1 min readJun 26, 2018
This is a very good point: any feature selection (or feature engineering) you do to a dataset should be thoroughly evaluated with cross validation to see if it has a beneficial effect. Machine learning is still largely an empirical field which means it’s nearly impossible to tell ahead of time what effect a particular choice will have on model performance. The only way to find out for sure is to try a technique and then test it!