A huge challenge for engineers and researchers in the fields of data mining and Machine Learning (ML) is high-dimensional data analysis. Feature selection offers a simple yet effective way to overcome this challenge by eliminating redundant and irrelevant data. Removing the irrelevant data improves learning accuracy, reduces the computation time, and facilitates an enhanced understanding for the learning model or data. While developing a ML model in real-life, most often not all the variables in the dataset are useful. Addition of redundant variables decreases the generalization competence of the model and may also decrease the overall precision of a classifier. Also, if more variables are added to a model, it results in a complex model being developed. 

Feature Selection

In statistics and Machine learning, feature selection (also known as variable selection, attribute selection, or variable subset selection) is the practice of choosing a subset of relevant features (predictors and variables) for use in a model construction. It is the automatic selection of attributes present in the data (such as columns in tabular data) that are most significant and appropriate to the predictive modeling problem that one is working on.

Feature Selection and Dimensionality Reduction

Feature selection is different from dimensionality reduction. Both methods work to decrease the number of attributes in the dataset; however, dimensionality reduction works by creating new groupings of attributes, whereas feature selection methods include and remove attributes available in the data without modifying the attributes. Examples of dimensionality reduction methods are principal component analysis, singular value decomposition, and sammon’s mapping.

Objective of Feature Selection

The objective of feature selection in ML is to identify the best set of features that enable one to build useful and constructive models of the subject one is trying to study. The methods for feature selection in Machine Learning can be classified into the following categories:

  • Supervised methods: These methods are used for labeled data, and are also used to classify the relevant features for increasing the efficiency of supervised models, such as classification and regression.
  • Unsupervised methods: These methods are used for unlabelled data.


From a taxonomic point of view these methods can be classified under:

  • Filter methods: These methods collect the fundamental properties of the features that are measured through univariate statistics instead of using cross-validation performance. These methods are quicker and less expensive computationally than wrapper methods. While dealing with high-dimensional data, it is computationally cheaper to use filter methods.
  • Wrapper methods: Wrappers necessitate a method to search the space of all possible subsets of features, assessing a classifier with that feature subset, and evaluating their quality by learning. The feature selection process is based on a particular ML algorithm that one tries to fit on a given dataset. The wrapper methods usually provide a better predictive accuracy than filter methods.
  • Embedded methods: These methods cover the advantages of both filter and wrapper methods by not only comprising interactions of features but also by retaining a reasonable computational cost.


The advantages of feature selection can be summed up as:

  • Decreases over-fitting: Less redundant data means less chances of making decisions based on noise.
  • Reduces training time: Less data means that the algorithms train sooner.
  • Improved accuracy: Less ambiguous data means improvement of modeling accuracy.