Feature selection in medical domains
MetadataShow full item record
Usually datasets in the medical domain contain so many features while not all these features are useful for training the classifier. This research aims to find the best optimal subset of features for each medical datasets namely Heart Disease, Parkinson, Thoracic Surgery, Diabetic Retinopathy and Breast Cancer which are all publicly available in UCI repository. We use Filter and Wrapper as feature selection methods which are scheme independent and scheme specific, respectively. In Wrapper approach, we applied different classification algorithms such as Support Vector Machine (SVM), Naïve Bayes, Decision Tree, Random Forest and K Nearest Neighbor (KNN) for both feature selection and classification. Afterwards, the same classifiers were applied on the dataset with selected features. The target of this study is comparative analysis of the effect of feature selection on 5 aforementioned datasets (these datasets are various in terms of data types and dimensionality) regarding the accuracy of the classifier.