Evolutionary instance resampling for difficult data sets
Richardson, William Dale
MetadataShow full item record
In the field of machine learning, properties of data sets such as class imbalance and overlap often pose difficulties for classifier algorithms. A number of methods alleviate these difficulties by adjusting the distribution of the training data prior to classifier construction. Resampling is typically effected by weighting, removing, or duplicating instances, but finding a good resampling for the data set is a nontrivial problem. Genetic algorithms are frequently used to search for solutions in large, difficult search spaces. In this thesis, four evolutionary approaches are applied to the problem of instance resampling across a variety of data sets and classifier paradigms. In many cases, evolutionary pre-processing is able to produce better classifiers. In particular, an integer-based, one-to-one representation and a cluster-based, real-valued weighting encoding are shown to improve classifier performance on difficult data sets.