Classification analysis in microarray data using biological pathway and gene family information
Abstract
In this thesis, we conducted classification analysis on genomic expression microarray data. Principal Component Analysis (PCA) was used on dataset organized based on biological pathway and gene family information. Principal components were selected by 1-stage PCA that only contain pathway information and 2-stage PCA that contain both pathway and family information. Support Vector Machines (SVM) based classification was performed on the principal component scores generated by the two PCA procedures. Results suggest that both 1-stage PCA and 2-stage PCA based SVM classification possesses good predictive power and a reasonably low error rate.