HIV classification using DNA sequences
MetadataShow full item record
Many phylogenetic methods used for HIV classification analyze a collection of whole genome sequences to classify a new virus. Since the computational speed is reduced when we analyze whole genome, methods that analyze only some genomic regions were developed. Phylogenetic analysis based on complete genome is more reliable than those based on short segments of the HIV genome. We propose a new phylogenetic classification method based on coalescent theory. We choose the best-fitted model for every gene segment of the sequences using AIC criterion. Then we use maximum likelihood estimation to infer a phylogenetic tree and K-means clustering to classify the sequences. We observed significant improvement in HIV string classification by utilizing information provided by the entire genome. We take advantage of the whole genome and also recognize the uniqueness of every gene region. We tested the method on 150 sequences sampled from Los Alamos HIV database and obtained 100% subtyping accuracy.