Multiple sequence alignment using traveling salesman problem algorithms
MetadataShow full item record
The construction of multiple sequence alignments (MSAs) is a fundamental problem in computational biology, but computing an optimal MSA is NP-hard. It is therefore necessary to develop heuristics to come as close to the optimum as possible but operate within reasonable time and space bounds. In this thesis, an approach pioneered by Korostensky and Gonnet is developed and tested extensively. The idea is to apply a Traveling Salesman Problem (TSP) algorithm to find an optimal circular order to build an MSA progressively. The sum-of-pairs (SP) score is used for computing pairwise alignments and evaluating the quality of an MSA. Using the reference alignments from the benchmark alignment database BALISBASE, the performance of my program, tspMSA, was evaluated extensively with more than 60 reference data sets. Except for one test case, the SP scores obtained from the TSP algorithm are significant better than the scores obtained from two popular progressive alignment programs, PILEUP and CLUSTALW. The SP scores of MSAs built from different starting points and different directions of an optimal circular order are studied.