Using traveling salesman problem algorithms to determine multiple sequence alignment orders
MetadataShow full item record
Multiple Sequence Alignment (MSA) is one of the most important tools in modern biology. The MSA problem is NP-hard, therefore, heuristic approaches are needed to align a large set of data within a reasonable time. Among existing heuristic approaches, CLUSTALW has been found to be the progressive alignment program that provides the best quality alignments, while the program POA provides very fast alignments. In this thesis, a new MSA algorithm is implemented and tested extensively. We use a Traveling Salesman Problem (TSP) algorithm to determine a circular tour in which the sequences are aligned. Sequences are aligned progressively by repeatedly merging the closest nodes along the TSP tour. Quality assessment of our algorithm, TspMsa, with CLUSTALW and POA was conducted using the BAliBASE benchmarks. It is found that TspMsa provides alignments which are similar to those from CLUSTALW in most test cases. Both programs give alignments which are significantly better than those from POA. For alignments of large sets of sequences, TspMsa and POA run in considerably shorter execution times than CLUSTALW.