Comparison of phosphorylation patterns across eukaryotes by discriminative N-gram analysis
MetadataShow full item record
Abstract Background How protein phosphorylation relates to kingdom/phylum divergence is largely unknown and the amino acid residues surrounding the phosphorylation site have profound importance on protein kinase–substrate interactions. Standard motif analysis is not adequate for large scale comparative analysis because each phophopeptide is assigned to a unique motif and perform poorly with the unbalanced nature of the input datasets. Results First the discriminative n-grams of five species from five different kingdom/phyla were identified. A signature with 5540 discriminative n-grams that could be found in other species from the same kingdoms/phyla was created. Using a test data set, the ability of the signature to classify species in their corresponding kingdom/phylum was confirmed using classification methods. Lastly, ortholog proteins among proteins with n-grams were identified in order to determine to what degree was the identity of the detected n-grams a property of phosphosites rather than a consequence of species-specific or kingdom/phylum-specific protein inventory. The motifs were grouped in clusters of equal physico-chemical nature and their distribution was similar between species in the same kingdom/phylum while clear differences were found among species of different kingdom/phylum. For example, the animal-specific top discriminative n-grams contained many basic amino acids and the plant-specific motifs were mainly acidic. Secondary structure prediction methods show that the discriminative n-grams in the majority of the cases lack from a regular secondary structure as on average they had 88 % of random coil compared to 66 % found in the phosphoproteins they were derived from. Conclusions The discriminative n-grams were able to classify organisms in their corresponding kingdom/phylum, they show different patterns among species of different kingdom/phylum and these regions can contribute to evolutionary divergence as they are in disordered regions that can evolve rapidly. The differences found possibly reflect group-specific differences in the kinomes of the different groups of species.