Salmonella serotyping and source prediction using whole genome sequencing data
MetadataShow full item record
In our study, we utilized whole genome sequencing (WGS) and bioinformatics approaches to address the challenges for Salmonella serotyping and microbial source tracking (MST). Firstly, a web-based bioinformatics tool to predict Salmonella serotypes (SeqSero, http://www.denglab.info/SeqSero) was developed by identifying the Salmonella antigen determinants (e.g. rfb gene cluster, wzx, wzy, fliC and fljB genes) from WGS data. Based on our curated databases for the determinants, SeqSero can theoretically achieve almost full spectrum Salmonella serotyping. Three datasets were used to evaluate the performance of SeqSero: 1) 308 raw reads genomes from Salmonella isolates of known serotype which were confirmed by CDC; 2) 3,306 raw reads genomes from Salmonella isolates sequenced and made publicly available by GenomeTrakr, a U.S. national monitoring network operated by the Food and Drug Administration; and 3) 354 other publicly available draft or complete Salmonella genomes from NCBI. The evaluation showed that SeqSero can reliably predict serotypes from WGS data with a high accuracy and fast speed. Secondly, we analyzed the population structure of a broad-host-range pathogen Salmonella enterica serovar Typhimurium (ST). A total of 1,267 ST genomes from clinical and various animal, food and environmental sources were included to identify population groups (major phylogenetic lineages) and clades (smaller phylogenetic groups within a group) as well as their association with particular sources. A maximum likelihood phylogenetic tree was constructed based on whole genome sequencing SNPs (wgSNPs). A total of 10 major population groups were identified. Clustering of isolates from the same source was observed in 6 population groups, including clusters overrepresented by isolates from poultry, bovine, swine, and wild birds. Analyses of evolutionary relationship, metabolic profile, gene contents and pseudogene distribution provided further support for the source-cluster association. The observed source association demonstrated the potential of WGS-based subtyping in MST for ST, which was initially evaluated and analyzed by two different sets of genomes, one from publicly available genomes in the FDA GenomeTrakr database.