A framework for ontology-based question answering with application to parasite immunology
Asiaee, Amir H
Tarleton, Rick L
MetadataShow full item record
Abstract Background Large quantities of biomedical data are being produced at a rapid pace for a variety of organisms. With ontologies proliferating, data is increasingly being stored using the RDF data model and queried using RDF based querying languages. While existing systems facilitate the querying in various ways, the scientist must map the question in his or her mind to the interface used by the systems. The field of natural language processing has long investigated the challenges of designing natural language based retrieval systems. Recent efforts seek to bring the ability to pose natural language questions to RDF data querying systems while leveraging the associated ontologies. These analyze the input question and extract triples (subject, relationship, object), if possible, mapping them to RDF triples in the data. However, in the biomedical context, relationships between entities are not always explicit in the question and these are often complex involving many intermediate concepts. Results We present a new framework, OntoNLQA, for querying RDF data annotated using ontologies which allows posing questions in natural language. OntoNLQA offers five steps in order to answer natural language questions. In comparison to previous systems, OntoNLQA differs in how some of the methods are realized. In particular, it introduces a novel approach for discovering the sophisticated semantic associations that may exist between the key terms of a natural language question, in order to build an intuitive query and retrieve precise answers. We apply this framework to the context of parasite immunology data, leading to a system called AskCuebee that allows parasitologists to pose genomic, proteomic and pathway questions in natural language related to the parasite, Trypanosoma cruzi. We separately evaluate the accuracy of each component of OntoNLQA as implemented in AskCuebee and the accuracy of the whole system. AskCuebee answers 68 % of the questions in a corpus of 125 questions, and 60 % of the questions in a new previously unseen corpus. If we allow simple corrections by the scientists, this proportion increases to 92 %. Conclusions We introduce a novel framework for question answering and apply it to parasite immunology data. Evaluations of translating the questions to RDF triple queries by combining machine learning, lexical similarity matching with ontology classes, properties and instances for specificity, and discovering associations between them demonstrate that the approach performs well and improves on previous systems. Subsequently, OntoNLQA offers a viable framework for building question answering systems in other biomedical domains.