Semantic web topic models
MetadataShow full item record
Currently we are coping with a plethora of text (more than 80% of web) generated and disseminated globally on the web, and thanks to new technologies such as smart devices and social networks it keeps growing exponentially every day. This tremendous amount of text mostly unstructured is easy to be processed and perceived by humans, but significantly hard for machines to understand. Needless to say, this volume of text is an invaluable source of information and knowledge. Thus, there is an increasing need to design methods and algorithms in order to effectively process this sheer volume of text and extract high quality information in an automatic fashion. Probabilistic topic models are a class of latent variable models for textual data that can be used to produce interpretable summarization of documents in the form of their constituent topics. However, because topic models are entirely unsupervised, they may create topics that are not always meaningful and understandable to humans. In this dissertation, we develop novel topic models that combine probabilistic topic modeling with domain knowledge in the form of ontologies within a single framework. These models effectively enhance topic modeling process and produce the topics that are best aligned with user modeling goals. We first describe the ontology-based topic model, OntoLDA, in which a document is a mixture of topics where as topics are distributions over the ontology concepts and concepts are multinomial distributions over the words. We demonstrate the utility of this model in order to automatically generate labels for the topics. We next propose the sOntoLDA topic model which combines the DBpedia ontology with the topic modeling and use this model for semantic tagging of web documents. For all these models, we develop learning algorithms and show their usefulness with experiments conducted on real-world datasets.