Topic-driven document summarization using ontology knowledge
Abstract
Topic-driven summarization aims at extracting a summary from a single document or a
document collection based on a given topic. This thesis presents an extractive, ontology-based approach to topic-driven summarization. We make use of an ontology formed out of the information present in Wikipedia. Given a document and one or more topic terms/phrases, our summarization system generates a topic-related summary. To produce a good summary which contains information related to the topic, it is important to understand the topic. We first expand the initial topic by using the Wikipedia ontology. The document is represented as a graph using entities from the Wikipedia ontology. A Spreading activation algorithm is applied to find all nodes in the document graph that are semantically related to the expanded topic terms. This determines the relative importance of nodes by assigning a weight to each node. These weights are used to decide which sentences should be included in the extractive summary.