Multilingual text similarity analysis in Islamic texts
Abstract
Text similarity measures have been widely studied and used in machine learning and information retrieval for many years. We present a framework with different text similarity measures to delve into the problem of text similarity in the context of multilingual representations of the Qur’an and the Hadith. For the Qur’an, we compare and contrast the effect of applying five similarity measures across four representations of the Qur’an. We analyze the results along two classes namely: the identical verse pairs and similar verse pairs. For the Hadith, we utilize the same methodology to apply on the larger text data that the Hadith comprises. We employ multithreading technique for speeding up the similarity computations We compare and contrast the application of similarity measures across the English and Arabic Representations Based on the results of our text similarity analysis, we propose interlinking of Hadiths with similar semantic content by investigating different equivalence classes by applying different similarity thresholds.