Volume 21, Issue 1 (6-2024)                   JSDP 2024, 21(1): 71-88 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Mottaghi M S, Minaei-Bidgoli B. A Graph-based Algorithm for Clustering Qur’anic Surahs. JSDP 2024; 21 (1) : 6
URL: http://jsdp.rcisp.ac.ir/article-1-1220-en.html
Shahid Beheshti university
Abstract:   (920 Views)

The Holy Qur'an is revealed from God Almighty. Up to now many scholars and researchers have tried to understand the Holy Qur'an and comprehend it. The availability of computer systems is a great opportunity to help researchers reach higher peaks by speeding them up in their way. Clustering is one of the methods has been used to understand the structure of the data. In clustering, we want to divide samples of data into groups so that the members of each cluster are similar together and are different from the members of the other clusters. Clustering of Qur'anic surahs has been the subject of some computer studies on the Qur'an. In these studies, different approaches have been considered to vectorizing the surahs. In a study, Thabet formed vectors of each surah by considering some stems of Qur'anic words as features and the normalized probability of their occurrences in the surah as feature values and clustered just 24 surahs due to the sparseness of the obtained data matrix. With a similar approach in vectorizing the surahs, Moisl calculated the minimum surah length threshold per feature in order to solve the problem of shorter surahs by using some concepts of statistical sampling theory, and could cluster more surahs. Instead of using words as features, Sharaf considered 13 features including existence of referring to the story of Adam and Ebliys, number of the phrase «یا أَیُّهَا الَّذینَ آمَنُوا» (O you who believe), and determined the method of measuring each feature. Then, he formed data matrix and clustered the Qur'anic surahs. In another study, Sufi et al. considered the topics identified for each verse in the Tafsir Rahnama as features and constructed a binary data matrix based on the presence or absence of that topic in the Tafsir of that surah and applied clustering. In this article, we have clustered the surahs of the Holy Qur'an based on the co-occurrence of words in it. To achieve this goal, we have used an existing graph-based approach. In the present study, we first represent each surah as a weighted undirected graph. Then we form the vector of each surah by considering closed frequent sub-graphs as features and relative occurrence of them in each surah as feature values, and eventually cluster the surahs. We used the Silhouette score to evaluate the quality of clustering. Based on this criterion, in the best clustering among different runs, the Silhouette score of 0.91 was obtained. This research provide a proper structural infrastructure for specifying the semantic layer of Holy Qur'an surahs for computational linguistics researchers in the domain of Qur'anic studies.

Article number: 6
Full-Text [PDF 955 kb]   (316 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2021/04/10 | Accepted: 2022/09/24 | Published: 2024/08/3 | ePublished: 2024/08/3

References
1. اصلانی، اکرم، اسماعیلی، مهدی. «یافتن الگوهای مکرّر در قرآن کریم به‌‌کمک روش‌‌های متن‌‌کاوی»، پردازش علائم و داده‌ها،۱۵ (۳)، ۸۹-۱۰۰، ۱۳۹۷ https://jsdp.rcisp.ac.ir/. [ دسترسی در ۱۹ اردیبهشت ۱۴۰۳].
2. A. Aslani A, M. Esmaeili, "Finding Frequent Patterns in Holy Quran UsingText Mining," JSDP, vol. 15, no.3, pp. 89-100, 2018. Available: https://jsdp.rcisp.ac.ir/. [Accessed: May. 10, 2024]. [DOI:10.29252/jsdp.15.3.89]
3. صادق‌زمانی، فهیمه، ضرغامی، محمدحسین، «تعیین ساختار روابط بین اعضای خانواده بر اساس ویژگیهای شخصیتی»، اندازه‌گیری تربیتی، شماره ۳۳، ۲۰۸-۱۸۹، ۱۳۹۷. http://journals.atu.ac.ir. [دسترسی در ۵ اسفند۹۹].
4. Sadegh-Zamani and M. H. Zarghami, "Determining structure of relations between family members based on personality characteristics," Quarterly of Educational Measurement, no. 33, pp. 189-208, 2018. Available: http://journals.atu.ac.ir. [Accesed: Feb. 24, 2021].
5. صوفی، محسن، علی‌احمدی، علی‌رضا، علی‌احمدی، حسین، مینایی، بهروز، «خوشه‌بندی سوره‌های قرآن با تکنیک‌های داده‌کاوی»، رهیافت‌هایی در علوم قرآن و حدیث، دوره ۵۰، ۱۲۰-۱۰۳، ۱۳۹۷. https://jquran.um.ac.ir. [دسترسی در ۵ اسفند۹۹].
6. [M. Sufi, A. R. Ali-Ahmadi, H. Ali-Ahmadi, B. Minaei-Bidgoli, "Clustering of Qur'anic Surahs Using Data Mining Techniques", New Approaches in Quran and Hadith Studies, vol. 50, pp. 103-120, 2018-2019. Available: https://jquran.um.ac.ir. [Accesed: Feb. 24, 2021].
7. قلی‌زاده، بهروز، ساختمان‌های گسسته، چاپ سی و یکم، تهران، مؤسسه انتشارات علمی دانشگاه شریف، چاپ۳۱، ۱۳۹۱.
8. B. Gholizadeh, Discrete Mathematics, Tehran: Sharif University Press, 2012-2013.
9. هاشمی رفسنجانی، علی‌اکبر، و جمعی از محققان مرکز فرهنگ و معارف قرآن، تفسیر راهنما، قم: بوستان کتاب قم، چاپ سوم، ۱۳۷۹.
10. A. Hashemi-Rafsanjani, and a group of researchers from Quranic Science and Culture Center, Tafsir Rahnama, Qom: Bustan Ketab Publisher, 2000-2001.
11. E. Aftab and MK. Malik, "eRock at Qur'an QA 2022: contemporary deep neural networks for Qur'an based reading comprehension question answers," In Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection 2022. pp. 96-103, 2022. Available: https://aclanthology.org/2022.osact-1.11.pdf [Accessed: May. 10, 2024].
12. M. E. Aktas, and E. Akbas, "Text classification via network topology: A case study on the Holy Quran," In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 16 Dec 2019, 1557-1562. Available: http://www.math .uco.edu. [Accessed: Feb. 24, 2021]. [DOI:10.1109/ICMLA.2019.00257]
13. E. Alpaydin, Introduction to machine learning, 2nd ed. Massachusetts: Massachusetts Institutes of Technology, 2010. Available: https://books.google.com [Accessed: Feb. 24, 2021].
14. E. Atwell, Habash Nizar, Louw Bill, B. Abu Shawar, T. McEnery, W. Zaghouani, and M. El-Haj, "Understanding the Quran: A new grand challenge for computer science and artificial intelligence," ACM-BCS Visions of Computer Science 2010, 2010. Available: https://eprints.lancs.ac.uk. [Accessed: Feb. 25, 2021].
15. C. M. Bishop, Pattern Recognition and Machine Learning, New York: Springer-Verlag, 2006. Available: http://users.isr.ist.utl.pt/. [Accessed: Feb. 25, 2021].
16. K. Dukes, and N. Habash, "Morphological Annotation of Quranic Arabic," Lrec, 2010. Available: http://citeseerx.ist.psu.edu/. [Accessed: Feb. 25, 2021].
17. K. Dukes, "Quranic Arabic Corpus," May. 1, 2011. [online]. Available: http://corpus.quran.com/. [Accessed: Feb. 24, 2021]
18. M. H. Elahimanesh, B. Minaei-Bidgoli, M. J. Gholami, and H. Juzi, "An Introduction to Noor Corpus and its Language Model." First International Conference on Persian language Processing(ICPLP), Semnan university, 2012. Available: researchgate.net. [Accessed: Feb. 25, 2021].
19. H. Veeramani, S. Thapa and U. Naseem , "LowResContextQA at Qur'an QA 2023 Shared Task: Temporal and Sequential Representation Augmented Question Answering Span Detection in Arabic," In Proceedings of ArabicNLP 2023, pp. 708-713, 2023. Available: https://aclanthology.org/2023.arabicnlp-1.78. [Accessed: May. 10, 2024]. [DOI:10.18653/v1/2023.arabicnlp-1.78]
20. A. Lim, "The berth planning problem," perations research letters, vol. 22, pp. 105-110, March 1998. Available: https://www.sciencedirect.com/ [Accessed: Feb. 25, 2021]. [DOI:10.1016/S0167-6377(98)00010-8]
21. R. Malhas and T. Elsayed, "Arabic machine reading comprehension on the Holy Qur'an using CL-AraBERT," Information Processing & Management, Vol. 59, no. 6, 2022. Available: https://doi.org/10.1016/j.ipm.2022.103068 [DOI:10.1016/j.ipm.2022.103068. [Accessed: May. 10, 2024].]
22. G. Mediamer, "Semantic Feature Analysis for Multi-Label Text Classification on Topics of the Al-Quran Verses," Journal of Information Processing Systems, vol. 20, no.1, 2024. Available: https://jips-k.org/digital-library/2024/20/1/1. [Accessed: May. 10, 2024].
23. Y. Mellah, I. Touahri, Z. Kaddari, Z. Haja, J. Berrich and T. Bouchentouf , "LARSA22 at Qur'an QA 2022: text-to-text transformer for finding answers to questions from Qur'an," In Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection, pp. 112-119, 2022. Available: https://aclanthology.org/2022.osact-1.13. [Accessed: May. 10, 2024].
24. M. Mohammed, S. Amin and MM. Aref , "An english islamic articles dataset (eiad) for developing an islambot question answering chatbot," In 2022 5th International Conference on Computing and Informatics (ICCI), pp. 303-309, 2022. Available: https://ieeexplore.ieee.org/abstract/document/9756122/. [Accessed: May. 10, 2024]. [DOI:10.1109/ICCI54321.2022.9756122]
25. H. Moisl, "Sura Length and Lexical Probability Estimation in Cluster Analysis of the Qur'an," ACM Transactions on Asian Language Information Processing (TALIP), vol. 8, pp. 1-19, 2009. Available: https://eprints.ncl.ac.uk/. [Accessed: Feb. 25, 2021]. [DOI:10.1145/1644879.1644886]
26. A. B. Muhammad, "Annotation of conceptual co-reference and text mining the Qur'an,"Ph.D. dissertation, Dept. school of computing, Leeds Univ., UK, 2012. Available: http://etheses.whiterose.ac.uk/. [Accessed: Feb. 25, 2021].
27. C. Nicolini, C. Bordier, and A. Bifone, "Community detection in weighted brain connectivity networks beyond the resolution limit," Neuroimage, vol. 146, pp. 28-39, 2017. Available: researchgate.net. [Accessed: Feb. 25, 2021]. [DOI:10.1016/j.neuroimage.2016.11.026]
28. F. Rousseau, E. Kiagias, and M. Vazirgiannis, "Text categorization as a graph classification problem," In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26-31 July 2015, pp. 1702-1712. Available: https://www.aclweb.org/. [Accessed: Feb. 25, 2021]. [DOI:10.3115/v1/P15-1164]
29. P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of computational and applied mathematics, vol. 20, pp. 53-65, 1987. Available: https://www.sciencedirect.com/. [Accessed: Feb. 25, 2021]. [DOI:10.1016/0377-0427(87)90125-7]
30. A. B. M. Sharaf, and E. Atwell, "QurAna: Corpus of the Quran annotated with Pronominal Anaphora," LREC, 2012, pp. 130-137. Available: http://citeseerx.ist.psu.edu/. [Accessed: Feb. 25, 2021].
31. A. B. M. Sharaf, and E. Atwell, "QurSim: A corpus for evaluation of relatedness in short texts," LREC, 2012, pp. 2295-2302. Available: http://textminingthequran.com/. [Accessed: Feb. 25, 2021].
32. S. A. Shirkhorshidi, S. Aghabozorgi Saeed, and T. Y. Wah, "A comparison study on similarity and dissimilarity measures in clustering continuous data," PloS one, vol. 10, 2015. Available: researchgate.net. [Accessed: Feb. 25, 2021]. [DOI:10.1371/journal.pone.0144059]
33. M. A. Siddiqui, S. M. Faraz, and S. A. Sattar, "Discovering the thematic structure of the Quran using probabilistic topic model," 2013 In 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, IEEE, 2013, pp. 234-239. Available: researchgate.net. [Accessed: Feb. 25, 2021]. [DOI:10.1109/NOORIC.2013.55]
34. I. Takigawa, H. Mamitsuka, "Efficiently mining δ-tolerance closed frequent subgraphs," Machine Learning, vol. 82, pp. 95-121. Available: https://link.springer.com. [Accessed: Feb. 25, 2021]. [DOI:10.1007/s10994-010-5215-6]
35. P. N. Tan, M. Steinbach, A. Karpatne, and V. Kumar, Introduction to data mining, second edition, New York: Pearson Education, 2018.
36. N. Thabet, "Understanding the thematic structure of the Qur'an: an exploratory .multivariate approach," In Proceedings of the ACL Student Research Workshop, 2005, pp. 7-12. Available: https://www.aclweb.org/. [Accessed: Feb. 25, 2021]. [DOI:10.3115/1628960.1628963]
37. X. Yan, and J. Han, "gspan: Graph-based substructure pattern mining," 2002 IEEE International Conference on Data Mining,2002, pp. 721-724. Available: https://sites.cs.ucsb.edu/. [Accessed: Feb. 25, 2021].
38. R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, London: Cambridge University Press, 2014. Available: http://citeseerx.ist.psu.edu/. [Accessed: Feb. 25, 2021]. [DOI:10.1017/CBO9781139088510]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing