This research delves into authorship attribution through an avant-garde lens, employing Topological Data Analysis (TDA) as a potent instrument to unravel intricate patterns within classical Persian poetry. The focal point of this study is the distinguished works of Ferdowsi and Hafez, two preeminent Persian poets, exploring the latent structures in their verses through the lenses of Persistent Homology and Mapper a pair of TDA methodologies. The discernment between Non-Semantic and Semantic authorship attribution methodologies lays the groundwork, elucidating the significance of capturing structural nuances in textual data. The main focus of this investigation revolves around the deployment of Persistent Homology a cutting-edge technique that transcends traditional text analysis methodologies. It operates in high-dimensional spaces, extracting topological features, and rendering them comprehensible through persistent diagrams. This paper meticulously unpacks the mathematical underpinnings of Persistent Homology, providing a stepwise exposition of its application, focusing on Homology, Simplicial Complex, and Group Theory. These foundational elements converge to empower extracting meaningful topological signatures from the poetic corpus. In tandem, Mapper, another TDA tool, unfolds as a pivotal player in this explorative journey. This algorithmic entity facilitates dimensionality reduction and simplicial complex construction to portray an accurate depiction of the intrinsic topological architecture residing in the dataset. The intricacies of Mapper's workflow from filter function selection to binning and clustering are meticulously detailed, forming a coherent narrative of its operational dynamics. Transitioning from theoretical discourse to practical implementation, this research adopts a case study approach, weaving Ferdowsi and Hafez's poetic masterpieces into the TDA tapestry. Beyond the mere application of algorithms, the study delves into the realm of accuracy assessments, subjecting the Mapper algorithm to rigorous tests, and gauging the precision of its poem classifications within identified clusters. An additional layer of complexity unfolds as the research embraces semantic clustering, elucidating thematic resonances embedded within the verses. The results borne out of this meticulous exploration not only underscore the efficacy of TDA methodologies in unveiling the intricate structures of Persian poetry but also offer a nuanced perspective on their interpretability and utility in the realm of authorship attribution. The poetic narrative, with its semantic richness and structural subtleties, emerges as a fertile ground for the application of TDA, pushing the boundaries of text classification methodologies. This research, therefore, contributes significantly to the evolving discourse on the intersection of literature and data science, offering a profound understanding of how TDA can be wielded as a transformative lens to decipher the profound threads of authorial expression.
Type of Study:
Applicable |
Subject:
Paper Received: 2023/09/5 | Accepted: 2025/06/11 | Published: 2025/12/19 | ePublished: 2025/12/19