Signal and Data Processing

fa ارائه مدلی برای تشخیص شایعات فارسی مبتنی بر تحلیل ویژگی‌های محتوایی در متن شبکه‌های اجتماعی A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks مقالات پردازش متن Paper پژوهشي Research <div style="text-align: justify;">شایعه یک تلاش جمعی است که در آن از قدرت واژگان برای تفسیر یک موقعیت مبهم&rlm; ولی جذاب استفاده میشود؛ بنابراین، شناسایی زبان شایعه میتواند در تشخیص شایعات کمککننده باشد. پژوهش‌های پیشین  برای حل مسأله تشخیص شایعه بیشتر بر روی اطلاعات متنی موجود در ریتوییت و توییت پاسخ کاربران و کمتر بر روی متن اصلی شایعه متمرکز شدهاند. اغلب این پژوهش‌ها بر روی زبان انگلیسی بوده و کارهای محدودی در زبان فارسی انجام شده است؛ از اینرو، این مقاله تنها با تمرکز برروی متن اصلی شایعات فارسی و معرفی ویژگیهایی با ارزش اطلاعات محتوایی بالا، مدلی مبتنی بر ویژگیهای محتوایی فیزیکی و غیرفیزیکی برای تشخیص شایعات فارسی منتشر‌شده برروی توییتر و تلگرام ارائه می‌کند. مدل پیشنهادی شایعات فارسی مجموعه‌داده توییتر را با معیار-F  848/0، شایعات مجموعه‌داده زلزله کرمانشاه را با معیار-F 952/0 و شایعات تلگرامی را با معیار-F 867/0 شناسایی کرده است؛ که نشان‌دهنده توانمندی مدل پیشنهادی برای شناسایی شایعات تنها با تمرکز بر ویژگیهای محتوایی متن شایعه منبع است. </div> The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in the English language, but more limited work has been done in the Persian language to detect rumors. This study analyzed the content of the original rumor and introduced informative content features to early identify Persian rumors (i.e., when it is published on news media but has not yet spread on social media) on Twitter and Telegram. Therefore, the proposed model is based on physical and non-physical content features in three categories including, lexical, syntactic, and pragmatic. These features are a combination of the common content features along with the proposed new content-based features. Since no social context information is available at the time of posting rumors, the proposed model is independent of propagation-based features and relies on the content-based information of the original rumor. Although in the proposed model, much information (including user information, the user's reaction to the rumor, and propagation structures) are ignored, but helpful content information can be obtained for classification by content analysis of the original rumor. Several experiments have been performed on the various combinations of feature sets (i.e., common and proposed content features) to explore the capability of features in distinguishing rumors and non-rumors separately and jointly. To this end, three machine learning algorithms including, Random Forest (RF), AdaBoost, and Support Vector Machine (SVM) have been used as strong classifications to evaluate the accuracy of the proposed model. To achieve the best performance of classification algorithms on the training dataset, it is necessary to use feature selection techniques. In this study, the Sequential Forward Floating Search (SFFS) approach has been used to select valuable features. Also, the statistical results of the t-test on the P-value (<=0.05) demonstrate that most of the new features proposed in this study reveal statistically significant differences between rumor and non-rumor documents. The experimental results are shown the performance of new proposed features to improve the accuracy of the rumor detection. The F-measure of the proposed model to detect Persian rumors on the Twitter dataset was 0.848, on the Kermanshah earthquake dataset was 0.952 and on the Telegram dataset was 0.867, which indicated the ability of the proposed method to identify rumors only by focusing on the content features of the original rumor text. The results of evaluating the proposed model on Twitter rumors show that, despite the short length of Twitter tweets and the extraction of limited content information from tweets, the proposed model can detect Twitter rumors with acceptable accuracy. Hence, the ability of content features to distinguish rumors from non-rumors is proven. تشخیص شایعات فارسی, تحلیل محتوی, ویژگی‌های محتوایی فیزیکی و غیرفیزیکی, پردازش متن Persian rumors detection, Content analysis, Physical and non-physical content features, Text processing 50 29 http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-1862-1&slc_lang=fa&sid=1 Zoleikha Jahanbakhsh-Nagadeh زلیخا جهانبخش نقده zoleikha.jahanbakhsh@srbiau.ac.ir 10031947532846009881 10031947532846009881 No Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran. دانشگاه آزاد اسلامی واحد علوم و تحقیقات تهران Mohammad-Reza Feizi-Derakhshi محمد رضا فیضی درخشی mfeizi@tabrizu.ac.ir 10031947532846009882 10031947532846009882 Yes Department of Computer Engineering University of Tabriz, Tabriz, Iran. دانشگاه تبریز Arash Sharifi آرش شریفی a.sharifi@srbiau.ac.ir 10031947532846009883 10031947532846009883 No Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran. دانشگاه آزاد اسلامی واحد علوم و تحقیقات تهران