Signal and Data Processing

fa ارائه یک رتبه‌بند برای خطایاب معنایی با استفاده از ویژگی‌های حساس به متن A real-world spell checker using context-sensitive features مقالات پردازش متن Paper پژوهشي Research <span b="" dir="rtl" font-size:="" lang="FA" mso-ansi-language:="" mso-ascii-font-family:="" mso-ascii-theme-font:="" mso-bidi-language:="" mso-fareast-font-family:="" mso-fareast-language:="" mso-fareast-theme-font:="" mso-hansi-font-family:="" mso-hansi-theme-font:="" new="" style="line-height: 115%" times="">در عصر فناوری، روزانه حجم زیادی از سندهای الکترونیکی تولید میشود. از آنجا که این سندها توسط افراد مختلف تولید میشود دارای خطاهایی هستند. وجود خطاها باعث کاهش کیفیت سندها میشود، بنابراین وجود ابزارهای خطایاب باعث افزایش کیفیت میشود. یکی از انواع خطاها، خطای معنایی حساس به متن است. همانطور که از نام این آن برمیآید، برای تشخیص و تصحیح آن، نیاز به تحلیل اطلاعات موجود در متن است. در این مقاله، یک رتبهبند متمایزگر مستقل از زبان برای خطایابهای معنایی حساس به متن ارائه دادیم و از اطلاعات کل متن برای رتبهبندی استفاده کردیم. این رتبهبندی توسط ویژگیهای حساس به متن و یک مدل لگاریتم خطی انجام شده است. برای ارزیابی روش، از دو روش مبنای مختلف که یکی بر اساس مترجم ماشینی آماری و دیگری بر اساس مدل زبانی است استفاده کردهایم. به منظور ارزیابی سیستم از دو دادهی آزمون مختلف در زبان فارسی استفاده شده است. این روش باعث بهبود 17% در بازخوانی تشخیص و تصحیح نسبت به روش مبنای مترجم ماشینی آماری شده است. Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been wrongly converted into another word of the language. Thus, detection of real-word errors requires discourse analysis. In this paper, we propose a language independent discourse-aware discriminative ranker and use information of whole document and a log-linear model for ranking. To evaluate our method, we augment it into two context-sensitive spellchecker systems one is based on Statistical Machine Translation (SMT) and the other is based on language model. For more evaluation, we also use two different tests. Proposed method cause outperform about 17% over the SMT base approach with respect to detection and correction recall. خطایاب, خطای حساس به متن, مترجم ماشینی آماری, رتبه بندی آگاه به متن Spell checker, context-sensitive error, statistical machine translation, context-aware ranking 3 14 http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-483-1&slc_lang=fa&sid=1 Behzad Mirzababaei بهزاد میرزابابایی b.mirzababaei@ut.ac.ir 10031947532846002155 10031947532846002155 No university of Tehran دانشگاه تهران Heshaam Faili هشام فیلی hfaili@ut.ac.ir 10031947532846002156 10031947532846002156 Yes university of Tehran دانشگاه تهران