Signal and Data Processing

fa مدل ترجمه عبارت-مرزی با استفاده از برچسب‌های کم‌عمق نحوی Phrase-Boundary Translation Model Using Shallow Syntactic Labels مقالات پردازش متن Paper پژوهشي Research مدل عبارت-مرزی برای ترجمه ماشینی آماری، قواعد را با طبقه کلمات مرزی عبارات پیکره مقصد برچسب میزند. در این مقاله مدل عبارت-مرزی را با استفاده از برچسبهای کم‌عمق نحوی شامل برچسب POS و برچسب قطعات توسعه میدهیم. با اولویت برچسب قطعات، مدل پیشنهادی، غیرپایانهها را با برچسبهای کم‌عمق نحوی در مرز عبارات مقصد نام‌گذاری می‌کند. در قیاس با مدل  SAMT که قواعد را با درخت تجزیه نحوی جملات مقصد برچسب می‌زند، مدل پیشنهادی به تجزیه عمیق نحوی نیاز ندارد. همچنین، هرچه تفاوت ترتیب کلمات زبان مبداء و مقصد ترجمه بیشتر باشد، عبارات تراز‌شده قابل انطباق با درخت تجزیه نحوی، کمتر خواهد بود. تعدادی آزمایش در ترجمه از فارسی و آلمانی به انگلیسی به‌عنوان جفت‌زبان‌هایی با تفاوت زیاد در ترتیب کلمات انجام شد. در این آزمایش‌ها، مدل عبارت-مرزی پیشنهادی نسبت به مدل SAMT در حدود 5/0 واحد BLEU کیفیت ترجمه بهتری به‌دست آورد. Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of the target side phrases. In comparison to the base phrase-boundary model, our variant uses phrase labels in addition to word classes. In other words, if there is no chunk label in one boundary, the labeler uses the word POS tag. The boundary labels are concatenated where there is no label for the whole target span. Using chunks as phrase labels, the proposed model generalizes the rules to decrease the model sparseness. The sparseness has more importance in the language pairs with a lot of differences in the word order because they have less number of aligned phrase pairs for extraction of rules. Compared with Syntax Augmented Machine Translation (SAMT) that labels rules with the syntax trees of the target side sentences, the proposed model does not need deep syntactic parsing. Thus, it is applicable even for low-resource languages having no syntactic parser. Some translation experiments are performed from Persian and German to English as the source and target languages with different word orders. In the experiments, our model achieved improvements of about 0.5 point of BLEU over a variant of SAMT. ترجمه ماشینی آماری, مدل سلسله‌مراتبی, برچسب کلمه, برچسب قطعه Statistical machine translation, Hierarchical models, Word tag, Chunk label 115 126 http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-1013-1&slc_lang=fa&sid=1 Shahram Salami شهرام سلامی shsalami@gmail.com 10031947532846006210 10031947532846006210 Yes Shahid Beheshti University دانشگاه شهید بهشتی Mehrnoush Shamsfard مهرنوش شمس فرد m-shams@sbu.ac.ir 10031947532846006211 10031947532846006211 No Shahid Beheshti University دانشگاه شهید بهشتی