Volume 20, Issue 3 (12-2023)                   JSDP 2023, 20(3): 103-126 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Rabiei Zadeh A, Amirkhani H. A survey on short text similarity measurement methods. JSDP 2023; 20 (3) : 8
URL: http://jsdp.rcisp.ac.ir/article-1-1307-en.html
AI Laboratory of Computer Research Center of Islamic Science (Noor)
Abstract:   (261 Views)
Measuring similarity between two text snippets is one of the essential tasks in many NLP problems and it has been still one of the most challenging tasks in the field. Various methods have been proposed to measure text similarity. This survey reviews more than 150 of the related papers, introduces a comprehensive taxonomy with three main categories, and discusses the advantages and disadvantages of these methods. The first category is lexical methods that only focus on text pair’s surface similarity. These methods consider the text as a sequence of characters, tokens, or a mixture of these two. Some recent studies use deep learning techniques for detecting lexical similarity in alias detection task. The second category is semantic methods that take into consideration the meaning of the words based on some pre-prepared knowledge-bases like Wordnet or using Corpus-based methods. Some recent studies use modern deep learning techniques like transformers and Siamese networks to create document embedding that outperform other methods. The final category is hybrid methods that take advantage of all other methods even syntactic parsing in some cases. Note that high-quality syntactic parsers are not present for many languages and that using them has some side-effects on performance and speed.
Article number: 8
Full-Text [PDF 1099 kb]   (50 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2022/04/20 | Accepted: 2023/02/22 | Published: 2024/01/14 | ePublished: 2024/01/14

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing