Signal and Data Processing -

Search published articles

Showing 3 results for News

Ijaz: An Operational system for single-document summarization of Persian news texts

Asef Pour Masoomi, Mohsen Kahani, Seyyed Ahmad Toosi, Ahmad Estiri,
Volume 11, Issue 1 (9-2014)

Abstract

The rapid growth of published documents on the web has created some new requests for processing, classification and information retrieval. So, the use of natural language processing tools has increased around the world. Automatic summarization known as the core of a wide range of text-processing tools such as decision systems, accountability systems, search engines, etc. And always has been investigated as an important issue in computer science.This paper has introduced "Ijaz", a text summarization system, for Persian documents. For this, we first review the related works in this field, especially for Persian text summarization. We then investigate the using of some new effective features for improvement of the proposed summarizer system. Also for the first time, by using of a large corpus and standardized assessment tools, the proposed method has been evaluated and compared with other existing approaches for Persian text. The results of this evaluations are remarkable.

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Dr. Hadi Veisi, Mr. Sayed Akbar Ghoreishi, Dr. Azam Bastanfard,
Volume 17, Issue 4 (2-2021)

Abstract

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIB's archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting. The aim of this research is to design a content retrieval engine for the IRIB’s media and production using spoken term detection (STD) or keyword spotting. The goal of an STD system is to search for a set of keywords in a set of speech documents. One of the methods for STD is using a speech recognition system in which speech is recognized and converted into text and then, the text is searched for the keywords. Variety of speech documents and the limitation of speech recognition vocabulary are two challenges of this approach. Large vocabulary continuous speech recognition systems (LVCSR) usually have limited but large vocabulary and these systems can't recognize out of vocabulary (OOV) words. Therefore, LVCSR-based STD systems suffer OOV problem and can't spotting the OOV keywords. Methods such as the use of sub-word units (e.g., phonemes or syllables) and proxy words have been introduced to overcome the vocabulary limitation and to deal with the out of vocabulary (OOV) keywords.
This paper proposes a Persian (Farsi) STD system based on speech recognition and uses the proxy words method to deal with OOV keywords. To improve the performance of this method, we have used Long Short-Term Memory-Connectionist Temporal Classification (LSTM-CTC) network.
In our experiments, we have designed and implemented a large vocabulary continuous speech recognition systems for Farsi language. Large FarsDat dataset is used to train the speech recognition system. FarsDat contains 80 hours voices from 100 speakers. Kaldi toolkit is used to implement speech recognition system. Since limited dataset, Subspace Gaussian Mixture Models (SGMM) is used to train acoustic model of the speech recognition. Acoustic model is trained based context tri-phones and language model is probability tri-gram words model. Word Error Rate (WER) of Speech recognition system is 2. 71% on FARSDAT test set and also 28.23% on the Persian news collected from IRIB data.
Term detection is designed based on weighted finite-state transducers (WFST). In this method, first a speech document is converted to a lattice by the speech recognizer (the lattice contains the full probability of speech recognition system instead of the most probable one), and then the lattice is converted to WFST. This WFST contains the full probability of words that speech recognition computed. Then, text retrieval is used to index and search over the WFST output. The proxy words method is used to deal with OOV. In this method, OOV words are represented by similarly pronunciation in-vocabulary words. To improve the performance of the proxy words methods, an LSTM-CTC network is proposed. This LSTM-CTC is trained based on charterers of words separately (not a continuous sentence). This LSTM-CTC recomputed the probabilities and re-verified proxy outputs. It improves proxy words methods dues to the fact that proxy words method suffers false alarms. Since LSTM-CTC is an end-to-end network and is trained based on the characters, it doesn't need a phonetic lexicon and can support OOV words. As the LSTM-CTC is trained based on the separate words, it reduces the weight of the language model and focuses on acoustic model weight.
The proposed STD achieve 0.9206 based Actual Term Weighted Value (ATWV) for in vocabulary keywords and for OOV keywords ATWV is 0.2 using proxy word method. Applying the proposed LSTM-CTC improves the ATWV rate to 0.3058. On Persian news dataset, the proposed method receives ATWV of 0.8008.

Review on Large Language Models in Finance: Text and Time Series Analysis for Investor Behavior and Market Prediction

Dr Saeede Anbaee Farimani, Dr Raheleh Ghouchannezhad Noor Nia, Dr Majid Vafaei Jahan,
Volume 22, Issue 2 (9-2025)

Abstract

The onset of social media venues, online news media, and digital content allowed a vast volume of text and time series data to be generated which plays significant role in investors' decision-making and financial market volatility. Data extracted from these platforms provide information on public sentiments, immediate reactions to news, and informal analyses, which, if processed appropriately, can be very useful indicators in forecasting financial market trends. Billions of dollars are invested and lost, depending on correct forecasting. However, advances in deep learning, especially in large language models (LLMs) and novel time series analysis algorithms, have opened new windows to processing and analyzing this complex data. The advanced language models identify hidden patterns and nonlinear dependencies, always taking into account the context and semantic details of the text between news, market sentiments, and price fluctuations, as well as utilizing them via intelligent market analysis systems. This review analyzes the existing research trends on the relationship of text data available on websites and social networks with the behavior of financial markets, having reviewed more than 200 scientific papers published between 2006 and 2024 in a systematic manner. This study focuses on identifying advanced methods within text representation, sentiment analysis, predictive modeling, and language model applications for analyzing real-time and unstructured data. More than one information source has to be taken into consideration: (Twitter, news agencies, blogs, and specialized forums) from a perspective of credibility, data structure, and influence-on market decisions. Given the complexity of financial markets, such as stocks and forex, there is an ever-increasing demand for hybrid models capable of carrying out analyses across time-series and text data simultaneously. This paper aims to analyze the current research accomplishments, identify gaps in the research, and ultimately put forward future directions for the fields of text mining, AI, and deep learning. These directions can open up the path for the next generation of real-time and adaptive recommender, predictor, and correlation analyzer systems in the financial markets.

Page 1 from 1

Signal and Data Processing

Search published articles

Vote