The extensive use of social media platforms and the rapid dissemination of news in digital media have led to the generation of massive volumes of textual and time series data. This data has a direct influence on investor behavior in financial markets. In response, large language models (LLMs) and advanced natural language processing (NLP) techniques, alongside time series analysis methods, have emerged as critical tools for collecting, analyzing, and uncovering hidden patterns within these datasets. This comprehensive review examines over 200 studies published between 2006 and 2024, focusing on the interplay between financial markets and web-based news events through the lens of text mining. The study systematically evaluates various information sources, text representation techniques, sentiment analysis approaches, and predictive modeling frameworks. Notably, it highlights the recent advancements in leveraging LLMs for time series analysis and real-time data processing, marking a significant innovation in the field. The primary objective of this research is to map the current frontiers of knowledge in big data analytics and to identify promising avenues for future exploration. Specifically, it highlights the role of text mining, artificial intelligence, and deep learning methodologies in developing advanced systems for market prediction, decision support, and correlation analysis in financial domains such as stock exchanges and forex markets. By addressing challenges related to the scale, diversity, and reliability of data, this study provides valuable insights for enhancing the accuracy and efficiency of financial decision-making processes.
Rights and permissions | |
![]() |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |