Signal and Data Processing

fa بیشینه سازی امتیاز در بازی تصادفی match-3 با استفاده از یادگیری تقویتی عمیق Maximize Score in stochastic match-3 games using reinforcement learning مقالات پردازش داده‌های رقمی Paper كاربردي Applicable <div style="text-align: justify;">بازی‌های رایانه‌ای در سال‌های اخیر نقش مهمی در توسعۀ هوش مصنوعی داشته‌اند. روش‌های گوناگون از جمله روش‌های مبتنی‌بر قوانین، جستجوی درختی و  یادگیری ماشین (یادگیری نظارت‌شده و یادگیری تقویتی) برای ایجاد عامل‌های هوشمند در بازی‌های گوناگون توسعه یافته‌اند. از میان این پژوهش‌ها، می‌توان به پژوهش‌های Deep Blue در بازی شطرنج و AlphaGo در بازی Go اشاره کرد. AlphaGo اولین برنامۀ رایانه‌ای است که یک بازی‌کن حرفه‌ای انسانی Go را شکست داد. همچنین، Deep Blue یک سامانۀ رایانه‌ای حرفه‌ای شطرنج و نخستین برنامه است که در مقابل یک قهرمان جهان، برنده می‌شود. در این مقاله، ما بر روی بازی match-3 تمرکز داریم، که یک بازی محبوب در تلفن‌های همراه و شامل یک فضای حالت تصادفی بسیار بزرگ و تابع پاداش تصادفی است که یادگیری را دشوار می‌کند. در گذشته، پژوهش‌های زیادی در مورد بازی‌های گوناگون، از جمله match-3، انجام شده‌است. هدف اصلی این پژوهش‌ها به‌طور کلی بازی بهینه یا پیش‌بینی دشواری مراحل طراحی‌شده برای بازی‌کنان انسانی بوده‌است. پیش‌بینی دشواری مراحل به توسعه‌دهندگان بازی کمک می‌کند تا کیفیت بازی‌های خود را بهبود بخشند و تجربۀ کاربری بهتری فراهم کنند. در این مقاله، یک عامل هوشمند بر اساس یادگیری تقویتی عمیق ارائه شده‌که هدف آن به بیشینه رساندن امتیاز در بازی match-3 است. یادگیری تقویتی یکی از شاخه‌های یادگیری ماشین است که عامل از طریق تجربیات خود از تعامل با محیط، سیاست بهینه را برای انتخاب اعمال در فضاهای گوناگون یاد می‌گیرد. در یادگیری تقویتی عمیق، الگوریتم‌های یادگیری تقویتی به‌همراه شبکه‌های عصبی عمیق استفاده می‌شوند. در روش پیشنهادی، سازوکار‌های نگاشت گوناگونی برای فضای اعمال و فضای حالت استفاده شده‌است. همچنین، یک ساختار نوآورانه از شبکه‌های عصبی سفارشی‌سازی‌شده برای محیط بازی match-3 پیشنهاد شده‌است تا قابلیت یادگیری فضای حالت بزرگ را به‌دست‌آورد. نوآوری‌های این مقاله را می‌توان بدین شرح خلاصه کرد: روی‌کردی برای نگاشت از فضای اعمال به یک ماتریس دوبعدی ارائه شده که امکان جداکردن اعمال مجاز و غیرمجاز را تسهیل می‌کند. یک روش برای نگاشت از فضای حالت به ورودی شبۀ عصبی عمیق طراحی شده که با کاهش عمق صافی‌های پیچشی، فضای ورودی را کاهش داده و این‌گونه فرایند یادگیری را بهبود می‌بخشد. همچنین، تابع پاداش از طریق جداکردن پاداش‌های تصادفی از پاداش‌های قطعی، فرایند یادگیری را پایدار کرده‌است. مقایسۀ روش پیشنهادی با سایر روش‌های موجود، از جمله PPO، DQN، A3C، روش حریصانه و عوامل انسانی، نشان‌دهندۀ عملکرد برتر روش پیشنهادی در بازی match-3 است.  </div> <div style="text-align: justify;">Computer games have played an important role in the development of artificial intelligence in recent years. Throughout the history of artificial intelligence, computer games have been a suitable test environment for evaluating new approaches and algorithms to artificial intelligence. Different methods, including rule-based methods, tree search methods, and machine learning methods (supervised learning and reinforcement learning) have been developed to create intelligent agents in different games. Games have been used as a suitable environment for trial and error, testing different artificial intelligence ideas and algorithms. Among these researches, we can mention the research of Deep Blue in the chess game and AlphaGo in the game Go. AlphaGo is the first computer program to defeat an expert human Go player. Also, Deep Blue is a chess-playing expert system is the first computer program to win a match, against a world champion. In this paper, we focus on the match-3 game. The match-3 game is a popular game in cell phones, which consists of a very large random state space which makes learning difficult. It also has random reward function which makes learning unstable. Many researches have been done in the past on different games, including match-3. The aim of these researches has generally been to play optimally or to predict the difficulty of stages designed for human players. Predicting the difficulty of stages helps game developers to improve the quality of their games and provide a better experience for users. Based on the approach used, past works can be divided into three main categories including search-based methods, machine learning methods and heuristic methods. In this paper, an intelligent agent based on deep reinforcement learning is presented, whose goal is to maximize the score in the match-3 game. Reinforcement learning is one of the approaches that has received a lot of attention recently. Reinforcement learning is one of the branches of machine learning in which the agent learns the optimal policy for choosing actions in different spaces through its experiences of interacting with the environment. In deep reinforcement learning, reinforcement learning algorithms are used along with deep neural networks. In the proposed method, different mapping mechanisms for action space and state space are used. Also, a novel structure of neural network customized for the match-3 game environment has been proposed to achieve the ability to learn large state space. The contributions of this article can be summarized as follow. An approach for mapping the action space to a two-dimensional matrix is presented in which it is possible to easily separate valid and invalid actions. An approach has been designed to map the state space to the input of the deep neural network, which reduces the input space by reducing the depth of the convolutional filter and thus improves the learning process. The reward function has made the learning process stable by separating random rewards from deterministic rewards. The comparison of the proposed method with other existing methods, including PPO, DQN, A3C, greedy method and human agents shows the superior performance of the proposed method in the match-3 game.  </div> یادگیری تقویتی عمیق, بازی تصادفی, match-3, فضای حالت بزرگ deep reinforcement learning, random game, match-3, large state space 129 140 http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-1930-1&slc_lang=fa&sid=1 Mehdy Roayaei Ardakany مهدی رعایائی اردکانی mroayaei@modares.ac.ir 100319475328460012586 100319475328460012586 Yes Tarbiat Modares University دانشگاه تربیت مدرس Ali Afroughrh علی افروغه ali74afrougheh@gmail.com 100319475328460012587 100319475328460012587 No Tarbiat Modares University دانشگاه تربیت مدرس