Volume 20, Issue 4 (3-2024)                   JSDP 2024, 20(4): 129-140 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Roayaei Ardakany M, Afroughrh A. Maximize Score in stochastic match-3 games using reinforcement learning. JSDP 2024; 20 (4) : 9
URL: http://jsdp.rcisp.ac.ir/article-1-1345-en.html
Tarbiat Modares University
Abstract:   (342 Views)
Computer games have played an important role in the development of artificial intelligence in recent years. Throughout the history of artificial intelligence, computer games have been a suitable test environment for evaluating new approaches and algorithms to artificial intelligence. Different methods, including rule-based methods, tree search methods, and machine learning methods (supervised learning and reinforcement learning) have been developed to create intelligent agents in different games. Games have been used as a suitable environment for trial and error, testing different artificial intelligence ideas and algorithms. Among these researches, we can mention the research of Deep Blue in the chess game and AlphaGo in the game Go. AlphaGo is the first computer program to defeat an expert human Go player. Also, Deep Blue is a chess-playing expert system is the first computer program to win a match, against a world champion.
In this paper, we focus on the match-3 game. The match-3 game is a popular game in cell phones, which consists of a very large random state space which makes learning difficult. It also has random reward function which makes learning unstable. Many researches have been done in the past on different games, including match-3. The aim of these researches has generally been to play optimally or to predict the difficulty of stages designed for human players. Predicting the difficulty of stages helps game developers to improve the quality of their games and provide a better experience for users. Based on the approach used, past works can be divided into three main categories including search-based methods, machine learning methods and heuristic methods.
In this paper, an intelligent agent based on deep reinforcement learning is presented, whose goal is to maximize the score in the match-3 game. Reinforcement learning is one of the approaches that has received a lot of attention recently. Reinforcement learning is one of the branches of machine learning in which the agent learns the optimal policy for choosing actions in different spaces through its experiences of interacting with the environment. In deep reinforcement learning, reinforcement learning algorithms are used along with deep neural networks.
In the proposed method, different mapping mechanisms for action space and state space are used. Also, a novel structure of neural network customized for the match-3 game environment has been proposed to achieve the ability to learn large state space. The contributions of this article can be summarized as follow. An approach for mapping the action space to a two-dimensional matrix is presented in which it is possible to easily separate valid and invalid actions. An approach has been designed to map the state space to the input of the deep neural network, which reduces the input space by reducing the depth of the convolutional filter and thus improves the learning process. The reward function has made the learning process stable by separating random rewards from deterministic rewards.
The comparison of the proposed method with other existing methods, including PPO, DQN, A3C, greedy method and human agents shows the superior performance of the proposed method in the match-3 game.
 
Article number: 9
Full-Text [PDF 745 kb]   (219 Downloads)    
Type of Study: Applicable | Subject: Paper
Received: 2022/10/27 | Accepted: 2023/12/11 | Published: 2024/04/25 | ePublished: 2024/04/25

References
1. [1] M. Campbell, A. J. Hoane, and F. H. Hsu, "Deep Blue," Artificial intelligence, vol. 134, no. 1-2, pp. 57-83, 2002. [DOI:10.1016/S0004-3702(01)00129-1]
2. [2] D. Silver et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484-489, 2016. [DOI:10.1038/nature16961] [PMID]
3. [3] V. Mnih et al., Human-level control through deep reinforcement learning, Nature, vol. 518, no.7540, pp. 529-533, 2015. [DOI:10.1038/nature14236] [PMID]
4. [4] H. Van Hasselt, A. Guez, and D. Silver, Deep Reinforcement Learning with Double Q-Learning, Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, Mar. 2016. [DOI:10.1609/aaai.v30i1.10295]
5. [5] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Frcitas, Dueling Network Architectures for Deep Reinforcement Learning, in 33rd International Conference on Machine Learning, ICML 2016, 2016, vol. 4, no. 9, pp. 2939-2947.
6. [6] V. Mnih et al., Asynchronous Methods for Deep Reinforcement Learning, in Proceedings of the 33rd International Conference on Machine Learning, 2016, vol. 48, pp. 1928-1937.
7. [7] J. v. Neumann, Zur Theorie der Gesellschaftsspiele, Math. Ann., vol. 100, no. 1, pp. 295-320, Dec. 1928. [DOI:10.1007/BF01448847]
8. [8] D. Knuth, R. M. A., An analysis of alpha-beta pruning, An analysis of alpha-beta pruning, vol. 6, no. 4, pp. 293-326, 1975. [DOI:10.1016/0004-3702(75)90019-3]
9. [9] J. Schaeffer, R. Lake, P. Lu, M. B.A., Chinook the world man-machine checkers champion, AI magazine, vol. 17(1), 1996.
10. [10] M. Enzenberger, M. Müller, B. Arneson, and R. Segal, FUEGO-An open-source framework for board games and go engine based on Monte Carlo tree search, IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, no. 4, pp. 259-270, 2010. [DOI:10.1109/TCIAIG.2010.2083662]
11. [12] D. Hadar and O. Samuel, Crushing Candy Crush - An AI Project, Hebrew University of Jerusalem, 2015.
12. [13] E. R. Poromaa, Crushing Candy Crush, KTH Royal Inst. Technol., Stockholm, Sweden, 2017.
13. [14] S. Purmonen, Predicting game level difficulty using deep neural networks, KTH Royal Institute of Technology, Stockholm, Sweden, 2017.
14. [15] C. Tesau and G. Tesau, Temporal Difference Learning and TD-Gammon, Communications of the ACM, vol. 38, no. 3, pp. 58-68, 1995. [DOI:10.1145/203330.203343]
15. [16] V. Mnih et al., Playing Atari with Deep Reinforcement Learning, arXiv Prepr. arXiv1312.5602, 2013.
16. [17] V. Mnih et al., Human-level control through deep reinforcement learning, Nature, vol. 518, no. 7540, pp. 529-533, 2015. [DOI:10.1038/nature14236] [PMID]
17. [18] Y. Shin, J. Kim, K. Jin, and Y. Bin Kim, Playtesting in Match 3 Game Using Strategic Plays via Reinforcement Learning, IEEE Access, vol. 8, pp. 51593-51600, 2020. [DOI:10.1109/ACCESS.2020.2980380]
18. [19] I. Kamaldinov and I. Makarov, Deep reinforcement learning methods in match-3 game, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11832 LNCS, pp. 51-62, 2019. [DOI:10.1007/978-3-030-37334-4_5]
19. [20] N. Napolitano, Testing match-3 video games with Deep Reinforcement Learning, arXiv, 2020.
20. [21] L. Gualà, S. Leucci, and E. Natale, Bejeweled, candy crush and other match-three games are (NP-)hard, In 2014 IEEE Conference on Computational Intelligence and Games, CIG, pp. 1-21, 2014. [DOI:10.1109/CIG.2014.6932866] [PMID]
21. [22] S. F. Gudmundsson et al., Human-Like Playtesting with Deep Learning, in IEEE Conference on Computational Intelligence and Games, CIG, 2018, vol. 2018. [DOI:10.1109/CIG.2018.8490442]
22. [23] L. Kaiser, M. Babaeizadeh, P. Milos, et al., Model Based Reinforcement Learning for Atari. In International Conference on Learning Representations, 2019.
23. [24] O. Vinyals, I. Babuschkin, W. M. Wojciech Czarnecki, M. Mathieu, A. Dudzik, J. Chung, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature 575, no. 7782 (2019): 350-354. [DOI:10.1038/s41586-019-1724-z] [PMID]
24. [25] R. Z., Liu, Pang, Z. Y. Meng, W. Wang, Y. Yu, T., On efficient reinforcement learning for full-length game of StarCraft ii, Journal of Artificial Intelligence Research, 75, 213-260, 2022. [DOI:10.1613/jair.1.13743]
25. [26] J. Perolat, B. De Vylder, D. Hennes, E. Tarassov, E., F. Strub, V. de Boer, Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378 (6623), 990-996, 2022. [DOI:10.1126/science.add4679] [PMID]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing