Volume 20, Issue 4 (3-2024)                   JSDP 2024, 20(4): 141-160 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Alavi N, Tahmoresnezhad J. Regularized Knowledge Transfer for Multi-Agent Reinforcement Learning. JSDP 2024; 20 (4) : 10
URL: http://jsdp.rcisp.ac.ir/article-1-1056-en.html
Urmia University of Technology
Abstract:   (1829 Views)
Reinforcement learning (RL) refers to the training of machine learning models to make a sequence of decisions on which an agent learns by interacting with its environment, observing the results of interactions and receiving a positive or negative reward, accordingly. RL has many applications for multi-agent systems, especially in dynamic and unknown environments. However, most multi-agent reinforcement learning (MARL) algorithms suffer from some problems specifically the exponential computational complexity to calculate the joint state-action space, which leads to the lack of scalability of algorithms in realistic multi-agent problems. Applications of MARL can be categorized from robot soccer, networks, cloud computing, job scheduling, and to optimal reactive power dispatch.
In the area of reinforcement learning algorithms, there are serious challenges such as the lack of application of equilibrium-based algorithms in practice and high computational complexity to find equilibrium.  On the other hand, since agents have no concept of equilibrium policies, they tend to act aggressively toward their goals, which it results the high probability of collisions.
Consequently, in this paper, a novel algorithm called Regularized Knowledge Transfer for Multi-Agent Reinforcement Learning (RKT-MARL) is presented that relies on Markov decision process (MDP) model. RKT-MARL unlike the traditional reinforcement learning methods exploits the sparse interactions and knowledge transfer to achieve an equilibrium across agents. Moreover, RKT-MARL benefits from negotiation to find the equilibrium set. RKT-MARL uses the minimum variance method to select the best action in the equilibrium set, and transfers the knowledge of state-action values across various agents. Also, RKT-MARL initializes the Q-values in coordinate states as coefficients of current environmental information and previous knowledge. In order to evaluate the performance of our proposed method, groups of experiments are conducted on five grid world games and the results show the fast convergence and high scalability of RKT-MARL. Therefore, the fast convergence of our proposed method indicates that the agents quickly solve the problem of reinforcement learning and approach to their goal.
 
Article number: 10
Full-Text [PDF 1016 kb]   (890 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2019/08/2 | Accepted: 2023/12/11 | Published: 2024/04/25 | ePublished: 2024/04/25

References
1. [1] C. Yu, M. Zhang, F. Ren and G. Tan, "Multiagent learning of coordination in loosely coupled multiagent systems," IEEE Transactions on Cybernetics, vol. 45, no.12, pp. 2853-2867, 2015. [DOI:10.1109/TCYB.2014.2387277] [PMID]
2. [2] J. Kober, J. A. Bagnell and J. Peters, "Reinforcement learning in robotics: A survey," The International Journal of Robotics Research,vol. 32, no. 11, pp. 1238-1274,2013. [DOI:10.1177/0278364913495721]
3. [3] R. Babuška, L. Busoniu and B. D. Schutter, "Reinforcement learning for multi-agent systems," IEEE International
4. Conference on Emerging Technologies and Factory Automation, 2006.
5. [4] K. Arulkumaran, M. P. Deisenroth, M. Brundage and A. A. Bharath, "A brief survey of deep reinforcement learning," arXiv preprint arXiv:1708.05866, 2017. [DOI:10.1109/MSP.2017.2743240]
6. [5] Q. Zhang, P. Jiao, Q. Yin and L. Sun, "Coordinated Learning by Model Difference Identification in Multiagent Systems with Sparse Interactions," Discrete Dynamics in Nature and Society, 2016. [DOI:10.1155/2016/3207460]
7. [6] A. OroojlooyJadid and D. Hajinezhad, "A review of cooperative multi-agent deep reinforcement learning , arXiv preprint arXiv: 1908.03963, 2019.
8. [7] A. Nowé, P. Vrancx and Y. M. D. Hauwere, "Game theory and multi-agent reinforcement learning," In Reinforcement Learning, Springer, Berlin, Heidelberg, pp. 441-470, 2012. [DOI:10.1007/978-3-642-27645-3_14]
9. [8] F. S. Melo and M. Velso, "Decentralized MDPs with sparse interactions," Artificial Intelligence, vol. 175, no.11, pp. 1757-1789, 2011. [DOI:10.1016/j.artint.2011.05.001]
10. [9] D. S. Bernstein, R. Givan, N. Immerman and S. Zilberstein, "The complexity of decentralized control of Markov decision processes," Mathematics of operations research, vol. 27, no. 4, pp. 819-840, 2002. [DOI:10.1287/moor.27.4.819.297]
11. [10] A. M. Metelli, M. Mutti and M. Restelli, "Configurable Markov decision processes," In International Conference on Machine Learning, pp. 3491-3500, PMLR, 2018.
12. [11] L. Zhou, P. Yang, C. Chen and Y. Gao, "Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer," IEEE transactions on cybernetics, vol. 47, no. 5, pp. 1238-1250, 2017. [DOI:10.1109/TCYB.2016.2543238] [PMID]
13. [12] L. Canese, G.C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, M. Re and S. Spano, "Multi-Agent Reinforcement Learning: A Review of Challenges and Applications," Applied Sciences, p. 4948, 2021. [DOI:10.3390/app11114948]
14. [13] J. Tahmoresnezhad and S. Hashemi, "Exploiting kernel-based feature weighting and instance clustering to transfer knowledge across domains," Turkish Journal of Electrical Engineering & Computer Sciences, vol. 25, no. 1, pp. 292-307, 2017. [DOI:10.3906/elk-1503-245]
15. [14] J. Tahmoresnezhad and S. Hashemi, "Visual domain adaptation via transfer feature learning," Knowledge and Information Systems, vol. 50, no. 2, pp. 585-605, 2017. [DOI:10.1007/s10115-016-0944-x]
16. [15] Y. Hu, Y. Gao and B. An, "Accelerating multiagent reinforcement learning by equilibrium transfer," IEEE transactions on cybernetics, vol. 45, no. 7, pp. 1289-1302, 2015. [DOI:10.1109/TCYB.2014.2349152] [PMID]
17. [16] C. J. C. H. Watkins, "Learning from delayed rewards," (Doctoral dissertation, King's College, Cambridge), 1989.
18. [17] Y. M. D. Hauwere, P. Vrancx and A. Nowé, "Learning multi-agent state space representations," In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 715-722, 2010.
19. [18] Y. Hu, Y. Gao and B. An, "Multiagent reinforcement learning with unshared value functions," IEEE transactions on cybernetics, vol. 45, no. 4, pp. 647-662, 2015. [DOI:10.1109/TCYB.2014.2332042] [PMID]
20. [19] D. Abel, Y. Jinnai, S. Y. Guo, G. Konidaris and M. Littman, "Policy and Value Transfer in Lifelong Reinforcement Learning," In International Conference on Machine Learning, pp. 20-29, 2018.
21. [20] Y. Hu, Y. Gao and B. An, "Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction," In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 753-761, 2015.
22. [21] P. Mannion, "Knowledge-based multi-objective multi-agent reinforcement learning (Doctoral dissertation), 2017. [DOI:10.1017/S0269888918000292]
23. [22] T. Kujirai and T. Yokota, "Breaking Deadlocks in Multi-agent Reinforcement Learning with Sparse Interaction," In Pacific Rim Internatioal Conference on Artifical Intelligence, pp. 746-759, 2019. [DOI:10.1007/978-3-030-29908-8_58]
24. [23] Y. Liu, Y. Hu, Y.Gao, Y.Chen and C.Fan, "Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns," In IJCAI, pp. 457-463, 2019. [DOI:10.24963/ijcai.2019/65]
25. [24] F. S. Melo and M. Veloso, "Learning of coordination: Exploiting sparse interactions in multiagent systems," In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 773-780, 2009.
26. [25] P. Vrancx, Y. M. D. Hauwere and A. Nowé, "Transfer Learning for Multi-agent Coordination," In ICAART (2) , pp. 263-272, 2011.
27. [26] Melo, Francisco S., and Manuela Veloso. "Decentralized MDPs with sparse interactions," Artificial Intelligence 175.11 (2011). [DOI:10.1016/j.artint.2011.05.001]
28. [27] Sherafati F, Tahmoresnezhad J. Image Classification via Sparse Representation and Subspace Alignment. JSDP 2020; 17 (2) :58-47. [DOI:10.29252/jsdp.17.2.58]
29. [28] Zandifar M, Tahmoresnezhad J. Sample-oriented Domain Adaptation for Image Classification. JSDP 2019; 16 (3) :148-129. [DOI:10.29252/jsdp.16.3.148]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing