انتقال دانش تنظیم شده برای یادگیری تقویتی  چندعاملی

علوی, نیلوفر; طهمورث نژاد, جعفر

doi:10.61186/jsdp.20.4.141

دوره 20، شماره 4 - ( 12-1402 ) جلد 20 شماره 4 صفحات 160-141 | برگشت به فهرست نسخه ها

‎ 10.61186/jsdp.20.4.141

Mendeley

Zotero

RefWorks

Alavi N, Tahmoresnezhad J. Regularized Knowledge Transfer for Multi-Agent Reinforcement Learning. JSDP 2024; 20 (4) : 10
URL: http://jsdp.rcisp.ac.ir/article-1-1056-fa.html

علوی نیلوفر، طهمورث نژاد جعفر. انتقال دانش تنظیم شده برای یادگیری تقویتی چندعاملی. پردازش علائم و داده‌ها. 1402; 20 (4) :141-160

URL: http://jsdp.rcisp.ac.ir/article-1-1056-fa.html

انتقال دانش تنظیم شده برای یادگیری تقویتی چندعاملی

نیلوفر علوی

، جعفر طهمورث نژاد^*

دانشگاه صنعتی ارومیه

چکیده: (1438 مشاهده)

یادگیری تقویتی به آموزش مدل‌های یادگیری ماشین برای اتخاذ تصمیمات متوالی اشاره میکند که در آن یک عامل از طریق تعامل با محیط، آموزش دیده، نتایج این تعامل را مشاهده کرده و بر این اساس، پاداش مثبت یا منفی دریافت میکند. یادگیری تقویتی کاربردهای زیادی برای سیستمهای چندعاملی به خصوص در محیطهای پویا و ناشناخته دارد. با این حال، بیشتر الگوریتمهای یادگیری تقویتی چندعاملی با مشکلاتی همچون پیچیدگی محاسباتی نمایی برای محاسبه فضای حالت مشترک مواجه هستند که منجر به عدم مقیاسپذیری الگوریتمها درمسائل چندعاملی واقعی میشود. کاربردهای یادگیری تقویتی چندعاملی را میتوان از فوتبال ربات‌ها، شبکهها، محاسبات ابری، زمانبندی شغل تا اعزام نیروی واکنشی دستهبندی کرد. در این مقاله یک الگوریتم جدید به نام انتقال دانش تنظیم‌شده برای یادگیری تقویتی چندعاملی (RKT-MARL) معرفی میشود که براساس مدل تصمیمگیری مارکوف کار میکند. این الگوریتم برخلاف روشهای یادگیری تقویتی سنتی، مفاهیم تعاملات پراکنده و انتقال دانش را برای رسیدن به تعادل بین عاملها استفاده میکند. علاوه‌بر این، RKT-MARL از سازوکار مذاکره برای یافتن مجموعه تعادل و از روش کمینه واریانس برای انتخاب بهترین عمل در مجموعه تعادل بهدستآمده استفاده میکند. همچنین الگوریتم پیشنهادی، دانش مقادیر حالت-عمل را در میان عاملهای مختلف انتقال میدهد. از طرفی، الگوریتم RKT-MARL مقادیر Q را در حالتهای هماهنگی به عنوان ضریبی از اطلاعات محیطی جاری و دانش قبلی مقداردهی میکند. بهمنظور ارزیابی عملکرد روش پیشنهادی، یک گروه از آزمایشها بر روی پنج بازی جهانی انجامشده و نتایج حاصل بیانگر همگرایی سریع و مقیاسپذیری بالا در RKT-MARL‌ است.

شماره‌ی مقاله: 10

واژه‌های کلیدی: یادگیری تقویتی چند عاملی، انتقال دانش، تعادل‌های متا و نش، تنظیم‌پذیری، تعاملات پراکنده، مذاکره بین عامل‌ها.

متن کامل [PDF 1016 kb] (668 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش داده‌های رقمی
دریافت: 1398/5/11 | پذیرش: 1402/9/20 | انتشار: 1403/2/6 | انتشار الکترونیک: 1403/2/6

فهرست منابع

1. [1] C. Yu, M. Zhang, F. Ren and G. Tan, "Multiagent learning of coordination in loosely coupled multiagent systems," IEEE Transactions on Cybernetics, vol. 45, no.12, pp. 2853-2867, 2015. [DOI:10.1109/TCYB.2014.2387277] [PMID]

2. [2] J. Kober, J. A. Bagnell and J. Peters, "Reinforcement learning in robotics: A survey," The International Journal of Robotics Research,vol. 32, no. 11, pp. 1238-1274,2013. [DOI:10.1177/0278364913495721]

3. [3] R. Babuška, L. Busoniu and B. D. Schutter, "Reinforcement learning for multi-agent systems," IEEE International

4. Conference on Emerging Technologies and Factory Automation, 2006.

5. [4] K. Arulkumaran, M. P. Deisenroth, M. Brundage and A. A. Bharath, "A brief survey of deep reinforcement learning," arXiv preprint arXiv:1708.05866, 2017. [DOI:10.1109/MSP.2017.2743240]

6. [5] Q. Zhang, P. Jiao, Q. Yin and L. Sun, "Coordinated Learning by Model Difference Identification in Multiagent Systems with Sparse Interactions," Discrete Dynamics in Nature and Society, 2016. [DOI:10.1155/2016/3207460]

7. [6] A. OroojlooyJadid and D. Hajinezhad, "A review of cooperative multi-agent deep reinforcement learning , arXiv preprint arXiv: 1908.03963, 2019.

8. [7] A. Nowé, P. Vrancx and Y. M. D. Hauwere, "Game theory and multi-agent reinforcement learning," In Reinforcement Learning, Springer, Berlin, Heidelberg, pp. 441-470, 2012. [DOI:10.1007/978-3-642-27645-3_14]

9. [8] F. S. Melo and M. Velso, "Decentralized MDPs with sparse interactions," Artificial Intelligence, vol. 175, no.11, pp. 1757-1789, 2011. [DOI:10.1016/j.artint.2011.05.001]

10. [9] D. S. Bernstein, R. Givan, N. Immerman and S. Zilberstein, "The complexity of decentralized control of Markov decision processes," Mathematics of operations research, vol. 27, no. 4, pp. 819-840, 2002. [DOI:10.1287/moor.27.4.819.297]

11. [10] A. M. Metelli, M. Mutti and M. Restelli, "Configurable Markov decision processes," In International Conference on Machine Learning, pp. 3491-3500, PMLR, 2018.

12. [11] L. Zhou, P. Yang, C. Chen and Y. Gao, "Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer," IEEE transactions on cybernetics, vol. 47, no. 5, pp. 1238-1250, 2017. [DOI:10.1109/TCYB.2016.2543238] [PMID]

13. [12] L. Canese, G.C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, M. Re and S. Spano, "Multi-Agent Reinforcement Learning: A Review of Challenges and Applications," Applied Sciences, p. 4948, 2021. [DOI:10.3390/app11114948]

14. [13] J. Tahmoresnezhad and S. Hashemi, "Exploiting kernel-based feature weighting and instance clustering to transfer knowledge across domains," Turkish Journal of Electrical Engineering & Computer Sciences, vol. 25, no. 1, pp. 292-307, 2017. [DOI:10.3906/elk-1503-245]

15. [14] J. Tahmoresnezhad and S. Hashemi, "Visual domain adaptation via transfer feature learning," Knowledge and Information Systems, vol. 50, no. 2, pp. 585-605, 2017. [DOI:10.1007/s10115-016-0944-x]

16. [15] Y. Hu, Y. Gao and B. An, "Accelerating multiagent reinforcement learning by equilibrium transfer," IEEE transactions on cybernetics, vol. 45, no. 7, pp. 1289-1302, 2015. [DOI:10.1109/TCYB.2014.2349152] [PMID]

17. [16] C. J. C. H. Watkins, "Learning from delayed rewards," (Doctoral dissertation, King's College, Cambridge), 1989.

18. [17] Y. M. D. Hauwere, P. Vrancx and A. Nowé, "Learning multi-agent state space representations," In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 715-722, 2010.

19. [18] Y. Hu, Y. Gao and B. An, "Multiagent reinforcement learning with unshared value functions," IEEE transactions on cybernetics, vol. 45, no. 4, pp. 647-662, 2015. [DOI:10.1109/TCYB.2014.2332042] [PMID]

20. [19] D. Abel, Y. Jinnai, S. Y. Guo, G. Konidaris and M. Littman, "Policy and Value Transfer in Lifelong Reinforcement Learning," In International Conference on Machine Learning, pp. 20-29, 2018.

21. [20] Y. Hu, Y. Gao and B. An, "Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction," In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 753-761, 2015.

22. [21] P. Mannion, "Knowledge-based multi-objective multi-agent reinforcement learning (Doctoral dissertation), 2017. [DOI:10.1017/S0269888918000292]

23. [22] T. Kujirai and T. Yokota, "Breaking Deadlocks in Multi-agent Reinforcement Learning with Sparse Interaction," In Pacific Rim Internatioal Conference on Artifical Intelligence, pp. 746-759, 2019. [DOI:10.1007/978-3-030-29908-8_58]

24. [23] Y. Liu, Y. Hu, Y.Gao, Y.Chen and C.Fan, "Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns," In IJCAI, pp. 457-463, 2019. [DOI:10.24963/ijcai.2019/65]

25. [24] F. S. Melo and M. Veloso, "Learning of coordination: Exploiting sparse interactions in multiagent systems," In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 773-780, 2009.

26. [25] P. Vrancx, Y. M. D. Hauwere and A. Nowé, "Transfer Learning for Multi-agent Coordination," In ICAART (2) , pp. 263-272, 2011.

27. [26] Melo, Francisco S., and Manuela Veloso. "Decentralized MDPs with sparse interactions," Artificial Intelligence 175.11 (2011). [DOI:10.1016/j.artint.2011.05.001]

28. [27] Sherafati F, Tahmoresnezhad J. Image Classification via Sparse Representation and Subspace Alignment. JSDP 2020; 17 (2) :58-47. [DOI:10.29252/jsdp.17.2.58]

29. [28] Zandifar M, Tahmoresnezhad J. Sample-oriented Domain Adaptation for Image Classification. JSDP 2019; 16 (3) :148-129. [DOI:10.29252/jsdp.16.3.148]

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی