The cooperative multi-agent reinforcement learning(MARL)field has experienced remarkable progress.However,these advanced methods still face substantial challenges in real-world applications.A significant direction for...The cooperative multi-agent reinforcement learning(MARL)field has experienced remarkable progress.However,these advanced methods still face substantial challenges in real-world applications.A significant direction for improving cooperative MARL techniques and addressing existing challenges is robust and adaptive partner modelling.Reasoning about the beliefs of partners,such as their intentions and behaviors,is crucial for partner modelling,which is known as the theory of mind(ToM)in cognitive science.In animals,biological ToM reasoning in the prefrontal cortex(PFC)plays an important role in complex environment survival before decision-making.However,the biological PFC is too complex to be directly incorporated into conventional artificial neural networks(ANNs)in either functional or structural manners.Large reasoning language models(LRMs)have recently demonstrated significant human-like reasoning abilities and impressive performance.Therefore,we propose an improved LRM framework to simulate the PFC for robust and adaptive partner modelling.Despite the excellent performance of LRMs in various fields,their ToM reasoning capabilities remain limited in complex MARL scenarios.Therefore,we further propose a ToM reasoner to enhance the ToM reasoning abilities of LRMs.Our framework exhibits robustness and adaptability across various LRM sizes,improving the ToM reasoning ability of agents and facilitating more effective partner modelling,thereby achieving higher performance scores in cooperative benchmarks.展开更多
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences,China(No.XDA27040200)the Beijing Nova Program,China(No.20230484369)the Youth Innovation Promotion Association of the Chinese Academy of Sciences,China.
文摘The cooperative multi-agent reinforcement learning(MARL)field has experienced remarkable progress.However,these advanced methods still face substantial challenges in real-world applications.A significant direction for improving cooperative MARL techniques and addressing existing challenges is robust and adaptive partner modelling.Reasoning about the beliefs of partners,such as their intentions and behaviors,is crucial for partner modelling,which is known as the theory of mind(ToM)in cognitive science.In animals,biological ToM reasoning in the prefrontal cortex(PFC)plays an important role in complex environment survival before decision-making.However,the biological PFC is too complex to be directly incorporated into conventional artificial neural networks(ANNs)in either functional or structural manners.Large reasoning language models(LRMs)have recently demonstrated significant human-like reasoning abilities and impressive performance.Therefore,we propose an improved LRM framework to simulate the PFC for robust and adaptive partner modelling.Despite the excellent performance of LRMs in various fields,their ToM reasoning capabilities remain limited in complex MARL scenarios.Therefore,we further propose a ToM reasoner to enhance the ToM reasoning abilities of LRMs.Our framework exhibits robustness and adaptability across various LRM sizes,improving the ToM reasoning ability of agents and facilitating more effective partner modelling,thereby achieving higher performance scores in cooperative benchmarks.