While advanced Large Language Models(LLMs)can simulate human-like prosocial behaviors,the degree to which they align with human prosocial values and the underlying afective mechanisms remain unclear.This study address...While advanced Large Language Models(LLMs)can simulate human-like prosocial behaviors,the degree to which they align with human prosocial values and the underlying afective mechanisms remain unclear.This study addressed these gaps using the third-party punishment(TPP)paradigm,comparing LLM agents(GPT and DeepSeek series)with human participants(n=100).The LLM agents(n=500,100 agents per model)were one-to-one constructed based on the demographic and psychological features of human participants.Prompt engineering was employed to initiate TPP games and record punitive decisions and afective responses in LLM agents.Results revealed that:(1)GPT-4o,DeepSeek-V3,and DeepSeek-R1 models demonstrated stronger fairness value alignment,choosing punitive options more frequently than humans in TPP games;(2)all LLMs replicated the human pathway from unfairness through negative afective response to punitive decisions,with stronger mediation efects of negative emotions observed in DeepSeek models than GPT models;(3)only DeepSeek-R1 exhibited the human-like positive feedback loop from previous punitive decisions to positive afective feedback and subsequent punitive choices;(4)most LLMs(excluding GPT-3.5)showed signifcant representational similarity to human afect-decision patterns;(5)notably,all LLMs displayed rigid afective dynamics,characterized by lower afective variability and higher afective inertia than the fexible,contextsensitive fuctuations observed in humans.These fndings highlight notable advances in prosocial value alignment but underscore the necessity to enhance their afective dynamics to foster robust,adaptive prosocial LLMs.Such advancements could not only accelerate LLMs'alignment with human values but also provide empirical support for the broader applicability of prosocial theories to LLM agents.展开更多
The release of the generative pre-trained transformer(GPT)series has brought artificial general intelligence(AGI)to the forefront of the artificial intelligence(AI)field once again.However,the questions of how to defi...The release of the generative pre-trained transformer(GPT)series has brought artificial general intelligence(AGI)to the forefront of the artificial intelligence(AI)field once again.However,the questions of how to define and evaluate AGI remain unclear.This perspective article proposes that the evaluation of AGI should be rooted in dynamic embodied physical and social interactions(DEPSI).More specifically,we propose five critical characteristics to be considered as AGI benchmarks and suggest the Tong test as an AGI evaluation system.The Tong test describes a value-and ability-oriented testing system that delineates five levels of AGI milestones through a virtual environment with DEPSI,allowing for infinite task generation.We contrast the Tong test with classical AI testing systems in terms of various aspects and propose a systematic evaluation system to promote standardized,quantitative,and objective benchmarks and evaluation of AGI.展开更多
Purpose:The purpose of this paper is to explore whether the four value alignment strategies available to educators(Scaffolding,Balancing,Intervention,and Refuge)previously identified in the mathematics education liter...Purpose:The purpose of this paper is to explore whether the four value alignment strategies available to educators(Scaffolding,Balancing,Intervention,and Refuge)previously identified in the mathematics education literature comprehensively capture educator value alignment strategies in an in terve ntion con text.Design/Approach/Methods:To this end,we analyse semi-structured interview data with two teacher-leaders involved in the Getting Ready in Numeracy(G.R.I.N.)intervention program through a value alignment lens.Findings:We ascertain that a fifth strategy,the Beacon strategy,is needed to describe the range of value alignment strategies employed by educators in the GRI.N.program.The Beacon strategy involves the educator digging in and reasserting their expectations until the student behaves in a manner that aligns with the educator's values.In part it invoIves the educator being able to recognize their own values and clearly communicating these values to students.O rigin al ity/Value:This article further explores strategies that educators have at their disposal for aligning their values with those of their students.The uncovering of the Beacon strategy is particularly valuable as it suggests that educators could be purposefully pursuing value alignment even when they do not appear to take any active steps to move further towards their students'sets of values.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.32271110,62441614)the Tsinghua University Initiative Scientific Research Program(Grant No.20235080047)。
文摘While advanced Large Language Models(LLMs)can simulate human-like prosocial behaviors,the degree to which they align with human prosocial values and the underlying afective mechanisms remain unclear.This study addressed these gaps using the third-party punishment(TPP)paradigm,comparing LLM agents(GPT and DeepSeek series)with human participants(n=100).The LLM agents(n=500,100 agents per model)were one-to-one constructed based on the demographic and psychological features of human participants.Prompt engineering was employed to initiate TPP games and record punitive decisions and afective responses in LLM agents.Results revealed that:(1)GPT-4o,DeepSeek-V3,and DeepSeek-R1 models demonstrated stronger fairness value alignment,choosing punitive options more frequently than humans in TPP games;(2)all LLMs replicated the human pathway from unfairness through negative afective response to punitive decisions,with stronger mediation efects of negative emotions observed in DeepSeek models than GPT models;(3)only DeepSeek-R1 exhibited the human-like positive feedback loop from previous punitive decisions to positive afective feedback and subsequent punitive choices;(4)most LLMs(excluding GPT-3.5)showed signifcant representational similarity to human afect-decision patterns;(5)notably,all LLMs displayed rigid afective dynamics,characterized by lower afective variability and higher afective inertia than the fexible,contextsensitive fuctuations observed in humans.These fndings highlight notable advances in prosocial value alignment but underscore the necessity to enhance their afective dynamics to foster robust,adaptive prosocial LLMs.Such advancements could not only accelerate LLMs'alignment with human values but also provide empirical support for the broader applicability of prosocial theories to LLM agents.
基金supported by the National Key Research and Development Program of China (2022ZD0114900).
文摘The release of the generative pre-trained transformer(GPT)series has brought artificial general intelligence(AGI)to the forefront of the artificial intelligence(AI)field once again.However,the questions of how to define and evaluate AGI remain unclear.This perspective article proposes that the evaluation of AGI should be rooted in dynamic embodied physical and social interactions(DEPSI).More specifically,we propose five critical characteristics to be considered as AGI benchmarks and suggest the Tong test as an AGI evaluation system.The Tong test describes a value-and ability-oriented testing system that delineates five levels of AGI milestones through a virtual environment with DEPSI,allowing for infinite task generation.We contrast the Tong test with classical AI testing systems in terms of various aspects and propose a systematic evaluation system to promote standardized,quantitative,and objective benchmarks and evaluation of AGI.
文摘Purpose:The purpose of this paper is to explore whether the four value alignment strategies available to educators(Scaffolding,Balancing,Intervention,and Refuge)previously identified in the mathematics education literature comprehensively capture educator value alignment strategies in an in terve ntion con text.Design/Approach/Methods:To this end,we analyse semi-structured interview data with two teacher-leaders involved in the Getting Ready in Numeracy(G.R.I.N.)intervention program through a value alignment lens.Findings:We ascertain that a fifth strategy,the Beacon strategy,is needed to describe the range of value alignment strategies employed by educators in the GRI.N.program.The Beacon strategy involves the educator digging in and reasserting their expectations until the student behaves in a manner that aligns with the educator's values.In part it invoIves the educator being able to recognize their own values and clearly communicating these values to students.O rigin al ity/Value:This article further explores strategies that educators have at their disposal for aligning their values with those of their students.The uncovering of the Beacon strategy is particularly valuable as it suggests that educators could be purposefully pursuing value alignment even when they do not appear to take any active steps to move further towards their students'sets of values.