Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of diff...Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of different reset strategies,normal,non-randomized,and randomized,on agent performance using the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed DDPG(TD3)algorithms within the CarRacing-v2 environment.Two experimental setups were conducted:an extended training regime with DDPG for 1000 steps per episode across 1000 episodes,and a fast execution setup comparing DDPG and TD3 for 30 episodes with 50 steps per episode under constrained computational resources.A step-based reward scaling mechanism was applied under the randomized reset condition to promote broader state exploration.Experimental results showthat randomized resets significantly enhance learning efficiency and generalization,with DDPG demonstrating superior performance across all reset strategies.In particular,DDPG combined with randomized resets achieves the highest smoothed rewards(reaching approximately 15),best stability,and fastest convergence.These differences are statistically significant,as confirmed by t-tests:DDPG outperforms TD3 under randomized(t=−101.91,p<0.0001),normal(t=−21.59,p<0.0001),and non-randomized(t=−62.46,p<0.0001)reset conditions.The findings underscore the critical role of reset strategy and reward shaping in enhancing the robustness and adaptability of DRL agents in continuous control tasks,particularly in environments where computational efficiency and training stability are crucial.展开更多
Dear Editor,This letter investigates predefined-time optimization problems(OPs) of multi-agent systems(MASs), where the agent of MASs is subject to inequality constraints, and the team objective function accounts for ...Dear Editor,This letter investigates predefined-time optimization problems(OPs) of multi-agent systems(MASs), where the agent of MASs is subject to inequality constraints, and the team objective function accounts for impulse effects. Firstly, to address the inequality constraints,the penalty method is introduced. Then, a novel optimization strategy is developed, which only requires that the team objective function be strongly convex.展开更多
Melatonin(N-acetyl-5-methoxytryptamine)is known as the hormone of darkness because it is synthesized at night and involved in regulating the circadian clock.The hormone is primarily synthesized by the vertebrate pinea...Melatonin(N-acetyl-5-methoxytryptamine)is known as the hormone of darkness because it is synthesized at night and involved in regulating the circadian clock.The hormone is primarily synthesized by the vertebrate pineal gland,but is ubiquitous among invertebrates,unicellular organisms,plants,and even cyanobacteria(Hattori and Suzuki,2024).Melatonin is well-conserved evolutionarily and possesses several physiological functions,such as immune response,bone and glucose metabolism,and memory formation besides regulating the circadian rhythm.展开更多
Hard carbon(HC)is widely used in sodium-ion batteries(SIBs),but its performance has always been limited by lowinitial Coulombic efficiency(ICE)and cycling stability.Cathode compensation agent is a favorable strategy t...Hard carbon(HC)is widely used in sodium-ion batteries(SIBs),but its performance has always been limited by lowinitial Coulombic efficiency(ICE)and cycling stability.Cathode compensation agent is a favorable strategy to make up for the loss of active sodium ions consumed byHCanode.Yet it lacks agent that effectively decomposes to increase the active sodium ions as well as regulate carbon defects for decreasing the irreversible sodium ions consumption.Here,we propose 1,2-dihydroxybenzene Na salt(NaDB)as a cathode compensation agent with high specific capacity(347.9 mAh g^(-1)),lower desodiation potential(2.4–2.8 V)and high utilization(99%).Meanwhile,its byproduct could functionalize HC with more C=O groups and promote its reversible capacity.Consequently,the presodiation hard carbon(pHC)anode exhibits highly reversible capacity of 204.7 mAh g^(-1) with 98%retention at 5 C rate over 1000 cycles.Moreover,with 5 wt%NaDB initially coated on the Na3V2(PO4)3(NVP)cathode,the capacity retention of NVP + NaDB|HC cell could increase from 22%to 89%after 1000 cycles at 1 C rate.This work provides a new avenue to improve reversible capacity and cycling performance of SIBs through designing functional cathode compensation agent.展开更多
本文根据企业公开信息,收集并整理了国内部分AI Agent应用相关企业。从Chatbots到AI Agents根据OpenAI的定义,通用人工智能(artificial general intelligence,AGI)的终极目标是创造一种能够像人类一样完成各种复杂任务,同时还能自然交...本文根据企业公开信息,收集并整理了国内部分AI Agent应用相关企业。从Chatbots到AI Agents根据OpenAI的定义,通用人工智能(artificial general intelligence,AGI)的终极目标是创造一种能够像人类一样完成各种复杂任务,同时还能自然交流的人工智能(AI)。OpenAI将AGI的发展分成了五个等级:第一级是“聊天机器人”(chatbots),具备语言对话能力的人工智能;第二级是“推理者”(reasoners),具备人类的推理水平,能解决多种复杂难题的人工智能;第三级是“代理人”(agents)。展开更多
This study constructs a reflective feedback model based on a pedagogical agent(PA)and explores its impact on students’problem-solving ability and cognitive load.A quasi-experimental design was used in the study,with ...This study constructs a reflective feedback model based on a pedagogical agent(PA)and explores its impact on students’problem-solving ability and cognitive load.A quasi-experimental design was used in the study,with 84 students from a middle school selected as the research subjects(44 in the experimental group and 40 in the control group).The experimental group used the reflective feedback model,while the control group used the factual feedback model.The results show that,compared with factual feedback,the reflective feedback model based on the pedagogical agent significantly improves students’problem-solving ability,especially at the action and thinking levels.In addition,this model effectively reduces students’cognitive load,especially in terms of internal and external load.展开更多
基金supported by the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia(Project No.MoE-IF-UJ-R2-22-04220773-1).
文摘Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of different reset strategies,normal,non-randomized,and randomized,on agent performance using the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed DDPG(TD3)algorithms within the CarRacing-v2 environment.Two experimental setups were conducted:an extended training regime with DDPG for 1000 steps per episode across 1000 episodes,and a fast execution setup comparing DDPG and TD3 for 30 episodes with 50 steps per episode under constrained computational resources.A step-based reward scaling mechanism was applied under the randomized reset condition to promote broader state exploration.Experimental results showthat randomized resets significantly enhance learning efficiency and generalization,with DDPG demonstrating superior performance across all reset strategies.In particular,DDPG combined with randomized resets achieves the highest smoothed rewards(reaching approximately 15),best stability,and fastest convergence.These differences are statistically significant,as confirmed by t-tests:DDPG outperforms TD3 under randomized(t=−101.91,p<0.0001),normal(t=−21.59,p<0.0001),and non-randomized(t=−62.46,p<0.0001)reset conditions.The findings underscore the critical role of reset strategy and reward shaping in enhancing the robustness and adaptability of DRL agents in continuous control tasks,particularly in environments where computational efficiency and training stability are crucial.
基金supported in part by the National Natural Science Foundation of China(62276119)the Natural Science Foundation of Jiangsu Province(BK20241764)the Postgraduate Research & Practice Innovation Program of Jiangsu Province(KYCX22_2860)
文摘Dear Editor,This letter investigates predefined-time optimization problems(OPs) of multi-agent systems(MASs), where the agent of MASs is subject to inequality constraints, and the team objective function accounts for impulse effects. Firstly, to address the inequality constraints,the penalty method is introduced. Then, a novel optimization strategy is developed, which only requires that the team objective function be strongly convex.
基金supported by JSPS KAKENHI Grant Number JP22K11823 to AH and JP22J01508 to KW。
文摘Melatonin(N-acetyl-5-methoxytryptamine)is known as the hormone of darkness because it is synthesized at night and involved in regulating the circadian clock.The hormone is primarily synthesized by the vertebrate pineal gland,but is ubiquitous among invertebrates,unicellular organisms,plants,and even cyanobacteria(Hattori and Suzuki,2024).Melatonin is well-conserved evolutionarily and possesses several physiological functions,such as immune response,bone and glucose metabolism,and memory formation besides regulating the circadian rhythm.
基金supported by National Natural Science Foundation of China(No.22278308 and 22109114)Open Foundation of Shanghai Jiao Tong University Shaoxing Research Institute of Renewable Energy and Molecular Engineering(Grant number:JDSX2022023).
文摘Hard carbon(HC)is widely used in sodium-ion batteries(SIBs),but its performance has always been limited by lowinitial Coulombic efficiency(ICE)and cycling stability.Cathode compensation agent is a favorable strategy to make up for the loss of active sodium ions consumed byHCanode.Yet it lacks agent that effectively decomposes to increase the active sodium ions as well as regulate carbon defects for decreasing the irreversible sodium ions consumption.Here,we propose 1,2-dihydroxybenzene Na salt(NaDB)as a cathode compensation agent with high specific capacity(347.9 mAh g^(-1)),lower desodiation potential(2.4–2.8 V)and high utilization(99%).Meanwhile,its byproduct could functionalize HC with more C=O groups and promote its reversible capacity.Consequently,the presodiation hard carbon(pHC)anode exhibits highly reversible capacity of 204.7 mAh g^(-1) with 98%retention at 5 C rate over 1000 cycles.Moreover,with 5 wt%NaDB initially coated on the Na3V2(PO4)3(NVP)cathode,the capacity retention of NVP + NaDB|HC cell could increase from 22%to 89%after 1000 cycles at 1 C rate.This work provides a new avenue to improve reversible capacity and cycling performance of SIBs through designing functional cathode compensation agent.
文摘本文根据企业公开信息,收集并整理了国内部分AI Agent应用相关企业。从Chatbots到AI Agents根据OpenAI的定义,通用人工智能(artificial general intelligence,AGI)的终极目标是创造一种能够像人类一样完成各种复杂任务,同时还能自然交流的人工智能(AI)。OpenAI将AGI的发展分成了五个等级:第一级是“聊天机器人”(chatbots),具备语言对话能力的人工智能;第二级是“推理者”(reasoners),具备人类的推理水平,能解决多种复杂难题的人工智能;第三级是“代理人”(agents)。
基金023 Zhejiang Provincial Department of Education General Project:Research on an interdisciplinary teaching model to promote the development of computational thinking in the context of the new curriculum standards[Grant NO:Y202351596]Key Project of Zhejiang Provincial Education Science Planning:Research on an interdisciplinary teaching model to promote students’computational thinking from multiple analytical perspectives[Grant NO:2025SB103].
文摘This study constructs a reflective feedback model based on a pedagogical agent(PA)and explores its impact on students’problem-solving ability and cognitive load.A quasi-experimental design was used in the study,with 84 students from a middle school selected as the research subjects(44 in the experimental group and 40 in the control group).The experimental group used the reflective feedback model,while the control group used the factual feedback model.The results show that,compared with factual feedback,the reflective feedback model based on the pedagogical agent significantly improves students’problem-solving ability,especially at the action and thinking levels.In addition,this model effectively reduces students’cognitive load,especially in terms of internal and external load.