Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangd...Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangdong Province展开更多
Employee performance is widely regarded as a cornerstone of organizational success,and in fast-changing industries it becomes even more critical.China’s electric vehicle(EV)sector exemplifies this challenge,where rap...Employee performance is widely regarded as a cornerstone of organizational success,and in fast-changing industries it becomes even more critical.China’s electric vehicle(EV)sector exemplifies this challenge,where rapid innovation and intense competition require companies to motivate employees for both immediate efficiency and long-term commitment.This study explores how extrinsic rewards include bonuses,gifts,promotions,benefits,and intrinsic rewards,including recognition,career development,learning opportunities,and responsibility,influence task and contextual performance.A quantitative design was employed,using survey data and statistical analyses to test the proposed framework.The findings show that both extrinsic and intrinsic rewards significantly enhance performance but operate differently.Extrinsic rewards are more closely linked to short-term improvements,while intrinsic rewards foster deeper engagement and sustained contributions.By combining Herzberg’s Two-Factor Theory and Self-Determination Theory,the study demonstrates that effective reward systems must balance financial incentives with psychological motivators.The results provide theoretical contributions and practical guidance for managers seeking to strengthen motivation,build resilience,and promote sustainable performance.展开更多
BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer po...BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer potential benefits in ame-liorating this symptomatology in patients with MDD.AIM To further assess the efficacy of P.acidilactici CCFM6432 in alleviating anhedonia in patients with MDD,using a combination of objective and subjective assessment tools.METHODS Adult patients with MDD exhibiting anhedonic symptoms were enrolled and randomly assigned to two treatment groups:One receiving standard antide-pressant therapy plus P.acidilactici CCFM6432,and the other receiving standard antidepressant treatment along with a placebo,for 30 days.Assessments were conducted at baseline and post-intervention using the Hamilton Depression Rating Scale(HAMD),Temporal Experience of Pleasure Scale(TEPS),and synchronous electroencephalography(EEG)during a"Doors Guessing Task."Changes in both clinical outcomes and EEG biomarkers,specifically the stimulus-preceding negativity(SPN)and feedback-related nega-tivity amplitudes,were analyzed.RESULTS Of the 92 screened participants,71 were enrolled and 55 completed the study(CCFM6432 group:n=27;Placebo group:n=28).No baseline differences were noted between the groups in terms of demographics,clinical assessments,or EEG metrics.A mixed-design analysis of variance revealed that the CCFM6432 group showed significantly greater improvements in both HAMD and TEPS scores compared to the Placebo group.Moreover,the CCFM6432 group demonstrated a significant increase in SPN amplitudes,which were inversely correlated with the improvements observed in HAMD scores.No such changes were observed in the Placebo group.CONCLUSION Adjunctive administration of P.acidilactici CCFM6432 not only augments the therapeutic efficacy of antide-pressants but also significantly ameliorates the symptoms of anhedonia in MDD.展开更多
Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environ...Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments.展开更多
For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper consi...For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper considers a disaster-affected scenario where UAVs undertake the role of MEC servers to provide computing resources for Disaster Relief Devices(DRDs).Considering the fairness of DRDs,a max-min problem is formulated to optimize the saved time by jointly designing the trajectory of the UAVs,the offloading policy and serving time under the constraint of the UAVs'energy capacity.To solve the above non-convex problem,we first model the service process as a Markov Decision Process(MDP)with the Reward Shaping(RS)technique,and then propose a Deep Reinforcement Learning(DRL)based algorithm to find the optimal solution for the MDP.Simulations show that the proposed RS-DRL algorithm is valid and effective,and has better performance than the baseline algorithms.展开更多
The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stoppi...The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stopping task paradigm,with 66 children with ADHD tendencies as the research subjects.A mixed design of 2(reward and punishment type:reward,punishment)×2(stimulus type:monetary stimulus,social stimulus)was used.The analysis applied a between intervention group(with reward and punishment type variables)and within type of reward approach(by stimulus type as intra subject variables).The results showed that monetary punishment better promotes behavioral inhibition in children with an ADHD tendency than does reward.In addition,this study showed that monetary punishment and social rewards affected the speed–accuracy trade-off of inhibited behavior in children with an ADHD tendency.Thesefindings suggest that withdrawal of a material token resulted in more behavioural compliance in children with an ADHD tendency.展开更多
Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily...Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily involving abnormal development and damage of the dopaminergic system,pose significant public health challenges.Microglia,as the primary immune cells in the brain,are crucial in regulating neuronal circuit development and survival.From the embryonic stage to adulthood,microglia exhibit stage-specific gene expression profiles,transcriptome characteristics,and functional phenotypes,enhancing the susceptibility to early life stress.However,the role of microglia in mediating dopaminergic system disorders under early life stress conditions remains poorly understood.This review presents an up-to-date overview of preclinical studies elucidating the impact of early life stress on microglia,leading to dopaminergic system disorders,along with the underlying mechanisms and therapeutic potential for neurodegenerative and neurodevelopmental conditions.Impaired microglial activity damages dopaminergic neurons by diminishing neurotrophic support(e.g.,insulin-like growth factor-1)and hinders dopaminergic axon growth through defective phagocytosis and synaptic pruning.Furthermore,blunted microglial immunoreactivity suppresses striatal dopaminergic circuit development and reduces neuronal transmission.Furthermore,inflammation and oxidative stress induced by activated microglia can directly damage dopaminergic neurons,inhibiting dopamine synthesis,reuptake,and receptor activity.Enhanced microglial phagocytosis inhibits dopamine axon extension.These long-lasting effects of microglial perturbations may be driven by early life stress–induced epigenetic reprogramming of microglia.Indirectly,early life stress may influence microglial function through various pathways,such as astrocytic activation,the hypothalamic–pituitary–adrenal axis,the gut–brain axis,and maternal immune signaling.Finally,various therapeutic strategies and molecular mechanisms for targeting microglia to restore the dopaminergic system were summarized and discussed.These strategies include classical antidepressants and antipsychotics,antibiotics and anti-inflammatory agents,and herbal-derived medicine.Further investigations combining pharmacological interventions and genetic strategies are essential to elucidate the causal role of microglial phenotypic and functional perturbations in the dopaminergic system disrupted by early life stress.展开更多
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is propos...In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is proposed in this paper for automatic control of mobile robots.According to the environmental feedback,the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers,which can realize the real-time PID optimal control.First,a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic(SAC)algorithm,which is state-of-the-art RL algorithm.Second,in order to improve the RL convergence speed and the control performance,a Lyapunov-based reward shaping method for off-policy RL algorithm is designed,and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined.Through the policy evaluation and policy improvement of the soft policy iteration,the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically.Finally,based on the proposed reward shaping method,the reward function is designed to improve the system stability for the line-following robot.The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed,high generalization and high real-time performance,and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling.展开更多
The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil ...The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil disputes or,for that matter,help in putting out fires.展开更多
Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and prop...Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and property of others,whether civilians or civil ser- vants,would be raised from 50,000 yuan ($6,667)to 300,000($40,000)yuan.The high-展开更多
Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the...Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies.展开更多
At the end of September, banners carrying the slogan "Catch one thief, get 1,000yuan" appeared in Gulou District of Fuzhou,capital of southeast China’s Fujian Province.According to local officials, the goal of
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo...Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.展开更多
In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the g...In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.展开更多
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net...Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.展开更多
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting...Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.展开更多
文摘Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangdong Province
文摘Employee performance is widely regarded as a cornerstone of organizational success,and in fast-changing industries it becomes even more critical.China’s electric vehicle(EV)sector exemplifies this challenge,where rapid innovation and intense competition require companies to motivate employees for both immediate efficiency and long-term commitment.This study explores how extrinsic rewards include bonuses,gifts,promotions,benefits,and intrinsic rewards,including recognition,career development,learning opportunities,and responsibility,influence task and contextual performance.A quantitative design was employed,using survey data and statistical analyses to test the proposed framework.The findings show that both extrinsic and intrinsic rewards significantly enhance performance but operate differently.Extrinsic rewards are more closely linked to short-term improvements,while intrinsic rewards foster deeper engagement and sustained contributions.By combining Herzberg’s Two-Factor Theory and Self-Determination Theory,the study demonstrates that effective reward systems must balance financial incentives with psychological motivators.The results provide theoretical contributions and practical guidance for managers seeking to strengthen motivation,build resilience,and promote sustainable performance.
基金Supported by the Top Talent Support Program for Young and Middle-aged People of Wuxi Health Committee,No.BJ2023086Wuxi Taihu Talent Project,No.WXTTP 2021.
文摘BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer potential benefits in ame-liorating this symptomatology in patients with MDD.AIM To further assess the efficacy of P.acidilactici CCFM6432 in alleviating anhedonia in patients with MDD,using a combination of objective and subjective assessment tools.METHODS Adult patients with MDD exhibiting anhedonic symptoms were enrolled and randomly assigned to two treatment groups:One receiving standard antide-pressant therapy plus P.acidilactici CCFM6432,and the other receiving standard antidepressant treatment along with a placebo,for 30 days.Assessments were conducted at baseline and post-intervention using the Hamilton Depression Rating Scale(HAMD),Temporal Experience of Pleasure Scale(TEPS),and synchronous electroencephalography(EEG)during a"Doors Guessing Task."Changes in both clinical outcomes and EEG biomarkers,specifically the stimulus-preceding negativity(SPN)and feedback-related nega-tivity amplitudes,were analyzed.RESULTS Of the 92 screened participants,71 were enrolled and 55 completed the study(CCFM6432 group:n=27;Placebo group:n=28).No baseline differences were noted between the groups in terms of demographics,clinical assessments,or EEG metrics.A mixed-design analysis of variance revealed that the CCFM6432 group showed significantly greater improvements in both HAMD and TEPS scores compared to the Placebo group.Moreover,the CCFM6432 group demonstrated a significant increase in SPN amplitudes,which were inversely correlated with the improvements observed in HAMD scores.No such changes were observed in the Placebo group.CONCLUSION Adjunctive administration of P.acidilactici CCFM6432 not only augments the therapeutic efficacy of antide-pressants but also significantly ameliorates the symptoms of anhedonia in MDD.
基金supported by the Sichuan Science and Technology Program(2025ZNSFSC0005).
文摘Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments.
基金supported by the Key Research and Development Program of Jiangsu Province(No.BE2020084-2)the National Key Research and Development Program of China(No.2020YFB1600104)。
文摘For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper considers a disaster-affected scenario where UAVs undertake the role of MEC servers to provide computing resources for Disaster Relief Devices(DRDs).Considering the fairness of DRDs,a max-min problem is formulated to optimize the saved time by jointly designing the trajectory of the UAVs,the offloading policy and serving time under the constraint of the UAVs'energy capacity.To solve the above non-convex problem,we first model the service process as a Markov Decision Process(MDP)with the Reward Shaping(RS)technique,and then propose a Deep Reinforcement Learning(DRL)based algorithm to find the optimal solution for the MDP.Simulations show that the proposed RS-DRL algorithm is valid and effective,and has better performance than the baseline algorithms.
基金supported by the National General Projects in 2020 of the 13th Five Year Plan of National Education Science in China:A Study on Attention Training Interventions for ADHD Children in Regular Classes from the Perspective of Educational Neuroscience(BHA200123).
文摘The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stopping task paradigm,with 66 children with ADHD tendencies as the research subjects.A mixed design of 2(reward and punishment type:reward,punishment)×2(stimulus type:monetary stimulus,social stimulus)was used.The analysis applied a between intervention group(with reward and punishment type variables)and within type of reward approach(by stimulus type as intra subject variables).The results showed that monetary punishment better promotes behavioral inhibition in children with an ADHD tendency than does reward.In addition,this study showed that monetary punishment and social rewards affected the speed–accuracy trade-off of inhibited behavior in children with an ADHD tendency.Thesefindings suggest that withdrawal of a material token resulted in more behavioural compliance in children with an ADHD tendency.
基金supported by the National Natural Science Foundation of China,Nos.82304990(to NY),81973748(to JC),82174278(to JC)the National Key R&D Program of China,No.2023YFE0209500(to JC)+4 种基金China Postdoctoral Science Foundation,No.2023M732380(to NY)Guangzhou Key Laboratory of Formula-Pattern of Traditional Chinese Medicine,No.202102010014(to JC)Huang Zhendong Research Fund for Traditional Chinese Medicine of Jinan University,No.201911(to JC)National Innovation and Entrepreneurship Training Program for Undergraduates in China,No.202310559128(to NY and QM)Innovation and Entrepreneurship Training Program for Undergraduates at Jinan University,Nos.CX24380,CX24381(both to NY and QM)。
文摘Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily involving abnormal development and damage of the dopaminergic system,pose significant public health challenges.Microglia,as the primary immune cells in the brain,are crucial in regulating neuronal circuit development and survival.From the embryonic stage to adulthood,microglia exhibit stage-specific gene expression profiles,transcriptome characteristics,and functional phenotypes,enhancing the susceptibility to early life stress.However,the role of microglia in mediating dopaminergic system disorders under early life stress conditions remains poorly understood.This review presents an up-to-date overview of preclinical studies elucidating the impact of early life stress on microglia,leading to dopaminergic system disorders,along with the underlying mechanisms and therapeutic potential for neurodegenerative and neurodevelopmental conditions.Impaired microglial activity damages dopaminergic neurons by diminishing neurotrophic support(e.g.,insulin-like growth factor-1)and hinders dopaminergic axon growth through defective phagocytosis and synaptic pruning.Furthermore,blunted microglial immunoreactivity suppresses striatal dopaminergic circuit development and reduces neuronal transmission.Furthermore,inflammation and oxidative stress induced by activated microglia can directly damage dopaminergic neurons,inhibiting dopamine synthesis,reuptake,and receptor activity.Enhanced microglial phagocytosis inhibits dopamine axon extension.These long-lasting effects of microglial perturbations may be driven by early life stress–induced epigenetic reprogramming of microglia.Indirectly,early life stress may influence microglial function through various pathways,such as astrocytic activation,the hypothalamic–pituitary–adrenal axis,the gut–brain axis,and maternal immune signaling.Finally,various therapeutic strategies and molecular mechanisms for targeting microglia to restore the dopaminergic system were summarized and discussed.These strategies include classical antidepressants and antipsychotics,antibiotics and anti-inflammatory agents,and herbal-derived medicine.Further investigations combining pharmacological interventions and genetic strategies are essential to elucidate the causal role of microglial phenotypic and functional perturbations in the dopaminergic system disrupted by early life stress.
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金the National Key R&D Program of China(No.2018YFB1308400)。
文摘In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is proposed in this paper for automatic control of mobile robots.According to the environmental feedback,the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers,which can realize the real-time PID optimal control.First,a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic(SAC)algorithm,which is state-of-the-art RL algorithm.Second,in order to improve the RL convergence speed and the control performance,a Lyapunov-based reward shaping method for off-policy RL algorithm is designed,and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined.Through the policy evaluation and policy improvement of the soft policy iteration,the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically.Finally,based on the proposed reward shaping method,the reward function is designed to improve the system stability for the line-following robot.The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed,high generalization and high real-time performance,and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling.
文摘The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil disputes or,for that matter,help in putting out fires.
文摘Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and property of others,whether civilians or civil ser- vants,would be raised from 50,000 yuan ($6,667)to 300,000($40,000)yuan.The high-
基金supported by the National Key Research and Development Program of China(No.2022YFF 1303303)the National Natural Science Foundation of China(No.52394194).
文摘Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies.
文摘At the end of September, banners carrying the slogan "Catch one thief, get 1,000yuan" appeared in Gulou District of Fuzhou,capital of southeast China’s Fujian Province.According to local officials, the goal of
基金supported by the Key Research and Development Program of Shaanxi (2022GXLH-02-09)the Aeronautical Science Foundation of China (20200051053001)the Natural Science Basic Research Program of Shaanxi (2020JM-147)。
文摘Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.
基金the National Natural Science Foun-dation of China(Grant No.71961003).
文摘In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.
文摘Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.
基金National Natural Science Foundation of China(Nos.61803260,61673262 and 61175028)。
文摘Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.