Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangd...Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangdong Province展开更多
Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily...Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily involving abnormal development and damage of the dopaminergic system,pose significant public health challenges.Microglia,as the primary immune cells in the brain,are crucial in regulating neuronal circuit development and survival.From the embryonic stage to adulthood,microglia exhibit stage-specific gene expression profiles,transcriptome characteristics,and functional phenotypes,enhancing the susceptibility to early life stress.However,the role of microglia in mediating dopaminergic system disorders under early life stress conditions remains poorly understood.This review presents an up-to-date overview of preclinical studies elucidating the impact of early life stress on microglia,leading to dopaminergic system disorders,along with the underlying mechanisms and therapeutic potential for neurodegenerative and neurodevelopmental conditions.Impaired microglial activity damages dopaminergic neurons by diminishing neurotrophic support(e.g.,insulin-like growth factor-1)and hinders dopaminergic axon growth through defective phagocytosis and synaptic pruning.Furthermore,blunted microglial immunoreactivity suppresses striatal dopaminergic circuit development and reduces neuronal transmission.Furthermore,inflammation and oxidative stress induced by activated microglia can directly damage dopaminergic neurons,inhibiting dopamine synthesis,reuptake,and receptor activity.Enhanced microglial phagocytosis inhibits dopamine axon extension.These long-lasting effects of microglial perturbations may be driven by early life stress–induced epigenetic reprogramming of microglia.Indirectly,early life stress may influence microglial function through various pathways,such as astrocytic activation,the hypothalamic–pituitary–adrenal axis,the gut–brain axis,and maternal immune signaling.Finally,various therapeutic strategies and molecular mechanisms for targeting microglia to restore the dopaminergic system were summarized and discussed.These strategies include classical antidepressants and antipsychotics,antibiotics and anti-inflammatory agents,and herbal-derived medicine.Further investigations combining pharmacological interventions and genetic strategies are essential to elucidate the causal role of microglial phenotypic and functional perturbations in the dopaminergic system disrupted by early life stress.展开更多
The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil ...The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil disputes or,for that matter,help in putting out fires.展开更多
Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and prop...Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and property of others,whether civilians or civil ser- vants,would be raised from 50,000 yuan ($6,667)to 300,000($40,000)yuan.The high-展开更多
At the end of September, banners carrying the slogan "Catch one thief, get 1,000yuan" appeared in Gulou District of Fuzhou,capital of southeast China’s Fujian Province.According to local officials, the goal of
BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer po...BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer potential benefits in ame-liorating this symptomatology in patients with MDD.AIM To further assess the efficacy of P.acidilactici CCFM6432 in alleviating anhedonia in patients with MDD,using a combination of objective and subjective assessment tools.METHODS Adult patients with MDD exhibiting anhedonic symptoms were enrolled and randomly assigned to two treatment groups:One receiving standard antide-pressant therapy plus P.acidilactici CCFM6432,and the other receiving standard antidepressant treatment along with a placebo,for 30 days.Assessments were conducted at baseline and post-intervention using the Hamilton Depression Rating Scale(HAMD),Temporal Experience of Pleasure Scale(TEPS),and synchronous electroencephalography(EEG)during a"Doors Guessing Task."Changes in both clinical outcomes and EEG biomarkers,specifically the stimulus-preceding negativity(SPN)and feedback-related nega-tivity amplitudes,were analyzed.RESULTS Of the 92 screened participants,71 were enrolled and 55 completed the study(CCFM6432 group:n=27;Placebo group:n=28).No baseline differences were noted between the groups in terms of demographics,clinical assessments,or EEG metrics.A mixed-design analysis of variance revealed that the CCFM6432 group showed significantly greater improvements in both HAMD and TEPS scores compared to the Placebo group.Moreover,the CCFM6432 group demonstrated a significant increase in SPN amplitudes,which were inversely correlated with the improvements observed in HAMD scores.No such changes were observed in the Placebo group.CONCLUSION Adjunctive administration of P.acidilactici CCFM6432 not only augments the therapeutic efficacy of antide-pressants but also significantly ameliorates the symptoms of anhedonia in MDD.展开更多
Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environ...Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments.展开更多
The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stoppi...The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stopping task paradigm,with 66 children with ADHD tendencies as the research subjects.A mixed design of 2(reward and punishment type:reward,punishment)×2(stimulus type:monetary stimulus,social stimulus)was used.The analysis applied a between intervention group(with reward and punishment type variables)and within type of reward approach(by stimulus type as intra subject variables).The results showed that monetary punishment better promotes behavioral inhibition in children with an ADHD tendency than does reward.In addition,this study showed that monetary punishment and social rewards affected the speed–accuracy trade-off of inhibited behavior in children with an ADHD tendency.Thesefindings suggest that withdrawal of a material token resulted in more behavioural compliance in children with an ADHD tendency.展开更多
For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper consi...For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper considers a disaster-affected scenario where UAVs undertake the role of MEC servers to provide computing resources for Disaster Relief Devices(DRDs).Considering the fairness of DRDs,a max-min problem is formulated to optimize the saved time by jointly designing the trajectory of the UAVs,the offloading policy and serving time under the constraint of the UAVs'energy capacity.To solve the above non-convex problem,we first model the service process as a Markov Decision Process(MDP)with the Reward Shaping(RS)technique,and then propose a Deep Reinforcement Learning(DRL)based algorithm to find the optimal solution for the MDP.Simulations show that the proposed RS-DRL algorithm is valid and effective,and has better performance than the baseline algorithms.展开更多
Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,invo...Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,involved in reward processing,alongside dysregulated neural reward pathways,may reinforce self-injurious behaviors,highlighting the need to explore these mechanisms.Methods:Adolescents(aged 12-17 years)with depression disorders were divided into an NSSI group(21 subjects)and a control group(11 subjects)according to inclusion criteria.Serum beta-endorphin concentration was measured using the enzyme-linked immunosorbent assay method.The Addiction Factor Scale was used to assess addiction levels.Statistical analyses were con-ducted using SPSS 25.0.The oxygenated hemoglobin response signal was detected using functional near-infrared spectroscopy.Analyses were performed using NIRS_KIT 2.0.Results:Compared with the control group,the NSSI group exhibited lower serum beta-endorphin concentration.Additionally,85.7%of those in the NSSI group displayed addictive behaviors,and serum beta-endorphin concentration was negatively correlated with the Addiction Factor Scale score.The reward task activated channels 17,20,and 21(corresponding to the dorsolateral prefrontal cortex[PFC]and frontopolar PFC)in the gain condition and channels 20 and 21 in the loss condition.The oxygenated hemoglobin concentration of the differential waveform(Δ[oxy-Hb])of channel 12(corresponding to the frontopolar PFC)correlated positively with the Addiction Factor Scale score and negatively with the serum beta-endorphin concentration.展开更多
In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is propos...In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is proposed in this paper for automatic control of mobile robots.According to the environmental feedback,the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers,which can realize the real-time PID optimal control.First,a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic(SAC)algorithm,which is state-of-the-art RL algorithm.Second,in order to improve the RL convergence speed and the control performance,a Lyapunov-based reward shaping method for off-policy RL algorithm is designed,and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined.Through the policy evaluation and policy improvement of the soft policy iteration,the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically.Finally,based on the proposed reward shaping method,the reward function is designed to improve the system stability for the line-following robot.The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed,high generalization and high real-time performance,and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling.展开更多
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the...Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies.展开更多
This paper considers the optimal replacement problem of a repairable system consisting of one component and a single repairman, assume that the system after repair is not 'as good as new', by using the geometr...This paper considers the optimal replacement problem of a repairable system consisting of one component and a single repairman, assume that the system after repair is not 'as good as new', by using the geometric process, we consider a placement policy T based on the age of the system. The problem is to determine the optimal replacement policy T * such that the long_run expected benefit per unit time is maximized. Also, the explicit expression of the long_run expected benefit per unit time can be found. In some conditions, the existence and uniqueness of the optimal policy T * can be proved, finally, we prove that the policy T * is better than the policy T * in .展开更多
Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm...Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.展开更多
The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administr...The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward.展开更多
Chronic pain is often accompanied by negative emotions,and the progression of negative emotions may further impede pain relief.Both chronic pain and negative emotions are closely associated with reward/motivation circ...Chronic pain is often accompanied by negative emotions,and the progression of negative emotions may further impede pain relief.Both chronic pain and negative emotions are closely associated with reward/motivation circuits.Treatment with acupuncture is effective for the relief of pain and emotional disorders,although the acupoint combinations vary.Thus,this study aimed to elucidate the relationship of chronic pain and emotional disorders with the reward/motivation circuits.In association with the theory on "seven emotional factors that cause pain" in traditional Chinese medicine,the potential effect mechanism of "spirit-regulation with acupuncture" for the relief of chronic pain and negative emotions was explored.The findings suggest that chronic pain and its related negative emotions may be effectively relieved with acupuncture after optimizing its acupoint combination based on "spirit-regulation".However,the role of reward/motivation circuits for negative emotions in the relief of chronic pain with acupuncture needs to be demonstrated.展开更多
文摘Traditionally Chinese people place much value in virtue, with a long-held belief that one should never appropriate valuable items lost by others. However, a recent regulation by the government of south China’s Guangdong Province
基金supported by the National Natural Science Foundation of China,Nos.82304990(to NY),81973748(to JC),82174278(to JC)the National Key R&D Program of China,No.2023YFE0209500(to JC)+4 种基金China Postdoctoral Science Foundation,No.2023M732380(to NY)Guangzhou Key Laboratory of Formula-Pattern of Traditional Chinese Medicine,No.202102010014(to JC)Huang Zhendong Research Fund for Traditional Chinese Medicine of Jinan University,No.201911(to JC)National Innovation and Entrepreneurship Training Program for Undergraduates in China,No.202310559128(to NY and QM)Innovation and Entrepreneurship Training Program for Undergraduates at Jinan University,Nos.CX24380,CX24381(both to NY and QM)。
文摘Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily involving abnormal development and damage of the dopaminergic system,pose significant public health challenges.Microglia,as the primary immune cells in the brain,are crucial in regulating neuronal circuit development and survival.From the embryonic stage to adulthood,microglia exhibit stage-specific gene expression profiles,transcriptome characteristics,and functional phenotypes,enhancing the susceptibility to early life stress.However,the role of microglia in mediating dopaminergic system disorders under early life stress conditions remains poorly understood.This review presents an up-to-date overview of preclinical studies elucidating the impact of early life stress on microglia,leading to dopaminergic system disorders,along with the underlying mechanisms and therapeutic potential for neurodegenerative and neurodevelopmental conditions.Impaired microglial activity damages dopaminergic neurons by diminishing neurotrophic support(e.g.,insulin-like growth factor-1)and hinders dopaminergic axon growth through defective phagocytosis and synaptic pruning.Furthermore,blunted microglial immunoreactivity suppresses striatal dopaminergic circuit development and reduces neuronal transmission.Furthermore,inflammation and oxidative stress induced by activated microglia can directly damage dopaminergic neurons,inhibiting dopamine synthesis,reuptake,and receptor activity.Enhanced microglial phagocytosis inhibits dopamine axon extension.These long-lasting effects of microglial perturbations may be driven by early life stress–induced epigenetic reprogramming of microglia.Indirectly,early life stress may influence microglial function through various pathways,such as astrocytic activation,the hypothalamic–pituitary–adrenal axis,the gut–brain axis,and maternal immune signaling.Finally,various therapeutic strategies and molecular mechanisms for targeting microglia to restore the dopaminergic system were summarized and discussed.These strategies include classical antidepressants and antipsychotics,antibiotics and anti-inflammatory agents,and herbal-derived medicine.Further investigations combining pharmacological interventions and genetic strategies are essential to elucidate the causal role of microglial phenotypic and functional perturbations in the dopaminergic system disrupted by early life stress.
文摘The city of Suqian in east China’s Jiangsu Province,put a new regulation into practice six months ago whereby police authorities reward residents who volunteer to make peace in neighborhood quarrels,mediate in civil disputes or,for that matter,help in putting out fires.
文摘Saving people in distress can now bring a good Samaritan big bucks. The govemment of Guangzhou, Guangdong Province,announced in October that the maximum reward to people who risk their lives to save the lives and property of others,whether civilians or civil ser- vants,would be raised from 50,000 yuan ($6,667)to 300,000($40,000)yuan.The high-
文摘At the end of September, banners carrying the slogan "Catch one thief, get 1,000yuan" appeared in Gulou District of Fuzhou,capital of southeast China’s Fujian Province.According to local officials, the goal of
基金Supported by the Top Talent Support Program for Young and Middle-aged People of Wuxi Health Committee,No.BJ2023086Wuxi Taihu Talent Project,No.WXTTP 2021.
文摘BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer potential benefits in ame-liorating this symptomatology in patients with MDD.AIM To further assess the efficacy of P.acidilactici CCFM6432 in alleviating anhedonia in patients with MDD,using a combination of objective and subjective assessment tools.METHODS Adult patients with MDD exhibiting anhedonic symptoms were enrolled and randomly assigned to two treatment groups:One receiving standard antide-pressant therapy plus P.acidilactici CCFM6432,and the other receiving standard antidepressant treatment along with a placebo,for 30 days.Assessments were conducted at baseline and post-intervention using the Hamilton Depression Rating Scale(HAMD),Temporal Experience of Pleasure Scale(TEPS),and synchronous electroencephalography(EEG)during a"Doors Guessing Task."Changes in both clinical outcomes and EEG biomarkers,specifically the stimulus-preceding negativity(SPN)and feedback-related nega-tivity amplitudes,were analyzed.RESULTS Of the 92 screened participants,71 were enrolled and 55 completed the study(CCFM6432 group:n=27;Placebo group:n=28).No baseline differences were noted between the groups in terms of demographics,clinical assessments,or EEG metrics.A mixed-design analysis of variance revealed that the CCFM6432 group showed significantly greater improvements in both HAMD and TEPS scores compared to the Placebo group.Moreover,the CCFM6432 group demonstrated a significant increase in SPN amplitudes,which were inversely correlated with the improvements observed in HAMD scores.No such changes were observed in the Placebo group.CONCLUSION Adjunctive administration of P.acidilactici CCFM6432 not only augments the therapeutic efficacy of antide-pressants but also significantly ameliorates the symptoms of anhedonia in MDD.
基金supported by the Sichuan Science and Technology Program(2025ZNSFSC0005).
文摘Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments.
基金supported by the National General Projects in 2020 of the 13th Five Year Plan of National Education Science in China:A Study on Attention Training Interventions for ADHD Children in Regular Classes from the Perspective of Educational Neuroscience(BHA200123).
文摘The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stopping task paradigm,with 66 children with ADHD tendencies as the research subjects.A mixed design of 2(reward and punishment type:reward,punishment)×2(stimulus type:monetary stimulus,social stimulus)was used.The analysis applied a between intervention group(with reward and punishment type variables)and within type of reward approach(by stimulus type as intra subject variables).The results showed that monetary punishment better promotes behavioral inhibition in children with an ADHD tendency than does reward.In addition,this study showed that monetary punishment and social rewards affected the speed–accuracy trade-off of inhibited behavior in children with an ADHD tendency.Thesefindings suggest that withdrawal of a material token resulted in more behavioural compliance in children with an ADHD tendency.
基金supported by the Key Research and Development Program of Jiangsu Province(No.BE2020084-2)the National Key Research and Development Program of China(No.2020YFB1600104)。
文摘For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper considers a disaster-affected scenario where UAVs undertake the role of MEC servers to provide computing resources for Disaster Relief Devices(DRDs).Considering the fairness of DRDs,a max-min problem is formulated to optimize the saved time by jointly designing the trajectory of the UAVs,the offloading policy and serving time under the constraint of the UAVs'energy capacity.To solve the above non-convex problem,we first model the service process as a Markov Decision Process(MDP)with the Reward Shaping(RS)technique,and then propose a Deep Reinforcement Learning(DRL)based algorithm to find the optimal solution for the MDP.Simulations show that the proposed RS-DRL algorithm is valid and effective,and has better performance than the baseline algorithms.
基金supported by the National Natural Science Foundation of China(No.82260878)Guizhou Medical University Affiliated Hospital Doctoral Research Initiation Fund Project(gyfybsky-2021-44)+3 种基金Guizhou Provincial Science and Technology Plan Project(Qiankehe Achievements LC[2022]014)High-level Innovative Talents Cultivation Program of Guizhou Province(QianKeHe[2016]5679)Province Guiyang City Science and Technology Projects,Zhu Subjects Contract([2022]4-2-5)Guizhou Science and Technology Planning Project(QianKeHe[2020]4Y198).
文摘Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,involved in reward processing,alongside dysregulated neural reward pathways,may reinforce self-injurious behaviors,highlighting the need to explore these mechanisms.Methods:Adolescents(aged 12-17 years)with depression disorders were divided into an NSSI group(21 subjects)and a control group(11 subjects)according to inclusion criteria.Serum beta-endorphin concentration was measured using the enzyme-linked immunosorbent assay method.The Addiction Factor Scale was used to assess addiction levels.Statistical analyses were con-ducted using SPSS 25.0.The oxygenated hemoglobin response signal was detected using functional near-infrared spectroscopy.Analyses were performed using NIRS_KIT 2.0.Results:Compared with the control group,the NSSI group exhibited lower serum beta-endorphin concentration.Additionally,85.7%of those in the NSSI group displayed addictive behaviors,and serum beta-endorphin concentration was negatively correlated with the Addiction Factor Scale score.The reward task activated channels 17,20,and 21(corresponding to the dorsolateral prefrontal cortex[PFC]and frontopolar PFC)in the gain condition and channels 20 and 21 in the loss condition.The oxygenated hemoglobin concentration of the differential waveform(Δ[oxy-Hb])of channel 12(corresponding to the frontopolar PFC)correlated positively with the Addiction Factor Scale score and negatively with the serum beta-endorphin concentration.
基金the National Key R&D Program of China(No.2018YFB1308400)。
文摘In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is proposed in this paper for automatic control of mobile robots.According to the environmental feedback,the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers,which can realize the real-time PID optimal control.First,a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic(SAC)algorithm,which is state-of-the-art RL algorithm.Second,in order to improve the RL convergence speed and the control performance,a Lyapunov-based reward shaping method for off-policy RL algorithm is designed,and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined.Through the policy evaluation and policy improvement of the soft policy iteration,the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically.Finally,based on the proposed reward shaping method,the reward function is designed to improve the system stability for the line-following robot.The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed,high generalization and high real-time performance,and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling.
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金supported by the National Key Research and Development Program of China(No.2022YFF 1303303)the National Natural Science Foundation of China(No.52394194).
文摘Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies.
文摘This paper considers the optimal replacement problem of a repairable system consisting of one component and a single repairman, assume that the system after repair is not 'as good as new', by using the geometric process, we consider a placement policy T based on the age of the system. The problem is to determine the optimal replacement policy T * such that the long_run expected benefit per unit time is maximized. Also, the explicit expression of the long_run expected benefit per unit time can be found. In some conditions, the existence and uniqueness of the optimal policy T * can be proved, finally, we prove that the policy T * is better than the policy T * in .
文摘Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.
基金National Science Foundation of China (3047055330530270+10 种基金30670669 30770700)973 Program (2005CB522803 2007CB947703)863 Program (O7013810 2006AA02A116)The Major State Basic Research of China (2003CB716600)Chinese-Finnish International Collaboration Project-neuro (30621130076)Program of CASC (KSCX1-YW-R-33YZ200737)National Key Technologies R & D Program and Yunnan Science and Technique Program (2006PT08-2)
文摘The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward.
基金Supported by Youth Science Fund of the National Natural Science Foundation of China (81804180)General Program for TCM Scientific Research of Health Commission of Hubei Province (ZY2021M031)。
文摘Chronic pain is often accompanied by negative emotions,and the progression of negative emotions may further impede pain relief.Both chronic pain and negative emotions are closely associated with reward/motivation circuits.Treatment with acupuncture is effective for the relief of pain and emotional disorders,although the acupoint combinations vary.Thus,this study aimed to elucidate the relationship of chronic pain and emotional disorders with the reward/motivation circuits.In association with the theory on "seven emotional factors that cause pain" in traditional Chinese medicine,the potential effect mechanism of "spirit-regulation with acupuncture" for the relief of chronic pain and negative emotions was explored.The findings suggest that chronic pain and its related negative emotions may be effectively relieved with acupuncture after optimizing its acupoint combination based on "spirit-regulation".However,the role of reward/motivation circuits for negative emotions in the relief of chronic pain with acupuncture needs to be demonstrated.