Employee performance is widely regarded as a cornerstone of organizational success,and in fast-changing industries it becomes even more critical.China’s electric vehicle(EV)sector exemplifies this challenge,where rap...Employee performance is widely regarded as a cornerstone of organizational success,and in fast-changing industries it becomes even more critical.China’s electric vehicle(EV)sector exemplifies this challenge,where rapid innovation and intense competition require companies to motivate employees for both immediate efficiency and long-term commitment.This study explores how extrinsic rewards include bonuses,gifts,promotions,benefits,and intrinsic rewards,including recognition,career development,learning opportunities,and responsibility,influence task and contextual performance.A quantitative design was employed,using survey data and statistical analyses to test the proposed framework.The findings show that both extrinsic and intrinsic rewards significantly enhance performance but operate differently.Extrinsic rewards are more closely linked to short-term improvements,while intrinsic rewards foster deeper engagement and sustained contributions.By combining Herzberg’s Two-Factor Theory and Self-Determination Theory,the study demonstrates that effective reward systems must balance financial incentives with psychological motivators.The results provide theoretical contributions and practical guidance for managers seeking to strengthen motivation,build resilience,and promote sustainable performance.展开更多
Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environ...Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments.展开更多
The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stoppi...The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stopping task paradigm,with 66 children with ADHD tendencies as the research subjects.A mixed design of 2(reward and punishment type:reward,punishment)×2(stimulus type:monetary stimulus,social stimulus)was used.The analysis applied a between intervention group(with reward and punishment type variables)and within type of reward approach(by stimulus type as intra subject variables).The results showed that monetary punishment better promotes behavioral inhibition in children with an ADHD tendency than does reward.In addition,this study showed that monetary punishment and social rewards affected the speed–accuracy trade-off of inhibited behavior in children with an ADHD tendency.Thesefindings suggest that withdrawal of a material token resulted in more behavioural compliance in children with an ADHD tendency.展开更多
In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is propos...In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is proposed in this paper for automatic control of mobile robots.According to the environmental feedback,the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers,which can realize the real-time PID optimal control.First,a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic(SAC)algorithm,which is state-of-the-art RL algorithm.Second,in order to improve the RL convergence speed and the control performance,a Lyapunov-based reward shaping method for off-policy RL algorithm is designed,and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined.Through the policy evaluation and policy improvement of the soft policy iteration,the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically.Finally,based on the proposed reward shaping method,the reward function is designed to improve the system stability for the line-following robot.The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed,high generalization and high real-time performance,and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling.展开更多
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the...Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies.展开更多
By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning...By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.展开更多
In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the g...In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.展开更多
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting...Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.展开更多
Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challeng...Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm.展开更多
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb...Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.展开更多
Traditionally, organizations assume that compensation/pay and monetary benefits are what all employees need to work harder, be productive, or remain with the company. According to Abraham Maslow, within every person i...Traditionally, organizations assume that compensation/pay and monetary benefits are what all employees need to work harder, be productive, or remain with the company. According to Abraham Maslow, within every person is a hierarchy of five needs: physiological needs, safety needs, social needs, esteem needs, and self-actualization needs Organizations must be able to identify what employees desire to secure optimum performance and to meet the needs of both employees and employers. This research focuses on the generational gap and the significance of intrinsic and extrinsic rewards in the workforce. The purpose and objective of this research are to test the significance of monetary versus non-monetary rewards among the different generations in the organization. A self-designed questionnaire distributed to a multi-generational group of employees of selected organizations was used to collect the analyzed data. Sixty-five (65%) responses were obtained. Secondary data were used to elucidate the needs in this area of study. Because the workforce is predicted to become more diverse in terms of age, organizations will be unlikely to implement one set of rewards for the multiple generations. This is due to the differing expectations and requirements among the generations. However, the results indicate no significant difference in monetary versus non-monetary rewards among the different generations in the workforce.展开更多
We extend the traditional nonnegative reward testing with negative rewards.In this new testing framework,may preorder and must preorder are the inverse of each other.More surprisingly,it turns out that the real reward...We extend the traditional nonnegative reward testing with negative rewards.In this new testing framework,may preorder and must preorder are the inverse of each other.More surprisingly,it turns out that the real reward must testing is no more powerful than the nonnegative reward testing,at least for finite processes. In order to prove that result,we exploit an important property of failure simulation about the inclusion of the testing outcomes between two related processes.展开更多
The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administr...The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward.展开更多
There is emerging evidence implicating glucagon-like peptide-1 (GLP-1) in reward, including palatable food reinforcement and alcohol-based reward circuitry. While recent findings suggest that mesolimbic structures, su...There is emerging evidence implicating glucagon-like peptide-1 (GLP-1) in reward, including palatable food reinforcement and alcohol-based reward circuitry. While recent findings suggest that mesolimbic structures, such as the ventral tegmental area (VTA) and the nucleus accumbens (NAc), are critical anatomical sites mediating the role of GLP-1’s inhibitory actions, the present study focused on the potential novel impact of GLP-1 within the habenula, a region of the forebrain expressing GLP-1 receptors. Given that the habenula has also been implicated in the neural control of reward and reinforcement, we hypothesized that this brain region, like the VTA and NAc, might mediate the anhedonic effects of GLP-1. Rats were stereotaxically implanted with guide cannula targeting the habenula and trained on a progressive ratio 3 (PR3) schedule of reinforcement. Separate rats were trained on an alcohol two-bottle choice paradigm with intermittent access. The GLP-1 agonist exendin-4 (Ex-4) was administered directly into the habenula to determine the effects on operant responding for palatable food as well as alcohol intake. Our results indicated that Ex-4 reliably suppressed PR3 responding and that this effect was dose-dependent. A similar suppressive effect on alcohol consumption was observed. These findings provide initial and compelling evidence that the habenula may mediate the inhibitory action of GLP-1 on reward, including operant and drug reward. Our findings further suggest that GLP-1 receptor mechanisms outside of the midbrain and ventral striatum are critically involved in brain reward neurotransmission.展开更多
There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have n...There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.展开更多
The psychological mechanism of reward is to form operational conditioned reflex through positive reinforcement and negative reinforcement.The positive effect of reward is to strengthen external learning motivation,and...The psychological mechanism of reward is to form operational conditioned reflex through positive reinforcement and negative reinforcement.The positive effect of reward is to strengthen external learning motivation,and reward can sometimes improve creativity.The negative effects are:weakening students'creativity,weakening the internal motivation of learning and hindering the development of autonomy.Teachers should apply educational rewards scientifically,take care of their age,consider the difficulty of tasks,pay attention to stimulating students'internal motivation,and give priority to spiritual rewards,supplemented by material rewards.展开更多
Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used...Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans.展开更多
Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergi...Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss.展开更多
The nucleus accumbens shell(NAcSh) plays an important role in reward and aversion. Traditionally, NAc dopamine receptor 2-expressing(D2) neurons are assumed to function in aversion. However, this has been challenged b...The nucleus accumbens shell(NAcSh) plays an important role in reward and aversion. Traditionally, NAc dopamine receptor 2-expressing(D2) neurons are assumed to function in aversion. However, this has been challenged by recent reports which attribute positive motivational roles to D2 neurons. Using optogenetics and multiple behavioral tasks, we found that activation of D2 neurons in the dorsomedial NAcSh drives preference and increases the motivation for rewards, whereas activation of ventral NAcSh D2 neurons induces aversion. Stimulation of D2 neurons in the ventromedial NAcSh increases movement speed and stimulation of D2 neurons in the ventrolateral NAc Sh decreases movement speed. Combining retrograde tracing and in situ hybridization, we demonstrated that glutamatergic and GABAergic neurons in the ventral pallidum receive inputs differentially from the dorsomedial and ventral NAcSh. All together, these findings shed light on the controversy regarding the function of NAcSh D2 neurons, and provide new insights into understanding the heterogeneity of the NAcSh.展开更多
文摘Employee performance is widely regarded as a cornerstone of organizational success,and in fast-changing industries it becomes even more critical.China’s electric vehicle(EV)sector exemplifies this challenge,where rapid innovation and intense competition require companies to motivate employees for both immediate efficiency and long-term commitment.This study explores how extrinsic rewards include bonuses,gifts,promotions,benefits,and intrinsic rewards,including recognition,career development,learning opportunities,and responsibility,influence task and contextual performance.A quantitative design was employed,using survey data and statistical analyses to test the proposed framework.The findings show that both extrinsic and intrinsic rewards significantly enhance performance but operate differently.Extrinsic rewards are more closely linked to short-term improvements,while intrinsic rewards foster deeper engagement and sustained contributions.By combining Herzberg’s Two-Factor Theory and Self-Determination Theory,the study demonstrates that effective reward systems must balance financial incentives with psychological motivators.The results provide theoretical contributions and practical guidance for managers seeking to strengthen motivation,build resilience,and promote sustainable performance.
基金supported by the Sichuan Science and Technology Program(2025ZNSFSC0005).
文摘Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments.
基金supported by the National General Projects in 2020 of the 13th Five Year Plan of National Education Science in China:A Study on Attention Training Interventions for ADHD Children in Regular Classes from the Perspective of Educational Neuroscience(BHA200123).
文摘The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stopping task paradigm,with 66 children with ADHD tendencies as the research subjects.A mixed design of 2(reward and punishment type:reward,punishment)×2(stimulus type:monetary stimulus,social stimulus)was used.The analysis applied a between intervention group(with reward and punishment type variables)and within type of reward approach(by stimulus type as intra subject variables).The results showed that monetary punishment better promotes behavioral inhibition in children with an ADHD tendency than does reward.In addition,this study showed that monetary punishment and social rewards affected the speed–accuracy trade-off of inhibited behavior in children with an ADHD tendency.Thesefindings suggest that withdrawal of a material token resulted in more behavioural compliance in children with an ADHD tendency.
基金the National Key R&D Program of China(No.2018YFB1308400)。
文摘In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is proposed in this paper for automatic control of mobile robots.According to the environmental feedback,the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers,which can realize the real-time PID optimal control.First,a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic(SAC)algorithm,which is state-of-the-art RL algorithm.Second,in order to improve the RL convergence speed and the control performance,a Lyapunov-based reward shaping method for off-policy RL algorithm is designed,and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined.Through the policy evaluation and policy improvement of the soft policy iteration,the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically.Finally,based on the proposed reward shaping method,the reward function is designed to improve the system stability for the line-following robot.The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed,high generalization and high real-time performance,and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling.
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金supported by the National Key Research and Development Program of China(No.2022YFF 1303303)the National Natural Science Foundation of China(No.52394194).
文摘Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies.
基金funded by National Natural Science Foundation of China(No.62063006)Guangxi Science and Technology Major Program(No.2022AA05002)+1 种基金Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003)Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).
文摘By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.
基金the National Natural Science Foun-dation of China(Grant No.71961003).
文摘In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment.
基金National Natural Science Foundation of China(Nos.61803260,61673262 and 61175028)。
文摘Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.
基金Supported by the International Partnership Program of Chinese Academy of Sciences(No.184131KYSB20200033).
文摘Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm.
文摘Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.
文摘Traditionally, organizations assume that compensation/pay and monetary benefits are what all employees need to work harder, be productive, or remain with the company. According to Abraham Maslow, within every person is a hierarchy of five needs: physiological needs, safety needs, social needs, esteem needs, and self-actualization needs Organizations must be able to identify what employees desire to secure optimum performance and to meet the needs of both employees and employers. This research focuses on the generational gap and the significance of intrinsic and extrinsic rewards in the workforce. The purpose and objective of this research are to test the significance of monetary versus non-monetary rewards among the different generations in the organization. A self-designed questionnaire distributed to a multi-generational group of employees of selected organizations was used to collect the analyzed data. Sixty-five (65%) responses were obtained. Secondary data were used to elucidate the needs in this area of study. Because the workforce is predicted to become more diverse in terms of age, organizations will be unlikely to implement one set of rewards for the multiple generations. This is due to the differing expectations and requirements among the generations. However, the results indicate no significant difference in monetary versus non-monetary rewards among the different generations in the workforce.
基金the National Natural Science Foundation of China(No.61033002)
文摘We extend the traditional nonnegative reward testing with negative rewards.In this new testing framework,may preorder and must preorder are the inverse of each other.More surprisingly,it turns out that the real reward must testing is no more powerful than the nonnegative reward testing,at least for finite processes. In order to prove that result,we exploit an important property of failure simulation about the inclusion of the testing outcomes between two related processes.
基金National Science Foundation of China (3047055330530270+10 种基金30670669 30770700)973 Program (2005CB522803 2007CB947703)863 Program (O7013810 2006AA02A116)The Major State Basic Research of China (2003CB716600)Chinese-Finnish International Collaboration Project-neuro (30621130076)Program of CASC (KSCX1-YW-R-33YZ200737)National Key Technologies R & D Program and Yunnan Science and Technique Program (2006PT08-2)
文摘The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward.
文摘There is emerging evidence implicating glucagon-like peptide-1 (GLP-1) in reward, including palatable food reinforcement and alcohol-based reward circuitry. While recent findings suggest that mesolimbic structures, such as the ventral tegmental area (VTA) and the nucleus accumbens (NAc), are critical anatomical sites mediating the role of GLP-1’s inhibitory actions, the present study focused on the potential novel impact of GLP-1 within the habenula, a region of the forebrain expressing GLP-1 receptors. Given that the habenula has also been implicated in the neural control of reward and reinforcement, we hypothesized that this brain region, like the VTA and NAc, might mediate the anhedonic effects of GLP-1. Rats were stereotaxically implanted with guide cannula targeting the habenula and trained on a progressive ratio 3 (PR3) schedule of reinforcement. Separate rats were trained on an alcohol two-bottle choice paradigm with intermittent access. The GLP-1 agonist exendin-4 (Ex-4) was administered directly into the habenula to determine the effects on operant responding for palatable food as well as alcohol intake. Our results indicated that Ex-4 reliably suppressed PR3 responding and that this effect was dose-dependent. A similar suppressive effect on alcohol consumption was observed. These findings provide initial and compelling evidence that the habenula may mediate the inhibitory action of GLP-1 on reward, including operant and drug reward. Our findings further suggest that GLP-1 receptor mechanisms outside of the midbrain and ventral striatum are critically involved in brain reward neurotransmission.
文摘There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning.
文摘The psychological mechanism of reward is to form operational conditioned reflex through positive reinforcement and negative reinforcement.The positive effect of reward is to strengthen external learning motivation,and reward can sometimes improve creativity.The negative effects are:weakening students'creativity,weakening the internal motivation of learning and hindering the development of autonomy.Teachers should apply educational rewards scientifically,take care of their age,consider the difficulty of tasks,pay attention to stimulating students'internal motivation,and give priority to spiritual rewards,supplemented by material rewards.
基金funded by Henan Provincial Health Science and Technology Key Projects(201001009)National Science and Technology Infrastructure Program(2006BAI06B 08),China
文摘Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans.
基金supported by the Science and Technology Development Project of Shandong Province,China,No.2011YD18045the Natural Science Foundation of Shandong Province,China,No.ZR2012HM049+3 种基金the Health Care Foundation Program of Shandong Province,China,No.2007BZ19the Foundation Program of Technology Bureau of Qingdao,ChinaNo.Kzd-0309-1-1-33-nsh
文摘Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss.
基金supported by National Science Foundation of China grants 31571095 and 91332122a Key Scientific Technological Innovation Research project from the Ministry of Education+1 种基金a grant from Insitute Guo Qiang at Tsinghua Universityfunding from the Beijing Program on the Study of Functional Chips and Related Core Technologies of Brain-inspired Computing Systems。
文摘The nucleus accumbens shell(NAcSh) plays an important role in reward and aversion. Traditionally, NAc dopamine receptor 2-expressing(D2) neurons are assumed to function in aversion. However, this has been challenged by recent reports which attribute positive motivational roles to D2 neurons. Using optogenetics and multiple behavioral tasks, we found that activation of D2 neurons in the dorsomedial NAcSh drives preference and increases the motivation for rewards, whereas activation of ventral NAcSh D2 neurons induces aversion. Stimulation of D2 neurons in the ventromedial NAcSh increases movement speed and stimulation of D2 neurons in the ventrolateral NAc Sh decreases movement speed. Combining retrograde tracing and in situ hybridization, we demonstrated that glutamatergic and GABAergic neurons in the ventral pallidum receive inputs differentially from the dorsomedial and ventral NAcSh. All together, these findings shed light on the controversy regarding the function of NAcSh D2 neurons, and provide new insights into understanding the heterogeneity of the NAcSh.