期刊文献+
共找到3,788篇文章
< 1 2 190 >
每页显示 20 50 100
Research on Adaptive Reward Optimization Method for Robot Navigation in Complex Dynamic Environment
1
作者 Jie He Dongmei Zhao +2 位作者 Tao Liu Qingfeng Zou Jian’an Xie 《Computers, Materials & Continua》 2025年第8期2733-2749,共17页
Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environ... Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments. 展开更多
关键词 Machine learning reinforcement learning ROBOTS autonomous navigation reward shaping
在线阅读 下载PDF
Monetary reward and punishment effects on behavioral inhibition in children with attention deficit hyperactivity disorder tendencies
2
作者 Huifang Yang Peixuan Kuang 《Journal of Psychology in Africa》 2025年第4期535-540,共6页
The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stoppi... The study investigated the effects of monetary rewards and punishments on the behavioral inhibition in children with attention deficit hyperactivity disorder(ADHD)tendencies.The present study adopted the signal stopping task paradigm,with 66 children with ADHD tendencies as the research subjects.A mixed design of 2(reward and punishment type:reward,punishment)×2(stimulus type:monetary stimulus,social stimulus)was used.The analysis applied a between intervention group(with reward and punishment type variables)and within type of reward approach(by stimulus type as intra subject variables).The results showed that monetary punishment better promotes behavioral inhibition in children with an ADHD tendency than does reward.In addition,this study showed that monetary punishment and social rewards affected the speed–accuracy trade-off of inhibited behavior in children with an ADHD tendency.Thesefindings suggest that withdrawal of a material token resulted in more behavioural compliance in children with an ADHD tendency. 展开更多
关键词 reward PUNISHMENT behavioral inhibition attention deficit hyperactivity disorder children with ADHD tendency
暂未订购
A Study on the Addictive Feature of Nonsuicidal Self-Injury in Adolescents With Depression Disorders and Its Correlation With Serum Beta-Endorphin Concentration and Neural Reward Responsiveness
3
作者 Jie Li Xiaogang Zhu +4 位作者 Peiwen Zhang Yuxing Wang Jian Zhong Yiming Wang Lixia Yang 《iRADIOLOGY》 2025年第6期456-464,共9页
Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,invo... Background:Nonsuicidal self-injury(NSSI)in adolescents with depression disorders often exhibits addictive patterns,potentially linked to serum beta-endorphin levels and neural reward responsiveness.Beta-endorphin,involved in reward processing,alongside dysregulated neural reward pathways,may reinforce self-injurious behaviors,highlighting the need to explore these mechanisms.Methods:Adolescents(aged 12-17 years)with depression disorders were divided into an NSSI group(21 subjects)and a control group(11 subjects)according to inclusion criteria.Serum beta-endorphin concentration was measured using the enzyme-linked immunosorbent assay method.The Addiction Factor Scale was used to assess addiction levels.Statistical analyses were con-ducted using SPSS 25.0.The oxygenated hemoglobin response signal was detected using functional near-infrared spectroscopy.Analyses were performed using NIRS_KIT 2.0.Results:Compared with the control group,the NSSI group exhibited lower serum beta-endorphin concentration.Additionally,85.7%of those in the NSSI group displayed addictive behaviors,and serum beta-endorphin concentration was negatively correlated with the Addiction Factor Scale score.The reward task activated channels 17,20,and 21(corresponding to the dorsolateral prefrontal cortex[PFC]and frontopolar PFC)in the gain condition and channels 20 and 21 in the loss condition.The oxygenated hemoglobin concentration of the differential waveform(Δ[oxy-Hb])of channel 12(corresponding to the frontopolar PFC)correlated positively with the Addiction Factor Scale score and negatively with the serum beta-endorphin concentration. 展开更多
关键词 adolescents with depression disorders BETA-ENDORPHIN functional near-infrared spectroscopy neural reward responsiveness non-suicidal self-injury
暂未订购
Self-Adaptive LSAC-PID Approach Based on Lyapunov Reward Shaping for Mobile Robots
4
作者 YU Xinyi XU Siyu +1 位作者 FAN Yuehai OU Linlin 《Journal of Shanghai Jiaotong university(Science)》 2025年第6期1085-1102,共18页
In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is propos... In order to solve the control problem of multiple-input multiple-output(MIMO)systems in complex and variable control environments,a model-free adaptive LSAC-PID method based on deep reinforcement learning(RL)is proposed in this paper for automatic control of mobile robots.According to the environmental feedback,the RL agent of the upper controller outputs the optimal parameters to the lower MIMO PID controllers,which can realize the real-time PID optimal control.First,a model-free adaptive MIMO PID hybrid control strategy is presented to realize real-time optimal tuning of control parameters in terms of soft-actor-critic(SAC)algorithm,which is state-of-the-art RL algorithm.Second,in order to improve the RL convergence speed and the control performance,a Lyapunov-based reward shaping method for off-policy RL algorithm is designed,and a self-adaptive LSAC-PID tuning approach with Lyapunov-based reward is then determined.Through the policy evaluation and policy improvement of the soft policy iteration,the convergence and optimality of the proposed LSAC-PID algorithm are proved mathematically.Finally,based on the proposed reward shaping method,the reward function is designed to improve the system stability for the line-following robot.The simulation and experiment results show that the proposed adaptive LSAC-PID approach has good control performance such as fast convergence speed,high generalization and high real-time performance,and achieves real-time optimal tuning of MIMO PID parameters without the system model and control loop decoupling. 展开更多
关键词 multiple-input multiple-output(MIMO) PID tuning reinforcement learning(RL) Lyapunov-based reward shaping soft actor-critic(SAC) mobile robot
原文传递
Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions
5
作者 Liran Zhao Sihan Xu +1 位作者 Qinbo Sun Zhaohui Dang 《Defence Technology(防务技术)》 2025年第9期159-183,共25页
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac... This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research. 展开更多
关键词 Orbital attack-defense game Impulsive maneuver Multi-agent deep reinforcement learning reward function design
在线阅读 下载PDF
Phosphorus reward mechanisms of an arbuscular mycorrhizal fungus and a dark septate endophyte to plant carbon allocation:Synergism or competition?
6
作者 Yinli BI Linlin XIE +1 位作者 Xiao WANG Yang ZHOU 《Pedosphere》 2025年第5期869-878,共10页
Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the... Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies. 展开更多
关键词 Alternaria sp. Diversispora epigaea nutrient exchange plant-fungus association plant P uptake reward/investment ratio stable isotope pulse labeling symbiotic interaction
原文传递
Generational Gap: Intrinsic (Non-monetary) Versus Extrinsic (Monetary) Rewards in the Workforce
7
作者 Charles Chekwa Mmutakaego Chukwuanu Daisey Richardson 《Chinese Business Review》 2013年第6期414-424,共11页
Traditionally, organizations assume that compensation/pay and monetary benefits are what all employees need to work harder, be productive, or remain with the company. According to Abraham Maslow, within every person i... Traditionally, organizations assume that compensation/pay and monetary benefits are what all employees need to work harder, be productive, or remain with the company. According to Abraham Maslow, within every person is a hierarchy of five needs: physiological needs, safety needs, social needs, esteem needs, and self-actualization needs Organizations must be able to identify what employees desire to secure optimum performance and to meet the needs of both employees and employers. This research focuses on the generational gap and the significance of intrinsic and extrinsic rewards in the workforce. The purpose and objective of this research are to test the significance of monetary versus non-monetary rewards among the different generations in the organization. A self-designed questionnaire distributed to a multi-generational group of employees of selected organizations was used to collect the analyzed data. Sixty-five (65%) responses were obtained. Secondary data were used to elucidate the needs in this area of study. Because the workforce is predicted to become more diverse in terms of age, organizations will be unlikely to implement one set of rewards for the multiple generations. This is due to the differing expectations and requirements among the generations. However, the results indicate no significant difference in monetary versus non-monetary rewards among the different generations in the workforce. 展开更多
关键词 monetary benefits intrinsic reward extrinsic reward MOTIVATION multi-generational workforce monetary and non-monetary rewards
在线阅读 下载PDF
On Real Reward Testing
8
作者 杨伟忠 邓玉欣 《Journal of Shanghai Jiaotong university(Science)》 EI 2011年第4期479-484,共6页
We extend the traditional nonnegative reward testing with negative rewards.In this new testing framework,may preorder and must preorder are the inverse of each other.More surprisingly,it turns out that the real reward... We extend the traditional nonnegative reward testing with negative rewards.In this new testing framework,may preorder and must preorder are the inverse of each other.More surprisingly,it turns out that the real reward must testing is no more powerful than the nonnegative reward testing,at least for finite processes. In order to prove that result,we exploit an important property of failure simulation about the inclusion of the testing outcomes between two related processes. 展开更多
关键词 probabilistic processes real reward testing nonnegative reward testing failure simulation
原文传递
Delta EEG Activity in Left Orbitofrontal Cortex in Rats Related to Food Reward and Craving 被引量:3
9
作者 付玉 陈艳梅 +3 位作者 曾涛 彭沿平 田绍华 马原野 《Zoological Research》 CAS CSCD 北大核心 2008年第3期260-264,共5页
The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administr... The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward. 展开更多
关键词 Orbitofrontal cortex EEG reward CRAVING Delta band
在线阅读 下载PDF
A Functional Inhibitory Role of Habenular Glucagon-Like Peptide-1 (GLP-1) in Forebrain Reward Signaling
10
作者 Max Johnson Alev M. Brigande +3 位作者 Jiahe Yue Kayla J. Colvin Olivia Dao Paul J. Currie 《Journal of Behavioral and Brain Science》 2021年第9期205-215,共11页
There is emerging evidence implicating glucagon-like peptide-1 (GLP-1) in reward, including palatable food reinforcement and alcohol-based reward circuitry. While recent findings suggest that mesolimbic structures, su... There is emerging evidence implicating glucagon-like peptide-1 (GLP-1) in reward, including palatable food reinforcement and alcohol-based reward circuitry. While recent findings suggest that mesolimbic structures, such as the ventral tegmental area (VTA) and the nucleus accumbens (NAc), are critical anatomical sites mediating the role of GLP-1’s inhibitory actions, the present study focused on the potential novel impact of GLP-1 within the habenula, a region of the forebrain expressing GLP-1 receptors. Given that the habenula has also been implicated in the neural control of reward and reinforcement, we hypothesized that this brain region, like the VTA and NAc, might mediate the anhedonic effects of GLP-1. Rats were stereotaxically implanted with guide cannula targeting the habenula and trained on a progressive ratio 3 (PR3) schedule of reinforcement. Separate rats were trained on an alcohol two-bottle choice paradigm with intermittent access. The GLP-1 agonist exendin-4 (Ex-4) was administered directly into the habenula to determine the effects on operant responding for palatable food as well as alcohol intake. Our results indicated that Ex-4 reliably suppressed PR3 responding and that this effect was dose-dependent. A similar suppressive effect on alcohol consumption was observed. These findings provide initial and compelling evidence that the habenula may mediate the inhibitory action of GLP-1 on reward, including operant and drug reward. Our findings further suggest that GLP-1 receptor mechanisms outside of the midbrain and ventral striatum are critically involved in brain reward neurotransmission. 展开更多
关键词 Alcohol ANHEDONIA Appetitive Motivation Brain reward Ethanol Exendin-4 GLP-1 Receptors Operant Responding Palatable Food Intake reward Salience
暂未订购
Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning
11
作者 Yameng Yin Lieping Zhang +3 位作者 Xiaoxu Shi Yilin Wang Jiansheng Peng Jianchu Zou 《Computers, Materials & Continua》 SCIE EI 2024年第11期2769-2790,共22页
By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning... By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation. 展开更多
关键词 Double Deep Q Network path planning average Q-value estimation reward redistribution mechanism reward-prioritized experience selection method
在线阅读 下载PDF
On Principle of Rewards in English Learning
12
作者 熊莉芸 《广西中医学院学报》 2004年第2期110-114,共5页
There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have n... There is no question that learning a foreign language like English is different from learning other subjects, mainly because it is new to us Chinese and there is no enough environment. But that doesn’t mean we have no way to learn it and do it well .If asked to identify the most powerful influences on learning, motivation would probably be high on most teachers’ and learners’ lists. It seems only sensible to assume that English learning is most likely to occur when the learners want to learn. That is, when motivation such as interest, curiosity, or a desire achieves, the learners would be engaged in learning. However, how do we teachers motivate our students to like learning and learn well? Here, rewards both extrinsic and intrinsic are of great value and play a vital role in English learning. 展开更多
关键词 extrinsic and intrinsic rewards MOTIVATION ACTIVATE stimulate
暂未订购
Discussion on the Effectiveness of Educational Reward
13
作者 Sanzhen Xu 《Journal of Contemporary Educational Research》 2021年第2期85-89,共5页
The psychological mechanism of reward is to form operational conditioned reflex through positive reinforcement and negative reinforcement.The positive effect of reward is to strengthen external learning motivation,and... The psychological mechanism of reward is to form operational conditioned reflex through positive reinforcement and negative reinforcement.The positive effect of reward is to strengthen external learning motivation,and reward can sometimes improve creativity.The negative effects are:weakening students'creativity,weakening the internal motivation of learning and hindering the development of autonomy.Teachers should apply educational rewards scientifically,take care of their age,consider the difficulty of tasks,pay attention to stimulating students'internal motivation,and give priority to spiritual rewards,supplemented by material rewards. 展开更多
关键词 Education Spiritual reward Material reward Internal motivation External motivation
在线阅读 下载PDF
Co-effect of Demand-control-support Model and Effort-reward Imbalance Model on Depression Risk Estimation in Humans: Findings from Henan Province of China 被引量:9
14
作者 YU Shan Fa NAKATA Akinori +4 位作者 GU Gui Zhen SWANSON Naomi G ZHOU Wen Hui HE Li Hua WANG Sheng 《Biomedical and Environmental Sciences》 SCIE CAS CSCD 2013年第12期962-971,共10页
Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used... Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans. 展开更多
关键词 DEPRESSION Work-related stress Demand-control-support Effort- reward imbalance
暂未订购
Brain areas activated by uncertain reward-based decision-making in healthy volunteers 被引量:4
15
作者 Zongjun Guo Juan Chen +3 位作者 Shien Liu Yuhuan Li Bo Sun Zhenbo Gao 《Neural Regeneration Research》 SCIE CAS CSCD 2013年第35期3344-3352,共9页
Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergi... Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss. 展开更多
关键词 neural regeneration NEUROIMAGING DECISION-MAKING reward uncertainty cognitive processing functionalmagnetic resonance imaging BRAIN grants-supported paper NEUROREGENERATION
暂未订购
Projections from D2 Neurons in Different Subregions of Nucleus Accumbens Shell to Ventral Pallidum Play Distinct Roles in Reward and Aversion 被引量:4
16
作者 Yun Yao Ge Gao +4 位作者 Kai Liu Xin Shi Mingxiu Cheng Yan Xiong Sen Song 《Neuroscience Bulletin》 SCIE CAS CSCD 2021年第5期623-640,共18页
The nucleus accumbens shell(NAcSh) plays an important role in reward and aversion. Traditionally, NAc dopamine receptor 2-expressing(D2) neurons are assumed to function in aversion. However, this has been challenged b... The nucleus accumbens shell(NAcSh) plays an important role in reward and aversion. Traditionally, NAc dopamine receptor 2-expressing(D2) neurons are assumed to function in aversion. However, this has been challenged by recent reports which attribute positive motivational roles to D2 neurons. Using optogenetics and multiple behavioral tasks, we found that activation of D2 neurons in the dorsomedial NAcSh drives preference and increases the motivation for rewards, whereas activation of ventral NAcSh D2 neurons induces aversion. Stimulation of D2 neurons in the ventromedial NAcSh increases movement speed and stimulation of D2 neurons in the ventrolateral NAc Sh decreases movement speed. Combining retrograde tracing and in situ hybridization, we demonstrated that glutamatergic and GABAergic neurons in the ventral pallidum receive inputs differentially from the dorsomedial and ventral NAcSh. All together, these findings shed light on the controversy regarding the function of NAcSh D2 neurons, and provide new insights into understanding the heterogeneity of the NAcSh. 展开更多
关键词 Nucleus accumbens shell Ventral pallidum D2 neurons reward AVERSION MOTIVATION
原文传递
Repeated Failure in Reward Pursuit Alters Innate Drosophila Larval Behaviors 被引量:2
17
作者 Yue Fei Dikai Zhu +3 位作者 Yixuan Sun Caixia Gong Shenyang Huang Zhefeng Gong 《Neuroscience Bulletin》 SCIE CAS CSCD 2018年第6期901-911,共11页
Animals always seek rewards and the related neural basis has been well studied. However, what happens when animals fail to get a reward is largely unknown,although this is commonly seen in behaviors such as predation.... Animals always seek rewards and the related neural basis has been well studied. However, what happens when animals fail to get a reward is largely unknown,although this is commonly seen in behaviors such as predation. Here, we set up a behavioral model of repeated failure in reward pursuit(RFRP) in Drosophila larvae. In this model, the larvae were repeatedly prevented from reaching attractants such as yeast and butyl acetate, before finally abandoning further attempts. After giving up, they usually showed a decreased locomotor speed and impaired performance in light avoidance and sugar preference,which were named as phenotypes of RFRP states. In larvae that had developed RFRP phenotypes, the octopamine concentration was greatly elevated, while tbh mutants devoid of octopamine were less likely to develop RFRP phenotypes, and octopamine feeding efficiently restored such defects. By down-regulating tbh in different groups of neurons and imaging neuronal activity, neurons that regulated the development of RFRP states and the behavioral exhibition of RFRP phenotypes were mapped to a small subgroup of non-glutamatergic and glutamatergic octopaminergic neurons in the central larval brain. Our results establish a model for investigating the effect of depriving an expected reward in Drosophila and provide a simplified framework for the associated neural basis. 展开更多
关键词 Drosophila larva Repeated failure in reward pursuit OCTOPAMINE
原文传递
O-GlcNAcylation in Ventral Tegmental Area Dopaminergic Neurons Regulates Motor Learning and the Response to Natural Reward 被引量:1
18
作者 Ming-Shuo Shao Xiao Yang +5 位作者 Chen-Chun Zhang Chang-You Jiang Ying Mao Wen-Dong Xu Lan Ma Fei-Fei Wang 《Neuroscience Bulletin》 SCIE CAS CSCD 2022年第3期263-274,共12页
Protein O-GlcNAcylation is a post-translational modification that links environmental stimuli with changes in intracellular signal pathways,and its disturbance has been found in neurodegenerative diseases and metaboli... Protein O-GlcNAcylation is a post-translational modification that links environmental stimuli with changes in intracellular signal pathways,and its disturbance has been found in neurodegenerative diseases and metabolic disorders.However,its role in the mesolimbic dopamine(DA)system,especially in the ventral tegmental area(VTA),needs to be elucidated.Here,we found that injection of Thiamet G,an O-GlcNAcase(OGA)inhibitor,in the VTA and nucleus accumbens(NAc)of mice,facilitated neuronal O-GlcNAcylation and decreased the operant response to sucrose as well as the latency to fall in rotarod test.Mice with DAergic neuron-specific knockout of O-GlcNAc transferase(OGT)displayed severe metabolic abnormalities and died within 4–8 weeks after birth.Furthermore,mice specifically overexpressing OGT in DAergic neurons in the VTA had learning defects in the operant response to sucrose,and impaired motor learning in the rotarod test.Instead,overexpression of OGT in GABAergic neurons in the VTA had no effect on these behaviors.These results suggest that protein O-GlcNAcylation of DAergic neurons in the VTA plays an important role in regulating the response to natural reward and motor learning in mice. 展开更多
关键词 O-GLCNACYLATION Dopaminergic neurons Natural reward Motor learning
原文传递
Choice of discount rate in reinforcement learning with long-delay rewards 被引量:1
19
作者 LIN Xiangyang XING Qinghua LIU Fuxian 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期381-392,共12页
In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-ter... In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches. 展开更多
关键词 reinforcement learning(RL) discount rate longdelay reward Q-LEARNING treasure-detecting model feasible solution
在线阅读 下载PDF
Impact of social relationship on firms' sharing reward program 被引量:1
20
作者 Wei Wei Mei Shu e Zhong Weijun 《Journal of Southeast University(English Edition)》 EI CAS 2018年第4期540-544,共5页
In order to make strategic decision on firms’ sharing reward program( SRP), a nested Stackelberg game is developed. The sharing behavior among users and the rewarding strategy of firms are modeled. The optimal sharin... In order to make strategic decision on firms’ sharing reward program( SRP), a nested Stackelberg game is developed. The sharing behavior among users and the rewarding strategy of firms are modeled. The optimal sharing bonus is worked out and the impact of social relationships among customers is discussed. The results show that the higher the bonus,the more efforts the inductor is willing to make to persuade the inductee into buying. In addition,the firms should take the social relationship into consideration when setting the optimal sharing bonus. If the social relationship is weak,there is no need to adopt the SRP. Otherwise,there are two ways to reward the inductors. Also,the stronger the social relationship,the fewer the sharing bonuses that should be offered to the inductors,and the higher the expected profits. As a result,it is reasonable for the firms to implement SRPs on the social media where users are familiar with each other. 展开更多
关键词 social relationship sharing reward program incentive strategy social commerce
在线阅读 下载PDF
上一页 1 2 190 下一页 到第
使用帮助 返回顶部