期刊文献+
共找到183篇文章
< 1 2 10 >
每页显示 20 50 100
Reputational preference and other-regarding preference based rewarding mechanism promotes cooperation in spatial social dilemmas
1
作者 Huayan Pei Guanghui Yan Huanmin Wang 《Chinese Physics B》 SCIE EI CAS CSCD 2021年第5期206-214,共9页
To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference... To study the incentive mechanisms of cooperation, we propose a preference rewarding mechanism in the spatial prisoner’s dilemma game, which simultaneously considers reputational preference, other-regarding preference and the dynamic adjustment of vertex weight. The vertex weight of a player is adaptively adjusted according to the comparison result of his own reputation and the average reputation value of his immediate neighbors. Players are inclined to pay a personal cost to reward the cooperative neighbor with the greatest vertex weight. The vertex weight of a player is proportional to the preference rewards he can obtain from direct neighbors. We find that the preference rewarding mechanism significantly facilitates the evolution of cooperation, and the dynamic adjustment of vertex weight has powerful effect on the emergence of cooperative behavior. To validate multiple effects, strategy distribution and the average payoff and fitness of players are discussed in a microcosmic view. 展开更多
关键词 COOPERATION rewarding mechanism reputational preference other-regarding preference
原文传递
The Use of a“Rewarding System”for Healthcare Personnel
2
作者 Fabia Pioli Chiara Gatti +3 位作者 Nadia Storti Alessandro Maccioni Laura Fermani Mauro Pelagalli 《Journal of Health Science》 2018年第6期414-419,共6页
This work aims to identify a method by the coordinator of the OU(operational unit)for the training of gratified personnel through the use of a rewarding system.The continuous transformations that concern the Italian h... This work aims to identify a method by the coordinator of the OU(operational unit)for the training of gratified personnel through the use of a rewarding system.The continuous transformations that concern the Italian healthcare scene lead the operators to face always new needs and problems.Professionals can not only be considered as workers but bearers of qualified intellectual,professional and cultural skills.Individual coordinators are required to be real leaders within their operational units and to use their managerial skills in achieving company objectives and in evaluating the personnel they manage.The main factor to which difficulties in the management of staff are related concerns the motivation,defined as a state of mind together with aspirations,needs,orientations,that pushes people to act and to use a behavior characterized by commitment,perseverance and determination.The need to better rationalize the resources available,to promote high quality health care,improving safety,efficiency and appropriateness has led the general management and coordinator of the OU to use the reward systems.With the introduction of this procedure aimed at enhancing the merit and encouraging virtuous behavior during the provision of health services,the public employment reform participates in the evolution of the regulatory framework and it turns on the change that is taking place in the world of work. 展开更多
关键词 MOTIVATION GRATIFICATION rewarding system.
在线阅读 下载PDF
Charitable Heart and Rewarding Conduct——Impression from the Reception of the Goodwill Delegation of California
3
作者 Li Jin Chang Jiuqing 《International Understanding》 2003年第4期37-40,共4页
关键词 from of Charitable Heart and rewarding Conduct
在线阅读 下载PDF
ACR-MLM:a privacy-preserving framework for anonymous and confidential rewarding in blockchain-based multi-level marketing
4
作者 Saeed Banaeian Far Azadeh Imani Rad Maryam Rajabzadeh Asaar 《Data Science and Management》 2022年第4期219-231,共13页
Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,su... Network marketing is a trading technique that provides companies with the opportunity to increase sales.With the increasing number of Internet-based purchases,several threats are increasingly observed in this field,such as user privacy violations,company owner(CO)fraud,the changing of sold products’information,and the scalability of selling networks.This study presents the concept of a blockchain-based market called ACR-MLM that functions based on the multi-level marketing(MLM)model,through which registered users receive anonymous and confidential rewards for their own and their subgroups’sales.Applying a public blockchain as the ACR-MLM framework’s infrastructure solves existing problems in MLM-based markets,such as CO fraud(against the government or its users),user privacy violations(obtaining their real names or subgroup users),and scalability(when vast numbers of users have been registered).To provide confidentiality and scalability to the ACR-MLM framework,hierarchical identity-based encryption(HIBE)was applied with a functional encryption(FE)scheme.Finally,the security of ACR-MLM is analyzed using the random oracle(RO)model and then evaluated. 展开更多
关键词 Anonymous rewarding Blockchain Functional encryption Multi-level marketing PRIVACY
在线阅读 下载PDF
INTERNATIONAL COOPERATION IS A REWARDING ROUTE FOR HI-TECH DEVELOPMENT
5
作者 Xie Ming et al.(Institute of Modern Physics, CAS) 《Bulletin of the Chinese Academy of Sciences》 1997年第1期84-87,共4页
The CAS Institute of Modern Physics is a center of pure basic research concerning nuclear physics, accelerator physics and related technology. In recent years, it succeeded in the construction of China’s first produc... The CAS Institute of Modern Physics is a center of pure basic research concerning nuclear physics, accelerator physics and related technology. In recent years, it succeeded in the construction of China’s first production line for manufacturing radiation-crosslinked (RC) wire and cable with the aid of international cooperation,achieving rewarding benefits from it. 展开更多
关键词 LINE In INTERNATIONAL COOPERATION IS A rewarding ROUTE FOR HI-TECH DEVELOPMENT ELV
在线阅读 下载PDF
REWARDING COOPERATION BETWEEN CAS AND YUNNAN TRIGGERS A SOCIO-ECONOMIC TAKE-OFF
6
作者 Li Jiating(People’s Government of Yunnan Province) 《Bulletin of the Chinese Academy of Sciences》 1999年第4期217-220,共4页
Ⅰ. THE SUGGESTION OF THE STRATEGIC MEASURE Situated at the junction between the vast Eurasian landmass and the south Asian subcontinent, Yunnan Prov-
关键词 CAS rewarding COOPERATION BETWEEN CAS AND YUNNAN TRIGGERS A SOCIO-ECONOMIC TAKE-OFF
在线阅读 下载PDF
Efficacy of Pediococcus acidilactici CCFM6432 in alleviating anhedonia in major depressive disorder: A randomized controlled trial
7
作者 Du-Xing Li Qi-Ming Hu +6 位作者 Chen-Chen Xu Hong-Yu Yang Ji-Kang Liu Yi-Fan Sun Gang Wang Jun Wang Zhen-He Zhou 《World Journal of Psychiatry》 2025年第7期184-197,共14页
BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer po... BACKGROUND Anhedonia,a hallmark symptom of major depressive disorder(MDD),is often resistant to common antidepressants.Preliminary evidence indicates that Pedio-coccus acidilactici(P.acidilactici)CCFM6432 may offer potential benefits in ame-liorating this symptomatology in patients with MDD.AIM To further assess the efficacy of P.acidilactici CCFM6432 in alleviating anhedonia in patients with MDD,using a combination of objective and subjective assessment tools.METHODS Adult patients with MDD exhibiting anhedonic symptoms were enrolled and randomly assigned to two treatment groups:One receiving standard antide-pressant therapy plus P.acidilactici CCFM6432,and the other receiving standard antidepressant treatment along with a placebo,for 30 days.Assessments were conducted at baseline and post-intervention using the Hamilton Depression Rating Scale(HAMD),Temporal Experience of Pleasure Scale(TEPS),and synchronous electroencephalography(EEG)during a"Doors Guessing Task."Changes in both clinical outcomes and EEG biomarkers,specifically the stimulus-preceding negativity(SPN)and feedback-related nega-tivity amplitudes,were analyzed.RESULTS Of the 92 screened participants,71 were enrolled and 55 completed the study(CCFM6432 group:n=27;Placebo group:n=28).No baseline differences were noted between the groups in terms of demographics,clinical assessments,or EEG metrics.A mixed-design analysis of variance revealed that the CCFM6432 group showed significantly greater improvements in both HAMD and TEPS scores compared to the Placebo group.Moreover,the CCFM6432 group demonstrated a significant increase in SPN amplitudes,which were inversely correlated with the improvements observed in HAMD scores.No such changes were observed in the Placebo group.CONCLUSION Adjunctive administration of P.acidilactici CCFM6432 not only augments the therapeutic efficacy of antide-pressants but also significantly ameliorates the symptoms of anhedonia in MDD. 展开更多
关键词 ANHEDONIA PROBIOTICS DEPRESSION Event-related potentials Reward processing
暂未订购
Research on Adaptive Reward Optimization Method for Robot Navigation in Complex Dynamic Environment
8
作者 Jie He Dongmei Zhao +2 位作者 Tao Liu Qingfeng Zou Jian’an Xie 《Computers, Materials & Continua》 2025年第8期2733-2749,共17页
Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environ... Robot navigation in complex crowd service scenarios,such as medical logistics and commercial guidance,requires a dynamic balance between safety and efficiency,while the traditional fixed reward mechanism lacks environmental adaptability and struggles to adapt to the variability of crowd density and pedestrian motion patterns.This paper proposes a navigation method that integrates spatiotemporal risk field modeling and adaptive reward optimization,aiming to improve the robot’s decision-making ability in diverse crowd scenarios through dynamic risk assessment and nonlinear weight adjustment.We construct a spatiotemporal risk field model based on a Gaussian kernel function by combining crowd density,relative distance,andmotion speed to quantify environmental complexity and realize crowd-density-sensitive risk assessment dynamically.We apply an exponential decay function to reward design to address the linear conflict problem of fixed weights in multi-objective optimization.We adaptively adjust weight allocation between safety constraints and navigation efficiency based on real-time risk values,prioritizing safety in highly dense areas and navigation efficiency in sparse areas.Experimental results show that our method improves the navigation success rate by 9.0%over state-of-the-art models in high-density scenarios,with a 10.7%reduction in intrusion time ratio.Simulation comparisons validate the risk field model’s ability to capture risk superposition effects in dense scenarios and the suppression of near-field dangerous behaviors by the exponential decay mechanism.Our parametric optimization paradigm establishes an explicit mapping between navigation objectives and risk parameters through rigorous mathematical formalization,providing an interpretable approach for safe deployment of service robots in dynamic environments. 展开更多
关键词 Machine learning reinforcement learning ROBOTS autonomous navigation reward shaping
在线阅读 下载PDF
RS-DRL-based offloading policy and UAV trajectory design in F-MEC systems
9
作者 Yulu Yang Han Xu +3 位作者 Zhu Jin Tiecheng Song Jing Hu Xiaoqin Song 《Digital Communications and Networks》 2025年第2期377-386,共10页
For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper consi... For better flexibility and greater coverage areas,Unmanned Aerial Vehicles(UAVs)have been applied in Flying Mobile Edge Computing(F-MEC)systems to offer offloading services for the User Equipment(UEs).This paper considers a disaster-affected scenario where UAVs undertake the role of MEC servers to provide computing resources for Disaster Relief Devices(DRDs).Considering the fairness of DRDs,a max-min problem is formulated to optimize the saved time by jointly designing the trajectory of the UAVs,the offloading policy and serving time under the constraint of the UAVs'energy capacity.To solve the above non-convex problem,we first model the service process as a Markov Decision Process(MDP)with the Reward Shaping(RS)technique,and then propose a Deep Reinforcement Learning(DRL)based algorithm to find the optimal solution for the MDP.Simulations show that the proposed RS-DRL algorithm is valid and effective,and has better performance than the baseline algorithms. 展开更多
关键词 Flying mobile edge computing Task offloading Reward shaping Deep reinforcement learning
在线阅读 下载PDF
Emerging role of microglia in the developing dopaminergic system:Perturbation by early life stress
10
作者 Kaijie She Naijun Yuan +4 位作者 Minyi Huang Wenjun Zhu Manshi Tang Qingyu Ma Jiaxu Chen 《Neural Regeneration Research》 2026年第1期126-140,共15页
Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily... Early life stress correlates with a higher prevalence of neurological disorders,including autism,attention-deficit/hyperactivity disorder,schizophrenia,depression,and Parkinson's disease.These conditions,primarily involving abnormal development and damage of the dopaminergic system,pose significant public health challenges.Microglia,as the primary immune cells in the brain,are crucial in regulating neuronal circuit development and survival.From the embryonic stage to adulthood,microglia exhibit stage-specific gene expression profiles,transcriptome characteristics,and functional phenotypes,enhancing the susceptibility to early life stress.However,the role of microglia in mediating dopaminergic system disorders under early life stress conditions remains poorly understood.This review presents an up-to-date overview of preclinical studies elucidating the impact of early life stress on microglia,leading to dopaminergic system disorders,along with the underlying mechanisms and therapeutic potential for neurodegenerative and neurodevelopmental conditions.Impaired microglial activity damages dopaminergic neurons by diminishing neurotrophic support(e.g.,insulin-like growth factor-1)and hinders dopaminergic axon growth through defective phagocytosis and synaptic pruning.Furthermore,blunted microglial immunoreactivity suppresses striatal dopaminergic circuit development and reduces neuronal transmission.Furthermore,inflammation and oxidative stress induced by activated microglia can directly damage dopaminergic neurons,inhibiting dopamine synthesis,reuptake,and receptor activity.Enhanced microglial phagocytosis inhibits dopamine axon extension.These long-lasting effects of microglial perturbations may be driven by early life stress–induced epigenetic reprogramming of microglia.Indirectly,early life stress may influence microglial function through various pathways,such as astrocytic activation,the hypothalamic–pituitary–adrenal axis,the gut–brain axis,and maternal immune signaling.Finally,various therapeutic strategies and molecular mechanisms for targeting microglia to restore the dopaminergic system were summarized and discussed.These strategies include classical antidepressants and antipsychotics,antibiotics and anti-inflammatory agents,and herbal-derived medicine.Further investigations combining pharmacological interventions and genetic strategies are essential to elucidate the causal role of microglial phenotypic and functional perturbations in the dopaminergic system disrupted by early life stress. 展开更多
关键词 Chinese herbal drugs dopamine early life stress epigenetics gut-brain axis hypothalamo–pituitary–adrenal axis innate immune memory MICROGLIA neuroinflammation Parkinson disease PHAGOCYTOSIS REWARD
暂未订购
Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions
11
作者 Liran Zhao Sihan Xu +1 位作者 Qinbo Sun Zhaohui Dang 《Defence Technology(防务技术)》 2025年第9期159-183,共25页
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac... This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research. 展开更多
关键词 Orbital attack-defense game Impulsive maneuver Multi-agent deep reinforcement learning Reward function design
在线阅读 下载PDF
Phosphorus reward mechanisms of an arbuscular mycorrhizal fungus and a dark septate endophyte to plant carbon allocation:Synergism or competition?
12
作者 Yinli BI Linlin XIE +1 位作者 Xiao WANG Yang ZHOU 《Pedosphere》 2025年第5期869-878,共10页
Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the... Combined inoculation with dark septate endophytes(DSEs)and arbuscular mycorrhizal fungi(AMF)has been shown to promote plant growth,yet the underlying plant-fungus interaction mechanisms remain unclear.To elucidate the nature of this symbiosis,it is crucial to explore carbon(C)transport from plants to fungi and nutrient exchange between them.In this study,a pot experiment was conducted with two phosphorus(P)fertilization levels(low and normal)and four fungal inoculation treatments(no inoculation,single inoculation of AMF and DSE,and co-inoculation of AMF and DSE).The^(13)C isotope pulse labeling method was employed to quantify the plant photosynthetic C transfer from plants to different fungi,shedding light on the mechanisms of nutrient exchange between plants and fungi.Soil and mycelium δ^(13)C,soil C/N ratio,and soil C/P ratio were higher at the low P level than at the normal P level.However,soil microbial biomass C/P ratio was lower at the low P level,suggesting that the low P level was beneficial to soil C fixation and soil fungal P mineralization and transport.At the low P level,the P reward to plants from AMF and DSE increased significantly when the plants transferred the same amount of C to the fungi,and the two fungi synergistically promoted plant nutrient uptake and growth.At the normal P level,the root P content was significantly higher in the AMF-inoculated plants than in the DSE-inoculated plants,indicating that AMF contributed more than DSE to plant P uptake with the same amount of C received.Moreover,plants preferentially allocated more C to AMF.These findings indicate the presence of a source-sink balance between plant C allocation and fungal P contribution.Overall,AMF and DSE conferred a higher reward to plants at the low P level through functional synergistic strategies. 展开更多
关键词 Alternaria sp. Diversispora epigaea nutrient exchange plant-fungus association plant P uptake reward/investment ratio stable isotope pulse labeling symbiotic interaction
原文传递
Rewarding and Revitalizing
13
作者 HU YUE 《Beijing Review》 2009年第47期33-33,共1页
The fiscal stimulus package continues to play the leading role in China’s economic prosperity After several nervous months, China is finally breathing a sigh of relief as the powerful stimulus shifts the nation’ s g... The fiscal stimulus package continues to play the leading role in China’s economic prosperity After several nervous months, China is finally breathing a sigh of relief as the powerful stimulus shifts the nation’ s growth engine out of 展开更多
关键词 rewarding and Revitalizing
原文传递
Characterization of glutamatergic VTA neural population responses to aversive and rewarding conditioning in freely-moving mice
14
作者 Quentin Montardy Zheng Zhou +7 位作者 Zhuogui Lei Xuemei Liu Pengyu Zeng Chen Chen Yuanming Liu Paula Sanz-Leon Kang Huang Liping Wang 《Science Bulletin》 SCIE EI CAS CSCD 2019年第16期1167-1178,共12页
The Ventral Tegmental Area(VTA)is a midbrain structure known to integrate aversive and rewarding stimuli,but little is known about the role of VTA glutamatergic(VGluT2)neurons in these functions.Direct activation of V... The Ventral Tegmental Area(VTA)is a midbrain structure known to integrate aversive and rewarding stimuli,but little is known about the role of VTA glutamatergic(VGluT2)neurons in these functions.Direct activation of VGluT2 soma evokes rewarding behaviors,while activation of their downstream projections evokes aversive behaviors.To facilitate our understanding of these conflicting properties,we recorded calcium signals from VTAVGluT2+neurons using fiber photometry in VGluT2-cre mice to investigate how this population was recruited by aversive and rewarding stimulation,both during unconditioned and conditioned protocols.Our results revealed that,as a population,VTAVGluT2+neurons responded similarly to unconditioned-aversive and unconditioned-rewarding stimulation.During aversive and rewarding conditioning,the CS-evoked responses gradually increased across trials whilst the US-evoked response remained stable.Retrieval 24 h after conditioning,during which mice received only CS presentation,resulted in VTAVGluT2+neurons strongly responding to CS presentation and to the expected-US but only for aversive conditioning.To help understand these differences based on VTAVGluT2+neuronal networks,the inputs and outputs of VTAVGluT2+neurons were investigated using Cholera Toxin B(CTB)and rabies virus.Based on our results,we propose that the divergent VTAVGluT2+neuronal responses to aversion and reward conditioning may be partly due to the existence of VTAVGluT2+subpopulations that are characterized by their connectivity. 展开更多
关键词 Ventral Tegmental Area AVERSION REWARD CONDITIONING Behavior
原文传递
UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience 被引量:1
15
作者 ZHAN Guang ZHANG Kun +1 位作者 LI Ke PIAO Haiyin 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期644-665,共22页
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo... Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy. 展开更多
关键词 unmanned aerial vehicle(UAV) maneuvering decision-making autonomous air-delivery deep reinforcement learning reward shaping expert experience
在线阅读 下载PDF
Evolutionary dynamics of tax-based strong altruistic reward andpunishment in a public goods game
16
作者 Zhi-Hao Yang Yan-Long Yang 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第9期247-257,共11页
In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the g... In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment. 展开更多
关键词 evolutionary game theory strong altruism PUNISHMENT REWARD
原文传递
Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment
17
作者 Emran Al-Buraihy Dan Wang 《Computers, Materials & Continua》 SCIE EI 2024年第6期3913-3938,共26页
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net... Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources. 展开更多
关键词 Cross-language image description multimodal deep learning semantic matching reward mechanisms
在线阅读 下载PDF
Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
18
作者 DONG Yubo CUI Tao +3 位作者 ZHOU Yufan SONG Xun ZHU Yue DONG Peng 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期646-655,共10页
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting... Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance. 展开更多
关键词 multi-agent reinforcement learning deep reinforcement learning(DRL) long episode reward function
原文传递
Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning
19
作者 Yameng Yin Lieping Zhang +3 位作者 Xiaoxu Shi Yilin Wang Jiansheng Peng Jianchu Zou 《Computers, Materials & Continua》 SCIE EI 2024年第11期2769-2790,共22页
By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning... By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation. 展开更多
关键词 Double Deep Q Network path planning average Q-value estimation reward redistribution mechanism reward-prioritized experience selection method
在线阅读 下载PDF
SPaRM: an efficient exploration and planning framework for sparse reward reinforcement learning
20
作者 BAN Jian LI Gongyan XU Shaoyun 《High Technology Letters》 EI CAS 2024年第4期344-355,共12页
Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challeng... Due to the issue of long-horizon,a substantial number of visits to the state space is required during the exploration phase of reinforcement learning(RL)to gather valuable information.Addi-tionally,due to the challenge posed by sparse rewards,the planning phase of reinforcement learning consumes a considerable amount of time on repetitive and unproductive tasks before adequately ac-cessing sparse reward signals.To address these challenges,this work proposes a space partitioning and reverse merging(SPaRM)framework based on reward-free exploration(RFE).The framework consists of two parts:the space partitioning module and the reverse merging module.The former module partitions the entire state space into a specific number of subspaces to expedite the explora-tion phase.This work establishes its theoretical sample complexity lower bound.The latter module starts planning in reverse from near the target and gradually extends to the starting state,as opposed to the conventional practice of starting at the beginning.This facilitates the early involvement of sparse rewards at the target in the policy update process.This work designs two experimental envi-ronments:a complex maze and a set of randomly generated maps.Compared with two state-of-the-art(SOTA)algorithms,experimental results validate the effectiveness and superior performance of the proposed algorithm. 展开更多
关键词 reinforcement learning(RL) sparse reward reward-free exploration(RFE) space partitioning(SP) reverse merging(RM)
在线阅读 下载PDF
上一页 1 2 10 下一页 到第
使用帮助 返回顶部