期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Human experience-guided reinforcement learning for carrier-based aircraft support operation scheduling
1
作者 Xudong Chen Yizhe Luo +5 位作者 Qihang Sun Wenxiao Guo Zhao Jin Shuo Feng Yucheng Shi Mingliang Xu 《Defence Technology(防务技术)》 2025年第12期211-224,共14页
The efficiency of carrier-based aircraft support operation scheduling critically impacts aircraft carrier operational effectiveness by determining sortie generation rates,yet faces significant challenges in complex de... The efficiency of carrier-based aircraft support operation scheduling critically impacts aircraft carrier operational effectiveness by determining sortie generation rates,yet faces significant challenges in complex deck environments characterized by resource coupling,dynamic constraints,and highdimensional state-action spaces.Traditional optimization algorithms and vanilla reinforcement learning(RL)struggle with computational inefficiency,sparse rewards,and adaptability to dynamic scenarios,while human expert systems are constrained by the quality of expert knowledge,and poor expert guidance may even have a negative impact.To address these limitations,this paper proposes a human experience-guided actor-critic reinforcement learning framework that synergizes domain expertise with adaptive learning.First,a dynamic Markov decision process(MDP)model is developed to rigorously simulate carrier deck operations,explicitly encoding constraints on positions,resources,and collision avoidance.Building upon this foundation,a human experience database is constructed to enable real-time pattern-matching-based intervention during agent-environment interactions,dynamically correcting wrong actions to avoid catastrophic states while refining exploration efficiency.Finally,the policy and value network objectives are reshaped to incorporate human intent through hybrid reward functions and adaptive guidance weighting,ensuring balanced integration of expert knowledge with RL's exploration capabilities.Extensive simulations across three scenarios demonstrate superior performance compared to state-of-the-art methods and maintain robustness under suboptimal human guidance.These results validate the framework's ability to harmonize human expertise with adaptive learning,offering a practical solution for real-world carriers. 展开更多
关键词 reinforcement learning from human feedback Carrier-based aircraft scheduling Resource allocation Dynamic decision-making
在线阅读 下载PDF
From Algorithm to Expert:RLHF-Guided Vision-Language Model for 3D-EEM Fluorescence Spectroscopy Matching
2
作者 Chenglong Lu Jiehui Li +5 位作者 Tonglin Chen Changhua Zhou Yixin Fan Xinlin Ren Ziyi Ju Wei Wang 《Computers, Materials & Continua》 2026年第5期1883-1900,共18页
Existing methods for tracing water pollution sources typically integrate three-dimensional excitationemission matrix(3D-EEM)fluorescence spectroscopy with similarity-based matching algorithms.However,these approaches ... Existing methods for tracing water pollution sources typically integrate three-dimensional excitationemission matrix(3D-EEM)fluorescence spectroscopy with similarity-based matching algorithms.However,these approaches exhibit high error rates in borderline cases and necessitate expert manual review,which limits scalability and introduces inconsistencies between algorithmic outputs and expert judgment.To address these limitations,we propose a large vision-language model(VLM)designed as an“expert agent”to automatically refine similarity scores,ensuring alignment with expert decisions and overcoming key application bottlenecks.The model consists of two core components:(1)rule-based similarity calculation module generate initial spectral similarity scores,and(2)pre-trained large vision-language model fine-tuned via supervised learning and reinforcement learning with human feedback(RLHF)to emulate expert assessments.To facilitate training and evaluation,we introduce two expert-annotated datasets,Spec1k and SpecReason,which capture both quantitative corrections and qualitative reasoning patterns,allowing the model to emulate expert decision-making processes.Experimental results demonstrate that our method achieves 81.45%source attribution accuracy,38.24%higher than rule-based and machine learning baselines.Real-world deployment further validates its effectiveness. 展开更多
关键词 Vision-language model reinforcement learning with human feedback pollution source tracing 3D fluorescence spectroscopy
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部