The efficiency of carrier-based aircraft support operation scheduling critically impacts aircraft carrier operational effectiveness by determining sortie generation rates,yet faces significant challenges in complex de...The efficiency of carrier-based aircraft support operation scheduling critically impacts aircraft carrier operational effectiveness by determining sortie generation rates,yet faces significant challenges in complex deck environments characterized by resource coupling,dynamic constraints,and highdimensional state-action spaces.Traditional optimization algorithms and vanilla reinforcement learning(RL)struggle with computational inefficiency,sparse rewards,and adaptability to dynamic scenarios,while human expert systems are constrained by the quality of expert knowledge,and poor expert guidance may even have a negative impact.To address these limitations,this paper proposes a human experience-guided actor-critic reinforcement learning framework that synergizes domain expertise with adaptive learning.First,a dynamic Markov decision process(MDP)model is developed to rigorously simulate carrier deck operations,explicitly encoding constraints on positions,resources,and collision avoidance.Building upon this foundation,a human experience database is constructed to enable real-time pattern-matching-based intervention during agent-environment interactions,dynamically correcting wrong actions to avoid catastrophic states while refining exploration efficiency.Finally,the policy and value network objectives are reshaped to incorporate human intent through hybrid reward functions and adaptive guidance weighting,ensuring balanced integration of expert knowledge with RL's exploration capabilities.Extensive simulations across three scenarios demonstrate superior performance compared to state-of-the-art methods and maintain robustness under suboptimal human guidance.These results validate the framework's ability to harmonize human expertise with adaptive learning,offering a practical solution for real-world carriers.展开更多
Existing methods for tracing water pollution sources typically integrate three-dimensional excitationemission matrix(3D-EEM)fluorescence spectroscopy with similarity-based matching algorithms.However,these approaches ...Existing methods for tracing water pollution sources typically integrate three-dimensional excitationemission matrix(3D-EEM)fluorescence spectroscopy with similarity-based matching algorithms.However,these approaches exhibit high error rates in borderline cases and necessitate expert manual review,which limits scalability and introduces inconsistencies between algorithmic outputs and expert judgment.To address these limitations,we propose a large vision-language model(VLM)designed as an“expert agent”to automatically refine similarity scores,ensuring alignment with expert decisions and overcoming key application bottlenecks.The model consists of two core components:(1)rule-based similarity calculation module generate initial spectral similarity scores,and(2)pre-trained large vision-language model fine-tuned via supervised learning and reinforcement learning with human feedback(RLHF)to emulate expert assessments.To facilitate training and evaluation,we introduce two expert-annotated datasets,Spec1k and SpecReason,which capture both quantitative corrections and qualitative reasoning patterns,allowing the model to emulate expert decision-making processes.Experimental results demonstrate that our method achieves 81.45%source attribution accuracy,38.24%higher than rule-based and machine learning baselines.Real-world deployment further validates its effectiveness.展开更多
基金supported by funding from the National Natural Science Foundation of China(Grant Nos.62325602,62406292,62302459,62406293,and 62036010)。
文摘The efficiency of carrier-based aircraft support operation scheduling critically impacts aircraft carrier operational effectiveness by determining sortie generation rates,yet faces significant challenges in complex deck environments characterized by resource coupling,dynamic constraints,and highdimensional state-action spaces.Traditional optimization algorithms and vanilla reinforcement learning(RL)struggle with computational inefficiency,sparse rewards,and adaptability to dynamic scenarios,while human expert systems are constrained by the quality of expert knowledge,and poor expert guidance may even have a negative impact.To address these limitations,this paper proposes a human experience-guided actor-critic reinforcement learning framework that synergizes domain expertise with adaptive learning.First,a dynamic Markov decision process(MDP)model is developed to rigorously simulate carrier deck operations,explicitly encoding constraints on positions,resources,and collision avoidance.Building upon this foundation,a human experience database is constructed to enable real-time pattern-matching-based intervention during agent-environment interactions,dynamically correcting wrong actions to avoid catastrophic states while refining exploration efficiency.Finally,the policy and value network objectives are reshaped to incorporate human intent through hybrid reward functions and adaptive guidance weighting,ensuring balanced integration of expert knowledge with RL's exploration capabilities.Extensive simulations across three scenarios demonstrate superior performance compared to state-of-the-art methods and maintain robustness under suboptimal human guidance.These results validate the framework's ability to harmonize human expertise with adaptive learning,offering a practical solution for real-world carriers.
文摘Existing methods for tracing water pollution sources typically integrate three-dimensional excitationemission matrix(3D-EEM)fluorescence spectroscopy with similarity-based matching algorithms.However,these approaches exhibit high error rates in borderline cases and necessitate expert manual review,which limits scalability and introduces inconsistencies between algorithmic outputs and expert judgment.To address these limitations,we propose a large vision-language model(VLM)designed as an“expert agent”to automatically refine similarity scores,ensuring alignment with expert decisions and overcoming key application bottlenecks.The model consists of two core components:(1)rule-based similarity calculation module generate initial spectral similarity scores,and(2)pre-trained large vision-language model fine-tuned via supervised learning and reinforcement learning with human feedback(RLHF)to emulate expert assessments.To facilitate training and evaluation,we introduce two expert-annotated datasets,Spec1k and SpecReason,which capture both quantitative corrections and qualitative reasoning patterns,allowing the model to emulate expert decision-making processes.Experimental results demonstrate that our method achieves 81.45%source attribution accuracy,38.24%higher than rule-based and machine learning baselines.Real-world deployment further validates its effectiveness.