期刊文献+
共找到95篇文章
< 1 2 5 >
每页显示 20 50 100
Research on Models and Policy Optimization for Integrated Development of Agriculture,Culture,and Tourism in Metropolitan Suburbs:A Case of Study of Cihui Subdistrict,Wuhan City,China
1
作者 XU Jiachen TENG Jin’e +1 位作者 WANG Zhijie GAN Chang 《Journal of Landscape Research》 2025年第6期32-36,共5页
With the in-depth advancement of rural revitalization and urban-rural integration strategies,the integration of agriculture,culture,and tourism has become an important path for promoting high-quality development in me... With the in-depth advancement of rural revitalization and urban-rural integration strategies,the integration of agriculture,culture,and tourism has become an important path for promoting high-quality development in metropolitan suburbs.Taking Cihui Subdistrict in Wuhan as an example,this research systematically sorts out its resource endowments,development models,and implementation effectiveness of the agriculture-culture-tourism integration through field research and case analysis.It further delves into the existing problems,such as insufficient planning and coordination,weak factor support,and insufficient industrial integration,along with their underlying causes.On such basis,targeted countermeasures are proposed from the aspects of scientific planning,industrial collaboration,talent introduction and cultivation,brand building,and policy optimization.The study aims to build an integrated development system of agriculture,culture,and tourism tailored to the characteristics of metropolitan suburbs,providing theoretical references and policy inspiration for similar regions. 展开更多
关键词 Integration of agriculture CULTURE and tourism Metropolitan suburbs Cihui Subdistrict Mode innovation policy optimization
在线阅读 下载PDF
Dynamic hedging of 50ETF options using Proximal Policy Optimization
2
作者 Lei Liu Mengmeng Hao Jinde Cao 《Journal of Automation and Intelligence》 2025年第3期198-206,共9页
This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the cha... This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness. 展开更多
关键词 B-S model Option hedging Reinforcement learning 50ETF Proximal policy optimization(PPO)
在线阅读 下载PDF
Gait Learning Reproduction for Quadruped Robots Based on Experience Evolution Proximal Policy Optimization
3
作者 LI Chunyang ZHU Xiaoqing +2 位作者 RUAN Xiaogang LIU Xinyuan ZHANG Siyuan 《Journal of Shanghai Jiaotong university(Science)》 2025年第6期1125-1133,共9页
Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch ... Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch due to problems such as reward sparsity.To solve the problem,we propose a experience evolution proximal policy optimization(EEPPO)algorithm which integrates PPO with priori knowledge highlighting by evolutionary strategy.We use the successful trained samples as priori knowledge to guide the learning direction in order to increase the success probability of the learning algorithm.To verify the effectiveness of the proposed EEPPO algorithm,we have conducted simulation experiments of the quadruped robot gait learning task on Pybullet.Experimental results show that the central pattern generator based radial basis function(CPG-RBF)network and the policy network are simultaneously updated to achieve the quadruped robot’s bionic diagonal trot gait learning task using key information such as the robot’s speed,posture and joints information.Experimental comparison results with the traditional soft actor-critic(SAC)algorithm validate the superiority of the proposed EEPPO algorithm,which can learn a more stable diagonal trot gait in flat terrain. 展开更多
关键词 quadruped robot proximal policy optimization(PPO) priori knowledge evolutionary strategy bionic gait learning
原文传递
A Lyapunov characterization of robust policy optimization
4
作者 Leilei Cui Zhong-Ping Jiang 《Control Theory and Technology》 EI CSCD 2023年第3期374-389,共16页
In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each... In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method. 展开更多
关键词 policy optimization policy iteration(PI)-Input-to-state stability(ISS) Lyapunov's direct method
原文传递
Policy Optimization Study Based on Evolutionary Learning
5
作者 刘素平 丁永生 《Journal of Donghua University(English Edition)》 EI CAS 2009年第6期621-624,共4页
In order to achieve an intelligent and automated self-management network,dynamic policy configuration and selection are needed.A certain policy only suits to a certain network environment.If the network environment ch... In order to achieve an intelligent and automated self-management network,dynamic policy configuration and selection are needed.A certain policy only suits to a certain network environment.If the network environment changes,the certain policy does not suit any more.Thereby,the policy-based management should also have similar "natural selection" process.Useful policy will be retained,and policies which have lost their effectiveness are eliminated.A policy optimization method based on evolutionary learning was proposed.For different shooting times,the priority of policy with high shooting times is improved,while policy with a low rate has lower priority,and long-term no shooting policy will be dormant.Thus the strategy for the survival of the fittest is realized,and the degree of self-learning in policy management is improved. 展开更多
关键词 policy-based management evolution learning policy optimization
在线阅读 下载PDF
Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making 被引量:1
6
作者 Yiming Yang Dengpeng Xing +1 位作者 Wannian Xia Peng Wang 《Machine Intelligence Research》 2025年第4期797-816,共20页
Reinforcement learning encounters formidable challenges when tasked with intricate decision-making scenarios,primarily due to the expansive parameterized action spaces and the vastness of the corresponding policy land... Reinforcement learning encounters formidable challenges when tasked with intricate decision-making scenarios,primarily due to the expansive parameterized action spaces and the vastness of the corresponding policy landscapes.To surmount these difficulties,we devise a practical structured action graph model augmented by guiding policies that integrate trust region constraints.Based on this,we propose guided proximal policy optimization with structured action graph(GPPO-SAG),which has demonstrated pronounced efficacy in refining policy learning and enhancing performance across sophisticated tasks characterized by parameterized action spaces.Rigorous empirical evaluations of our model have been performed on comprehensive gaming platforms,including the entire suite of StarCraft II and Hearthstone,yielding exceptionally favorable outcomes.Our source code is at https://github.com/sachiel321/GPPO-SAG. 展开更多
关键词 Reinforcement learning trust region policy optimization complex decision-making policy guiding structured action graph
原文传递
Robust trajectory maneuver scheduling near the flyby of small celestial bodies on the basis of proximal policy optimization
7
作者 Hang Hu Weiren Wu +2 位作者 Jinxiu Zhang Jihe Wang Yuqi Song 《Astrodynamics》 2025年第6期855-875,共21页
Owing to the large communication delay in deep space exploration missions,trajectory maneuvers prior to the flyby of small celestial bodies generally need to be scheduled in advance.However,the lack of prior data and ... Owing to the large communication delay in deep space exploration missions,trajectory maneuvers prior to the flyby of small celestial bodies generally need to be scheduled in advance.However,the lack of prior data and the presence of environmental uncertainties in deep space are significant challenges for maneuver scheduling.To solve this problem,in this study,robust maneuver scheduling networks based on proximal policy optimization were proposed.A reward function that considers the terminal state accuracy of the spacecraft after maneuvering and the total velocity impulse cost was designed for the maneuver scheduling networks.An additional constant was added to the variance of the actor network to improve the performance of the generated maneuvering strategy.Compared with the actor-critic algorithm and genetic algorithm,the maneuvering strategy generated by the maneuver scheduling networks demonstrated the best performance in most simulation scenarios and maintained a better balance between the terminal state accuracy and the total velocity impulse cost.The robustness of the maneuver strategy against uncertain perturbations in the environment and uncertain initial state deviations of the spacecraft was validated in several maneuver scenarios in the simulation.In addition,the generated maneuvering strategy exhibited excellent real-time performance.The time cost to make a decision was still better than 0.7 s in the worst case,testing on Raspberry Pi 4B with a memory of 4 GB and a limited CPU frequency of 800 MHz.The robustness against uncertainties and real-time capability of the proposed method revealed its potential onboard application to future deep space exploration missions. 展开更多
关键词 deep space exploration spacecraft maneuver trajectory scheduling proximal policy optimization (PPO)
原文传递
Evaluating end-to-end autonomous driving architectures: a proximal policy optimization approach in simulated environments
8
作者 Angelo Morgado Kaoru Ota +1 位作者 Mianxiong Dong Nuno Pombo 《Autonomous Intelligent Systems》 2025年第1期191-205,共15页
Autonomous driving systems(ADS)are at the forefront of technological innovation,promising enhanced safety,efficiency,and convenience in transportation.This study investigates the potential of end-to-end reinforcement ... Autonomous driving systems(ADS)are at the forefront of technological innovation,promising enhanced safety,efficiency,and convenience in transportation.This study investigates the potential of end-to-end reinforcement learning(RL)architectures for ADS,specifically focusing on a Go-To-Point task involving lane-keeping and navigation through basic urban environments.The study uses the Proximal Policy Optimization(PPO)algorithm within the CARLA simulation environment.Traditional modular systems,which separate driving tasks into perception,decision-making,and control,provide interpretability and reliability in controlled scenarios but struggle with adaptability to dynamic,real-world conditions.In contrast,end-to-end systems offer a more integrated approach,potentially enhancing flexibility and decision-making cohesion.This research introduces CARLA-GymDrive,a novel framework integrating the CARLA simulator with the Gymnasium API,enabling seamless RL experimentation with both discrete and continuous action spaces.Through a two-phase training regimen,the study evaluates the efficacy of PPO in an end-to-end ADS focused on basic tasks like lane-keeping and waypoint navigation.A comparative analysis with modular architectures is also provided.The findings highlight the strengths of PPO in managing continuous control tasks,achieving smoother and more adaptable driving behaviors than value-based algorithms like Deep Q-Networks.However,challenges remain in generalization and computational demands,with end-to-end systems requiring extensive training time.While the study underscores the potential of end-to-end architectures,it also identifies limitations in scalability and real-world applicability,suggesting that modular systems may currently be more feasible for practical ADS deployment.Nonetheless,the CARLA-GymDrive framework and the insights gained from PPO-based ADS contribute significantly to the field,laying a foundation for future advancements in AD. 展开更多
关键词 Autonomous Driving Systems(ADS) End-to-End Architecture Software System Architecture Proximal policy optimization(PPO) Real-Time Embedded Systems Simulation Framework
原文传递
Proximal policy optimization with an integral compensator for quadrotor control 被引量:7
9
作者 Huan HU Qing-ling WANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第5期777-795,共19页
We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled ... We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled by four learned neural networks,which directly map the system states to control commands in an end-to-end style.By introducing an integral compensator into the actor-critic framework,the speed tracking accuracy and robustness have been greatly enhanced.In addition,a two-phase learning scheme which includes both offline-and online-learning is developed for practical use.A model with strong generalization ability is learned in the offline phase.Then,the flight policy of the model is continuously optimized in the online learning phase.Finally,the performances of our proposed algorithm are compared with those of the traditional PID algorithm. 展开更多
关键词 Reinforcement learning Proximal policy optimization Quadrotor control Neural network
原文传递
A STOCHASTIC TRUST-REGION FRAMEWORK FOR POLICY OPTIMIZATION 被引量:1
10
作者 Mingming Zhao Yongfeng Li Zaiwen Wen 《Journal of Computational Mathematics》 SCIE CSCD 2022年第6期1004-1030,共27页
In this paper,we study a few challenging theoretical and numerical issues on the well known trust region policy optimization for deep reinforcement learning.The goal is to find a policy that maximizes the total expect... In this paper,we study a few challenging theoretical and numerical issues on the well known trust region policy optimization for deep reinforcement learning.The goal is to find a policy that maximizes the total expected reward when the agent acts according to the policy.The trust region subproblem is constructed with a surrogate function coherent to the total expected reward and a general distance constraint around the latest policy.We solve the subproblem using a preconditioned stochastic gradient method with a line search scheme to ensure that each step promotes the model function and stays in the trust region.To overcome the bias caused by sampling to the function estimations under the random settings,we add the empirical standard deviation of the total expected reward to the predicted increase in a ratio in order to update the trust region radius and decide whether the trial point is accepted.Moreover,for a Gaussian policy which is commonly used for continuous action space,the maximization with respect to the mean and covariance is performed separately to control the entropy loss.Our theoretical analysis shows that the deterministic version of the proposed algorithm tends to generate a monotonic improvement of the total expected reward and the global convergence is guaranteed under moderate assumptions.Comparisons with the state-of-the–art methods demonstrate the effectiveness and robustness of our method over robotic controls and game playings from OpenAI Gym. 展开更多
关键词 Deep reinforcement learning Stochastic trust region method policy optimization Global convergence Entropy control
原文传递
STUDY ON THE OPTIMIZATION OF TRANSPORT CONTROL POLICY IN COMMUNICATION NETWORK 被引量:1
11
作者 Fan Shuyan Han Weizhan Lu Ran 《Journal of Electronics(China)》 2010年第2期261-266,共6页
In communication networks with policy-based Transport Control on-Demand (TCoD) function,the transport control policies play a great impact on the network effectiveness. To evaluate and optimize the transport policies ... In communication networks with policy-based Transport Control on-Demand (TCoD) function,the transport control policies play a great impact on the network effectiveness. To evaluate and optimize the transport policies in communication network,a policy-based TCoD network model is given and a comprehensive evaluation index system of the network effectiveness is put forward from both network application and handling mechanism perspectives. A TCoD network prototype system based on Asynchronous Transfer Mode/Multi-Protocol Label Switching (ATM/MPLS) is introduced and some experiments are performed on it. The prototype system is evaluated and analyzed with the comprehensive evaluation index system. The results show that the index system can be used to judge whether the communication network can meet the application requirements or not,and can provide references for the optimization of the transport policies so as to improve the communication network effectiveness. 展开更多
关键词 Communication network Comprehensive evaluation index system Network Application Effectiveness (NAE) Transport Control on-Demand (TCoD) policy optimization
在线阅读 下载PDF
Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots
12
作者 Chi Zhang Wei Zou +1 位作者 Ningbo Cheng Shuomo Zhang 《Machine Intelligence Research》 EI CSCD 2024年第6期1162-1177,共16页
Endowing quadruped robots with the skill to forward jump is conducive to making it overcome barriers and pass through complex terrains.In this paper,a model-free control architecture with target-guided policy optimiza... Endowing quadruped robots with the skill to forward jump is conducive to making it overcome barriers and pass through complex terrains.In this paper,a model-free control architecture with target-guided policy optimization and deep reinforcement learn-ing(DRL)for quadruped robot jumping is presented.First,the jumping phase is divided into take-off and flight-landing phases,and op-timal strategies with soft actor-critic(SAC)are constructed for the two phases respectively.Second,policy learning including expecta-tions,penalties in the overall jumping process,and extrinsic excitations is designed.Corresponding policies and constraints are all provided for successful take-off,excellent flight attitude and stable standing after landing.In order to avoid low efficiency of random ex-ploration,a curiosity module is introduced as extrinsic rewards to solve this problem.Additionally,the target-guided module encour-ages the robot explore closer and closer to desired jumping target.Simulation results indicate that the quadruped robot can realize com-pleted forward jumping locomotion with good horizontal and vertical distances,as well as excellent motion attitudes. 展开更多
关键词 Jumping locomotion for quadruped robot policy optimization deep reinforcement learning(DRL) locomotion control robot learning
原文传递
Managing International Student Education in China:Cross-Cultural Adaptation and Policy Optimization
13
作者 Qi Zhang 《Advances in Social Behavior Research》 2024年第7期41-45,共5页
This paper aims to explore the current status and challenges of international student education in China,with a focus on cross-cultural adaptation and institutional policies optimisation.It comes at a time when China ... This paper aims to explore the current status and challenges of international student education in China,with a focus on cross-cultural adaptation and institutional policies optimisation.It comes at a time when China is attracting more international students than ever,as part of the Belt and Road Initiative.However,international students also report significant cross-cultural adaptation challenges,including language issues,insufficient administrative support,and limited opportunities for social integration.This study,using a mixed-method approach that combines quantitative surveys and qualitative interviews,mainly with international students and university administrators from 10 leading Chinese universities,found that language proficiency is the biggest barrier to academic integration(78%of respondents reported it as a major barrier),and institutional support to cross-cultural adaptation often lags behind.For example,only 38%of international students felt that their universities provided sufficient support for cross-cultural adaption.The paper recommends reinforcing language support,providing cross-cultural sensitivity training for staff,and creating structured mentorship programmes to improve international students’academic and social integration in China. 展开更多
关键词 International students cross-cultural adaptation education management language barriers policy optimization
在线阅读 下载PDF
Optimization Scheduling of Hydrogen-Coupled Electro-Heat-Gas Integrated Energy System Based on Generative Adversarial Imitation Learning
14
作者 Baiyue Song Chenxi Zhang +1 位作者 Wei Zhang Leiyu Wan 《Energy Engineering》 2025年第12期4919-4945,共27页
Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most ... Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most promising directions of development.This paper proposes an optimized schedulingmodel for a hydrogen-coupled electro-heat-gas integrated energy system(HCEHG-IES)using generative adversarial imitation learning(GAIL).The model aims to enhance renewable-energy absorption,reduce carbon emissions,and improve grid-regulation flexibility.First,the optimal scheduling problem of HCEHG-IES under uncertainty is modeled as a Markov decision process(MDP).To overcome the limitations of conventional deep reinforcement learning algorithms—including long optimization time,slow convergence,and subjective reward design—this study augments the PPO algorithm by incorporating a discriminator network and expert data.The newly developed algorithm,termed GAIL,enables the agent to perform imitation learning from expert data.Based on this model,dynamic scheduling decisions are made in continuous state and action spaces,generating optimal energy-allocation and management schemes.Simulation results indicate that,compared with traditional reinforcement-learning algorithms,the proposed algorithmoffers better economic performance.Guided by expert data,the agent avoids blind optimization,shortens the offline training time,and improves convergence performance.In the online phase,the algorithm enables flexible energy utilization,thereby promoting renewable-energy absorption and reducing carbon emissions. 展开更多
关键词 Hydrogen energy optimization dispatch generative adversarial imitation learning proximal policy optimization imitation learning renewable energy
在线阅读 下载PDF
Rural Revitalization and the Transformation of Xinhui Chenpi Industry: A Case Study of Policy Implementation and Development Pathways
15
作者 Yuxin Yang 《Proceedings of Business and Economic Studies》 2025年第5期132-141,共10页
This paper examines the transformation and development of the Xinhui Chenpi industry under the rural revitalization strategy in China.The study highlights the significant growth of the industry,with the annual product... This paper examines the transformation and development of the Xinhui Chenpi industry under the rural revitalization strategy in China.The study highlights the significant growth of the industry,with the annual production of chenpi reaching approximately 7,000 tons and the total output value surpassing 26 billion yuan in 2024.The paper proposes strategies to foster sustainable growth in industries facing challenges such as inefficient production processes,inconsistent product quality,and a lack of policy awareness among operators.These strategies include optimizing support policies,enhancing regulatory frameworks,and leveraging digital technologies for brand building and market expansion.The research contributes to understanding the development trajectory of the Xinhui Chenpi industry and provides insights for policymakers and industry practitioners. 展开更多
关键词 Rural revitalization Industrial transformation policy optimization Digital marketing
在线阅读 下载PDF
Research on Utility Evaluation and Optimization of the Third Pillar Pension in Multi-level Pension Security for Employees in New Business Forms
16
作者 Ren Feixiao Wang Wenbo Zhang Kexin 《Journal of Humanities and Nature》 2025年第2期3-19,共17页
Against the backdrop of uneven pressure on the three-pillar pension system and a mismatch between pension funds and the demographic structure,a large number of employees in new forms of employment remain outside the p... Against the backdrop of uneven pressure on the three-pillar pension system and a mismatch between pension funds and the demographic structure,a large number of employees in new forms of employment remain outside the pension security system,facing relatively high pension risks.Due to their high job mobility,weak long-term planning ability,and large income fluctuations,on the basis of maintaining the balance of the three-pillar pension system,individual pension schemes may become a breakthrough point for improving the pension situation of employees in new forms of employment.In line with the national goal of building a multi-level and multi-pillar old-age insurance system,to study the supplementary role of the third-pillar individual pension policy for employees in new forms of employment,this article constructs an evaluation system using the analytic hierarchy process and designs a questionnaire.After conducting a questionnaire survey in six cities in Shandong Province,the collected data are analyzed.It is found that the short-term effect of the current policy is that residents'awareness of pension issues is gradually improving,and the participation rate is increasing,but the behavior is short-term,and residents generally tend to avoid pension risks.Therefore,regarding the deepening of the individual pension system,the article puts forward three suggestions:(1)Conduct comprehensive publicity through multiple channels and with emphasis on key points;(2)Enhance the system's attractiveness according to the characteristics of the target population;(3)Improve the public's awareness of pension planning and financial literacy;(4)Strengthen the connection and transformation among different pillars of the pension system. 展开更多
关键词 New Business Format Personal Pension System Analytic Hierarchy Process policy optimization
在线阅读 下载PDF
Safe Deep Reinforcement Learning for Real-time AC Optimal Power Flow:A Near-optimal Solution
17
作者 Bin Feng Jiayue Zhao +4 位作者 Gang Huang Yijie Hu Huating Xu Changxin Guo Zhe Chen 《CSEE Journal of Power and Energy Systems》 2026年第1期99-111,共13页
The real-time AC optimal power flow(OPF)problem is a key issue in making fast and accurate decisions to ensure the safety and economy of power systems.With the rapid development of renewable energies,the fluctuation h... The real-time AC optimal power flow(OPF)problem is a key issue in making fast and accurate decisions to ensure the safety and economy of power systems.With the rapid development of renewable energies,the fluctuation has grown more vibrant,thus a novel approach called safe deep reinforcement learning is proposed in this paper.Herein,the real-time ACOPF problem is modeled as a constrained Markov decision process,and primal-dual optimization(PDO)based proximal policy optimization(PPO)is used to learn the optimal generator outputs in the primal domain and security constraints in the dual domain,which avoids manually selecting a trade-off between penalties for constraint violations and rewards for the economy.Before training,behavior cloning clones the expert experience into the initial weights of neural networks.Moreover,multiprocessing training is utilized to accelerate the training speed.Case studies are conducted on the IEEE 118-bus system and the modified IEEE 118-bus system.Compared with other methods,the experimental results show that the proposed method can achieve security and near-optimal economic goals by fast calculating the real-time ACOPF problem. 展开更多
关键词 Behavior cloning deep reinforcement learning multiprocessing training optimal power flow primal-dual optimization proximal policy optimization
原文传递
A PPO-Based DRL Approach for Scalable Communication in Civilian UAV Networks
18
作者 Chu Thi Minh Hue Nguyen Minh Quy 《Computers, Materials & Continua》 2026年第5期1869-1882,共14页
Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.Howeve... Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.However,communication between UAVs still faces challenges due to high-dynamic topology,volatile wireless links,and strict energy budgets.In this work,we introduce an improved communication scheme,namely Proximal Policy Optimization(PPO).Our solution casts hop–by–hop relay selection as aMarkov decision process and develops a decentralized Proximal Policy Optimization framework in an actor–critic form.Akey novelty is the design of the reward function,which jointly considers the delivery ratio,end-to-end delay,and energy efficiency,enabling flexible prioritization in dynamic environments.The simulation results across swarms of 20–70 UAVs show that,the proposed framework enhances delivery ratio to 5%over a Deep Q-Network baseline(reaching≈80%at 70 nodes),reduces latency by about 2–3ms inmedium-to-dense settings(from∼43 to 35–36ms),and attains comparable or slightly lower total energy consumption(typically 0.5%–2%lower).The results indicate that the proposed communication scheme,adaptive and scalable learning-based UAV scenarios,pave the way for re-world UAV deployments. 展开更多
关键词 Reinforcement learning proximal policy optimization(PPO) UAV 6G
在线阅读 下载PDF
OPTIMAL HARVESTING POLICY FOR INSHORE-OFFSHORE FISHERY MODEL WITH IMPULSIVE DIFFUSION 被引量:7
19
作者 董玲珍 陈兰荪 孙丽华 《Acta Mathematica Scientia》 SCIE CSCD 2007年第2期405-412,共8页
This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. Th... This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. The complexity of this system is also analyzed. Moreover, the optimal harvesting policy are given for the inshore subpopulation, which includes the maximum sustainable yield and the corresponding harvesting effort. 展开更多
关键词 Impulsive diffusion inshore-offshore fishery model global asymptotic stability periodic solution optimal harvesting policy
在线阅读 下载PDF
Optimal switching policy for performance enhancement of distributed parameter systems based on event-driven control 被引量:1
20
作者 穆文英 崔宝同 +1 位作者 楼旭阳 李纹 《Chinese Physics B》 SCIE EI CAS CSCD 2014年第7期211-217,共7页
This paper aims to improve the performance of a class of distributed parameter systems for the optimal switching of actuators and controllers based on event-driven control. It is assumed that in the available multiple... This paper aims to improve the performance of a class of distributed parameter systems for the optimal switching of actuators and controllers based on event-driven control. It is assumed that in the available multiple actuators, only one actuator can receive the control signal and be activated over an unfixed time interval, and the other actuators keep dormant. After incorporating a state observer into the event generator, the event-driven control loop and the minimum inter-event time are ultimately bounded. Based on the event-driven state feedback control, the time intervals of unfixed length can be obtained. The optimal switching policy is based on finite horizon linear quadratic optimal control at the beginning of each time subinterval. A simulation example demonstrate the effectiveness of the proposed policy. 展开更多
关键词 distributed parameter systems optimal switching policy EVENT-DRIVEN
原文传递
上一页 1 2 5 下一页 到第
使用帮助 返回顶部