Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soar...Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soaring trajectory is crucial for maximizing energy efficiency during flight.Existing nonlinear programming methods are heavily dependent on the choice of initial values which is hard to determine.Therefore,this paper introduces a deep reinforcement learning method based on a differentially flat model for dynamic soaring trajectory planning and optimization.Initially,the gliding trajectory is parameterized using Fourier basis functions,achieving a flexible trajectory representation with a minimal number of hyperparameters.Subsequently,the trajectory optimization problem is formulated as a dynamic interactive process of Markov decision-making.The hyperparameters of the trajectory are optimized using the Proximal Policy Optimization(PPO2)algorithm from deep reinforcement learning(DRL),reducing the strong reliance on initial value settings in the optimization process.Finally,a comparison between the proposed method and the nonlinear programming method reveals that the trajectory generated by the proposed approach is smoother while meeting the same performance requirements.Specifically,the proposed method achieves a 34%reduction in maximum thrust,a 39.4%decrease in maximum thrust difference,and a 33%reduction in maximum airspeed difference.展开更多
To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is per...To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is perfect for unit 1 and only mechanical service for unit 2 in the model. PM activity is randomly performed according to a dynamic PM plan distributed in each implementation period. A replacement is determined based on the competing results of unplanned and planned replacements. The unplanned replacement is trigged by a catastrophic failure of unit 2, and the planned replacement is executed when the PM number reaches the threshold N. Through modeling and analysis, a solution algorithm for an optimal implementation period and the PM number is given, and optimal process and parametric sensitivity are provided by a numerical example. Results show that the implementation period should be decreased as soon as possible under the condition of meeting the needs of practice, which can increase mean operating time and decrease the long-run cost rate.展开更多
At the beginning of 2025,China’s national carbon market carbon price trend exhibited a continuous unilateral downward trajectory,representing a departure from the overall steady upward trend in carbon prices since th...At the beginning of 2025,China’s national carbon market carbon price trend exhibited a continuous unilateral downward trajectory,representing a departure from the overall steady upward trend in carbon prices since the carbon market launched in 2021.The analysis suggests that the primary reason for the recent decline in carbon prices is the reversal of supply and demand dynamics in the carbon market,with increased quota supply amid a sluggish economy.It is expected that downward pressure on carbon prices will persist in the short term,but with more industries being included and continued policy optimization and improvement,a rise in China’s medium-to long-term carbon prices is highly probable.Recommendations for enterprises involved in carbon asset operations and management:first,refining carbon asset reserves and trading strategies;second,accelerating internal CCER project development;third,exploring carbon financial instrument applications;fourth,establishing and improving internal carbon pricing mechanisms;fifth,proactively planning for new industry inclusion.展开更多
This paper examines the transformation and development of the Xinhui Chenpi industry under the rural revitalization strategy in China.The study highlights the significant growth of the industry,with the annual product...This paper examines the transformation and development of the Xinhui Chenpi industry under the rural revitalization strategy in China.The study highlights the significant growth of the industry,with the annual production of chenpi reaching approximately 7,000 tons and the total output value surpassing 26 billion yuan in 2024.The paper proposes strategies to foster sustainable growth in industries facing challenges such as inefficient production processes,inconsistent product quality,and a lack of policy awareness among operators.These strategies include optimizing support policies,enhancing regulatory frameworks,and leveraging digital technologies for brand building and market expansion.The research contributes to understanding the development trajectory of the Xinhui Chenpi industry and provides insights for policymakers and industry practitioners.展开更多
Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most ...Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most promising directions of development.This paper proposes an optimized schedulingmodel for a hydrogen-coupled electro-heat-gas integrated energy system(HCEHG-IES)using generative adversarial imitation learning(GAIL).The model aims to enhance renewable-energy absorption,reduce carbon emissions,and improve grid-regulation flexibility.First,the optimal scheduling problem of HCEHG-IES under uncertainty is modeled as a Markov decision process(MDP).To overcome the limitations of conventional deep reinforcement learning algorithms—including long optimization time,slow convergence,and subjective reward design—this study augments the PPO algorithm by incorporating a discriminator network and expert data.The newly developed algorithm,termed GAIL,enables the agent to perform imitation learning from expert data.Based on this model,dynamic scheduling decisions are made in continuous state and action spaces,generating optimal energy-allocation and management schemes.Simulation results indicate that,compared with traditional reinforcement-learning algorithms,the proposed algorithmoffers better economic performance.Guided by expert data,the agent avoids blind optimization,shortens the offline training time,and improves convergence performance.In the online phase,the algorithm enables flexible energy utilization,thereby promoting renewable-energy absorption and reducing carbon emissions.展开更多
This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the cha...This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness.展开更多
Against the backdrop of uneven pressure on the three-pillar pension system and a mismatch between pension funds and the demographic structure,a large number of employees in new forms of employment remain outside the p...Against the backdrop of uneven pressure on the three-pillar pension system and a mismatch between pension funds and the demographic structure,a large number of employees in new forms of employment remain outside the pension security system,facing relatively high pension risks.Due to their high job mobility,weak long-term planning ability,and large income fluctuations,on the basis of maintaining the balance of the three-pillar pension system,individual pension schemes may become a breakthrough point for improving the pension situation of employees in new forms of employment.In line with the national goal of building a multi-level and multi-pillar old-age insurance system,to study the supplementary role of the third-pillar individual pension policy for employees in new forms of employment,this article constructs an evaluation system using the analytic hierarchy process and designs a questionnaire.After conducting a questionnaire survey in six cities in Shandong Province,the collected data are analyzed.It is found that the short-term effect of the current policy is that residents'awareness of pension issues is gradually improving,and the participation rate is increasing,but the behavior is short-term,and residents generally tend to avoid pension risks.Therefore,regarding the deepening of the individual pension system,the article puts forward three suggestions:(1)Conduct comprehensive publicity through multiple channels and with emphasis on key points;(2)Enhance the system's attractiveness according to the characteristics of the target population;(3)Improve the public's awareness of pension planning and financial literacy;(4)Strengthen the connection and transformation among different pillars of the pension system.展开更多
Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch ...Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch due to problems such as reward sparsity.To solve the problem,we propose a experience evolution proximal policy optimization(EEPPO)algorithm which integrates PPO with priori knowledge highlighting by evolutionary strategy.We use the successful trained samples as priori knowledge to guide the learning direction in order to increase the success probability of the learning algorithm.To verify the effectiveness of the proposed EEPPO algorithm,we have conducted simulation experiments of the quadruped robot gait learning task on Pybullet.Experimental results show that the central pattern generator based radial basis function(CPG-RBF)network and the policy network are simultaneously updated to achieve the quadruped robot’s bionic diagonal trot gait learning task using key information such as the robot’s speed,posture and joints information.Experimental comparison results with the traditional soft actor-critic(SAC)algorithm validate the superiority of the proposed EEPPO algorithm,which can learn a more stable diagonal trot gait in flat terrain.展开更多
In this paper,we investigate the problem of fast spectrum sharing in vehicle-to-everything com-munication.In order to improve the spectrum effi-ciency of the whole system,the spectrum of vehicle-to-infrastructure link...In this paper,we investigate the problem of fast spectrum sharing in vehicle-to-everything com-munication.In order to improve the spectrum effi-ciency of the whole system,the spectrum of vehicle-to-infrastructure links is reused by vehicle-to-vehicle links.To this end,we model it as a problem of deep reinforcement learning and tackle it with prox-imal policy optimization.A considerable number of interactions are often required for training an agent with good performance,so simulation-based training is commonly used in communication networks.Nev-ertheless,severe performance degradation may occur when the agent is directly deployed in the real world,even though it can perform well on the simulator,due to the reality gap between the simulation and the real environments.To address this issue,we make prelim-inary efforts by proposing an algorithm based on meta reinforcement learning.This algorithm enables the agent to rapidly adapt to a new task with the knowl-edge extracted from similar tasks,leading to fewer in-teractions and less training time.Numerical results show that our method achieves near-optimal perfor-mance and exhibits rapid convergence.展开更多
Since last year,China’s inbound tourism market has accelerated its recovery.With the introduction and optimization of various facilitation policies and the development of new products,the inbound tourism market has s...Since last year,China’s inbound tourism market has accelerated its recovery.With the introduction and optimization of various facilitation policies and the development of new products,the inbound tourism market has shown unlimited potential for growth.According to data from the Data Center of the Ministry of Culture and Tourism,the number of inbound tourists reached a new high during the Spring Festival in 2025.The UK became China's third largest source of inbound tourists after the Republic of Korea and Japan.展开更多
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr...This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.展开更多
Unmanned Aerial Vehicle(UAV)stands as a burgeoning electric transportation carrier,holding substantial promise for the logistics sector.A reinforcement learning framework Centralized-S Proximal Policy Optimization(C-S...Unmanned Aerial Vehicle(UAV)stands as a burgeoning electric transportation carrier,holding substantial promise for the logistics sector.A reinforcement learning framework Centralized-S Proximal Policy Optimization(C-SPPO)based on centralized decision process and considering policy entropy(S)is proposed.The proposed framework aims to plan the best scheduling scheme with the objective of minimizing both the timeout of order requests and the flight impact of UAVs that may lead to conflicts.In this framework,the intents of matching act are generated through the observations of UAV agents,and the ultimate conflict-free matching results are output under the guidance of a centralized decision maker.Concurrently,a pre-activation operation is introduced to further enhance the cooperation among UAV agents.Simulation experiments based on real-world data from New York City are conducted.The results indicate that the proposed CSPPO outperforms the baseline algorithms in the Average Delay Time(ADT),the Maximum Delay Time(MDT),the Order Delay Rate(ODR),the Average Flight Distance(AFD),and the Flight Impact Ratio(FIR).Furthermore,the framework demonstrates scalability to scenarios of different sizes without requiring additional training.展开更多
This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. Th...This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. The complexity of this system is also analyzed. Moreover, the optimal harvesting policy are given for the inshore subpopulation, which includes the maximum sustainable yield and the corresponding harvesting effort.展开更多
This paper considers a model of an insurance company which is allowed to invest a risky asset and to purchase proportional reinsurance. The objective is to find the policy which maximizes the expected total discounted...This paper considers a model of an insurance company which is allowed to invest a risky asset and to purchase proportional reinsurance. The objective is to find the policy which maximizes the expected total discounted dividend pay-out until the time of bankruptcy and the terminal value of the company under liquidity constraint. We find the solution of this problem via solving the problem with zero terminal value. We also analyze the influence of terminal value on the optimal policy.展开更多
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob...Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods.展开更多
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to...The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.展开更多
ARINC653 systems,which have been widely used in avionics industry,are an important class of safety-critical applications.Partitions are the core concept in the Arinc653 system architecture.Due to the existence of part...ARINC653 systems,which have been widely used in avionics industry,are an important class of safety-critical applications.Partitions are the core concept in the Arinc653 system architecture.Due to the existence of partitions,the system designer must allocate adequate time slots statically to each partition in the design phase.Although some time slot allocation policies could be borrowed from task scheduling policies,no existing literatures give an optimal allocation policy.In this paper,we present a partition configuration policy and prove that this policy is optimal in the sense that if this policy fails to configure adequate time slots to each partition,nor do other policies.Then,by simulation,we show the effects of different partition configuration policies on time slot allocation of partitions and task response time,respectively.展开更多
To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape...To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments.展开更多
Many human-machine collaborative support scheduling systems are used to aid human decision making by providing several optimal scheduling algorithms that do not take operator's attention into consideration.However...Many human-machine collaborative support scheduling systems are used to aid human decision making by providing several optimal scheduling algorithms that do not take operator's attention into consideration.However, the current systems should take advantage of the operator's attention to obtain the optimal solution.In this paper, we innovatively propose a human-machine collaborative support scheduling system of intelligence information from multi-UAVs based on eye-tracker. Firstly, the target recognition algorithm is applied to the images from the multiple unmanned aerial vehicles(multi-UAVs) to recognize the targets in the images. Then,the support system utilizes the eye tracker to gain the eye-gaze points which are intended to obtain the focused targets in the images. Finally, the heuristic scheduling algorithms take both the attributes of targets and the operator's attention into consideration to obtain the sequence of the images. As the processing time of the images collected by the multi-UAVs is uncertain, however the upper bounds and lower bounds of the processing time are known before. So the processing time of the images is modeled by the interval processing time. The objective of the scheduling problem is to minimize mean weighted completion time. This paper proposes some new polynomial time heuristic scheduling algorithms which firstly schedule the images including the focused targets. We conduct the scheduling experiments under six different distributions. The results indicate that the proposed algorithm is not sensitive to the different distributions of the processing time and has a negligible computational time. The absolute error of the best performing heuristic solution is only about 1%. Then, we incorporate the best performing heuristic algorithm into the human-machine collaborative support systems to verify the performance of the system.展开更多
基金support received by the National Natural Science Foundation of China(Grant Nos.52372398&62003272).
文摘Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soaring trajectory is crucial for maximizing energy efficiency during flight.Existing nonlinear programming methods are heavily dependent on the choice of initial values which is hard to determine.Therefore,this paper introduces a deep reinforcement learning method based on a differentially flat model for dynamic soaring trajectory planning and optimization.Initially,the gliding trajectory is parameterized using Fourier basis functions,achieving a flexible trajectory representation with a minimal number of hyperparameters.Subsequently,the trajectory optimization problem is formulated as a dynamic interactive process of Markov decision-making.The hyperparameters of the trajectory are optimized using the Proximal Policy Optimization(PPO2)algorithm from deep reinforcement learning(DRL),reducing the strong reliance on initial value settings in the optimization process.Finally,a comparison between the proposed method and the nonlinear programming method reveals that the trajectory generated by the proposed approach is smoother while meeting the same performance requirements.Specifically,the proposed method achieves a 34%reduction in maximum thrust,a 39.4%decrease in maximum thrust difference,and a 33%reduction in maximum airspeed difference.
基金The National Natural Science Foundation of China(No.51275090,71201025)the Program for Special Talent in Six Fields of Jiangsu Province(No.2008144)+1 种基金the Scientific Research Foundation of Graduate School of Southeast University(No.YBJJ1302)the Scientific Innovation Research of College Graduates in Jiangsu Province(No.CXLX12_0078)
文摘To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is perfect for unit 1 and only mechanical service for unit 2 in the model. PM activity is randomly performed according to a dynamic PM plan distributed in each implementation period. A replacement is determined based on the competing results of unplanned and planned replacements. The unplanned replacement is trigged by a catastrophic failure of unit 2, and the planned replacement is executed when the PM number reaches the threshold N. Through modeling and analysis, a solution algorithm for an optimal implementation period and the PM number is given, and optimal process and parametric sensitivity are provided by a numerical example. Results show that the implementation period should be decreased as soon as possible under the condition of meeting the needs of practice, which can increase mean operating time and decrease the long-run cost rate.
文摘At the beginning of 2025,China’s national carbon market carbon price trend exhibited a continuous unilateral downward trajectory,representing a departure from the overall steady upward trend in carbon prices since the carbon market launched in 2021.The analysis suggests that the primary reason for the recent decline in carbon prices is the reversal of supply and demand dynamics in the carbon market,with increased quota supply amid a sluggish economy.It is expected that downward pressure on carbon prices will persist in the short term,but with more industries being included and continued policy optimization and improvement,a rise in China’s medium-to long-term carbon prices is highly probable.Recommendations for enterprises involved in carbon asset operations and management:first,refining carbon asset reserves and trading strategies;second,accelerating internal CCER project development;third,exploring carbon financial instrument applications;fourth,establishing and improving internal carbon pricing mechanisms;fifth,proactively planning for new industry inclusion.
基金Research on the Digital Transformation of the Xinhui Dried Tangerine Peel Industry under the Rural Revitalization Strategy(2023HSQX100)。
文摘This paper examines the transformation and development of the Xinhui Chenpi industry under the rural revitalization strategy in China.The study highlights the significant growth of the industry,with the annual production of chenpi reaching approximately 7,000 tons and the total output value surpassing 26 billion yuan in 2024.The paper proposes strategies to foster sustainable growth in industries facing challenges such as inefficient production processes,inconsistent product quality,and a lack of policy awareness among operators.These strategies include optimizing support policies,enhancing regulatory frameworks,and leveraging digital technologies for brand building and market expansion.The research contributes to understanding the development trajectory of the Xinhui Chenpi industry and provides insights for policymakers and industry practitioners.
基金supported by State Grid Corporation Technology Project(No.522437250003).
文摘Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most promising directions of development.This paper proposes an optimized schedulingmodel for a hydrogen-coupled electro-heat-gas integrated energy system(HCEHG-IES)using generative adversarial imitation learning(GAIL).The model aims to enhance renewable-energy absorption,reduce carbon emissions,and improve grid-regulation flexibility.First,the optimal scheduling problem of HCEHG-IES under uncertainty is modeled as a Markov decision process(MDP).To overcome the limitations of conventional deep reinforcement learning algorithms—including long optimization time,slow convergence,and subjective reward design—this study augments the PPO algorithm by incorporating a discriminator network and expert data.The newly developed algorithm,termed GAIL,enables the agent to perform imitation learning from expert data.Based on this model,dynamic scheduling decisions are made in continuous state and action spaces,generating optimal energy-allocation and management schemes.Simulation results indicate that,compared with traditional reinforcement-learning algorithms,the proposed algorithmoffers better economic performance.Guided by expert data,the agent avoids blind optimization,shortens the offline training time,and improves convergence performance.In the online phase,the algorithm enables flexible energy utilization,thereby promoting renewable-energy absorption and reducing carbon emissions.
基金supported by the Foundation of Key Laboratory of System Control and Information Processing,Ministry of Education,China,Scip20240111Aeronautical Science Foundation of China,Grant 2024Z071108001the Foundation of Key Laboratory of Traffic Information and Safety of Anhui Higher Education Institutes,Anhui Sanlian University,KLAHEI18018.
文摘This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness.
基金funded by the National College Students'Innovation and Entrepreneurship Training Program(No.202410456025)supported by the China Center of the Serbian Academy of Sciences and Arts and the Hong Kong Institute of Humanities and Natural Sciences and Technology.
文摘Against the backdrop of uneven pressure on the three-pillar pension system and a mismatch between pension funds and the demographic structure,a large number of employees in new forms of employment remain outside the pension security system,facing relatively high pension risks.Due to their high job mobility,weak long-term planning ability,and large income fluctuations,on the basis of maintaining the balance of the three-pillar pension system,individual pension schemes may become a breakthrough point for improving the pension situation of employees in new forms of employment.In line with the national goal of building a multi-level and multi-pillar old-age insurance system,to study the supplementary role of the third-pillar individual pension policy for employees in new forms of employment,this article constructs an evaluation system using the analytic hierarchy process and designs a questionnaire.After conducting a questionnaire survey in six cities in Shandong Province,the collected data are analyzed.It is found that the short-term effect of the current policy is that residents'awareness of pension issues is gradually improving,and the participation rate is increasing,but the behavior is short-term,and residents generally tend to avoid pension risks.Therefore,regarding the deepening of the individual pension system,the article puts forward three suggestions:(1)Conduct comprehensive publicity through multiple channels and with emphasis on key points;(2)Enhance the system's attractiveness according to the characteristics of the target population;(3)Improve the public's awareness of pension planning and financial literacy;(4)Strengthen the connection and transformation among different pillars of the pension system.
基金the National Natural Science Foundation of China(No.62103009)。
文摘Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch due to problems such as reward sparsity.To solve the problem,we propose a experience evolution proximal policy optimization(EEPPO)algorithm which integrates PPO with priori knowledge highlighting by evolutionary strategy.We use the successful trained samples as priori knowledge to guide the learning direction in order to increase the success probability of the learning algorithm.To verify the effectiveness of the proposed EEPPO algorithm,we have conducted simulation experiments of the quadruped robot gait learning task on Pybullet.Experimental results show that the central pattern generator based radial basis function(CPG-RBF)network and the policy network are simultaneously updated to achieve the quadruped robot’s bionic diagonal trot gait learning task using key information such as the robot’s speed,posture and joints information.Experimental comparison results with the traditional soft actor-critic(SAC)algorithm validate the superiority of the proposed EEPPO algorithm,which can learn a more stable diagonal trot gait in flat terrain.
基金L.Liang was supported in part by the Natural Science Foundation of Jiangsu Province under Grant BK20220810in part by the National Natural Science Foundation of China under Grant 62201145 and Grant 62231019S.Jin was supported in part by the National Natural Science Foundation of China(NSFC)under Grants 62261160576,62341107,61921004。
文摘In this paper,we investigate the problem of fast spectrum sharing in vehicle-to-everything com-munication.In order to improve the spectrum effi-ciency of the whole system,the spectrum of vehicle-to-infrastructure links is reused by vehicle-to-vehicle links.To this end,we model it as a problem of deep reinforcement learning and tackle it with prox-imal policy optimization.A considerable number of interactions are often required for training an agent with good performance,so simulation-based training is commonly used in communication networks.Nev-ertheless,severe performance degradation may occur when the agent is directly deployed in the real world,even though it can perform well on the simulator,due to the reality gap between the simulation and the real environments.To address this issue,we make prelim-inary efforts by proposing an algorithm based on meta reinforcement learning.This algorithm enables the agent to rapidly adapt to a new task with the knowl-edge extracted from similar tasks,leading to fewer in-teractions and less training time.Numerical results show that our method achieves near-optimal perfor-mance and exhibits rapid convergence.
文摘Since last year,China’s inbound tourism market has accelerated its recovery.With the introduction and optimization of various facilitation policies and the development of new products,the inbound tourism market has shown unlimited potential for growth.According to data from the Data Center of the Ministry of Culture and Tourism,the number of inbound tourists reached a new high during the Spring Festival in 2025.The UK became China's third largest source of inbound tourists after the Republic of Korea and Japan.
基金partially supported by the National Key Research and Development Program of the Ministry of Science and Technology of China(2022YFE0114200)the National Natural Science Foundation of China(U20A6004).
文摘This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.
基金the support of the Chinese Special Research Project for Civil Aircraft(No.MJZ17N22)the National Natural Science Foundation of China(Nos.U2133207,U2333214)+1 种基金the China Postdoctoral Science Foundation(No.2023M741687)the National Social Science Fund of China(No.22&ZD169)。
文摘Unmanned Aerial Vehicle(UAV)stands as a burgeoning electric transportation carrier,holding substantial promise for the logistics sector.A reinforcement learning framework Centralized-S Proximal Policy Optimization(C-SPPO)based on centralized decision process and considering policy entropy(S)is proposed.The proposed framework aims to plan the best scheduling scheme with the objective of minimizing both the timeout of order requests and the flight impact of UAVs that may lead to conflicts.In this framework,the intents of matching act are generated through the observations of UAV agents,and the ultimate conflict-free matching results are output under the guidance of a centralized decision maker.Concurrently,a pre-activation operation is introduced to further enhance the cooperation among UAV agents.Simulation experiments based on real-world data from New York City are conducted.The results indicate that the proposed CSPPO outperforms the baseline algorithms in the Average Delay Time(ADT),the Maximum Delay Time(MDT),the Order Delay Rate(ODR),the Average Flight Distance(AFD),and the Flight Impact Ratio(FIR).Furthermore,the framework demonstrates scalability to scenarios of different sizes without requiring additional training.
文摘This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. The complexity of this system is also analyzed. Moreover, the optimal harvesting policy are given for the inshore subpopulation, which includes the maximum sustainable yield and the corresponding harvesting effort.
基金Supported by Doctor Foundation of Xinjiang Universitythe National Natural Science Foundation of China
文摘This paper considers a model of an insurance company which is allowed to invest a risky asset and to purchase proportional reinsurance. The objective is to find the policy which maximizes the expected total discounted dividend pay-out until the time of bankruptcy and the terminal value of the company under liquidity constraint. We find the solution of this problem via solving the problem with zero terminal value. We also analyze the influence of terminal value on the optimal policy.
基金Supported by the National Natural Science Foundation of China(71571019).
文摘Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods.
基金the Project of National Natural Science Foundation of China(Grant No.62106283)the Project of National Natural Science Foundation of China(Grant No.72001214)to provide fund for conducting experimentsthe Project of Natural Science Foundation of Shaanxi Province(Grant No.2020JQ-484)。
文摘The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.
基金supported by the National Natural Science Foundation of China under Grant No.90718019the National High-Tech Research and Development Plan of China under Grant No.2007AA010304
文摘ARINC653 systems,which have been widely used in avionics industry,are an important class of safety-critical applications.Partitions are the core concept in the Arinc653 system architecture.Due to the existence of partitions,the system designer must allocate adequate time slots statically to each partition in the design phase.Although some time slot allocation policies could be borrowed from task scheduling policies,no existing literatures give an optimal allocation policy.In this paper,we present a partition configuration policy and prove that this policy is optimal in the sense that if this policy fails to configure adequate time slots to each partition,nor do other policies.Then,by simulation,we show the effects of different partition configuration policies on time slot allocation of partitions and task response time,respectively.
基金supported in part by the National Natural Science Foundation of China under grants 61901078,61771082,61871062,and U20A20157in part by the Science and Technology Research Program of Chongqing Municipal Education Commission under grant KJQN201900609+2 种基金in part by the Natural Science Foundation of Chongqing under grant cstc2020jcyj-zdxmX0024in part by University Innovation Research Group of Chongqing under grant CXQT20017in part by the China University Industry-University-Research Collaborative Innovation Fund(Future Network Innovation Research and Application Project)under grant 2021FNA04008.
文摘To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments.
基金the National Natural Science Foundation of China(No.61403410)
文摘Many human-machine collaborative support scheduling systems are used to aid human decision making by providing several optimal scheduling algorithms that do not take operator's attention into consideration.However, the current systems should take advantage of the operator's attention to obtain the optimal solution.In this paper, we innovatively propose a human-machine collaborative support scheduling system of intelligence information from multi-UAVs based on eye-tracker. Firstly, the target recognition algorithm is applied to the images from the multiple unmanned aerial vehicles(multi-UAVs) to recognize the targets in the images. Then,the support system utilizes the eye tracker to gain the eye-gaze points which are intended to obtain the focused targets in the images. Finally, the heuristic scheduling algorithms take both the attributes of targets and the operator's attention into consideration to obtain the sequence of the images. As the processing time of the images collected by the multi-UAVs is uncertain, however the upper bounds and lower bounds of the processing time are known before. So the processing time of the images is modeled by the interval processing time. The objective of the scheduling problem is to minimize mean weighted completion time. This paper proposes some new polynomial time heuristic scheduling algorithms which firstly schedule the images including the focused targets. We conduct the scheduling experiments under six different distributions. The results indicate that the proposed algorithm is not sensitive to the different distributions of the processing time and has a negligible computational time. The absolute error of the best performing heuristic solution is only about 1%. Then, we incorporate the best performing heuristic algorithm into the human-machine collaborative support systems to verify the performance of the system.