This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the cha...This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness.展开更多
为平抑微源半桥变流器串联星型结构微电网HCSY-MG(half-bridge converter series Y-connection micro-grids)并网系统中微源出力的波动,保证各相直流侧电压之和相等,与并网电流三相平衡,提出1种基于改进近端策略优化PPO(proximal policy...为平抑微源半桥变流器串联星型结构微电网HCSY-MG(half-bridge converter series Y-connection micro-grids)并网系统中微源出力的波动,保证各相直流侧电压之和相等,与并网电流三相平衡,提出1种基于改进近端策略优化PPO(proximal policy optimization)的分布式混合储能系统HESS(hybrid energy storage system)充、放电优化控制策略。在考虑HCSY-MG系统并网电流与分布式HESS特性的条件下,确定影响并网电流的主要系统变量,以及HESS接入系统的最佳拓扑结构。然后结合串联系统的特点,将分布式HESS的充、放电问题转换为深度强化学习的Markov决策过程。同时针对PPO算法中熵损失权重难以确定的问题,提出1种改进的PPO算法,兼顾智能体的收敛性和探索性。最后以某新能源发电基地的典型运行数据为算例,验证所提控制策略的可行性和有效性。展开更多
阶梯式碳交易机制以及优化调度模型求解算法是进行园区综合能源系统(community integrated energy system,CIES)优化调度的重要因素,现有文献对这两个因素的考虑不够全面。为此,文中在考虑阶梯式碳交易机制的基础上,提出采用近端策略优...阶梯式碳交易机制以及优化调度模型求解算法是进行园区综合能源系统(community integrated energy system,CIES)优化调度的重要因素,现有文献对这两个因素的考虑不够全面。为此,文中在考虑阶梯式碳交易机制的基础上,提出采用近端策略优化(proximal policy optimization,PPO)算法求解CIES低碳优化调度问题。该方法基于低碳优化调度模型搭建强化学习交互环境,利用设备状态参数及运行参数定义智能体的状态、动作空间及奖励函数,再通过离线训练获取可生成最优策略的智能体。算例分析结果表明,采用PPO算法得到的CIES低碳优化调度方法能够充分发挥阶梯式碳交易机制减少碳排放量和提高能源利用率方面的优势。展开更多
SATech-01 is an experimental satellite for space science exploration and on-orbit demonstration of advanced technologies.The satellite is equipped with 16 experimental payloads and supports multiple working modes to m...SATech-01 is an experimental satellite for space science exploration and on-orbit demonstration of advanced technologies.The satellite is equipped with 16 experimental payloads and supports multiple working modes to meet the observation requirements of various payloads.Due to the limitation of platform power supply and data storage systems,proposing reasonable mission planning schemes to improve scientific revenue of the payloads becomes a critical issue.In this article,we formulate the integrated task scheduling of SATech-01 as a multi-objective optimization problem and propose a novel Fair Integrated Scheduling with Proximal Policy Optimization(FIS-PPO)algorithm to solve it.We use multiple decision heads to generate decisions for each task and design the action mask to ensure the schedule meeting the platform constraints.Experimental results show that FIS-PPO could push the capability of the platform to the limit and improve the overall observation efficiency by 31.5%compared to rule-based plans currently used.Moreover,fairness is considered in the reward design and our method achieves much better performance in terms of equal task opportunities.Because of its low computational complexity,our task scheduling algorithm has the potential to be directly deployed on board for real-time task scheduling in future space projects.展开更多
基金supported by the Foundation of Key Laboratory of System Control and Information Processing,Ministry of Education,China,Scip20240111Aeronautical Science Foundation of China,Grant 2024Z071108001the Foundation of Key Laboratory of Traffic Information and Safety of Anhui Higher Education Institutes,Anhui Sanlian University,KLAHEI18018.
文摘This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness.
文摘为平抑微源半桥变流器串联星型结构微电网HCSY-MG(half-bridge converter series Y-connection micro-grids)并网系统中微源出力的波动,保证各相直流侧电压之和相等,与并网电流三相平衡,提出1种基于改进近端策略优化PPO(proximal policy optimization)的分布式混合储能系统HESS(hybrid energy storage system)充、放电优化控制策略。在考虑HCSY-MG系统并网电流与分布式HESS特性的条件下,确定影响并网电流的主要系统变量,以及HESS接入系统的最佳拓扑结构。然后结合串联系统的特点,将分布式HESS的充、放电问题转换为深度强化学习的Markov决策过程。同时针对PPO算法中熵损失权重难以确定的问题,提出1种改进的PPO算法,兼顾智能体的收敛性和探索性。最后以某新能源发电基地的典型运行数据为算例,验证所提控制策略的可行性和有效性。
基金supported by the Strategic Priority Program on Space Science,Chinese Academy of Sciences。
文摘SATech-01 is an experimental satellite for space science exploration and on-orbit demonstration of advanced technologies.The satellite is equipped with 16 experimental payloads and supports multiple working modes to meet the observation requirements of various payloads.Due to the limitation of platform power supply and data storage systems,proposing reasonable mission planning schemes to improve scientific revenue of the payloads becomes a critical issue.In this article,we formulate the integrated task scheduling of SATech-01 as a multi-objective optimization problem and propose a novel Fair Integrated Scheduling with Proximal Policy Optimization(FIS-PPO)algorithm to solve it.We use multiple decision heads to generate decisions for each task and design the action mask to ensure the schedule meeting the platform constraints.Experimental results show that FIS-PPO could push the capability of the platform to the limit and improve the overall observation efficiency by 31.5%compared to rule-based plans currently used.Moreover,fairness is considered in the reward design and our method achieves much better performance in terms of equal task opportunities.Because of its low computational complexity,our task scheduling algorithm has the potential to be directly deployed on board for real-time task scheduling in future space projects.