The real-time AC optimal power flow(OPF)problem is a key issue in making fast and accurate decisions to ensure the safety and economy of power systems.With the rapid development of renewable energies,the fluctuation h...The real-time AC optimal power flow(OPF)problem is a key issue in making fast and accurate decisions to ensure the safety and economy of power systems.With the rapid development of renewable energies,the fluctuation has grown more vibrant,thus a novel approach called safe deep reinforcement learning is proposed in this paper.Herein,the real-time ACOPF problem is modeled as a constrained Markov decision process,and primal-dual optimization(PDO)based proximal policy optimization(PPO)is used to learn the optimal generator outputs in the primal domain and security constraints in the dual domain,which avoids manually selecting a trade-off between penalties for constraint violations and rewards for the economy.Before training,behavior cloning clones the expert experience into the initial weights of neural networks.Moreover,multiprocessing training is utilized to accelerate the training speed.Case studies are conducted on the IEEE 118-bus system and the modified IEEE 118-bus system.Compared with other methods,the experimental results show that the proposed method can achieve security and near-optimal economic goals by fast calculating the real-time ACOPF problem.展开更多
基金supported by the National Natural Science Foundation of China(52007173 and U22B2098).
文摘The real-time AC optimal power flow(OPF)problem is a key issue in making fast and accurate decisions to ensure the safety and economy of power systems.With the rapid development of renewable energies,the fluctuation has grown more vibrant,thus a novel approach called safe deep reinforcement learning is proposed in this paper.Herein,the real-time ACOPF problem is modeled as a constrained Markov decision process,and primal-dual optimization(PDO)based proximal policy optimization(PPO)is used to learn the optimal generator outputs in the primal domain and security constraints in the dual domain,which avoids manually selecting a trade-off between penalties for constraint violations and rewards for the economy.Before training,behavior cloning clones the expert experience into the initial weights of neural networks.Moreover,multiprocessing training is utilized to accelerate the training speed.Case studies are conducted on the IEEE 118-bus system and the modified IEEE 118-bus system.Compared with other methods,the experimental results show that the proposed method can achieve security and near-optimal economic goals by fast calculating the real-time ACOPF problem.