The annual compliance cycle of the carbon trading system allows generation companies(GenCos)to decouple the timing of carbon allowance purchases from their actual emissions.However,trading a large volume of allowances...The annual compliance cycle of the carbon trading system allows generation companies(GenCos)to decouple the timing of carbon allowance purchases from their actual emissions.However,trading a large volume of allowances within a single day can significantly impact on carbon prices.Faced with uncertain future carbon and electricity prices,GenCos must address a challenging multistage stochastic optimization problem to coordinate their carbon trading strategies with daily power generation decisions.In this paper,a two-layered hybrid mathematical-deep reinforcement learning(DRL)optimization framework is proposed.The upper DRL layer tackles the stochastic,year-long carbon trading and allowance usage optimization problem,aiming for long-term optimality and providing guidance for short-term decisions in the lower layer.The lower mathematical optimization layer addresses the deterministic daily power generation schedule problem while enforcing strict technical constraints.To accelerate learning of the annual compliance cycle,a decision timeline transfer learning method is proposed,enabling the DRL agent to progressively refine its policy through sequentially training on monthly,weekly and daily decision environments.Case studies demonstrate that,with these methods,a GenCo can reduce emission costs and increase profits by effectively leveraging carbon price fluctuations within the compliance cycle.展开更多
基金supported by the Natural Science Foundation of China-Smart Grid Joint Fund of State Grid Corporation of China(No.U2066212)the Na-tional Natural Science Foundation of China(No.52207105)the Key Science and Technology Pro-jects of China Southern Power Grid Corporation(No.066600KK52222023).
文摘The annual compliance cycle of the carbon trading system allows generation companies(GenCos)to decouple the timing of carbon allowance purchases from their actual emissions.However,trading a large volume of allowances within a single day can significantly impact on carbon prices.Faced with uncertain future carbon and electricity prices,GenCos must address a challenging multistage stochastic optimization problem to coordinate their carbon trading strategies with daily power generation decisions.In this paper,a two-layered hybrid mathematical-deep reinforcement learning(DRL)optimization framework is proposed.The upper DRL layer tackles the stochastic,year-long carbon trading and allowance usage optimization problem,aiming for long-term optimality and providing guidance for short-term decisions in the lower layer.The lower mathematical optimization layer addresses the deterministic daily power generation schedule problem while enforcing strict technical constraints.To accelerate learning of the annual compliance cycle,a decision timeline transfer learning method is proposed,enabling the DRL agent to progressively refine its policy through sequentially training on monthly,weekly and daily decision environments.Case studies demonstrate that,with these methods,a GenCo can reduce emission costs and increase profits by effectively leveraging carbon price fluctuations within the compliance cycle.