The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)
本文提出一种基于深度强化学习的微电网在线优化调度策略. 针对可再生能源的随机性及复杂的潮流约束对微电网经济安全运行带来的挑战，本文以成本最小为目标，考虑微电网运行状态及调度动作的约束，将微电网在线调度问题建模为一个约束马尔可夫决策过程. 为避免求解复杂的非线性潮流优化、降低对高精度预测信息及系统模型的依赖，本文设计了一个卷积神经网络结构来学习最优的调度策略. 该神经网络结构可以从微电网原始观测数据中提取高质量的特征,并基于提取到的特征直接产生调度决策. 为了确保该神经网络产生的调度决策能够满足复杂的网络潮流约束,本文结合拉格朗日乘子法与soft actor-critic,提出了一种新的深度强化学习算法来训练该神经网络.为了验证提出方法的有效性,本文利用真实的电力系统数据进行仿真.仿真结果表明,提出的在线优化调度方法可以有效地从数据中学习到满足潮流约束且具有成本效益的调度策略,降低随机性对微电网运行的影响.
This paper proposes an online scheduling strategy based on deep reinforcement learning (DRL). To overcome the challenges in economic and safe operation of microgrids posed by uncertain renewable energy resources and complex power flow constraints, in this paper, we formulate the microgrid online scheduling problem as a constrained Markov decision process (CMDP) with the objective of operating cost minimization while considering the constraints on the operating states and scheduling actions. To avoid solving complicated nonlinear optimal power flow and reduce the dependency on accurate forecasting information and system model, we design a convolutional neural network (CNN) architecture to learn the optimal scheduling policy. The neural network can extract high-quality features from the original observation data of the microgrid and directly make scheduling decisions based on the extracted features. To ensure the satisfaction of complex power flow constraints, we propose a novel DRL algorithm by combining the Lagrange multiplier method and the soft actor-critic algorithm to train the neural network. To verify the effectiveness of the proposed approach, we use real-world power system data to perform simulation studies. Simulation results demonstrate that the proposed online scheduling optimization approach can effectively learn a cost-effective scheduling strategy that satisfies power flow constraints, mitigating the effect of randomness on microgrids.