近年来,综合能源系统作为一种以多种能源形态和设备相互交互的能源系统方案得到了广泛应用和研究.然而,在面对动态复杂的多能源系统时,传统的优化调度方法往往无法满足其实时性和精准度需求.因此,本文设计了一种软深度确定性策略梯度(So...近年来,综合能源系统作为一种以多种能源形态和设备相互交互的能源系统方案得到了广泛应用和研究.然而,在面对动态复杂的多能源系统时,传统的优化调度方法往往无法满足其实时性和精准度需求.因此,本文设计了一种软深度确定性策略梯度(Soft Deep Deterministic Policy Gradient,Soft-DDPG)算法驱动的综合能源系统优化调度方法,以最小化调度周期内系统总运行成本为目标,建立设备运行综合能效评估模型,再采用Soft-DDPG算法对每个能源设备的能效调度动作进行优化控制.Soft-DDPG算法将softmax算子引入到动作值函数的计算中,有效降低了Q值高估问题.与此同时,该算法在动作选择策略中加入了随机噪声,提高了算法的学习效率.实验结果显示,本文所提出的方法解决了综合能源系统能效调度实时性差、精准度低的瓶颈问题,实现了系统的高效灵活调度,降低了系统的总运行成本.展开更多
针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LST...针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LSTM-DDPG)的再入制导方法。该方法采用纵、侧向制导解耦设计思想,在纵向制导方面,首先针对再入制导问题构建强化学习所需的状态、动作空间;其次,确定决策点和制导周期内的指令计算策略,并设计考虑综合性能的奖励函数;然后,引入LSTM网络构建强化学习训练网络,进而通过在线更新策略提升算法的多任务适用性;侧向制导则采用基于横程误差的动态倾侧反转方法,获得倾侧角符号。以美国超音速通用飞行器(common aero vehicle-hypersonic,CAV-H)再入滑翔为例进行仿真,结果表明:与传统数值预测-校正方法相比,所提制导方法具有相当的终端精度和更高的计算效率优势;与现有基于DDPG算法的再入制导方法相比,所提制导方法具有相当的计算效率以及更高的终端精度和鲁棒性。展开更多
针对传统入侵检测系统在动态环境下时序特征捕捉不足、小样本攻击检测效果差的问题,本文提出基于LSTM-DDPG的入侵检测方法。通过将长短期记忆网络(LSTM)融入深度确定性策略梯度(DDPG)框架,构建具备时序建模与动态策略优化能力的检测模...针对传统入侵检测系统在动态环境下时序特征捕捉不足、小样本攻击检测效果差的问题,本文提出基于LSTM-DDPG的入侵检测方法。通过将长短期记忆网络(LSTM)融入深度确定性策略梯度(DDPG)框架,构建具备时序建模与动态策略优化能力的检测模型。结合TON-IoT数据集进行实验验证。实验表明,融合模型较单一DDPG和LSTM在准确率(+13.07%/+21.58%)、精确率(+34.75%/+9.55%)、召回率(+29.43%/+99.13%)及F1值(+31.89%/+49.93%)上均显著提升,其中小样本攻击MITM的召回率提升3.29%。该方法验证了时序特征与强化学习融合的有效性,为动态网络安全防护提供新思路,未来将重点优化模型在小样本与大样本检测中的平衡性。Aiming at the problems that the traditional intrusion detection system lacks time series feature capture and the detection effect of small sample attack is poor in dynamic environment, this paper proposes an intrusion detection method based on LSTM-DDPG. By integrating Long Short-Term Memory (LSTM) network into the Deep Deterministic Policy Gradient (DDPG) framework, a detection model with the ability of time series modeling and dynamic policy optimization was constructed. The TON-IoT dataset was used for experimental verification. The experimental results show that the fusion model significantly improves the accuracy (+13.07%/+21.58%), precision (+34.75%/+9.55%), recall (+29.43%/+99.13%) and F1 value (+31.89%/+49.93%) compared with single DDPG and LSTM. The recall rate of small sample attack MITM is increased by 3.29%. This method verifies the effectiveness of the fusion of time series features and reinforcement learning, and provides new ideas for dynamic network security protection. In the future, the balance between small sample and large sample detection of the model will be optimized.展开更多
文摘近年来,综合能源系统作为一种以多种能源形态和设备相互交互的能源系统方案得到了广泛应用和研究.然而,在面对动态复杂的多能源系统时,传统的优化调度方法往往无法满足其实时性和精准度需求.因此,本文设计了一种软深度确定性策略梯度(Soft Deep Deterministic Policy Gradient,Soft-DDPG)算法驱动的综合能源系统优化调度方法,以最小化调度周期内系统总运行成本为目标,建立设备运行综合能效评估模型,再采用Soft-DDPG算法对每个能源设备的能效调度动作进行优化控制.Soft-DDPG算法将softmax算子引入到动作值函数的计算中,有效降低了Q值高估问题.与此同时,该算法在动作选择策略中加入了随机噪声,提高了算法的学习效率.实验结果显示,本文所提出的方法解决了综合能源系统能效调度实时性差、精准度低的瓶颈问题,实现了系统的高效灵活调度,降低了系统的总运行成本.
文摘针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LSTM-DDPG)的再入制导方法。该方法采用纵、侧向制导解耦设计思想,在纵向制导方面,首先针对再入制导问题构建强化学习所需的状态、动作空间;其次,确定决策点和制导周期内的指令计算策略,并设计考虑综合性能的奖励函数;然后,引入LSTM网络构建强化学习训练网络,进而通过在线更新策略提升算法的多任务适用性;侧向制导则采用基于横程误差的动态倾侧反转方法,获得倾侧角符号。以美国超音速通用飞行器(common aero vehicle-hypersonic,CAV-H)再入滑翔为例进行仿真,结果表明:与传统数值预测-校正方法相比,所提制导方法具有相当的终端精度和更高的计算效率优势;与现有基于DDPG算法的再入制导方法相比,所提制导方法具有相当的计算效率以及更高的终端精度和鲁棒性。
文摘针对传统入侵检测系统在动态环境下时序特征捕捉不足、小样本攻击检测效果差的问题,本文提出基于LSTM-DDPG的入侵检测方法。通过将长短期记忆网络(LSTM)融入深度确定性策略梯度(DDPG)框架,构建具备时序建模与动态策略优化能力的检测模型。结合TON-IoT数据集进行实验验证。实验表明,融合模型较单一DDPG和LSTM在准确率(+13.07%/+21.58%)、精确率(+34.75%/+9.55%)、召回率(+29.43%/+99.13%)及F1值(+31.89%/+49.93%)上均显著提升,其中小样本攻击MITM的召回率提升3.29%。该方法验证了时序特征与强化学习融合的有效性,为动态网络安全防护提供新思路,未来将重点优化模型在小样本与大样本检测中的平衡性。Aiming at the problems that the traditional intrusion detection system lacks time series feature capture and the detection effect of small sample attack is poor in dynamic environment, this paper proposes an intrusion detection method based on LSTM-DDPG. By integrating Long Short-Term Memory (LSTM) network into the Deep Deterministic Policy Gradient (DDPG) framework, a detection model with the ability of time series modeling and dynamic policy optimization was constructed. The TON-IoT dataset was used for experimental verification. The experimental results show that the fusion model significantly improves the accuracy (+13.07%/+21.58%), precision (+34.75%/+9.55%), recall (+29.43%/+99.13%) and F1 value (+31.89%/+49.93%) compared with single DDPG and LSTM. The recall rate of small sample attack MITM is increased by 3.29%. This method verifies the effectiveness of the fusion of time series features and reinforcement learning, and provides new ideas for dynamic network security protection. In the future, the balance between small sample and large sample detection of the model will be optimized.