期刊文献+
共找到39篇文章
< 1 2 >
每页显示 20 50 100
Multi-Agent Deep Q-Networks for Efficient Edge Federated Learning Communications in Software-Defined IoT
1
作者 Prohim Tam Sa Math +1 位作者 Ahyoung Lee Seokhoon Kim 《Computers, Materials & Continua》 SCIE EI 2022年第5期3319-3335,共17页
Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging ... Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput. 展开更多
关键词 Deep q-networks federated learning network functions virtualization quality of service software-defined networking
在线阅读 下载PDF
Reinforcement Learning with an Ensemble of Binary Action Deep Q-Networks
2
作者 A.M.Hafiz M.Hassaballah +2 位作者 Abdullah Alqahtani Shtwai Alsubai Mohamed Abdel Hameed 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期2651-2666,共16页
With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in ... With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games. 展开更多
关键词 Deep q-networks ensemble learning reinforcement learning OpenAI Gym environments
在线阅读 下载PDF
Resource Allocation in V2X Networks:A Double Deep Q-Network Approach with Graph Neural Networks
3
作者 Zhengda Huan Jian Sun +3 位作者 Zeyu Chen Ziyi Zhang Xiao Sun Zenghui Xiao 《Computers, Materials & Continua》 2025年第9期5427-5443,共17页
With the advancement of Vehicle-to-Everything(V2X)technology,efficient resource allocation in dynamic vehicular networks has become a critical challenge for achieving optimal performance.Existing methods suffer from h... With the advancement of Vehicle-to-Everything(V2X)technology,efficient resource allocation in dynamic vehicular networks has become a critical challenge for achieving optimal performance.Existing methods suffer from high computational complexity and decision latency under high-density traffic and heterogeneous network conditions.To address these challenges,this study presents an innovative framework that combines Graph Neural Networks(GNNs)with a Double Deep Q-Network(DDQN),utilizing dynamic graph structures and reinforcement learning.An adaptive neighbor sampling mechanism is introduced to dynamically select the most relevant neighbors based on interference levels and network topology,thereby improving decision accuracy and efficiency.Meanwhile,the framework models communication links as nodes and interference relationships as edges,effectively capturing the direct impact of interference on resource allocation while reducing computational complexity and preserving critical interaction information.Employing an aggregation mechanism based on the Graph Attention Network(GAT),it dynamically adjusts the neighbor sampling scope and performs attention-weighted aggregation based on node importance,ensuring more efficient and adaptive resource management.This design ensures reliable Vehicle-to-Vehicle(V2V)communication while maintaining high Vehicle-to-Infrastructure(V2I)throughput.The framework retains the global feature learning capabilities of GNNs and supports distributed network deployment,allowing vehicles to extract low-dimensional graph embeddings from local observations for real-time resource decisions.Experimental results demonstrate that the proposed method significantly reduces computational overhead,mitigates latency,and improves resource utilization efficiency in vehicular networks under complex traffic scenarios.This research not only provides a novel solution to resource allocation challenges in V2X networks but also advances the application of DDQN in intelligent transportation systems,offering substantial theoretical significance and practical value. 展开更多
关键词 Resource allocation V2X double deep q-network graph neural network
在线阅读 下载PDF
Transformer-Aided Deep Double Dueling Spatial-Temporal Q-Network for Spatial Crowdsourcing Analysis
4
作者 Yu Li Mingxiao Li +2 位作者 Dongyang Ou Junjie Guo Fangyuan Pan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第4期893-909,共17页
With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms ... With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models. 展开更多
关键词 Historical behavior analysis spatial crowdsourcing deep double dueling q-networks
在线阅读 下载PDF
Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments 被引量:1
5
作者 LI Shuyi LI Minzhe JING Zhongliang 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期601-612,共12页
The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factor... The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factors contribute to a tendency for the solution to converge slowly,and in some cases,diverge altogether.In addressing this issue,this paper introduces a novel approach utilizing a double dueling deep Q-network(D3QN),tailored for dynamic multi-agent environments.A novel reward function based on multi-agent positional constraints is designed,and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents.Moreover,the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum.To match radar and image sensors,a convolutional neural network-long short-term memory(CNN-LSTM)architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN.The algorithm’s efficacy and reliability are validated in a simulated environment,utilizing robot operating system and Gazebo.The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios.In terms of the average success rate and accuracy,the proposed method is superior to other deep learning algorithms,and the convergence speed is also improved. 展开更多
关键词 MULTI-AGENT path planning deep reinforcement learning deep q-network
原文传递
Multi-UAV path planning for multiple emergency payloads delivery in natural disaster scenarios 被引量:1
6
作者 Zarina Kutpanova Mustafa Kadhim +1 位作者 Xu Zheng Nurkhat Zhakiyev 《Journal of Electronic Science and Technology》 2025年第2期1-18,共18页
Unmanned aerial vehicles(UAVs)are widely used in situations with uncertain and risky areas lacking network coverage.In natural disasters,timely delivery of first aid supplies is crucial.Current UAVs face risks such as... Unmanned aerial vehicles(UAVs)are widely used in situations with uncertain and risky areas lacking network coverage.In natural disasters,timely delivery of first aid supplies is crucial.Current UAVs face risks such as crashing into birds or unexpected structures.Airdrop systems with parachutes risk dispersing payloads away from target locations.The objective here is to use multiple UAVs to distribute payloads cooperatively to assigned locations.The civil defense department must balance coverage,accurate landing,and flight safety while considering battery power and capability.Deep Q-network(DQN)models are commonly used in multi-UAV path planning to effectively represent the surroundings and action spaces.Earlier strategies focused on advanced DQNs for UAV path planning in different configurations,but rarely addressed non-cooperative scenarios and disaster environments.This paper introduces a new DQN framework to tackle challenges in disaster environments.It considers unforeseen structures and birds that could cause UAV crashes and assumes urgent landing zones and winch-based airdrop systems for precise delivery and return.A new DQN model is developed,which incorporates the battery life,safe flying distance between UAVs,and remaining delivery points to encode surrounding hazards into the state space and Q-networks.Additionally,a unique reward system is created to improve UAV action sequences for better delivery coverage and safe landings.The experimental results demonstrate that multi-UAV first aid delivery in disaster environments can achieve advanced performance. 展开更多
关键词 Deep q-network First aid delivery Multi-UAV path planning Reinforcement learning Unmanned aerial vehicle(UAV)
在线阅读 下载PDF
Intelligent Scheduling of Virtual Power Plants Based on Deep Reinforcement Learning
7
作者 Shaowei He Wenchao Cui +3 位作者 Gang Li Hairun Xu Xiang Chen Yu Tai 《Computers, Materials & Continua》 2025年第7期861-886,共26页
The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant ... The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants. 展开更多
关键词 Deep reinforcement learning deep q-network virtual power plant lntelligent scheduling markov decision process
在线阅读 下载PDF
Energy Optimization for Autonomous Mobile Robot Path Planning Based on Deep Reinforcement Learning
8
作者 Longfei Gao Weidong Wang Dieyun Ke 《Computers, Materials & Continua》 2026年第1期984-998,共15页
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ... At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems. 展开更多
关键词 Autonomous mobile robot deep reinforcement learning energy optimization multi-attention mechanism prioritized experience replay dueling deep q-network
在线阅读 下载PDF
Convolutional Neural Network-Based Deep Q-Network (CNN-DQN) Resource Management in Cloud Radio Access Network 被引量:2
9
作者 Amjad Iqbal Mau-Luen Tham Yoong Choon Chang 《China Communications》 SCIE CSCD 2022年第10期129-142,共14页
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi... The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach. 展开更多
关键词 energy efficiency(EE) markov decision process(MDP) convolutional neural network(CNN) cloud RAN deep q-network(DQN)
在线阅读 下载PDF
Manufacturing Resource Scheduling Based on Deep Q-Network 被引量:1
10
作者 ZHANG Yufei Zou Yuanhao ZHAO Xiaodong 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2022年第6期531-538,共8页
To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the... To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the entire scheduling process as a multi-stage sequential decision problem, and further obtains the scheduling order by the combination of deep convolutional neural network(CNN) and improved deep Q-network(DQN). Specifically, with respect to the representation of the Markov decision process(MDP), the feature matrix is considered as the state space and a set of heuristic dispatching rules are denoted as the action space. In addition, the deep CNN is employed to approximate the state-action values, and the double dueling deep Qnetwork with prioritized experience replay and noisy network(D3QPN2) is adopted to determine the appropriate action according to the current state. In the experiments, compared with the traditional heuristic method, the proposed method is able to learn high-quality scheduling policy and achieve shorter makespan on the standard public datasets. 展开更多
关键词 smart manufacturing job shop scheduling convolutional neural network deep q-network
原文传递
Walking Stability Control Method for Biped Robot on Uneven Ground Based on Deep Q-Network
11
作者 Baoling Han Yuting Zhao Qingsheng Luo 《Journal of Beijing Institute of Technology》 EI CAS 2019年第3期598-605,共8页
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ... A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved. 展开更多
关键词 DEEP q-network (DQN) BIPED robot uneven ground WALKING STABILITY gait control
在线阅读 下载PDF
UAV Autonomous Navigation for Wireless Powered Data Collection with Onboard Deep Q-Network
12
作者 LI Yuting DING Yi +3 位作者 GAO Jiangchuan LIU Yusha HU Jie YANG Kun 《ZTE Communications》 2023年第2期80-87,共8页
In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly ... In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design. 展开更多
关键词 unmanned aerial vehicle wireless power transfer deep q-network autonomous navigation
在线阅读 下载PDF
Automatic depth matching method of well log based on deep reinforcement learning 被引量:3
13
作者 XIONG Wenjun XIAO Lizhi +1 位作者 YUAN Jiangru YUE Wenzheng 《Petroleum Exploration and Development》 SCIE 2024年第3期634-646,共13页
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei... In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy. 展开更多
关键词 artificial intelligence machine learning depth matching well log multi-agent deep reinforcement learning convolutional neural network double deep q-network
在线阅读 下载PDF
Associative Tasks Computing Offloading Scheme in Internet of Medical Things with Deep Reinforcement Learning 被引量:1
14
作者 Jiang Fan Qin Junwei +1 位作者 Liu Lei Tian Hui 《China Communications》 SCIE CSCD 2024年第4期38-52,共15页
The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-rel... The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-related coupling relationships, Io MT faces unprecedented challenges. Considering the associative connections among tasks, this paper proposes a computing offloading policy for multiple-user devices(UDs) considering device-to-device(D2D) communication and a multi-access edge computing(MEC)technique under the scenario of Io MT. Specifically,to minimize the total delay and energy consumption concerning the requirement of Io MT, we first analyze and model the detailed local execution, MEC execution, D2D execution, and associated tasks offloading exchange model. Consequently, the associated tasks’ offloading scheme of multi-UDs is formulated as a mixed-integer nonconvex optimization problem. Considering the advantages of deep reinforcement learning(DRL) in processing tasks related to coupling relationships, a Double DQN based associative tasks computing offloading(DDATO) algorithm is then proposed to obtain the optimal solution, which can make the best offloading decision under the condition that tasks of UDs are associative. Furthermore, to reduce the complexity of the DDATO algorithm, the cacheaided procedure is intentionally introduced before the data training process. This avoids redundant offloading and computing procedures concerning tasks that previously have already been cached by other UDs. In addition, we use a dynamic ε-greedy strategy in the action selection section of the algorithm, thus preventing the algorithm from falling into a locally optimal solution. Simulation results demonstrate that compared with other existing methods for associative task models concerning different structures in the Io MT network, the proposed algorithm can lower the total cost more effectively and efficiently while also providing a tradeoff between delay and energy consumption tolerance. 展开更多
关键词 associative tasks cache-aided procedure double deep q-network Internet of Medical Things(IoMT) multi-access edge computing(MEC)
在线阅读 下载PDF
Value Function Mechanism in WSNs-Based Mango Plantation Monitoring System
15
作者 Wen-Tsai Sung Indra Griha Tofik Isa Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2024年第9期3733-3759,共27页
Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.... Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.In this study,a Wireless Sensor Networks(“WSNs”)-based intelligent mango plantation monitoring system will be developed that implements deep reinforcement learning(DRL)technology in carrying out prediction tasks based on three classifications:“optimal,”“sub-optimal,”or“not-optimal”conditions based on three parameters including humidity,temperature,and soil moisture.The key idea is how to provide a precise decision-making mechanism in the real-time monitoring system.A value function-based will be employed to perform DRL model called deep Q-network(DQN)which contributes in optimizing the future reward and performing the precise decision recommendation to the agent and system behavior.The WSNs experiment result indicates the system’s accuracy by capturing the real-time environment parameters is 98.39%.Meanwhile,the results of comparative accuracy model experiments of the proposed DQN,individual Q-learning,uniform coverage(UC),and NaÏe Bayes classifier(NBC)are 97.60%,95.30%,96.50%,and 92.30%,respectively.From the results of the comparative experiment,it can be seen that the proposed DQN used in the study has themost optimal accuracy.Testing with 22 test scenarios for“optimal,”“sub-optimal,”and“not-optimal”conditions was carried out to ensure the system runs well in the real-world data.The accuracy percentage which is generated from the real-world data reaches 95.45%.Fromthe resultsof the cost analysis,the systemcanprovide a low-cost systemcomparedtothe conventional system. 展开更多
关键词 Intelligent monitoring system deep reinforcement learning(DRL) wireless sensor networks(WSNs) deep q-network(DQN)
在线阅读 下载PDF
强化学习在卫星任务规划中的策略优化研究
16
作者 阎密 《北斗与空间信息应用技术》 2024年第6期40-44,共5页
随着航天技术的快速发展,卫星任务规划面临着越来越多的挑战。常规的任务规划策略通常建立在人工经验及预设的算法架构之上,这种模式在面对任务环境的不断变化和资源的限制时,显示出其适应性不足的缺陷。为突破既有限制,文章将强化学习(... 随着航天技术的快速发展,卫星任务规划面临着越来越多的挑战。常规的任务规划策略通常建立在人工经验及预设的算法架构之上,这种模式在面对任务环境的不断变化和资源的限制时,显示出其适应性不足的缺陷。为突破既有限制,文章将强化学习(Reinforcement Learning,RL)纳入卫星任务规划范畴,目标是通过智能体与环境互动的探索,推进任务规划策略的智能化提升。 展开更多
关键词 强化学习 卫星任务规划 策略优化 Q-LEARNING Deep q-network
在线阅读 下载PDF
基于深度强化学习的鱼类集群行为建模 被引量:1
17
作者 陈鹏宇 王芳 +4 位作者 刘硕 岳圣智 宋亚男 金兆一 林远山 《广东海洋大学学报》 CAS CSCD 北大核心 2023年第3期1-9,共9页
【目的】使用深度强化学习技术对鱼类集群行为进行建模,探究鱼类集群行为的形成机理。【方法】针对传统基于规则的集群行为建模方法严重依赖人的先验知识而可能无法很好刻画集群行为的问题,提出一种基于Deep Q-Networks(DQN)的鱼类集群... 【目的】使用深度强化学习技术对鱼类集群行为进行建模,探究鱼类集群行为的形成机理。【方法】针对传统基于规则的集群行为建模方法严重依赖人的先验知识而可能无法很好刻画集群行为的问题,提出一种基于Deep Q-Networks(DQN)的鱼类集群行为建模方法,以鱼类个体运动方向与周围邻居平均运动方向夹角表达个体的状态(连续值),以离散化的转角表示其动作,使用神经网络表达鱼类个体的运动策略。在单个学习者多个教师的环境中,以邻居数变化作为立即奖励,使用DQN算法训练神经网络,获得鱼类个体运动策略。【结果】使用本研究方法鱼类个体能学习到教师的运动策略,习得的鱼类个体运动策略在不同场景中均能涌现出集群行为,并且集群行为的特性与真实鱼群行为类似。【结论】本研究方法能够有效地对鱼类集群行为进行建模,有助于分析和理解复杂鱼类种群行为。分析得到的鱼类个体之间的局部交互机理,为理解鱼群形成、鱼类洄游、渔场形成等提供新视角,也可为工厂化高密度养殖提供参考。 展开更多
关键词 鱼群 集群行为建模 深度强化学习 Deep q-networks
在线阅读 下载PDF
A New Reward System Based on Human Demonstrations for Hard Exploration Games
18
作者 Wadhah Zeyad Tareq Mehmet Fatih Amasyali 《Computers, Materials & Continua》 SCIE EI 2022年第2期2401-2414,共14页
The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challe... The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challenge is when the reward is sparse or missing.Such environments are complex exploration environments likeMontezuma’s Revenge,Pitfall,and Private Eye games.Approaches built to deal with such challenges were very demanding.This work introduced a different reward system that enables the simple classical algorithm to learn fast and achieve high performance in hard exploration environments.Moreover,we added some simple enhancements to several hyperparameters,such as the number of actions and the sampling ratio that helped improve performance.We include the extra reward within the human demonstrations.After that,we used Prioritized Double Deep Q-Networks(Prioritized DDQN)to learning from these demonstrations.Our approach enabled the Prioritized DDQNwith a short learning time to finish the first level of Montezuma’s Revenge game and to perform well in both Pitfall and Private Eye.We used the same games to compare our results with several baselines,such as the Rainbow and Deep Q-learning from demonstrations(DQfD)algorithm.The results showed that the new rewards system enabled Prioritized DDQN to out-perform the baselines in the hard exploration games with short learning time. 展开更多
关键词 Deep reinforcement learning human demonstrations prioritized double deep q-networks atari
在线阅读 下载PDF
现货市场环境下基于深度强化学习的光储联合电站储能系统最优运行方法 被引量:10
19
作者 龚开 王旭 +3 位作者 邓晖 蒋传文 马骏超 房乐 《电网技术》 EI CSCD 北大核心 2022年第9期3365-3375,共11页
光伏-储能联合电站不仅能够有效减少光伏实时出力偏差,也是一种能够提供电能量和调频辅助服务的潜在市场主体。为实现上述3类目标,与光伏出力协同的储能电量调度策略至关重要,然而目前大部分光储联合电站的储能电量调度策略无法同时协... 光伏-储能联合电站不仅能够有效减少光伏实时出力偏差,也是一种能够提供电能量和调频辅助服务的潜在市场主体。为实现上述3类目标,与光伏出力协同的储能电量调度策略至关重要,然而目前大部分光储联合电站的储能电量调度策略无法同时协调降低光伏实时出力偏差和参与电能量与调频辅助服务市场3种决策;另一方面,电力现货市场价格与调频信号不确定性及储能电量调度策略将光储联合电站储能运行优化问题转化为一个随机动态非凸优化问题,现有相关研究大部分利用随机场景法或智能算法处理非凸优化,所获得的储能运行方案存在一定的局限性,且难以根据实时数据动态制定运行方案。因此,提出一种现货市场环境下基于DQN(deep Q-network)的光储联合电站储能系统优化运行方法,该方法克服了非凸优化难题,结合所提储能电量闭环调度策略能够实现光储联合电站在考虑偏差考核成本、电能量收益、调频辅助服务收益下的储能系统小时级动态优化运行,进而最大化光-储联合电站的经济收益。测试算例通过实际市场数据验证了所提方法的可行性和有效性。 展开更多
关键词 储能 deep q-network 不确定性 电力市场 最优运行
原文传递
基于深度强化学习的单路口信号控制算法 被引量:12
20
作者 郭梦杰 任安虎 《电子测量技术》 2019年第24期49-52,共4页
针对交通拥堵问题,利用深度强化学习与交通信号控制相结合的方法,构造一个单路口的道路模型,将交通信号控制问题转化为一个Agent在离散时间步长上与交叉口交互的强化学习问题,将交叉口的等待时间作为目标函数。利用强化学习的决策能力... 针对交通拥堵问题,利用深度强化学习与交通信号控制相结合的方法,构造一个单路口的道路模型,将交通信号控制问题转化为一个Agent在离散时间步长上与交叉口交互的强化学习问题,将交叉口的等待时间作为目标函数。利用强化学习的决策能力和深度学习的感知能力,使得智能体Agent在观测到环境状态后选择出当前状态下可能的最优控制策略并执行,并根据奖赏函数来更新下一时刻的状态。在仿真软件SUMO上进行仿真实验,与定时控制模式相比,所提出的方法在不同饱和度流量下的车辆等待时间均有不同程度的提升,验证了算法的有效性。 展开更多
关键词 深度学习 交通信号控制 DEEP q-network SUMO
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部