With the advancement of Vehicle-to-Everything(V2X)technology,efficient resource allocation in dynamic vehicular networks has become a critical challenge for achieving optimal performance.Existing methods suffer from h...With the advancement of Vehicle-to-Everything(V2X)technology,efficient resource allocation in dynamic vehicular networks has become a critical challenge for achieving optimal performance.Existing methods suffer from high computational complexity and decision latency under high-density traffic and heterogeneous network conditions.To address these challenges,this study presents an innovative framework that combines Graph Neural Networks(GNNs)with a Double Deep Q-Network(DDQN),utilizing dynamic graph structures and reinforcement learning.An adaptive neighbor sampling mechanism is introduced to dynamically select the most relevant neighbors based on interference levels and network topology,thereby improving decision accuracy and efficiency.Meanwhile,the framework models communication links as nodes and interference relationships as edges,effectively capturing the direct impact of interference on resource allocation while reducing computational complexity and preserving critical interaction information.Employing an aggregation mechanism based on the Graph Attention Network(GAT),it dynamically adjusts the neighbor sampling scope and performs attention-weighted aggregation based on node importance,ensuring more efficient and adaptive resource management.This design ensures reliable Vehicle-to-Vehicle(V2V)communication while maintaining high Vehicle-to-Infrastructure(V2I)throughput.The framework retains the global feature learning capabilities of GNNs and supports distributed network deployment,allowing vehicles to extract low-dimensional graph embeddings from local observations for real-time resource decisions.Experimental results demonstrate that the proposed method significantly reduces computational overhead,mitigates latency,and improves resource utilization efficiency in vehicular networks under complex traffic scenarios.This research not only provides a novel solution to resource allocation challenges in V2X networks but also advances the application of DDQN in intelligent transportation systems,offering substantial theoretical significance and practical value.展开更多
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi...The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.展开更多
To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the...To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the entire scheduling process as a multi-stage sequential decision problem, and further obtains the scheduling order by the combination of deep convolutional neural network(CNN) and improved deep Q-network(DQN). Specifically, with respect to the representation of the Markov decision process(MDP), the feature matrix is considered as the state space and a set of heuristic dispatching rules are denoted as the action space. In addition, the deep CNN is employed to approximate the state-action values, and the double dueling deep Qnetwork with prioritized experience replay and noisy network(D3QPN2) is adopted to determine the appropriate action according to the current state. In the experiments, compared with the traditional heuristic method, the proposed method is able to learn high-quality scheduling policy and achieve shorter makespan on the standard public datasets.展开更多
The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factor...The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factors contribute to a tendency for the solution to converge slowly,and in some cases,diverge altogether.In addressing this issue,this paper introduces a novel approach utilizing a double dueling deep Q-network(D3QN),tailored for dynamic multi-agent environments.A novel reward function based on multi-agent positional constraints is designed,and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents.Moreover,the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum.To match radar and image sensors,a convolutional neural network-long short-term memory(CNN-LSTM)architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN.The algorithm’s efficacy and reliability are validated in a simulated environment,utilizing robot operating system and Gazebo.The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios.In terms of the average success rate and accuracy,the proposed method is superior to other deep learning algorithms,and the convergence speed is also improved.展开更多
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ...A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.展开更多
Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging ...Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput.展开更多
With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms ...With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.展开更多
With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in ...With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.展开更多
In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly ...In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.展开更多
Fiber allocation in optical cable production is critical for optimizing production efficiency,product quality,and inventory management.However,factors like fiber length and storage time complicate this process,making ...Fiber allocation in optical cable production is critical for optimizing production efficiency,product quality,and inventory management.However,factors like fiber length and storage time complicate this process,making heuristic optimization algorithms inadequate.To tackle these challenges,this paper proposes a new framework:the dueling-double-deep Q-network with twin state-value and action-advantage functions (D3QNTF).First,dual action-advantage and state-value functions are used to prevent overestimation of action values.Second,a method for random initialization of feasible solutions improves sample quality early in the optimization.Finally,a strict penalty for errors is added to the reward mechanism,making the agent more sensitive to and better at avoiding illegal actions,which reduces decision errors.Experimental results show that the proposed method outperforms state-of-the-art algorithms,including greedy algorithms,genetic algorithms,deep Q-networks,double deep Q-networks,and standard dueling-double-deep Q-networks.The findings highlight the potential of the D3QNTF framework for fiber allocation in optical cable production.展开更多
Recent years have seen the rapid development of autonomous driving systems,which are typically designed in a hierarchical architecture or an end-to-end architecture.The hierarchical architecture is always complicated ...Recent years have seen the rapid development of autonomous driving systems,which are typically designed in a hierarchical architecture or an end-to-end architecture.The hierarchical architecture is always complicated and hard to design,while the end-to-end architecture is more promising due to its simple structure.This paper puts forward an end-to-end autonomous driving method through a deep reinforcement learning algorithm Dueling Double Deep Q-Network,making it possible for the vehicle to learn end-to-end driving by itself.This paper firstly proposes an architecture for the end-to-end lane-keeping task.Unlike the traditional image-only state space,the presented state space is composed of both camera images and vehicle motion information.Then corresponding dueling neural network structure is introduced,which reduces the variance and improves sampling efficiency.Thirdly,the proposed method is applied to The Open Racing Car Simulator(TORCS)to demonstrate its great performance,where it surpasses human drivers.Finally,the saliency map of the neural network is visualized,which indicates the trained network drives by observing the lane lines.A video for the presented work is available online,https://youtu.be/76ciJ mIHMD8 or https://v.youku.com/v_show/id_XNDM4 ODc0M TM4NA==.html.展开更多
In this paper,an unmanned aerial vehicle(UAV)-aided wireless emergence communication system is studied,where a UAV is deployed to support ground user equipments(UEs)for emergence communications.We aim to maximize the ...In this paper,an unmanned aerial vehicle(UAV)-aided wireless emergence communication system is studied,where a UAV is deployed to support ground user equipments(UEs)for emergence communications.We aim to maximize the number of the UEs served,the fairness,and the overall uplink data rate via optimizing the trajectory of UAV and the transmission power of UEs.We propose a deep Q-network(DQN)based algorithm,which involves the well-known deep neural network(DNN)and Q-learning,to solve the UAV trajectory prob-lem.Then,based on the optimized UAV trajectory,we further propose a successive convex approximation(SCA)based algorithm to tackle the power control problem for each UE.Numerical simulations demonstrate that the proposed DQN based algorithm can achieve considerable performance gain over the existing benchmark algorithms in terms of fairness,the number of UEs served and overall uplink data rate via optimizing UAV’s trajectory and power optimization.展开更多
In dense traffic unmanned aerial vehicle(UAV)ad-hoc networks,traffic congestion can cause increased delay and packet loss,which limit the performance of the networks;therefore,a traffic balancing strategy is required ...In dense traffic unmanned aerial vehicle(UAV)ad-hoc networks,traffic congestion can cause increased delay and packet loss,which limit the performance of the networks;therefore,a traffic balancing strategy is required to control the traffic.In this study,we propose TQNGPSR,a traffic-aware Q-network enhanced geographic routing protocol based on greedy perimeter stateless routing(GPSR),for UAV ad-hoc networks.The protocol enforces a traffic balancing strategy using the congestion information of neighbors,and evaluates the quality of a wireless link by the Q-network algorithm,which is a reinforcement learning algorithm.Based on the evaluation of each wireless link,the protocol makes routing decisions in multiple available choices to reduce delay and decrease packet loss.We simulate the performance of TQNGPSR and compare it with AODV,OLSR,GPSR,and QNGPSR.Simulation results show that TQNGPSR obtains higher packet delivery ratios and lower end-to-end delays than GPSR and QNGPSR.In high node density scenarios,it also outperforms AODV and OLSR in terms of the packet delivery ratio,end-to-end delay,and throughput.展开更多
High penetration of distributed renewable energy sources and electric vehicles(EVs)makes future active distribution network(ADN)highly variable.These characteristics put great challenges to traditional voltage control...High penetration of distributed renewable energy sources and electric vehicles(EVs)makes future active distribution network(ADN)highly variable.These characteristics put great challenges to traditional voltage control methods.Voltage control based on the deep Q-network(DQN)algorithm offers a potential solution to this problem because it possesses humanlevel control performance.However,the traditional DQN methods may produce overestimation of action reward values,resulting in degradation of obtained solutions.In this paper,an intelligent voltage control method based on averaged weighted double deep Q-network(AWDDQN)algorithm is proposed to overcome the shortcomings of overestimation of action reward values in DQN algorithm and underestimation of action reward values in double deep Q-network(DDQN)algorithm.Using the proposed method,the voltage control objective is incorporated into the designed action reward values and normalized to form a Markov decision process(MDP)model which is solved by the AWDDQN algorithm.The designed AWDDQN-based intelligent voltage control agent is trained offline and used as online intelligent dynamic voltage regulator for the ADN.The proposed voltage control method is validated using the IEEE 33-bus and 123-bus systems containing renewable energy sources and EVs,and compared with the DQN and DDQN algorithms based methods,and traditional mixed-integer nonlinear program based methods.The simulation results show that the proposed method has better convergence and less voltage volatility than the other ones.展开更多
基金Project ZR2023MF111 supported by Shandong Provincial Natural Science Foundation。
文摘With the advancement of Vehicle-to-Everything(V2X)technology,efficient resource allocation in dynamic vehicular networks has become a critical challenge for achieving optimal performance.Existing methods suffer from high computational complexity and decision latency under high-density traffic and heterogeneous network conditions.To address these challenges,this study presents an innovative framework that combines Graph Neural Networks(GNNs)with a Double Deep Q-Network(DDQN),utilizing dynamic graph structures and reinforcement learning.An adaptive neighbor sampling mechanism is introduced to dynamically select the most relevant neighbors based on interference levels and network topology,thereby improving decision accuracy and efficiency.Meanwhile,the framework models communication links as nodes and interference relationships as edges,effectively capturing the direct impact of interference on resource allocation while reducing computational complexity and preserving critical interaction information.Employing an aggregation mechanism based on the Graph Attention Network(GAT),it dynamically adjusts the neighbor sampling scope and performs attention-weighted aggregation based on node importance,ensuring more efficient and adaptive resource management.This design ensures reliable Vehicle-to-Vehicle(V2V)communication while maintaining high Vehicle-to-Infrastructure(V2I)throughput.The framework retains the global feature learning capabilities of GNNs and supports distributed network deployment,allowing vehicles to extract low-dimensional graph embeddings from local observations for real-time resource decisions.Experimental results demonstrate that the proposed method significantly reduces computational overhead,mitigates latency,and improves resource utilization efficiency in vehicular networks under complex traffic scenarios.This research not only provides a novel solution to resource allocation challenges in V2X networks but also advances the application of DDQN in intelligent transportation systems,offering substantial theoretical significance and practical value.
基金supported by the Universiti Tunku Abdul Rahman (UTAR) Malaysia under UTARRF (IPSR/RMC/UTARRF/2021-C1/T05)
文摘The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.
基金Supported by the National Key Research and Development Plan(2019YFB1706401)。
文摘To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the entire scheduling process as a multi-stage sequential decision problem, and further obtains the scheduling order by the combination of deep convolutional neural network(CNN) and improved deep Q-network(DQN). Specifically, with respect to the representation of the Markov decision process(MDP), the feature matrix is considered as the state space and a set of heuristic dispatching rules are denoted as the action space. In addition, the deep CNN is employed to approximate the state-action values, and the double dueling deep Qnetwork with prioritized experience replay and noisy network(D3QPN2) is adopted to determine the appropriate action according to the current state. In the experiments, compared with the traditional heuristic method, the proposed method is able to learn high-quality scheduling policy and achieve shorter makespan on the standard public datasets.
基金National Natural Science Foundation of China(Nos.61673262 and 50779033)National GF Basic Research Program(No.JCKY2021110B134)Fundamental Research Funds for the Central Universities。
文摘The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factors contribute to a tendency for the solution to converge slowly,and in some cases,diverge altogether.In addressing this issue,this paper introduces a novel approach utilizing a double dueling deep Q-network(D3QN),tailored for dynamic multi-agent environments.A novel reward function based on multi-agent positional constraints is designed,and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents.Moreover,the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum.To match radar and image sensors,a convolutional neural network-long short-term memory(CNN-LSTM)architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN.The algorithm’s efficacy and reliability are validated in a simulated environment,utilizing robot operating system and Gazebo.The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios.In terms of the average success rate and accuracy,the proposed method is superior to other deep learning algorithms,and the convergence speed is also improved.
基金Supported by the National Ministries and Research Funds(3020020221111)
文摘A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.
基金This work was funded by BK21 FOUR(Fostering Outstanding Universities for Research)(No.5199990914048)this research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2020R1I1A3066543)In addition,this work was supported by the Soonchunhyang University Research Fund.
文摘Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput.
基金supported in part by the Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2022C01083 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/)Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2023C01217 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/).
文摘With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.
文摘With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.
文摘In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.
基金supported by the National Natural Science Foundation of China(Grant Nos.52205519 and 62273264).
文摘Fiber allocation in optical cable production is critical for optimizing production efficiency,product quality,and inventory management.However,factors like fiber length and storage time complicate this process,making heuristic optimization algorithms inadequate.To tackle these challenges,this paper proposes a new framework:the dueling-double-deep Q-network with twin state-value and action-advantage functions (D3QNTF).First,dual action-advantage and state-value functions are used to prevent overestimation of action values.Second,a method for random initialization of feasible solutions improves sample quality early in the optimization.Finally,a strict penalty for errors is added to the reward mechanism,making the agent more sensitive to and better at avoiding illegal actions,which reduces decision errors.Experimental results show that the proposed method outperforms state-of-the-art algorithms,including greedy algorithms,genetic algorithms,deep Q-networks,double deep Q-networks,and standard dueling-double-deep Q-networks.The findings highlight the potential of the D3QNTF framework for fiber allocation in optical cable production.
基金This work is supported by the National Key Research and Development Project of China under Grant 2018YFB1600600Beijing Natural Science Foundation with JQ18010.The authors should also thank the support from Tsinghua University-Didi Joint Research Center for Future Mobility.
文摘Recent years have seen the rapid development of autonomous driving systems,which are typically designed in a hierarchical architecture or an end-to-end architecture.The hierarchical architecture is always complicated and hard to design,while the end-to-end architecture is more promising due to its simple structure.This paper puts forward an end-to-end autonomous driving method through a deep reinforcement learning algorithm Dueling Double Deep Q-Network,making it possible for the vehicle to learn end-to-end driving by itself.This paper firstly proposes an architecture for the end-to-end lane-keeping task.Unlike the traditional image-only state space,the presented state space is composed of both camera images and vehicle motion information.Then corresponding dueling neural network structure is introduced,which reduces the variance and improves sampling efficiency.Thirdly,the proposed method is applied to The Open Racing Car Simulator(TORCS)to demonstrate its great performance,where it surpasses human drivers.Finally,the saliency map of the neural network is visualized,which indicates the trained network drives by observing the lane lines.A video for the presented work is available online,https://youtu.be/76ciJ mIHMD8 or https://v.youku.com/v_show/id_XNDM4 ODc0M TM4NA==.html.
基金The associate editor coordinating the review of this paper and approving it for publication was J.Zhang.
文摘In this paper,an unmanned aerial vehicle(UAV)-aided wireless emergence communication system is studied,where a UAV is deployed to support ground user equipments(UEs)for emergence communications.We aim to maximize the number of the UEs served,the fairness,and the overall uplink data rate via optimizing the trajectory of UAV and the transmission power of UEs.We propose a deep Q-network(DQN)based algorithm,which involves the well-known deep neural network(DNN)and Q-learning,to solve the UAV trajectory prob-lem.Then,based on the optimized UAV trajectory,we further propose a successive convex approximation(SCA)based algorithm to tackle the power control problem for each UE.Numerical simulations demonstrate that the proposed DQN based algorithm can achieve considerable performance gain over the existing benchmark algorithms in terms of fairness,the number of UEs served and overall uplink data rate via optimizing UAV’s trajectory and power optimization.
基金Project supported by the National Natural Science Foundation of China(No.61501399)the National Key R&D Program of China(No.2018AAA0102302)。
文摘In dense traffic unmanned aerial vehicle(UAV)ad-hoc networks,traffic congestion can cause increased delay and packet loss,which limit the performance of the networks;therefore,a traffic balancing strategy is required to control the traffic.In this study,we propose TQNGPSR,a traffic-aware Q-network enhanced geographic routing protocol based on greedy perimeter stateless routing(GPSR),for UAV ad-hoc networks.The protocol enforces a traffic balancing strategy using the congestion information of neighbors,and evaluates the quality of a wireless link by the Q-network algorithm,which is a reinforcement learning algorithm.Based on the evaluation of each wireless link,the protocol makes routing decisions in multiple available choices to reduce delay and decrease packet loss.We simulate the performance of TQNGPSR and compare it with AODV,OLSR,GPSR,and QNGPSR.Simulation results show that TQNGPSR obtains higher packet delivery ratios and lower end-to-end delays than GPSR and QNGPSR.In high node density scenarios,it also outperforms AODV and OLSR in terms of the packet delivery ratio,end-to-end delay,and throughput.
基金supported in part by the Anhui Province Natural Science Foundation(No.2108085UD02)the National Natural Science Foundation of China(No.51577047)111 Project(No.BP0719039)。
文摘High penetration of distributed renewable energy sources and electric vehicles(EVs)makes future active distribution network(ADN)highly variable.These characteristics put great challenges to traditional voltage control methods.Voltage control based on the deep Q-network(DQN)algorithm offers a potential solution to this problem because it possesses humanlevel control performance.However,the traditional DQN methods may produce overestimation of action reward values,resulting in degradation of obtained solutions.In this paper,an intelligent voltage control method based on averaged weighted double deep Q-network(AWDDQN)algorithm is proposed to overcome the shortcomings of overestimation of action reward values in DQN algorithm and underestimation of action reward values in double deep Q-network(DDQN)algorithm.Using the proposed method,the voltage control objective is incorporated into the designed action reward values and normalized to form a Markov decision process(MDP)model which is solved by the AWDDQN algorithm.The designed AWDDQN-based intelligent voltage control agent is trained offline and used as online intelligent dynamic voltage regulator for the ADN.The proposed voltage control method is validated using the IEEE 33-bus and 123-bus systems containing renewable energy sources and EVs,and compared with the DQN and DDQN algorithms based methods,and traditional mixed-integer nonlinear program based methods.The simulation results show that the proposed method has better convergence and less voltage volatility than the other ones.