Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)du...Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)due to its tamper-proof and non-repudiation features.Although blockchain typically does not require the endorsement of third-party trust organizations,it mostly needs to perform necessary mathematical calculations to prevent malicious attacks,which results in stricter requirements for computation resources on the participating devices.By offloading the computation tasks required to support blockchain consensus to edge service nodes or the cloud,while providing data privacy protection for IoT applications,it can effectively address the limitations of computation and energy resources in IoT devices.However,how to make reasonable offloading decisions for IoT devices remains an open issue.Due to the excellent self-learning ability of Reinforcement Learning(RL),this paper proposes a RL enabled Swarm Intelligence Optimization Algorithm(RLSIOA)that aims to improve the quality of initial solutions and achieve efficient optimization of computation task offloading decisions.The algorithm considers various factors that may affect the revenue obtained by IoT devices executing consensus algorithms(e.g.,Proof-of-Work),it optimizes the proportion of sub-tasks to be offloaded and the scale of computing resources to be rented from the edge and cloud to maximize the revenue of devices.Experimental results show that RLSIOA can obtain higher-quality offloading decision-making schemes at lower latency costs compared to representative benchmark algorithms.展开更多
Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies dri...Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks.展开更多
This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent...This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent policy networks.The graph of welding spots distribution is encoded using the graph attention network.Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks.The policy network is used to convert the large scale welding spots allocation problem to multiple small scale singlerobot welding path planning problems,and the path planning problem is quickly solved through existing methods.Then,the model is trained through reinforcement learning.In addition,the task balancing method is used to allocate tasks to multiple stations.The proposed algorithm is compared with classical algorithms,and the results show that the algorithm based on DRL can produce higher quality solutions.展开更多
The advent of the internet-of-everything era has led to the increased use of mobile edge computing.The rise of artificial intelligence has provided many possibilities for the low-latency task-offloading demands of use...The advent of the internet-of-everything era has led to the increased use of mobile edge computing.The rise of artificial intelligence has provided many possibilities for the low-latency task-offloading demands of users,but existing technologies rigidly assume that there is only one task to be offloaded in each time slot at the terminal.In practical scenarios,there are often numerous computing tasks to be executed at the terminal,leading to a cumulative delay for subsequent task offloading.Therefore,the efficient processing of multiple computing tasks on the terminal has become highly challenging.To address the lowlatency offloading requirements for multiple computational tasks on terminal devices,we propose a terminal multitask parallel offloading algorithm based on deep reinforcement learning.Specifically,we first establish a mobile edge computing system model consisting of a single edge server and multiple terminal users.We then model the task offloading decision problem as a Markov decision process,and solve this problem using the Dueling Deep-Q Network algorithm to obtain the optimal offloading strategy.Experimental results demonstrate that,under the same constraints,our proposed algorithm reduces the average system latency.展开更多
The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability an...The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.展开更多
Unmanned Aerial Vehicles(UAVs)are useful in dangerous and dynamic tasks such as search-and-rescue,forest surveillance,and anti-terrorist operations.These tasks can be solved better through the collaboration of multipl...Unmanned Aerial Vehicles(UAVs)are useful in dangerous and dynamic tasks such as search-and-rescue,forest surveillance,and anti-terrorist operations.These tasks can be solved better through the collaboration of multiple UAVs under human supervision.However,it is still difficult for human to monitor,understand,predict and control the behaviors of the UAVs due to the task complexity as well as the black-box machine learning and planning algorithms being used.In this paper,the coactive design method is adopted to analyze the cognitive capabilities required for the tasks and design the interdependencies among the heterogeneous teammates of UAVs or human for coherent collaboration.Then,an agent-based task planner is proposed to automatically decompose a complex task into a sequence of explainable subtasks under constrains of resources,execution time,social rules and costs.Besides,a deep reinforcement learning approach is designed for the UAVs to learn optimal policies of a flocking behavior and a path planner that are easy for the human operator to understand and control.Finally,a mixed-initiative action selection mechanism is used to evaluate the learned policies as well as the human’s decisions.Experimental results demonstrate the effectiveness of the proposed methods.展开更多
With the arrival of 5G,latency-sensitive applications are becoming increasingly diverse.Mobile Edge Computing(MEC)technology has the characteristics of high bandwidth,low latency and low energy consumption,and has att...With the arrival of 5G,latency-sensitive applications are becoming increasingly diverse.Mobile Edge Computing(MEC)technology has the characteristics of high bandwidth,low latency and low energy consumption,and has attracted much attention among researchers.To improve the Quality of Service(QoS),this study focuses on computation offloading in MEC.We consider the QoS from the perspective of computational cost,dimensional disaster,user privacy and catastrophic forgetting of new users.The QoS model is established based on the delay and energy consumption and is based on DDQN and a Federated Learning(FL)adaptive task offloading algorithm in MEC.The proposed algorithm combines the QoS model and deep reinforcement learning algorithm to obtain an optimal offloading policy according to the local link and node state information in the channel coherence time to address the problem of time-varying transmission channels and reduce the computing energy consumption and task processing delay.To solve the problems of privacy and catastrophic forgetting,we use FL to make distributed use of multiple users’data to obtain the decision model,protect data privacy and improve the model universality.In the process of FL iteration,the communication delay of individual devices is too large,which affects the overall delay cost.Therefore,we adopt a communication delay optimization algorithm based on the unary outlier detection mechanism to reduce the communication delay of FL.The simulation results indicate that compared with existing schemes,the proposed method significantly reduces the computation cost on a device and improves the QoS when handling complex tasks.展开更多
The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multi...The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems.展开更多
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to...The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.展开更多
The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC n...The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC networks can support a wide range of applications. MEC networks can also leverage various types of resources, including computation resources, network resources, radio resources,and location-based resources, to provide multidimensional resources for intelligent applications in 5/6G.However, tasks generated by users often consist of multiple subtasks that require different types of resources. It is a challenging problem to offload multiresource task requests to the edge cloud aiming at maximizing benefits due to the heterogeneity of resources provided by devices. To address this issue,we mathematically model the task requests with multiple subtasks. Then, the problem of task offloading of multi-resource task requests is proved to be NP-hard. Furthermore, we propose a novel Dual-Agent Deep Reinforcement Learning algorithm with Node First and Link features(NF_L_DA_DRL) based on the policy network, to optimize the benefits generated by offloading multi-resource task requests in MEC networks. Finally, simulation results show that the proposed algorithm can effectively improve the benefit of task offloading with higher resource utilization compared with baseline algorithms.展开更多
To improve the quality of computation experience for mobile devices,mobile edge computing(MEC)is a promising paradigm by providing computing capabilities in close proximity within a sliced radio access network,which s...To improve the quality of computation experience for mobile devices,mobile edge computing(MEC)is a promising paradigm by providing computing capabilities in close proximity within a sliced radio access network,which supports both traditional communication and MEC services.However,this kind of intensive computing problem is a high dimensional NP hard problem,and some machine learning methods do not have a good effect on solving this problem.In this paper,the Markov decision process model is established to find the excellent task offloading scheme,which maximizes the long-term utility performance,so as to make the best offloading decision according to the queue state,energy queue state and channel quality between mobile users and BS.In order to explore the curse of high dimension in state space,a candidate network is proposed based on edge computing optimize offloading(ECOO)algorithm with the application of deep deterministic policy gradient algorithm.Through simulation experiments,it is proved that the ECOO algorithm is superior to some deep reinforcement learning algorithms in terms of energy consumption and time delay.So the ECOO is good at dealing with high dimensional problems.展开更多
Vehicular edge computing(VEC)is emerging as a promising solution paradigm to meet the requirements of compute-intensive applications in internet of vehicle(IoV).Non-orthogonal multiple access(NOMA)has advantages in im...Vehicular edge computing(VEC)is emerging as a promising solution paradigm to meet the requirements of compute-intensive applications in internet of vehicle(IoV).Non-orthogonal multiple access(NOMA)has advantages in improving spectrum efficiency and dealing with bandwidth scarcity and cost.It is an encouraging progress combining VEC and NOMA.In this paper,we jointly optimize task offloading decision and resource allocation to maximize the service utility of the NOMA-VEC system.To solve the optimization problem,we propose a multiagent deep graph reinforcement learning algorithm.The algorithm extracts the topological features and relationship information between agents from the system state as observations,outputs task offloading decision and resource allocation simultaneously with local policy network,which is updated by a local learner.Simulation results demonstrate that the proposed method achieves a 1.52%∼5.80%improvement compared with the benchmark algorithms in system service utility.展开更多
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
Task scheduling plays a crucial role in cloud computing and is a key factor determining cloud computing performance.To solve the task scheduling problem for remote sensing data processing in cloud computing,this paper...Task scheduling plays a crucial role in cloud computing and is a key factor determining cloud computing performance.To solve the task scheduling problem for remote sensing data processing in cloud computing,this paper proposes a workflow task scheduling algorithm—Workflow Task Scheduling Algorithm based on Deep Reinforcement Learning(WDRL).The remote sensing data process modeling is transformed into a directed acyclic graph scheduling problem.Then,the algorithm is designed by establishing a Markov decision model and adopting a fitness calculation method.Finally,combine the advantages of reinforcement learning and deep neural networks to minimize make-time for remote sensing data processes from experience.The experiment is based on the development of CloudSim and Python and compares the change of completion time in the process of remote sensing data.The results showthat compared with several traditionalmeta-heuristic scheduling algorithms,WDRL can effectively achieve the goal of optimizing task scheduling efficiency.展开更多
This paper focuses on the scheduling problem of workflow tasks that exhibit interdependencies.Unlike indepen-dent batch tasks,workflows typically consist of multiple subtasks with intrinsic correlations and dependenci...This paper focuses on the scheduling problem of workflow tasks that exhibit interdependencies.Unlike indepen-dent batch tasks,workflows typically consist of multiple subtasks with intrinsic correlations and dependencies.It necessitates the distribution of various computational tasks to appropriate computing node resources in accor-dance with task dependencies to ensure the smooth completion of the entire workflow.Workflow scheduling must consider an array of factors,including task dependencies,availability of computational resources,and the schedulability of tasks.Therefore,this paper delves into the distributed graph database workflow task scheduling problem and proposes a workflow scheduling methodology based on deep reinforcement learning(DRL).The method optimizes the maximum completion time(makespan)and response time of workflow tasks,aiming to enhance the responsiveness of workflow tasks while ensuring the minimization of the makespan.The experimental results indicate that the Q-learning Deep Reinforcement Learning(Q-DRL)algorithm markedly diminishes the makespan and refines the average response time within distributed graph database environments.In quantifying makespan,Q-DRL achieves mean reductions of 12.4%and 11.9%over established First-fit and Random scheduling strategies,respectively.Additionally,Q-DRL surpasses the performance of both DRL-Cloud and Improved Deep Q-learning Network(IDQN)algorithms,with improvements standing at 4.4%and 2.6%,respectively.With reference to average response time,the Q-DRL approach exhibits a significantly enhanced performance in the scheduling of workflow tasks,decreasing the average by 2.27%and 4.71%when compared to IDQN and DRL-Cloud,respectively.The Q-DRL algorithm also demonstrates a notable increase in the efficiency of system resource utilization,reducing the average idle rate by 5.02%and 9.30%in comparison to IDQN and DRL-Cloud,respectively.These findings support the assertion that Q-DRL not only upholds a lower average idle rate but also effectively curtails the average response time,thereby substantially improving processing efficiency and optimizing resource utilization within distributed graph database systems.展开更多
Traditionally, heuristic re-planning algorithms are used to tackle the problem of dynamic task planning for multiple satellites. However, the traditional heuristic strategies depend on the concrete tasks, which often ...Traditionally, heuristic re-planning algorithms are used to tackle the problem of dynamic task planning for multiple satellites. However, the traditional heuristic strategies depend on the concrete tasks, which often affect the result’s optimality. Noticing that the historical information of cooperative task planning will impact the latter planning results, we propose a hybrid learning algorithm for dynamic multi-satellite task planning, which is based on the multi-agent reinforcement learning of policy iteration and the transfer learning. The reinforcement learning strategy of each satellite is described with neural networks. The policy neural network individuals with the best topological structure and weights are found by applying co-evolutionary search iteratively. To avoid the failure of the historical learning caused by the randomly occurring observation requests, a novel approach is proposed to balance the quality and efficiency of the task planning, which converts the historical learning strategy to the current initial learning strategy by applying the transfer learning algorithm. The simulations and analysis show the feasibility and adaptability of the proposed approach especially for the situation with randomly occurring observation requests.展开更多
Recently,with the increasing complexity of multiplex Unmanned Aerial Vehicles(multi-UAVs)collaboration in dynamic task environments,multi-UAVs systems have shown new characteristics of inter-coupling among multiplex g...Recently,with the increasing complexity of multiplex Unmanned Aerial Vehicles(multi-UAVs)collaboration in dynamic task environments,multi-UAVs systems have shown new characteristics of inter-coupling among multiplex groups and intra-correlation within groups.However,previous studies often overlooked the structural impact of dynamic risks on agents among multiplex UAV groups,which is a critical issue for modern multi-UAVs communication to address.To address this problem,we integrate the influence of dynamic risks on agents among multiplex UAV group structures into a multi-UAVs task migration problem and formulate it as a partially observable Markov game.We then propose a Hybrid Attention Multi-agent Reinforcement Learning(HAMRL)algorithm,which uses attention structures to learn the dynamic characteristics of the task environment,and it integrates hybrid attention mechanisms to establish efficient intra-and inter-group communication aggregation for information extraction and group collaboration.Experimental results show that in this comprehensive and challenging model,our algorithm significantly outperforms state-of-the-art algorithms in terms of convergence speed and algorithm performance due to the rational design of communication mechanisms.展开更多
With the rapid development of mobile internet technology and increasing concerns over data privacy,Federated Learning(FL)has emerged as a significant framework for training machine learning models.Given the advancemen...With the rapid development of mobile internet technology and increasing concerns over data privacy,Federated Learning(FL)has emerged as a significant framework for training machine learning models.Given the advancements in technology,User Equipment(UE)can now process multiple computing tasks simultaneously,and since UEs can have multiple data sources that are suitable for various FL tasks,multiple tasks FL could be a promising way to respond to different application requests at the same time.However,running multiple FL tasks simultaneously could lead to a strain on the device’s computation resource and excessive energy consumption,especially the issue of energy consumption challenge.Due to factors such as limited battery capacity and device heterogeneity,UE may fail to efficiently complete the local training task,and some of them may become stragglers with high-quality data.Aiming at alleviating the energy consumption challenge in a multi-task FL environment,we design an automatic Multi-Task FL Deployment(MFLD)algorithm to reach the local balancing and energy consumption goals.The MFLD algorithm leverages Deep Reinforcement Learning(DRL)techniques to automatically select UEs and allocate the computation resources according to the task requirement.Extensive experiments validate our proposed approach and showed significant improvements in task deployment success rate and energy consumption cost.展开更多
基金supported by the Project of Science and Technology Research Program of Chongqing Education Commission of China(No.KJZD-K202401105)High-Quality Development Action Plan for Graduate Education at Chongqing University of Technology(No.gzljg2023308,No.gzljd2024204)+1 种基金the Graduate Innovation Program of Chongqing University of Technology(No.gzlcx20233197)Yunnan Provincial Key R&D Program(202203AA080006).
文摘Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)due to its tamper-proof and non-repudiation features.Although blockchain typically does not require the endorsement of third-party trust organizations,it mostly needs to perform necessary mathematical calculations to prevent malicious attacks,which results in stricter requirements for computation resources on the participating devices.By offloading the computation tasks required to support blockchain consensus to edge service nodes or the cloud,while providing data privacy protection for IoT applications,it can effectively address the limitations of computation and energy resources in IoT devices.However,how to make reasonable offloading decisions for IoT devices remains an open issue.Due to the excellent self-learning ability of Reinforcement Learning(RL),this paper proposes a RL enabled Swarm Intelligence Optimization Algorithm(RLSIOA)that aims to improve the quality of initial solutions and achieve efficient optimization of computation task offloading decisions.The algorithm considers various factors that may affect the revenue obtained by IoT devices executing consensus algorithms(e.g.,Proof-of-Work),it optimizes the proportion of sub-tasks to be offloaded and the scale of computing resources to be rented from the edge and cloud to maximize the revenue of devices.Experimental results show that RLSIOA can obtain higher-quality offloading decision-making schemes at lower latency costs compared to representative benchmark algorithms.
基金funded by the National Key Research and Development Program of China under Grant 2019YFB1803301Beijing Natural Science Foundation (L202002)。
文摘Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks.
基金National Key Research and Development Program of China,Grant/Award Number:2021YFB1714700Postdoctoral Research Foundation of China,Grant/Award Number:2024M752364Postdoctoral Fellowship Program of CPSF,Grant/Award Number:GZB20240525。
文摘This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent policy networks.The graph of welding spots distribution is encoded using the graph attention network.Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks.The policy network is used to convert the large scale welding spots allocation problem to multiple small scale singlerobot welding path planning problems,and the path planning problem is quickly solved through existing methods.Then,the model is trained through reinforcement learning.In addition,the task balancing method is used to allocate tasks to multiple stations.The proposed algorithm is compared with classical algorithms,and the results show that the algorithm based on DRL can produce higher quality solutions.
基金supported by the National Natural Science Foundation of China(62202215)Liaoning Province Applied Basic Research Program(Youth Special Project,2023JH2/101600038)+2 种基金Shenyang Youth Science and Technology Innovation Talent Support Program(RC220458)Guangxuan Program of Shenyang Ligong University(SYLUGXRC202216)the Basic Research Special Funds for Undergraduate Universities in Liaoning Province(LJ212410144067).
文摘The advent of the internet-of-everything era has led to the increased use of mobile edge computing.The rise of artificial intelligence has provided many possibilities for the low-latency task-offloading demands of users,but existing technologies rigidly assume that there is only one task to be offloaded in each time slot at the terminal.In practical scenarios,there are often numerous computing tasks to be executed at the terminal,leading to a cumulative delay for subsequent task offloading.Therefore,the efficient processing of multiple computing tasks on the terminal has become highly challenging.To address the lowlatency offloading requirements for multiple computational tasks on terminal devices,we propose a terminal multitask parallel offloading algorithm based on deep reinforcement learning.Specifically,we first establish a mobile edge computing system model consisting of a single edge server and multiple terminal users.We then model the task offloading decision problem as a Markov decision process,and solve this problem using the Dueling Deep-Q Network algorithm to obtain the optimal offloading strategy.Experimental results demonstrate that,under the same constraints,our proposed algorithm reduces the average system latency.
基金supported by National Natural Science Foundation of China under Grant No.62372110Fujian Provincial Natural Science of Foundation under Grants 2023J02008,2024H0009.
文摘The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.
基金co-supported by the National Natural Science Foundation of China(Nos.61906203,61876187)the National Key Laboratory of Science and Technology on UAV,Northwestern Polytechnical University,China(No.614230110080817)。
文摘Unmanned Aerial Vehicles(UAVs)are useful in dangerous and dynamic tasks such as search-and-rescue,forest surveillance,and anti-terrorist operations.These tasks can be solved better through the collaboration of multiple UAVs under human supervision.However,it is still difficult for human to monitor,understand,predict and control the behaviors of the UAVs due to the task complexity as well as the black-box machine learning and planning algorithms being used.In this paper,the coactive design method is adopted to analyze the cognitive capabilities required for the tasks and design the interdependencies among the heterogeneous teammates of UAVs or human for coherent collaboration.Then,an agent-based task planner is proposed to automatically decompose a complex task into a sequence of explainable subtasks under constrains of resources,execution time,social rules and costs.Besides,a deep reinforcement learning approach is designed for the UAVs to learn optimal policies of a flocking behavior and a path planner that are easy for the human operator to understand and control.Finally,a mixed-initiative action selection mechanism is used to evaluate the learned policies as well as the human’s decisions.Experimental results demonstrate the effectiveness of the proposed methods.
基金supported by the National Natural Science Foundation of China(62032013,62072094Liaoning Province Science and Technology Fund Project(2020MS086)+1 种基金Shenyang Science and Technology Plan Project(20206424)the Fundamental Research Funds for the Central Universities(N2116014,N180101028)CERNET Innovation Project(NGII20190504).
文摘With the arrival of 5G,latency-sensitive applications are becoming increasingly diverse.Mobile Edge Computing(MEC)technology has the characteristics of high bandwidth,low latency and low energy consumption,and has attracted much attention among researchers.To improve the Quality of Service(QoS),this study focuses on computation offloading in MEC.We consider the QoS from the perspective of computational cost,dimensional disaster,user privacy and catastrophic forgetting of new users.The QoS model is established based on the delay and energy consumption and is based on DDQN and a Federated Learning(FL)adaptive task offloading algorithm in MEC.The proposed algorithm combines the QoS model and deep reinforcement learning algorithm to obtain an optimal offloading policy according to the local link and node state information in the channel coherence time to address the problem of time-varying transmission channels and reduce the computing energy consumption and task processing delay.To solve the problems of privacy and catastrophic forgetting,we use FL to make distributed use of multiple users’data to obtain the decision model,protect data privacy and improve the model universality.In the process of FL iteration,the communication delay of individual devices is too large,which affects the overall delay cost.Therefore,we adopt a communication delay optimization algorithm based on the unary outlier detection mechanism to reduce the communication delay of FL.The simulation results indicate that compared with existing schemes,the proposed method significantly reduces the computation cost on a device and improves the QoS when handling complex tasks.
基金the Shanghai Pujiang Program (No.22PJD030),the National Natural Science Foundation of China (Nos.61603244 and 71904116)the National Natural Science Foundation of China-Shandong Joint Fund (No.U2006228)。
文摘The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems.
基金the Project of National Natural Science Foundation of China(Grant No.62106283)the Project of National Natural Science Foundation of China(Grant No.72001214)to provide fund for conducting experimentsthe Project of Natural Science Foundation of Shaanxi Province(Grant No.2020JQ-484)。
文摘The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.
基金Supported by National Natural Science Foundation of China(60474035),National Research Foundation for the Doctoral Program of Higher Education of China(20050359004),Natural Science Foundation of Anhui Province(070412035)
基金supported in part by the National Natural Science Foundation of China under Grants 62201105,62331017,and 62075024in part by the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0404+1 种基金in part by the Chongqing Municipal Education Commission under Grant KJQN202100643in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515110056.
文摘The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC networks can support a wide range of applications. MEC networks can also leverage various types of resources, including computation resources, network resources, radio resources,and location-based resources, to provide multidimensional resources for intelligent applications in 5/6G.However, tasks generated by users often consist of multiple subtasks that require different types of resources. It is a challenging problem to offload multiresource task requests to the edge cloud aiming at maximizing benefits due to the heterogeneity of resources provided by devices. To address this issue,we mathematically model the task requests with multiple subtasks. Then, the problem of task offloading of multi-resource task requests is proved to be NP-hard. Furthermore, we propose a novel Dual-Agent Deep Reinforcement Learning algorithm with Node First and Link features(NF_L_DA_DRL) based on the policy network, to optimize the benefits generated by offloading multi-resource task requests in MEC networks. Finally, simulation results show that the proposed algorithm can effectively improve the benefit of task offloading with higher resource utilization compared with baseline algorithms.
基金National Natural Science Foundation of China(No.11461038)Science and Technology Support Program of Gansu Province(No.144NKCA040)。
文摘To improve the quality of computation experience for mobile devices,mobile edge computing(MEC)is a promising paradigm by providing computing capabilities in close proximity within a sliced radio access network,which supports both traditional communication and MEC services.However,this kind of intensive computing problem is a high dimensional NP hard problem,and some machine learning methods do not have a good effect on solving this problem.In this paper,the Markov decision process model is established to find the excellent task offloading scheme,which maximizes the long-term utility performance,so as to make the best offloading decision according to the queue state,energy queue state and channel quality between mobile users and BS.In order to explore the curse of high dimension in state space,a candidate network is proposed based on edge computing optimize offloading(ECOO)algorithm with the application of deep deterministic policy gradient algorithm.Through simulation experiments,it is proved that the ECOO algorithm is superior to some deep reinforcement learning algorithms in terms of energy consumption and time delay.So the ECOO is good at dealing with high dimensional problems.
基金supported by the Talent Fund of Beijing Jiaotong University(No.2023XKRC028)CCFLenovo Blue Ocean Research Fund and Beijing Natural Science Foundation under Grant(No.L221003).
文摘Vehicular edge computing(VEC)is emerging as a promising solution paradigm to meet the requirements of compute-intensive applications in internet of vehicle(IoV).Non-orthogonal multiple access(NOMA)has advantages in improving spectrum efficiency and dealing with bandwidth scarcity and cost.It is an encouraging progress combining VEC and NOMA.In this paper,we jointly optimize task offloading decision and resource allocation to maximize the service utility of the NOMA-VEC system.To solve the optimization problem,we propose a multiagent deep graph reinforcement learning algorithm.The algorithm extracts the topological features and relationship information between agents from the system state as observations,outputs task offloading decision and resource allocation simultaneously with local policy network,which is updated by a local learner.Simulation results demonstrate that the proposed method achieves a 1.52%∼5.80%improvement compared with the benchmark algorithms in system service utility.
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.
基金funded in part by the Key Research and Promotion Projects of Henan Province under Grant Nos.212102210079,222102210052,222102210007,and 222102210062.
文摘Task scheduling plays a crucial role in cloud computing and is a key factor determining cloud computing performance.To solve the task scheduling problem for remote sensing data processing in cloud computing,this paper proposes a workflow task scheduling algorithm—Workflow Task Scheduling Algorithm based on Deep Reinforcement Learning(WDRL).The remote sensing data process modeling is transformed into a directed acyclic graph scheduling problem.Then,the algorithm is designed by establishing a Markov decision model and adopting a fitness calculation method.Finally,combine the advantages of reinforcement learning and deep neural networks to minimize make-time for remote sensing data processes from experience.The experiment is based on the development of CloudSim and Python and compares the change of completion time in the process of remote sensing data.The results showthat compared with several traditionalmeta-heuristic scheduling algorithms,WDRL can effectively achieve the goal of optimizing task scheduling efficiency.
基金funded by the Science and Technology Foundation of State Grid Corporation of China(Grant No.5108-202218280A-2-397-XG).
文摘This paper focuses on the scheduling problem of workflow tasks that exhibit interdependencies.Unlike indepen-dent batch tasks,workflows typically consist of multiple subtasks with intrinsic correlations and dependencies.It necessitates the distribution of various computational tasks to appropriate computing node resources in accor-dance with task dependencies to ensure the smooth completion of the entire workflow.Workflow scheduling must consider an array of factors,including task dependencies,availability of computational resources,and the schedulability of tasks.Therefore,this paper delves into the distributed graph database workflow task scheduling problem and proposes a workflow scheduling methodology based on deep reinforcement learning(DRL).The method optimizes the maximum completion time(makespan)and response time of workflow tasks,aiming to enhance the responsiveness of workflow tasks while ensuring the minimization of the makespan.The experimental results indicate that the Q-learning Deep Reinforcement Learning(Q-DRL)algorithm markedly diminishes the makespan and refines the average response time within distributed graph database environments.In quantifying makespan,Q-DRL achieves mean reductions of 12.4%and 11.9%over established First-fit and Random scheduling strategies,respectively.Additionally,Q-DRL surpasses the performance of both DRL-Cloud and Improved Deep Q-learning Network(IDQN)algorithms,with improvements standing at 4.4%and 2.6%,respectively.With reference to average response time,the Q-DRL approach exhibits a significantly enhanced performance in the scheduling of workflow tasks,decreasing the average by 2.27%and 4.71%when compared to IDQN and DRL-Cloud,respectively.The Q-DRL algorithm also demonstrates a notable increase in the efficiency of system resource utilization,reducing the average idle rate by 5.02%and 9.30%in comparison to IDQN and DRL-Cloud,respectively.These findings support the assertion that Q-DRL not only upholds a lower average idle rate but also effectively curtails the average response time,thereby substantially improving processing efficiency and optimizing resource utilization within distributed graph database systems.
文摘Traditionally, heuristic re-planning algorithms are used to tackle the problem of dynamic task planning for multiple satellites. However, the traditional heuristic strategies depend on the concrete tasks, which often affect the result’s optimality. Noticing that the historical information of cooperative task planning will impact the latter planning results, we propose a hybrid learning algorithm for dynamic multi-satellite task planning, which is based on the multi-agent reinforcement learning of policy iteration and the transfer learning. The reinforcement learning strategy of each satellite is described with neural networks. The policy neural network individuals with the best topological structure and weights are found by applying co-evolutionary search iteratively. To avoid the failure of the historical learning caused by the randomly occurring observation requests, a novel approach is proposed to balance the quality and efficiency of the task planning, which converts the historical learning strategy to the current initial learning strategy by applying the transfer learning algorithm. The simulations and analysis show the feasibility and adaptability of the proposed approach especially for the situation with randomly occurring observation requests.
基金supported by the Key Research and Development Program of Jiangsu Province of China(No.BE2022157)the National Natural Science Foundation of China(Nos.62303111,62076060,and 61932007)+1 种基金the Defense Industrial Technology Development Program(No.JCKY2021214B002)the Fellowship of China Postdoctoral Science Foundation(No.2022M720715).
文摘Recently,with the increasing complexity of multiplex Unmanned Aerial Vehicles(multi-UAVs)collaboration in dynamic task environments,multi-UAVs systems have shown new characteristics of inter-coupling among multiplex groups and intra-correlation within groups.However,previous studies often overlooked the structural impact of dynamic risks on agents among multiplex UAV groups,which is a critical issue for modern multi-UAVs communication to address.To address this problem,we integrate the influence of dynamic risks on agents among multiplex UAV group structures into a multi-UAVs task migration problem and formulate it as a partially observable Markov game.We then propose a Hybrid Attention Multi-agent Reinforcement Learning(HAMRL)algorithm,which uses attention structures to learn the dynamic characteristics of the task environment,and it integrates hybrid attention mechanisms to establish efficient intra-and inter-group communication aggregation for information extraction and group collaboration.Experimental results show that in this comprehensive and challenging model,our algorithm significantly outperforms state-of-the-art algorithms in terms of convergence speed and algorithm performance due to the rational design of communication mechanisms.
文摘With the rapid development of mobile internet technology and increasing concerns over data privacy,Federated Learning(FL)has emerged as a significant framework for training machine learning models.Given the advancements in technology,User Equipment(UE)can now process multiple computing tasks simultaneously,and since UEs can have multiple data sources that are suitable for various FL tasks,multiple tasks FL could be a promising way to respond to different application requests at the same time.However,running multiple FL tasks simultaneously could lead to a strain on the device’s computation resource and excessive energy consumption,especially the issue of energy consumption challenge.Due to factors such as limited battery capacity and device heterogeneity,UE may fail to efficiently complete the local training task,and some of them may become stragglers with high-quality data.Aiming at alleviating the energy consumption challenge in a multi-task FL environment,we design an automatic Multi-Task FL Deployment(MFLD)algorithm to reach the local balancing and energy consumption goals.The MFLD algorithm leverages Deep Reinforcement Learning(DRL)techniques to automatically select UEs and allocate the computation resources according to the task requirement.Extensive experiments validate our proposed approach and showed significant improvements in task deployment success rate and energy consumption cost.