Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision u...Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.展开更多
The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability an...The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.展开更多
In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Comm...In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Communication Node(PCN).Specifically,all nodes transmit to the UAV exploiting uplink non-Orthogonal Multiple Access(NOMA),while the UAV performs covert transmission to the PCN at the same frequency.To minimize the average age of covert information,we formulate a joint optimization problem of UAV trajectory and power allocation designing subject to multi-dimensional constraints including covertness demand,communication quality requirement,maximum flying speed,and the maximum available resources.To address this problem,we embed Signomial Programming(SP)into Deep Reinforcement Learning(DRL)and propose a DRL framework capable of handling the constrained Markov decision processes,named SP embedded Soft Actor-Critic(SSAC).By adopting SSAC,we achieve the joint optimization of UAV trajectory and power allocation.Our simulations show the optimized UAV trajectory and verify the superiority of SSAC compared with various existing baseline schemes.The results of this study suggest that by maintaining appropriate distances from both the PCN and CNs,one can effectively enhance the performance of covert communication by reducing the detection probability of the CNs.展开更多
This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent...This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent policy networks.The graph of welding spots distribution is encoded using the graph attention network.Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks.The policy network is used to convert the large scale welding spots allocation problem to multiple small scale singlerobot welding path planning problems,and the path planning problem is quickly solved through existing methods.Then,the model is trained through reinforcement learning.In addition,the task balancing method is used to allocate tasks to multiple stations.The proposed algorithm is compared with classical algorithms,and the results show that the algorithm based on DRL can produce higher quality solutions.展开更多
Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mob...Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mobile Edge Computing(MEC)is considered as an efficient solution to provide powerful computing capabilities to latency or energy sensitive nodes.The low-latency and high-reliability requirements of healthcare application services can be met through optimal offloading and resource allocation for the computational tasks of the nodes.In this study,we established a system model consisting of two types of nodes by considering nondivisible and trade-off computational tasks between latency and energy consumption.To minimize processing cost of the system tasks,a Mixed-Integer Nonlinear Programming(MINLP)task offloading problem is proposed.Furthermore,this problem is decomposed into task offloading decisions and resource allocation problems.The resource allocation problem is solved using traditional optimization algorithms,and the offloading decision problem is solved using a deep reinforcement learning algorithm.We propose an Online Offloading based on the Deep Reinforcement Learning(OO-DRL)algorithm with parallel deep neural networks and a weightsensitive experience replay mechanism.Simulation results show that,compared with several existing methods,our proposed algorithm can perform real-time task offloading in a smart healthcare network in dynamically varying environments and reduce the system task processing cost.展开更多
The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant ...The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants.展开更多
Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs...Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%.展开更多
In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapi...In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy.展开更多
This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion...This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks.展开更多
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr...This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.展开更多
Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.Howeve...Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.However,in practice,opportunistic MCS has several challenges from both the perspectives of MCS participants and the data platform.On the one hand,participants face uncertainties in conducting MCS tasks,including their mobility and implicit interactions among participants,and participants’economic returns given by the MCS data platform are determined by not only their own actions but also other participants’strategic actions.On the other hand,the platform can only observe the participants’uploaded sensing data that depends on the unknown effort/action exerted by participants to the platform,while,for optimizing its overall objective,the platform needs to properly reward certain participants for incentivizing them to provide high-quality data.To address the challenge of balancing individual incentives and platform objectives in MCS,this paper proposes MARCS,an online sensing policy based on multi-agent deep reinforcement learning(MADRL)with centralized training and decentralized execution(CTDE).Specifically,the interactions between MCS participants and the data platform are modeled as a partially observable Markov game,where participants,acting as agents,use DRL-based policies to make decisions based on local observations,such as task trajectories and platform payments.To align individual and platform goals effectively,the platform leverages Shapley value to estimate the contribution of each participant’s sensed data,using these estimates as immediate rewards to guide agent training.The experimental results on real mobility trajectory datasets indicate that the revenue of MARCS reaches almost 35%,53%,and 100%higher than DDPG,Actor-Critic,and model predictive control(MPC)respectively on the participant side and similar results on the platform side,which show superior performance compared to baselines.展开更多
With the development of the future Web of Healthcare Things(WoHT),there will be a trend of densely deploying medical sensors with massive simultaneous online communication requirements.The dense deployment and simulta...With the development of the future Web of Healthcare Things(WoHT),there will be a trend of densely deploying medical sensors with massive simultaneous online communication requirements.The dense deployment and simultaneous online communication of massive medical sensors will inevitably generate overlapping interference.This will be extremely challenging to support data transmission at the medical-grade quality of service level.To handle the challenge,this paper proposes a hypergraph interference coordination-aided resource allocation based on the Deep Reinforcement Learning(DRL)method.Specifically,we build a novel hypergraph interference model for the considered WoHT by analyzing the impact of the overlapping interference.Due to the high complexity of directly solving the hypergraph interference model,the original resource allocation problem is converted into a sequential decision-making problem through the Markov Decision Process(MDP)modeling method.Then,a policy and value-based resource allocation algorithm is proposed to solve this problem under simultaneous online communication and dense deployment.In addition,to enhance the exploration ability of the optimal allocation strategy for the agent,we propose a resource allocation algorithm with an asynchronous parallel architecture.Simulation results verify that the proposed algorithms can achieve higher network throughput than the existing algorithms in the considered WoHT scenario.展开更多
Despite its immense potential,the application of digital twin technology in real industrial scenarios still faces numerous challenges.This study focuses on industrial assembly lines in sectors such as microelectronics...Despite its immense potential,the application of digital twin technology in real industrial scenarios still faces numerous challenges.This study focuses on industrial assembly lines in sectors such as microelectronics,pharmaceuticals,and food packaging,where precision and speed are paramount,applying digital twin technology to the robotic assembly process.The innovation of this research lies in the development of a digital twin architecture and system for Delta robots that is suitable for real industrial environments.Based on this system,a deep reinforcement learning algorithm for obstacle avoidance path planning in Delta robots has been developed,significantly enhancing learning efficiency through an improved intermediate reward mechanism.Experiments on communication and interaction between the digital twin system and the physical robot validate the effectiveness of this method.The system not only enhances the integration of digital twin technology,deep reinforcement learning and robotics,offering an efficient solution for path planning and target grasping inDelta robots,but also underscores the transformative potential of digital twin technology in intelligent manufacturing,with extensive applicability across diverse industrial domains.展开更多
In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordinat...In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordination control strategy under a single objective and simple conditions is difficult to adapt to the multi-dimensional state variables’input.In this paper,we propose a deep reinforcement learning algorithm-based multi-objective parallel human-machine steering coordination strategy for path tracking considering driver misoperation and external disturbance.Firstly,the driver steering mathematical model is constructed based on the driver preview characteristics and steering delay response,and the driver characteristic parameters are fitted after collecting the actual driver driving data.Secondly,considering that the vehicle is susceptible to the influence of external disturbances during the driving process,the Tube MPC(Tube Model Predictive Control)based path tracking steering controller is designed based on the vehicle system dynamics error model.After verifying that the driver steering model meets the driver steering operation characteristics,DQN(Deep Q-network),DDPG(Deep Deterministic Policy Gradient)and TD3(Twin Delayed Deep Deterministic Policy Gradient)deep reinforcement learning algorithms are utilized to design a multi-objective parallel steering coordination strategy which satisfies the multi-dimensional state variables’input of the vehicle.Finally,the tracking accuracy,lateral safety,human-machine conflict and driver steering load evaluation index are designed in different driver operation states and different road environments,and the performance of the parallel steering coordination control strategies with different deep reinforcement learning algorithms and fuzzy algorithms are compared by simulations and hardware in the loop experiments.The results show that the parallel steering collaborative strategy based on a deep reinforcement learning algorithm can more effectively assist the driver in tracking the target path under lateral wind interference and driver misoperation,and the TD3-based coordination control strategy has better overall performance.展开更多
In Heterogeneous Vehicle-to-Everything Networks(HVNs),multiple users such as vehicles and handheld devices and infrastructure can communicate with each other to obtain more advanced services.However,the increasing num...In Heterogeneous Vehicle-to-Everything Networks(HVNs),multiple users such as vehicles and handheld devices and infrastructure can communicate with each other to obtain more advanced services.However,the increasing number of entities accessing HVNs presents a huge technical challenge to allocate the limited wireless resources.Traditional model-driven resource allocation approaches are no longer applicable because of rich data and the interference problem of multiple communication modes reusing resources in HVNs.In this paper,we investigate a wireless resource allocation scheme including power control and spectrum allocation based on the resource block reuse strategy.To meet the high capacity of cellular users and the high reliability of Vehicle-to-Vehicle(V2V)user pairs,we propose a data-driven Multi-Agent Deep Reinforcement Learning(MADRL)resource allocation scheme for the HVN.Simulation results demonstrate that compared to existing algorithms,the proposed MADRL-based scheme achieves a high sum capacity and probability of successful V2V transmission,while providing close-to-limit performance.展开更多
Underwater wireless sensor networks(UWSNs)have emerged as a new paradigm of real-time organized systems,which are utilized in a diverse array of scenarios to manage the underwater environment surrounding them.One of t...Underwater wireless sensor networks(UWSNs)have emerged as a new paradigm of real-time organized systems,which are utilized in a diverse array of scenarios to manage the underwater environment surrounding them.One of the major challenges that these systems confront is topology control via clustering,which reduces the overload of wireless communications within a network and ensures low energy consumption and good scalability.This study aimed to present a clustering technique in which the clustering process and cluster head(CH)selection are performed based on the Markov decision process and deep reinforcement learning(DRL).DRL algorithm selects the CH by maximizing the defined reward function.Subsequently,the sensed data are collected by the CHs and then sent to the autonomous underwater vehicles.In the final phase,the consumed energy by each sensor is calculated,and its residual energy is updated.Then,the autonomous underwater vehicle performs all clustering and CH selection operations.This procedure persists until the point of cessation when the sensor’s power has been reduced to such an extent that no node can become a CH.Through analysis of the findings from this investigation and their comparison with alternative frameworks,the implementation of this method can be used to control the cluster size and the number of CHs,which ultimately augments the energy usage of nodes and prolongs the lifespan of the network.Our simulation results illustrate that the suggested methodology surpasses the conventional low-energy adaptive clustering hierarchy,the distance-and energy-constrained K-means clustering scheme,and the vector-based forward protocol and is viable for deployment in an actual operational environment.展开更多
The advent of the internet-of-everything era has led to the increased use of mobile edge computing.The rise of artificial intelligence has provided many possibilities for the low-latency task-offloading demands of use...The advent of the internet-of-everything era has led to the increased use of mobile edge computing.The rise of artificial intelligence has provided many possibilities for the low-latency task-offloading demands of users,but existing technologies rigidly assume that there is only one task to be offloaded in each time slot at the terminal.In practical scenarios,there are often numerous computing tasks to be executed at the terminal,leading to a cumulative delay for subsequent task offloading.Therefore,the efficient processing of multiple computing tasks on the terminal has become highly challenging.To address the lowlatency offloading requirements for multiple computational tasks on terminal devices,we propose a terminal multitask parallel offloading algorithm based on deep reinforcement learning.Specifically,we first establish a mobile edge computing system model consisting of a single edge server and multiple terminal users.We then model the task offloading decision problem as a Markov decision process,and solve this problem using the Dueling Deep-Q Network algorithm to obtain the optimal offloading strategy.Experimental results demonstrate that,under the same constraints,our proposed algorithm reduces the average system latency.展开更多
This paper investigates a wireless powered communication network(WPCN)facilitated by an unmanned aerial vehicle(UAV)in Internet of Things(IoT)networks,where multiple IoT devices(IoTDs)gather energy from a terrestrial ...This paper investigates a wireless powered communication network(WPCN)facilitated by an unmanned aerial vehicle(UAV)in Internet of Things(IoT)networks,where multiple IoT devices(IoTDs)gather energy from a terrestrial energy station(ES)during the wireless energy transfer(WET)stage,followed by the UAV collecting data from these powered IoTDs with the time division multiple access(TDMA)protocol in the wireless information transfer(WIT)stage.To overcome the challenges of radio propagation caused by obstructions,we incorporate a reconfigurable intelligent surface(RIS)to enhance the link quality of the ES-IoTDs and IoTDs-UAV.The primary objective is to maximize the average sum rate of all IoTDs by jointly optimizing UAV trajectory,ES transmit power,and RIS phase shifts,along with the time allocation for WET and WIT.To this end,we reformulate the optimization problem as a markov decision process(MDP)and introduce a deep reinforcement learning(DRL)approach for addressing the formulated problem,called the proximal policy optimization(PPO)based energy harvesting with trajectory design and phase shift optimization(PPO-EHTDPS)algorithm.By continuously exploring within the environment,the PPO algorithm refines its policy to optimize the UAV trajectory,the energy phase shifts,ES transmit power,and WET/WIT time allocation.Additionally,a continuous phase shift optimization algorithm is employed to determine the information phase shifts for each IoTD to maximize average sum rate.Finally,numerical results demonstrate that the proposed PPOEHTDPS algorithm can significantly achieve higher average sum rate and show better convergence performance over the benchmark algorithms.展开更多
Most blind image quality assessment(BIQA)methods require a large amount of time to collect human opinion scores as training labels,which limits their usability in practice.Thus,we present an opinion-unaware BIQA metho...Most blind image quality assessment(BIQA)methods require a large amount of time to collect human opinion scores as training labels,which limits their usability in practice.Thus,we present an opinion-unaware BIQA method based on deep reinforcement learning which is trained without subjective scores,named DRL-IQA.Inspired by the human visual perception process,our model is formulated as a quality reinforced agent,which consists of the dynamic distortion generation part and the quality perception part.By considering the image distortion degradation process as a sequential decision-making process,the dynamic distortion generation part can develop a strategy to add as many different distortions as possible to an image,which enriches the distortion space to alleviate overfitting.A reward function calculated from quality degradation after adding distortion is utilized to continuously optimize the strategy.Furthermore,the quality perception part can extract rich quality features from the quality degradation process without using subjective scores,and accurately predict the state values that represent the image quality.Experimental results reveal that our method achieves competitive quality prediction performance compared to other state-of-the-art BIQA methods.展开更多
The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multi...The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems.展开更多
基金co-supported by the National Natural Science Foundation of China(No.62103432)the China Postdoctoral Science Foundation(No.284881)the Young Talent fund of University Association for Science and Technology in Shaanxi,China(No.20210108)。
文摘Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.
基金supported by National Natural Science Foundation of China under Grant No.62372110Fujian Provincial Natural Science of Foundation under Grants 2023J02008,2024H0009.
文摘The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.
基金This study was co-supported by the National Natural Science Foundation of China(No.62025110&62271093)the Natural Science Foundation of Chongqing,China(No.CSTB2023NSCQ-LZX0108).
文摘In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Communication Node(PCN).Specifically,all nodes transmit to the UAV exploiting uplink non-Orthogonal Multiple Access(NOMA),while the UAV performs covert transmission to the PCN at the same frequency.To minimize the average age of covert information,we formulate a joint optimization problem of UAV trajectory and power allocation designing subject to multi-dimensional constraints including covertness demand,communication quality requirement,maximum flying speed,and the maximum available resources.To address this problem,we embed Signomial Programming(SP)into Deep Reinforcement Learning(DRL)and propose a DRL framework capable of handling the constrained Markov decision processes,named SP embedded Soft Actor-Critic(SSAC).By adopting SSAC,we achieve the joint optimization of UAV trajectory and power allocation.Our simulations show the optimized UAV trajectory and verify the superiority of SSAC compared with various existing baseline schemes.The results of this study suggest that by maintaining appropriate distances from both the PCN and CNs,one can effectively enhance the performance of covert communication by reducing the detection probability of the CNs.
基金National Key Research and Development Program of China,Grant/Award Number:2021YFB1714700Postdoctoral Research Foundation of China,Grant/Award Number:2024M752364Postdoctoral Fellowship Program of CPSF,Grant/Award Number:GZB20240525。
文摘This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent policy networks.The graph of welding spots distribution is encoded using the graph attention network.Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks.The policy network is used to convert the large scale welding spots allocation problem to multiple small scale singlerobot welding path planning problems,and the path planning problem is quickly solved through existing methods.Then,the model is trained through reinforcement learning.In addition,the task balancing method is used to allocate tasks to multiple stations.The proposed algorithm is compared with classical algorithms,and the results show that the algorithm based on DRL can produce higher quality solutions.
基金supported in part by the National Natural Science Foundation of China under Grant 62371181in part by the Changzhou Science and Technology International Cooperation Program under Grant CZ20230029+1 种基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government.(MSIT)(2021R1A2B5B02087169)supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(RS-202300259004)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)。
文摘Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mobile Edge Computing(MEC)is considered as an efficient solution to provide powerful computing capabilities to latency or energy sensitive nodes.The low-latency and high-reliability requirements of healthcare application services can be met through optimal offloading and resource allocation for the computational tasks of the nodes.In this study,we established a system model consisting of two types of nodes by considering nondivisible and trade-off computational tasks between latency and energy consumption.To minimize processing cost of the system tasks,a Mixed-Integer Nonlinear Programming(MINLP)task offloading problem is proposed.Furthermore,this problem is decomposed into task offloading decisions and resource allocation problems.The resource allocation problem is solved using traditional optimization algorithms,and the offloading decision problem is solved using a deep reinforcement learning algorithm.We propose an Online Offloading based on the Deep Reinforcement Learning(OO-DRL)algorithm with parallel deep neural networks and a weightsensitive experience replay mechanism.Simulation results show that,compared with several existing methods,our proposed algorithm can perform real-time task offloading in a smart healthcare network in dynamically varying environments and reduce the system task processing cost.
基金supported by the National Key Research and Development Program of China,Grant No.2020YFB0905900.
文摘The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.72101046 and 61672128)。
文摘Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%.
基金supported by the Basic Scientific Research Business Fund Project of Higher Education Institutions in Heilongjiang Province(145409601)the First Batch of Experimental Teaching and Teaching Laboratory Construction Research Projects in Heilongjiang Province(SJGZ20240038).
文摘In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy.
基金supported by the National Science and Technology Council,Taiwan[Grant NSTC 111-2628-E-006-005-MY3]supported by the Ocean Affairs Council,Taiwansponsored in part by Higher Education Sprout Project,Ministry of Education to the Headquarters of University Advancement at National Cheng Kung University(NCKU).
文摘This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks.
基金partially supported by the National Key Research and Development Program of the Ministry of Science and Technology of China(2022YFE0114200)the National Natural Science Foundation of China(U20A6004).
文摘This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.
基金sponsored by Qinglan Project of Jiangsu Province,and Jiangsu Provincial Key Research and Development Program(No.BE2020084-1).
文摘Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.However,in practice,opportunistic MCS has several challenges from both the perspectives of MCS participants and the data platform.On the one hand,participants face uncertainties in conducting MCS tasks,including their mobility and implicit interactions among participants,and participants’economic returns given by the MCS data platform are determined by not only their own actions but also other participants’strategic actions.On the other hand,the platform can only observe the participants’uploaded sensing data that depends on the unknown effort/action exerted by participants to the platform,while,for optimizing its overall objective,the platform needs to properly reward certain participants for incentivizing them to provide high-quality data.To address the challenge of balancing individual incentives and platform objectives in MCS,this paper proposes MARCS,an online sensing policy based on multi-agent deep reinforcement learning(MADRL)with centralized training and decentralized execution(CTDE).Specifically,the interactions between MCS participants and the data platform are modeled as a partially observable Markov game,where participants,acting as agents,use DRL-based policies to make decisions based on local observations,such as task trajectories and platform payments.To align individual and platform goals effectively,the platform leverages Shapley value to estimate the contribution of each participant’s sensed data,using these estimates as immediate rewards to guide agent training.The experimental results on real mobility trajectory datasets indicate that the revenue of MARCS reaches almost 35%,53%,and 100%higher than DDPG,Actor-Critic,and model predictive control(MPC)respectively on the participant side and similar results on the platform side,which show superior performance compared to baselines.
基金supported in part by the National Natural Science Foundation of China under Grant No.62301094in part by the Researchers Supporting Project Number(RSPD2024R681)King Saud University,Riyadh,Saudi Arabia,in part by the Science and Technology Research Program of the Chongqing Education Commission of China under Grants KJQN202201157 and KJQN202301135.
文摘With the development of the future Web of Healthcare Things(WoHT),there will be a trend of densely deploying medical sensors with massive simultaneous online communication requirements.The dense deployment and simultaneous online communication of massive medical sensors will inevitably generate overlapping interference.This will be extremely challenging to support data transmission at the medical-grade quality of service level.To handle the challenge,this paper proposes a hypergraph interference coordination-aided resource allocation based on the Deep Reinforcement Learning(DRL)method.Specifically,we build a novel hypergraph interference model for the considered WoHT by analyzing the impact of the overlapping interference.Due to the high complexity of directly solving the hypergraph interference model,the original resource allocation problem is converted into a sequential decision-making problem through the Markov Decision Process(MDP)modeling method.Then,a policy and value-based resource allocation algorithm is proposed to solve this problem under simultaneous online communication and dense deployment.In addition,to enhance the exploration ability of the optimal allocation strategy for the agent,we propose a resource allocation algorithm with an asynchronous parallel architecture.Simulation results verify that the proposed algorithms can achieve higher network throughput than the existing algorithms in the considered WoHT scenario.
基金supported in part by the National Natural Science Foundation of China under Grants 62303098 and 62173073in part by China Postdoctoral Science Foundation under Grant 2022M720679+1 种基金in part by the Central University Basic Research Fund of China under Grant N2304021in part by the Liaoning Provincial Science and Technology Plan Project-Technology Innovation Guidance of the Science and Technology Department under Grant 2023JH1/10400011.
文摘Despite its immense potential,the application of digital twin technology in real industrial scenarios still faces numerous challenges.This study focuses on industrial assembly lines in sectors such as microelectronics,pharmaceuticals,and food packaging,where precision and speed are paramount,applying digital twin technology to the robotic assembly process.The innovation of this research lies in the development of a digital twin architecture and system for Delta robots that is suitable for real industrial environments.Based on this system,a deep reinforcement learning algorithm for obstacle avoidance path planning in Delta robots has been developed,significantly enhancing learning efficiency through an improved intermediate reward mechanism.Experiments on communication and interaction between the digital twin system and the physical robot validate the effectiveness of this method.The system not only enhances the integration of digital twin technology,deep reinforcement learning and robotics,offering an efficient solution for path planning and target grasping inDelta robots,but also underscores the transformative potential of digital twin technology in intelligent manufacturing,with extensive applicability across diverse industrial domains.
基金Supported by National Natural Science Foundation of China(Grant Nos.U22A20246,52372382)Hefei Municipal Natural Science Foundation(Grant No.2022008)+1 种基金the Open Fund of State Key Laboratory of Mechanical Behavior and System Safety of Traffic Engineering Structures(Grant No.KF2023-06)S&T Program of Hebei(Grant No.225676162GH).
文摘In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordination control strategy under a single objective and simple conditions is difficult to adapt to the multi-dimensional state variables’input.In this paper,we propose a deep reinforcement learning algorithm-based multi-objective parallel human-machine steering coordination strategy for path tracking considering driver misoperation and external disturbance.Firstly,the driver steering mathematical model is constructed based on the driver preview characteristics and steering delay response,and the driver characteristic parameters are fitted after collecting the actual driver driving data.Secondly,considering that the vehicle is susceptible to the influence of external disturbances during the driving process,the Tube MPC(Tube Model Predictive Control)based path tracking steering controller is designed based on the vehicle system dynamics error model.After verifying that the driver steering model meets the driver steering operation characteristics,DQN(Deep Q-network),DDPG(Deep Deterministic Policy Gradient)and TD3(Twin Delayed Deep Deterministic Policy Gradient)deep reinforcement learning algorithms are utilized to design a multi-objective parallel steering coordination strategy which satisfies the multi-dimensional state variables’input of the vehicle.Finally,the tracking accuracy,lateral safety,human-machine conflict and driver steering load evaluation index are designed in different driver operation states and different road environments,and the performance of the parallel steering coordination control strategies with different deep reinforcement learning algorithms and fuzzy algorithms are compared by simulations and hardware in the loop experiments.The results show that the parallel steering collaborative strategy based on a deep reinforcement learning algorithm can more effectively assist the driver in tracking the target path under lateral wind interference and driver misoperation,and the TD3-based coordination control strategy has better overall performance.
基金funded in part by the National Key Research and Development of China Project (2020YFB1807204)in part by National Natural Science Foundation of China (U2001213 and 61971191)+1 种基金in part by the Beijing Natural Science Foundation under Grant L201011in part by the key project of Natural Science Foundation of Jiangxi Province (20202ACBL202006)。
文摘In Heterogeneous Vehicle-to-Everything Networks(HVNs),multiple users such as vehicles and handheld devices and infrastructure can communicate with each other to obtain more advanced services.However,the increasing number of entities accessing HVNs presents a huge technical challenge to allocate the limited wireless resources.Traditional model-driven resource allocation approaches are no longer applicable because of rich data and the interference problem of multiple communication modes reusing resources in HVNs.In this paper,we investigate a wireless resource allocation scheme including power control and spectrum allocation based on the resource block reuse strategy.To meet the high capacity of cellular users and the high reliability of Vehicle-to-Vehicle(V2V)user pairs,we propose a data-driven Multi-Agent Deep Reinforcement Learning(MADRL)resource allocation scheme for the HVN.Simulation results demonstrate that compared to existing algorithms,the proposed MADRL-based scheme achieves a high sum capacity and probability of successful V2V transmission,while providing close-to-limit performance.
文摘Underwater wireless sensor networks(UWSNs)have emerged as a new paradigm of real-time organized systems,which are utilized in a diverse array of scenarios to manage the underwater environment surrounding them.One of the major challenges that these systems confront is topology control via clustering,which reduces the overload of wireless communications within a network and ensures low energy consumption and good scalability.This study aimed to present a clustering technique in which the clustering process and cluster head(CH)selection are performed based on the Markov decision process and deep reinforcement learning(DRL).DRL algorithm selects the CH by maximizing the defined reward function.Subsequently,the sensed data are collected by the CHs and then sent to the autonomous underwater vehicles.In the final phase,the consumed energy by each sensor is calculated,and its residual energy is updated.Then,the autonomous underwater vehicle performs all clustering and CH selection operations.This procedure persists until the point of cessation when the sensor’s power has been reduced to such an extent that no node can become a CH.Through analysis of the findings from this investigation and their comparison with alternative frameworks,the implementation of this method can be used to control the cluster size and the number of CHs,which ultimately augments the energy usage of nodes and prolongs the lifespan of the network.Our simulation results illustrate that the suggested methodology surpasses the conventional low-energy adaptive clustering hierarchy,the distance-and energy-constrained K-means clustering scheme,and the vector-based forward protocol and is viable for deployment in an actual operational environment.
基金supported by the National Natural Science Foundation of China(62202215)Liaoning Province Applied Basic Research Program(Youth Special Project,2023JH2/101600038)+2 种基金Shenyang Youth Science and Technology Innovation Talent Support Program(RC220458)Guangxuan Program of Shenyang Ligong University(SYLUGXRC202216)the Basic Research Special Funds for Undergraduate Universities in Liaoning Province(LJ212410144067).
文摘The advent of the internet-of-everything era has led to the increased use of mobile edge computing.The rise of artificial intelligence has provided many possibilities for the low-latency task-offloading demands of users,but existing technologies rigidly assume that there is only one task to be offloaded in each time slot at the terminal.In practical scenarios,there are often numerous computing tasks to be executed at the terminal,leading to a cumulative delay for subsequent task offloading.Therefore,the efficient processing of multiple computing tasks on the terminal has become highly challenging.To address the lowlatency offloading requirements for multiple computational tasks on terminal devices,we propose a terminal multitask parallel offloading algorithm based on deep reinforcement learning.Specifically,we first establish a mobile edge computing system model consisting of a single edge server and multiple terminal users.We then model the task offloading decision problem as a Markov decision process,and solve this problem using the Dueling Deep-Q Network algorithm to obtain the optimal offloading strategy.Experimental results demonstrate that,under the same constraints,our proposed algorithm reduces the average system latency.
文摘This paper investigates a wireless powered communication network(WPCN)facilitated by an unmanned aerial vehicle(UAV)in Internet of Things(IoT)networks,where multiple IoT devices(IoTDs)gather energy from a terrestrial energy station(ES)during the wireless energy transfer(WET)stage,followed by the UAV collecting data from these powered IoTDs with the time division multiple access(TDMA)protocol in the wireless information transfer(WIT)stage.To overcome the challenges of radio propagation caused by obstructions,we incorporate a reconfigurable intelligent surface(RIS)to enhance the link quality of the ES-IoTDs and IoTDs-UAV.The primary objective is to maximize the average sum rate of all IoTDs by jointly optimizing UAV trajectory,ES transmit power,and RIS phase shifts,along with the time allocation for WET and WIT.To this end,we reformulate the optimization problem as a markov decision process(MDP)and introduce a deep reinforcement learning(DRL)approach for addressing the formulated problem,called the proximal policy optimization(PPO)based energy harvesting with trajectory design and phase shift optimization(PPO-EHTDPS)algorithm.By continuously exploring within the environment,the PPO algorithm refines its policy to optimize the UAV trajectory,the energy phase shifts,ES transmit power,and WET/WIT time allocation.Additionally,a continuous phase shift optimization algorithm is employed to determine the information phase shifts for each IoTD to maximize average sum rate.Finally,numerical results demonstrate that the proposed PPOEHTDPS algorithm can significantly achieve higher average sum rate and show better convergence performance over the benchmark algorithms.
基金supported by the Fundamental Research Funds for the Central Universities.
文摘Most blind image quality assessment(BIQA)methods require a large amount of time to collect human opinion scores as training labels,which limits their usability in practice.Thus,we present an opinion-unaware BIQA method based on deep reinforcement learning which is trained without subjective scores,named DRL-IQA.Inspired by the human visual perception process,our model is formulated as a quality reinforced agent,which consists of the dynamic distortion generation part and the quality perception part.By considering the image distortion degradation process as a sequential decision-making process,the dynamic distortion generation part can develop a strategy to add as many different distortions as possible to an image,which enriches the distortion space to alleviate overfitting.A reward function calculated from quality degradation after adding distortion is utilized to continuously optimize the strategy.Furthermore,the quality perception part can extract rich quality features from the quality degradation process without using subjective scores,and accurately predict the state values that represent the image quality.Experimental results reveal that our method achieves competitive quality prediction performance compared to other state-of-the-art BIQA methods.
基金the Shanghai Pujiang Program (No.22PJD030),the National Natural Science Foundation of China (Nos.61603244 and 71904116)the National Natural Science Foundation of China-Shandong Joint Fund (No.U2006228)。
文摘The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems.