The generation of synthetic trajectories has become essential in various fields for analyzing complex movement patterns.However,the use of real-world trajectory data poses significant privacy risks,such as location re...The generation of synthetic trajectories has become essential in various fields for analyzing complex movement patterns.However,the use of real-world trajectory data poses significant privacy risks,such as location reidentification and correlation attacks.To address these challenges,privacy-preserving trajectory generation methods are critical for applications relying on sensitive location data.This paper introduces DPIL-Traj,an advanced framework designed to generate synthetic trajectories while achieving a superior balance between data utility and privacy preservation.Firstly,the framework incorporates Differential Privacy Clustering,which anonymizes trajectory data by applying differential privacy techniques that add noise,ensuring the protection of sensitive user information.Secondly,Imitation Learning is used to replicate decision-making behaviors observed in real-world trajectories.By learning from expert trajectories,this component generates synthetic data that closely mimics real-world decision-making processes while optimizing the quality of the generated trajectories.Finally,Markov-based Trajectory Generation is employed to capture and maintain the inherent temporal dynamics of movement patterns.Extensive experiments conducted on the GeoLife trajectory dataset show that DPIL-Traj improves utility performance by an average of 19.85%,and in terms of privacy performance by an average of 12.51%,compared to state-of-the-art approaches.Ablation studies further reveal that DP clustering effectively safeguards privacy,imitation learning enhances utility under noise,and the Markov module strengthens temporal coherence.展开更多
Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most ...Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most promising directions of development.This paper proposes an optimized schedulingmodel for a hydrogen-coupled electro-heat-gas integrated energy system(HCEHG-IES)using generative adversarial imitation learning(GAIL).The model aims to enhance renewable-energy absorption,reduce carbon emissions,and improve grid-regulation flexibility.First,the optimal scheduling problem of HCEHG-IES under uncertainty is modeled as a Markov decision process(MDP).To overcome the limitations of conventional deep reinforcement learning algorithms—including long optimization time,slow convergence,and subjective reward design—this study augments the PPO algorithm by incorporating a discriminator network and expert data.The newly developed algorithm,termed GAIL,enables the agent to perform imitation learning from expert data.Based on this model,dynamic scheduling decisions are made in continuous state and action spaces,generating optimal energy-allocation and management schemes.Simulation results indicate that,compared with traditional reinforcement-learning algorithms,the proposed algorithmoffers better economic performance.Guided by expert data,the agent avoids blind optimization,shortens the offline training time,and improves convergence performance.In the online phase,the algorithm enables flexible energy utilization,thereby promoting renewable-energy absorption and reducing carbon emissions.展开更多
Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high c...Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high computational overhead.This study proposes a lightweight integrated framework for grasp detection and imitation learning,named GD-IL;it comprises a grasp detection algorithm based on manipulability and Gaussian mixture model(manipulability-GMM),and a grasp trajectory generation algorithm based on a two-stage robot imitation learning algorithm(TS-RIL).In the manipulability-GMM algorithm,we apply GMM clustering and ellipse regression to the object point cloud,propose two judgment criteria to generate multiple candidate grasp bounding boxes for the robot,and use manipulability as a metric for selecting the optimal grasp bounding box.The stages of the TS-RIL algorithm are grasp trajectory learning and robot pose optimization.In the first stage,the robot grasp trajectory is characterized using a second-order dynamic movement primitive model and Gaussian mixture regression(GMM).By adjusting the function form of the forcing term,the robot closely approximates the target-grasping trajectory.In the second stage,a robot pose optimization model is built based on the derived pose error formula and manipulability metric.This model allows the robot to adjust its configuration in real time while grasping,thereby effectively avoiding singularities.Finally,an algorithm verification platform is developed based on a Robot Operating System and a series of comparative experiments are conducted in real-world scenarios.The experimental results demonstrate that GD-IL significantly improves the effectiveness and robustness of grasp detection and trajectory imitation learning,outperforming existing state-of-the-art methods in execution efficiency,manipulability,and success rate.展开更多
Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and th...Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and their service time powered by rechargeable batteries.In addition,Orthogonal Multiple Access(OMA)technique cannot utilize limited spectrum resources fully and efficiently.Therefore,Non-Orthogonal Multiple Access(NOMA)-based energy-efficient task scheduling among MEC servers for delay-constraint mobile applications is important,especially in highly-dynamic vehicular edge computing networks.The various movement patterns of vehicles lead to unbalanced offloading requirements and different load pressure for MEC servers.Self-Imitation Learning(SIL)-based Deep Reinforcement Learning(DRL)has emerged as a promising machine learning technique to break through obstacles in various research fields,especially in time-varying networks.In this paper,we first introduce related MEC technologies in vehicular networks.Then,we propose an energy-efficient approach for task scheduling in vehicular edge computing networks based on DRL,with the purpose of both guaranteeing the task latency requirement for multiple users and minimizing total energy consumption of MEC servers.Numerical results demonstrate that the proposed algorithm outperforms other methods.展开更多
Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly acc...Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly accomplish driving functions. Speaking of machine vision, deep learning techniques, and specifically convolutional neural networks, have been proven to be the state of the art technology in the field. As these networks typically involve millions of parameters and elements, designing an optimal architecture for deep learning structures is a difficult task which is globally under investigation by researchers. This study experimentally evaluates the impact of three major architectural properties of convolutional networks, including the number of layers, filters, and filter size on their performance. In this study, several models with different properties are developed,equally trained, and then applied to an autonomous car in a realistic simulation environment. A new ensemble approach is also proposed to calculate and update weights for the models regarding their mean squared error values. Based on design properties,performance results are reported and compared for further investigations. Surprisingly, the number of filters itself does not largely affect the performance efficiency. As a result, proper allocation of filters with different kernel sizes through the layers introduces a considerable improvement in the performance.Achievements of this study will provide the researchers with a clear clue and direction in designing optimal network architectures for deep learning purposes.展开更多
This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to fi...This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.展开更多
Biomimetic grasping is crucial for robots to interact with the environment and perform complex tasks,making it a key focus in robotics and embodied intelligence.However,achieving human-level finger coordination and fo...Biomimetic grasping is crucial for robots to interact with the environment and perform complex tasks,making it a key focus in robotics and embodied intelligence.However,achieving human-level finger coordination and force control remains challenging due to the need for multimodal perception,including visual,kinesthetic,and tactile feedback.Although some recent approaches have demonstrated remarkable performance in grasping diverse objects,they often rely on expensive tactile sensors or are restricted to rigid objects.To address these challenges,we introduce SoftGrasp,a novel multimodal imitation learning approach for adaptive,multi-stage grasping of objects with varying sizes,shapes,and hardness.First,we develop an immersive demonstration platform with force feedback to collect rich,human-like grasping datasets.Inspired by human proprioceptive manipulation,this platform gathers multimodal signals,including visual images,robot finger joint angles,and joint torques,during demonstrations.Next,we utilize a multi-head attention mechanism to align and integrate multimodal features,dynamically allocating attention to ensure comprehensive learning.On this basis,we design a behavior cloning method based on an angle-torque loss function,enabling multimodal imitation learning.Finally,we validate SoftGrasp in extensive experiments across various scenarios,demonstrating its ability to adaptively adjust joint forces and finger angles based on real-time inputs.These capabilities result in a 98%success rate in real-world experiments,achieving dexterous and stable grasping.Source code and demonstration videos are available at https://github.com/nubot-nudt/SoftGrasp.展开更多
This study focuses on enhancing the evasion capabilities of unmanned ground vehicles(UGVs)using Generative Adversarial Imitation Learning(GAIL).The UGVs are trained to evade unmanned aerial vehicles(UAVs).A decision-m...This study focuses on enhancing the evasion capabilities of unmanned ground vehicles(UGVs)using Generative Adversarial Imitation Learning(GAIL).The UGVs are trained to evade unmanned aerial vehicles(UAVs).A decision-making neural network has been trained via GAIL to refine evasion strategies with expert demonstrations.The simulation environment was developed with OpenAI Gym and calibrated with real-world data for the improvement of accuracy.The integrated platform including the proposed algorithm was tested in flight experiments.Results showed that the UGVs could effectively evade UAVs in the complex and dynamic environment.展开更多
We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instanc...We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instances and labels yield to various extents of difficulty and the gains and penalties(rewards)are expected to be diverse.We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth(expert)and the extractor(agent).Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods.展开更多
The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms...The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms of operation uncertainties,an imitation learning based real-time decision-mak-ing solution for microgrid economic dispatch is proposed.In this solution,the optimal dispatch trajectories obtained by solving the optimal problem using historical deterministic operation patterns are demonstrated as the expert samples for imitation learning.To improve the generalization performance of imitation learning and the expressive ability of uncertain variables,a hybrid model combining the unsupervised and supervised learning is utilized.The denoising autoencoder based unsupervised learning model is adopted to enhance the feature extraction of operation patterns.Furthermore,the long short-term memory network based supervised learning model is used to efficiently characterize the mapping between the input space composed of the extracted operation patterns and system state variables and the output space composed of the optimal dispatch trajectories.The numerical simulation results demonstrate that under various operation uncertainties,the operation cost achieved by the proposed solution is close to the minimum theoretical value.Compared with the traditional model predictive control method and basic clone imitation learning method,the operation cost of the proposed solution is reduced by 6.3% and 2.8%,respectively,overa test period of three months.展开更多
Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL cont...Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL control approach for building energy systems which are becoming complicated due to the need to optimize for multiple,potentially conflicting,goals like occupant comfort,energy use and grid interactivity.However,for real world applications,RL has several drawbacks like requiring large training data and time,and unstable control behavior during the early exploration process making it infeasible for an application directly to building control tasks.To address these issues,an imitation learning approach is utilized herein where the RL agents starts with a policy transferred from accepted rule based policies and heuristic policies.This approach is successful in reducing the training time,preventing the unstable early exploration behavior and improving upon an accepted rule-based policy-all of these make RL a more practical control approach for real world applications in the domain of building controls.展开更多
Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the d...Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the defects of traditional imitation learning by using a generative adversary network framework and shows excellent performance in many fields.However,GAIL directly acts on immediate rewards,a feature that is reflected in the value function after a period of accumulation.Thus,when faced with complex practical problems,the learning efficiency of GAIL is often extremely low and the policy may be slow to learn.One way to solve this problem is to directly guide the action(policy)in the agents'learning process,such as the control sharing(CS)method.This paper combines reinforcement learning and imitation learning and proposes a novel GAIL framework called generative adversarial imitation learning based on control sharing policy(GACS).GACS learns model constraints from expert samples and uses adversarial networks to guide learning directly.The actions are produced by adversarial networks and are used to optimize the policy and effectively improve learning efficiency.Experiments in the autonomous driving environment and the real-time strategy game breakout show that GACS has better generalization capabilities,more efficient imitation of the behavior of experts,and can learn better policies relative to other frameworks.展开更多
Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads t...Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads to the design of control policies closely approximating the optimal state-feedback.This approach requires training a machine learning algorithm(in our case deep neural networks)directly on state-control pairs originating from optimal trajectories.We have shown in previous work that,when restricted to low-dimensional state and control spaces,this approach is very successful in several deterministic,non-linear problems in continuous-time.In this work,we refine our previous studies using as a test case a simple quadcopter model with quadratic and time-optimal objective functions.We describe in detail the best learning pipeline we have developed,that is able to approximate via deep neural networks the state-feedback map to a very high accuracy.We introduce the use of the softplus activation function in the hidden units of neural networks showing that it results in a smoother control profile whilst retaining the benefits of rectifiers.We show how to evaluate the optimality of the trained state-feedback,and find that already with two layers the objective function reached and its optimal value differ by less than one percent.We later consider also an additional metric linked to the system asymptotic behaviour-time taken to converge to the policy’s fixed point.With respect to these metrics,we show that improvements in the mean absolute error do not necessarily correspond to better policies.展开更多
The flexibility of electrical heating devices can help address the issues arising from the growing presence of unpredictable renewable energy sources in the energy system.In particular,heat pumps offer an effective so...The flexibility of electrical heating devices can help address the issues arising from the growing presence of unpredictable renewable energy sources in the energy system.In particular,heat pumps offer an effective solution by employing smart control methods that adjust the heat pump’s power output in reaction to demand response signals.This paper combines imitation learning based on an artificial neural network with an intelligent control approach for heat pumps.We train the model using the output data of an optimization problem to determine the optimal operation schedule of a heat pump.The objective is to minimize the electricity cost with a time-variable electricity tariff while keeping the building temperature within acceptable boundaries.We evaluate our developed novel method,PSC-ANN,on various multi-family buildings with differing insulation levels that utilize an underfloor heating system as thermal storage.The results show that PSC-ANN outperforms a positively evaluated intelligent control approach from the literature and a conventional control approach.Further,our experiments reveal that a trained imitation learning model for a specific building is also applicable to other similar buildings without the need to train it again with new data.Our developed approach also reduces the execution time compared to optimally solving the corresponding optimization problem.PSC-ANN can be integrated into multiple buildings,enabling them to better utilize renewable energy sources by adjusting their electricity consumption in response to volatile external signals.展开更多
Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well wi...Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well with complex problems.Given the frequent need to solve varied combinatorial optimization problems, leveraging statistical learning to auto-tune B&B algorithms for specific problem classes becomes attractive. This paper proposes a graph pointer network model to learn the branch rules. Graph features, global features and historical features are designated to represent the solver state. The graph neural network processes graph features, while the pointer mechanism assimilates the global and historical features to finally determine the variable on which to branch. The model is trained to imitate the expert strong branching rule by a tailored top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. It also outperforms state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.展开更多
The automatic and rapid generation of excavation trajectories is the foundation for achieving an intelligent excavator.To obtain high-performance trajectories that enhance operational capacity while avoiding the numer...The automatic and rapid generation of excavation trajectories is the foundation for achieving an intelligent excavator.To obtain high-performance trajectories that enhance operational capacity while avoiding the numerous issues present in existing methods for generating effective excavation paths,this paper proposes a trajectory generation method for excavators based on imitation learning,using the mole as a bionic prototype.Given the high excavation efficiency of moles,this paper first analyzes the structural characteristics of the mole’s forelimbs,its digging principles,morphology,and trajectory patterns.Subsequently,a higher-order polynomial is employed to fit and optimize the mole’s excavation trajectory.Next,imitation learning is conducted on sample trajectories based on Dynamic Movement Primitives,followed by the introduction of an obstacle avoidance algorithm.Simulation experiments and comparisons demonstrate that the mole-inspired trajectory method used in this paper performs well and possesses the ability to generate obstacle avoidance trajectories,as well as the convenience of transferring across different machine models.展开更多
IntuiGrasp is a novel three-fingered dexterous hand that pioneers bio-inspired demonstrations with intuitive priors(BDIP)to bridge the gap between human tactile intuition and robotic execution.Unlike conven-tional pro...IntuiGrasp is a novel three-fingered dexterous hand that pioneers bio-inspired demonstrations with intuitive priors(BDIP)to bridge the gap between human tactile intuition and robotic execution.Unlike conven-tional programming,BDIP leverages human's innate priors(e.g.,“A pack of tissues requires gentle grasps,cups demand firm contact”)by enabling real-time transfer of gesture and force policies during physical demon-stration.When a human demonstrator wears IntuiGrasp,driven rings provide real-time haptic feedback on contact stress and slip,while inte-grated tactile sensors translate these human policies into image data,offering valuable data for imitation learning.In this study,human teachers use IntuiGrasp to demonstrate how to grasp three types of objects:a cup,a crumpled tissue pack,and a thin playing card.IntuiGrasp translates the policies for grasping these objects into image information that describes tactile sensations in real time.展开更多
Animals can adapt to their surroundings by modifying their trunk morphology,whereas legged robots currently utilize rigid trunks.This study introduces a single-degree-of-freedom(DoF),six-revolute(6R)morphing trunk mec...Animals can adapt to their surroundings by modifying their trunk morphology,whereas legged robots currently utilize rigid trunks.This study introduces a single-degree-of-freedom(DoF),six-revolute(6R)morphing trunk mechanism designed to equip legged robots with variable-width capabilities.Subsequently,a morphology-aware locomotion learning pipeline,based on reinforcement learning,is proposed for real-time trunk-width deformation and adaptive legged locomotion.The proposed variable-width trunk is integrated into a quadrupedal robot,and the learning pipeline is employed to train the adaptive locomotion controller of this robot.This study has three key contributions:(1)An overconstrained morphing mechanism is designed to achieve single-DoF trunk-width deformation,thereby minimizing power consumption and simplifying motion control.(2)A novel morphology-adaptive learning pipeline is introduced that utilizes adversarial joint-level motion imitation to ensure coordination consistency during morphological adaptation.This method addresses dynamic disturbances and interlimb coordination disruptions caused by width modifications.(3)A historical proprioception-based asymmetric neural network architecture is utilized to attain implicit terrain perception without visual input.Collectively,these developments enable the proposed variable-width legged robot to maintain consistent locomotion across complex terrains and facilitate rapid width deformation in response to environmental changes.Extensive simulation experiments validate the proposed design and control methodology.展开更多
Edge computation offloading allows mobile end devices to execute compute-inten?sive tasks on edge servers. End devices can decide whether the tasks are offloaded to edge servers, cloud servers or executed locally acco...Edge computation offloading allows mobile end devices to execute compute-inten?sive tasks on edge servers. End devices can decide whether the tasks are offloaded to edge servers, cloud servers or executed locally according to current network condition and devic?es'profiles in an online manner. In this paper, we propose an edge computation offloading framework based on deep imitation learning (DIL) and knowledge distillation (KD), which assists end devices to quickly make fine-grained decisions to optimize the delay of computa?tion tasks online. We formalize a computation offloading problem into a multi-label classifi?cation problem. Training samples for our DIL model are generated in an offline manner. Af?ter the model is trained, we leverage KD to obtain a lightweight DIL model, by which we fur?ther reduce the model's inference delay. Numerical experiment shows that the offloading de?cisions made by our model not only outperform those made by other related policies in laten?cy metric, but also have the shortest inference delay among all policies.展开更多
Recent advancements have shown that control strategies using Deep Reinforcement Learning(DRL)can significantly improve the management of HVAC control and energy systems in buildings,leading to significant energy savin...Recent advancements have shown that control strategies using Deep Reinforcement Learning(DRL)can significantly improve the management of HVAC control and energy systems in buildings,leading to significant energy savings and better comfort.Unlike conventional rule-based controllers,they demand considerable time and data to develop effective policies.Transfer learning using pre-trained models can help address this issue.In this work,we use imitation learning(IL)as a method of pre-training and reinforcement learning(RL)for fine-tuning.However,HVAC systems can vary depending on the location,building size,structure,construction materials and weather conditions.The diversity in HVAC control systems across different buildings complicates the use of IL and RL.Neural network weights trained on the source building cannot be directly transferred to the target building because of differences in input features and the number of control equipment.To overcome this problem,we propose a novel padding method to ensure that both the source and target buildings share the same state space dimensionality.Thus,the trained neural network weights are transferable,and only the output layer must be adjusted to fit the dimensionality of the target action space.Additionally,we evaluate the performance of an existing padding technique for comparison.Our experiments show that the novel padding technique outperforms zero padding by 1.37%and training from scratch by 4.59%on average.展开更多
基金supported by the Natural Science Foundation of Fujian Province of China(2025J01380)National Natural Science Foundation of China(No.62471139)+3 种基金the Major Health Research Project of Fujian Province(2021ZD01001)Fujian Provincial Units Special Funds for Education and Research(2022639)Fujian University of Technology Research Start-up Fund(GY-S24002)Fujian Research and Training Grants for Young and Middle-aged Leaders in Healthcare(GY-H-24179).
文摘The generation of synthetic trajectories has become essential in various fields for analyzing complex movement patterns.However,the use of real-world trajectory data poses significant privacy risks,such as location reidentification and correlation attacks.To address these challenges,privacy-preserving trajectory generation methods are critical for applications relying on sensitive location data.This paper introduces DPIL-Traj,an advanced framework designed to generate synthetic trajectories while achieving a superior balance between data utility and privacy preservation.Firstly,the framework incorporates Differential Privacy Clustering,which anonymizes trajectory data by applying differential privacy techniques that add noise,ensuring the protection of sensitive user information.Secondly,Imitation Learning is used to replicate decision-making behaviors observed in real-world trajectories.By learning from expert trajectories,this component generates synthetic data that closely mimics real-world decision-making processes while optimizing the quality of the generated trajectories.Finally,Markov-based Trajectory Generation is employed to capture and maintain the inherent temporal dynamics of movement patterns.Extensive experiments conducted on the GeoLife trajectory dataset show that DPIL-Traj improves utility performance by an average of 19.85%,and in terms of privacy performance by an average of 12.51%,compared to state-of-the-art approaches.Ablation studies further reveal that DP clustering effectively safeguards privacy,imitation learning enhances utility under noise,and the Markov module strengthens temporal coherence.
基金supported by State Grid Corporation Technology Project(No.522437250003).
文摘Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most promising directions of development.This paper proposes an optimized schedulingmodel for a hydrogen-coupled electro-heat-gas integrated energy system(HCEHG-IES)using generative adversarial imitation learning(GAIL).The model aims to enhance renewable-energy absorption,reduce carbon emissions,and improve grid-regulation flexibility.First,the optimal scheduling problem of HCEHG-IES under uncertainty is modeled as a Markov decision process(MDP).To overcome the limitations of conventional deep reinforcement learning algorithms—including long optimization time,slow convergence,and subjective reward design—this study augments the PPO algorithm by incorporating a discriminator network and expert data.The newly developed algorithm,termed GAIL,enables the agent to perform imitation learning from expert data.Based on this model,dynamic scheduling decisions are made in continuous state and action spaces,generating optimal energy-allocation and management schemes.Simulation results indicate that,compared with traditional reinforcement-learning algorithms,the proposed algorithmoffers better economic performance.Guided by expert data,the agent avoids blind optimization,shortens the offline training time,and improves convergence performance.In the online phase,the algorithm enables flexible energy utilization,thereby promoting renewable-energy absorption and reducing carbon emissions.
基金Supported by National Natural Science Foundation of China(Grant No.52475280)Shaanxi Provincial Natural Science Basic Research Program(Grant No.2025SYSSYSZD-105).
文摘Robots are key to expanding the scope of space applications.The end-to-end training for robot vision-based detection and precision operations is challenging owing to constraints such as extreme environments and high computational overhead.This study proposes a lightweight integrated framework for grasp detection and imitation learning,named GD-IL;it comprises a grasp detection algorithm based on manipulability and Gaussian mixture model(manipulability-GMM),and a grasp trajectory generation algorithm based on a two-stage robot imitation learning algorithm(TS-RIL).In the manipulability-GMM algorithm,we apply GMM clustering and ellipse regression to the object point cloud,propose two judgment criteria to generate multiple candidate grasp bounding boxes for the robot,and use manipulability as a metric for selecting the optimal grasp bounding box.The stages of the TS-RIL algorithm are grasp trajectory learning and robot pose optimization.In the first stage,the robot grasp trajectory is characterized using a second-order dynamic movement primitive model and Gaussian mixture regression(GMM).By adjusting the function form of the forcing term,the robot closely approximates the target-grasping trajectory.In the second stage,a robot pose optimization model is built based on the derived pose error formula and manipulability metric.This model allows the robot to adjust its configuration in real time while grasping,thereby effectively avoiding singularities.Finally,an algorithm verification platform is developed based on a Robot Operating System and a series of comparative experiments are conducted in real-world scenarios.The experimental results demonstrate that GD-IL significantly improves the effectiveness and robustness of grasp detection and trajectory imitation learning,outperforming existing state-of-the-art methods in execution efficiency,manipulability,and success rate.
基金supported in part by the National Natural Science Foundation of China under Grant 61971084 and Grant 62001073in part by the National Natural Science Foundation of Chongqing under Grant cstc2019jcyj-msxmX0208in part by the open research fund of National Mobile Communications Research Laboratory,Southeast University,under Grant 2020D05.
文摘Mobile Edge Computing(MEC)is promising to alleviate the computation and storage burdens for terminals in wireless networks.The huge energy consumption of MEC servers challenges the establishment of smart cities and their service time powered by rechargeable batteries.In addition,Orthogonal Multiple Access(OMA)technique cannot utilize limited spectrum resources fully and efficiently.Therefore,Non-Orthogonal Multiple Access(NOMA)-based energy-efficient task scheduling among MEC servers for delay-constraint mobile applications is important,especially in highly-dynamic vehicular edge computing networks.The various movement patterns of vehicles lead to unbalanced offloading requirements and different load pressure for MEC servers.Self-Imitation Learning(SIL)-based Deep Reinforcement Learning(DRL)has emerged as a promising machine learning technique to break through obstacles in various research fields,especially in time-varying networks.In this paper,we first introduce related MEC technologies in vehicular networks.Then,we propose an energy-efficient approach for task scheduling in vehicular edge computing networks based on DRL,with the purpose of both guaranteeing the task latency requirement for multiple users and minimizing total energy consumption of MEC servers.Numerical results demonstrate that the proposed algorithm outperforms other methods.
文摘Providing autonomous systems with an effective quantity and quality of information from a desired task is challenging. In particular, autonomous vehicles, must have a reliable vision of their workspace to robustly accomplish driving functions. Speaking of machine vision, deep learning techniques, and specifically convolutional neural networks, have been proven to be the state of the art technology in the field. As these networks typically involve millions of parameters and elements, designing an optimal architecture for deep learning structures is a difficult task which is globally under investigation by researchers. This study experimentally evaluates the impact of three major architectural properties of convolutional networks, including the number of layers, filters, and filter size on their performance. In this study, several models with different properties are developed,equally trained, and then applied to an autonomous car in a realistic simulation environment. A new ensemble approach is also proposed to calculate and update weights for the models regarding their mean squared error values. Based on design properties,performance results are reported and compared for further investigations. Surprisingly, the number of filters itself does not largely affect the performance efficiency. As a result, proper allocation of filters with different kernel sizes through the layers introduces a considerable improvement in the performance.Achievements of this study will provide the researchers with a clear clue and direction in designing optimal network architectures for deep learning purposes.
文摘This paper studies imitation learning in nonlinear multi-player game systems with heterogeneous control input dynamics.We propose a model-free data-driven inverse reinforcement learning(RL)algorithm for a leaner to find the cost functions of a N-player Nash expert system given the expert's states and control inputs.This allows us to address the imitation learning problem without prior knowledge of the expert's system dynamics.To achieve this,we provide a basic model-based algorithm that is built upon RL and inverse optimal control.This serves as the foundation for our final model-free inverse RL algorithm which is implemented via neural network-based value function approximators.Theoretical analysis and simulation examples verify the methods.
基金supported by the Innovation Science Foundation of National University of Defense Technology,China(24-ZZCX-GZZ-11)the National Science Foundation of China(62373201).
文摘Biomimetic grasping is crucial for robots to interact with the environment and perform complex tasks,making it a key focus in robotics and embodied intelligence.However,achieving human-level finger coordination and force control remains challenging due to the need for multimodal perception,including visual,kinesthetic,and tactile feedback.Although some recent approaches have demonstrated remarkable performance in grasping diverse objects,they often rely on expensive tactile sensors or are restricted to rigid objects.To address these challenges,we introduce SoftGrasp,a novel multimodal imitation learning approach for adaptive,multi-stage grasping of objects with varying sizes,shapes,and hardness.First,we develop an immersive demonstration platform with force feedback to collect rich,human-like grasping datasets.Inspired by human proprioceptive manipulation,this platform gathers multimodal signals,including visual images,robot finger joint angles,and joint torques,during demonstrations.Next,we utilize a multi-head attention mechanism to align and integrate multimodal features,dynamically allocating attention to ensure comprehensive learning.On this basis,we design a behavior cloning method based on an angle-torque loss function,enabling multimodal imitation learning.Finally,we validate SoftGrasp in extensive experiments across various scenarios,demonstrating its ability to adaptively adjust joint forces and finger angles based on real-time inputs.These capabilities result in a 98%success rate in real-world experiments,achieving dexterous and stable grasping.Source code and demonstration videos are available at https://github.com/nubot-nudt/SoftGrasp.
文摘This study focuses on enhancing the evasion capabilities of unmanned ground vehicles(UGVs)using Generative Adversarial Imitation Learning(GAIL).The UGVs are trained to evade unmanned aerial vehicles(UAVs).A decision-making neural network has been trained via GAIL to refine evasion strategies with expert demonstrations.The simulation environment was developed with OpenAI Gym and calibrated with real-world data for the improvement of accuracy.The integrated platform including the proposed algorithm was tested in flight experiments.Results showed that the UGVs could effectively evade UAVs in the complex and dynamic environment.
文摘We propose a new framework for entity and event extraction based on generative adversarial imitation learning-an inverse reinforcement learning method using a generative adversarial network(GAN).We assume that instances and labels yield to various extents of difficulty and the gains and penalties(rewards)are expected to be diverse.We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth(expert)and the extractor(agent).Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods.
基金supported in part by the National Natural Science Foundation of China(No.52177119).
文摘The intermittency of renewable energy generation,variability of load demand,and stochasticity of market price bring about direct challenges to optimal energy management of microgrids.To cope with these different forms of operation uncertainties,an imitation learning based real-time decision-mak-ing solution for microgrid economic dispatch is proposed.In this solution,the optimal dispatch trajectories obtained by solving the optimal problem using historical deterministic operation patterns are demonstrated as the expert samples for imitation learning.To improve the generalization performance of imitation learning and the expressive ability of uncertain variables,a hybrid model combining the unsupervised and supervised learning is utilized.The denoising autoencoder based unsupervised learning model is adopted to enhance the feature extraction of operation patterns.Furthermore,the long short-term memory network based supervised learning model is used to efficiently characterize the mapping between the input space composed of the extracted operation patterns and system state variables and the output space composed of the optimal dispatch trajectories.The numerical simulation results demonstrate that under various operation uncertainties,the operation cost achieved by the proposed solution is close to the minimum theoretical value.Compared with the traditional model predictive control method and basic clone imitation learning method,the operation cost of the proposed solution is reduced by 6.3% and 2.8%,respectively,overa test period of three months.
基金This work was authored in part by the National Renewable Energy Laboratory,United States,operated by Alliance for Sustainable Energy,LLC,for the U.S.Department of Energy(DOE)under Contract No.DE-AC36-08GO28308.
文摘Reinforcement learning(RL)has shown significant success in sequential decision making in fields like autonomous vehicles,robotics,marketing and gaming industries.This success has attracted the attention to the RL control approach for building energy systems which are becoming complicated due to the need to optimize for multiple,potentially conflicting,goals like occupant comfort,energy use and grid interactivity.However,for real world applications,RL has several drawbacks like requiring large training data and time,and unstable control behavior during the early exploration process making it infeasible for an application directly to building control tasks.To address these issues,an imitation learning approach is utilized herein where the RL agents starts with a policy transferred from accepted rule based policies and heuristic policies.This approach is successful in reducing the training time,preventing the unstable early exploration behavior and improving upon an accepted rule-based policy-all of these make RL a more practical control approach for real world applications in the domain of building controls.
基金Supported in Part by the National Natural Science Foundation of China (U1808206)。
文摘Generative adversarial imitation learning(GAIL)directly imitates the behavior of experts from human demonstration instead of designing explicit reward signals like reinforcement learning.Meanwhile,GAIL overcomes the defects of traditional imitation learning by using a generative adversary network framework and shows excellent performance in many fields.However,GAIL directly acts on immediate rewards,a feature that is reflected in the value function after a period of accumulation.Thus,when faced with complex practical problems,the learning efficiency of GAIL is often extremely low and the policy may be slow to learn.One way to solve this problem is to directly guide the action(policy)in the agents'learning process,such as the control sharing(CS)method.This paper combines reinforcement learning and imitation learning and proposes a novel GAIL framework called generative adversarial imitation learning based on control sharing policy(GACS).GACS learns model constraints from expert samples and uses adversarial networks to guide learning directly.The actions are produced by adversarial networks and are used to optimize the policy and effectively improve learning efficiency.Experiments in the autonomous driving environment and the real-time strategy game breakout show that GACS has better generalization capabilities,more efficient imitation of the behavior of experts,and can learn better policies relative to other frameworks.
文摘Imitation learning is a control design paradigm that seeks to learn a control policy reproducing demonstrations from expert agents.By substituting expert demonstrations for optimal behaviours,the same paradigm leads to the design of control policies closely approximating the optimal state-feedback.This approach requires training a machine learning algorithm(in our case deep neural networks)directly on state-control pairs originating from optimal trajectories.We have shown in previous work that,when restricted to low-dimensional state and control spaces,this approach is very successful in several deterministic,non-linear problems in continuous-time.In this work,we refine our previous studies using as a test case a simple quadcopter model with quadratic and time-optimal objective functions.We describe in detail the best learning pipeline we have developed,that is able to approximate via deep neural networks the state-feedback map to a very high accuracy.We introduce the use of the softplus activation function in the hidden units of neural networks showing that it results in a smoother control profile whilst retaining the benefits of rectifiers.We show how to evaluate the optimality of the trained state-feedback,and find that already with two layers the objective function reached and its optimal value differ by less than one percent.We later consider also an additional metric linked to the system asymptotic behaviour-time taken to converge to the policy’s fixed point.With respect to these metrics,we show that improvements in the mean absolute error do not necessarily correspond to better policies.
基金supported by the project AsimutE tion(Autoconsomma-et Stockage Intelligents pour une Meilleure Utilisation de l’Énergie)from the European Territorial Cooperation program Interreg.
文摘The flexibility of electrical heating devices can help address the issues arising from the growing presence of unpredictable renewable energy sources in the energy system.In particular,heat pumps offer an effective solution by employing smart control methods that adjust the heat pump’s power output in reaction to demand response signals.This paper combines imitation learning based on an artificial neural network with an intelligent control approach for heat pumps.We train the model using the output data of an optimization problem to determine the optimal operation schedule of a heat pump.The objective is to minimize the electricity cost with a time-variable electricity tariff while keeping the building temperature within acceptable boundaries.We evaluate our developed novel method,PSC-ANN,on various multi-family buildings with differing insulation levels that utilize an underfloor heating system as thermal storage.The results show that PSC-ANN outperforms a positively evaluated intelligent control approach from the literature and a conventional control approach.Further,our experiments reveal that a trained imitation learning model for a specific building is also applicable to other similar buildings without the need to train it again with new data.Our developed approach also reduces the execution time compared to optimally solving the corresponding optimization problem.PSC-ANN can be integrated into multiple buildings,enabling them to better utilize renewable energy sources by adjusting their electricity consumption in response to volatile external signals.
基金supported by the Open Project of Xiangjiang Laboratory (22XJ02003)Scientific Project of the National University of Defense Technology (NUDT)(ZK21-07, 23-ZZCX-JDZ-28)+1 种基金the National Science Fund for Outstanding Young Scholars (62122093)the National Natural Science Foundation of China (72071205)。
文摘Traditional expert-designed branching rules in branch-and-bound(B&B) are static, often failing to adapt to diverse and evolving problem instances. Crafting these rules is labor-intensive, and may not scale well with complex problems.Given the frequent need to solve varied combinatorial optimization problems, leveraging statistical learning to auto-tune B&B algorithms for specific problem classes becomes attractive. This paper proposes a graph pointer network model to learn the branch rules. Graph features, global features and historical features are designated to represent the solver state. The graph neural network processes graph features, while the pointer mechanism assimilates the global and historical features to finally determine the variable on which to branch. The model is trained to imitate the expert strong branching rule by a tailored top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. It also outperforms state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.
基金supported by the National Science Foundation of China(Grant No.52375246,No.52372428,No.52105100)Guangxi Science and Technology Program(Grant No.2023AB09014)Jilin Province Science and Technology Development Program,(Grant No.20230201094GX,No.20230201069GX).
文摘The automatic and rapid generation of excavation trajectories is the foundation for achieving an intelligent excavator.To obtain high-performance trajectories that enhance operational capacity while avoiding the numerous issues present in existing methods for generating effective excavation paths,this paper proposes a trajectory generation method for excavators based on imitation learning,using the mole as a bionic prototype.Given the high excavation efficiency of moles,this paper first analyzes the structural characteristics of the mole’s forelimbs,its digging principles,morphology,and trajectory patterns.Subsequently,a higher-order polynomial is employed to fit and optimize the mole’s excavation trajectory.Next,imitation learning is conducted on sample trajectories based on Dynamic Movement Primitives,followed by the introduction of an obstacle avoidance algorithm.Simulation experiments and comparisons demonstrate that the mole-inspired trajectory method used in this paper performs well and possesses the ability to generate obstacle avoidance trajectories,as well as the convenience of transferring across different machine models.
文摘IntuiGrasp is a novel three-fingered dexterous hand that pioneers bio-inspired demonstrations with intuitive priors(BDIP)to bridge the gap between human tactile intuition and robotic execution.Unlike conven-tional programming,BDIP leverages human's innate priors(e.g.,“A pack of tissues requires gentle grasps,cups demand firm contact”)by enabling real-time transfer of gesture and force policies during physical demon-stration.When a human demonstrator wears IntuiGrasp,driven rings provide real-time haptic feedback on contact stress and slip,while inte-grated tactile sensors translate these human policies into image data,offering valuable data for imitation learning.In this study,human teachers use IntuiGrasp to demonstrate how to grasp three types of objects:a cup,a crumpled tissue pack,and a thin playing card.IntuiGrasp translates the policies for grasping these objects into image information that describes tactile sensations in real time.
基金Supported by State Key Lab of Mechanical System and Vibration Project of China(Grant No.MSVZD202008).
文摘Animals can adapt to their surroundings by modifying their trunk morphology,whereas legged robots currently utilize rigid trunks.This study introduces a single-degree-of-freedom(DoF),six-revolute(6R)morphing trunk mechanism designed to equip legged robots with variable-width capabilities.Subsequently,a morphology-aware locomotion learning pipeline,based on reinforcement learning,is proposed for real-time trunk-width deformation and adaptive legged locomotion.The proposed variable-width trunk is integrated into a quadrupedal robot,and the learning pipeline is employed to train the adaptive locomotion controller of this robot.This study has three key contributions:(1)An overconstrained morphing mechanism is designed to achieve single-DoF trunk-width deformation,thereby minimizing power consumption and simplifying motion control.(2)A novel morphology-adaptive learning pipeline is introduced that utilizes adversarial joint-level motion imitation to ensure coordination consistency during morphological adaptation.This method addresses dynamic disturbances and interlimb coordination disruptions caused by width modifications.(3)A historical proprioception-based asymmetric neural network architecture is utilized to attain implicit terrain perception without visual input.Collectively,these developments enable the proposed variable-width legged robot to maintain consistent locomotion across complex terrains and facilitate rapid width deformation in response to environmental changes.Extensive simulation experiments validate the proposed design and control methodology.
基金This work was supported in part by the National Science Foundation of China under Grant No.61972432the Program for Guangdong Introduc⁃ing Innovative and Entrepreneurial Teams under Grant No.2017ZT07X355.
文摘Edge computation offloading allows mobile end devices to execute compute-inten?sive tasks on edge servers. End devices can decide whether the tasks are offloaded to edge servers, cloud servers or executed locally according to current network condition and devic?es'profiles in an online manner. In this paper, we propose an edge computation offloading framework based on deep imitation learning (DIL) and knowledge distillation (KD), which assists end devices to quickly make fine-grained decisions to optimize the delay of computa?tion tasks online. We formalize a computation offloading problem into a multi-label classifi?cation problem. Training samples for our DIL model are generated in an offline manner. Af?ter the model is trained, we leverage KD to obtain a lightweight DIL model, by which we fur?ther reduce the model's inference delay. Numerical experiment shows that the offloading de?cisions made by our model not only outperform those made by other related policies in laten?cy metric, but also have the shortest inference delay among all policies.
基金financial support of TaighdeÉireann-Research Ireland under Grant No.18/CRT/6223.
文摘Recent advancements have shown that control strategies using Deep Reinforcement Learning(DRL)can significantly improve the management of HVAC control and energy systems in buildings,leading to significant energy savings and better comfort.Unlike conventional rule-based controllers,they demand considerable time and data to develop effective policies.Transfer learning using pre-trained models can help address this issue.In this work,we use imitation learning(IL)as a method of pre-training and reinforcement learning(RL)for fine-tuning.However,HVAC systems can vary depending on the location,building size,structure,construction materials and weather conditions.The diversity in HVAC control systems across different buildings complicates the use of IL and RL.Neural network weights trained on the source building cannot be directly transferred to the target building because of differences in input features and the number of control equipment.To overcome this problem,we propose a novel padding method to ensure that both the source and target buildings share the same state space dimensionality.Thus,the trained neural network weights are transferable,and only the output layer must be adjusted to fit the dimensionality of the target action space.Additionally,we evaluate the performance of an existing padding technique for comparison.Our experiments show that the novel padding technique outperforms zero padding by 1.37%and training from scratch by 4.59%on average.