This work proposes the application of an iterative learning model predictive control(ILMPC)approach based on an adaptive fault observer(FOBILMPC)for fault-tolerant control and trajectory tracking in air-breathing hype...This work proposes the application of an iterative learning model predictive control(ILMPC)approach based on an adaptive fault observer(FOBILMPC)for fault-tolerant control and trajectory tracking in air-breathing hypersonic vehicles.In order to increase the control amount,this online control legislation makes use of model predictive control(MPC)that is based on the concept of iterative learning control(ILC).By using offline data to decrease the linearized model’s faults,the strategy may effectively increase the robustness of the control system and guarantee that disturbances can be suppressed.An adaptive fault observer is created based on the suggested ILMPC approach in order to enhance overall fault tolerance by estimating and compensating for actuator disturbance and fault degree.During the derivation process,a linearized model of longitudinal dynamics is established.The suggested ILMPC approach is likely to be used in the design of hypersonic vehicle control systems since numerical simulations have demonstrated that it can decrease tracking error and speed up convergence when compared to the offline controller.展开更多
Ensuring the consistent mechanical performance of three-dimensional(3D)-printed continuous fiber-reinforced composites is a significant challenge in additive manufacturing.The current reliance on manual monitoring exa...Ensuring the consistent mechanical performance of three-dimensional(3D)-printed continuous fiber-reinforced composites is a significant challenge in additive manufacturing.The current reliance on manual monitoring exacerbates this challenge by rendering the process vulnerable to environmental changes and unexpected factors,resulting in defects and inconsistent product quality,particularly in unmanned long-term operations or printing in extreme environments.To address these issues,we developed a process monitoring and closed-loop feedback control strategy for the 3D printing process.Real-time printing image data were captured and analyzed using a well-trained neural network model,and a real-time control module-enabled closed-loop feedback control of the flow rate was developed.The neural network model,which was based on image processing and artificial intelligence,enabled the recognition of flow rate values with an accuracy of 94.70%.The experimental results showed significant improvements in both the surface performance and mechanical properties of printed composites,with three to six times improvement in tensile strength and elastic modulus,demonstrating the effectiveness of the strategy.This study provides a generalized process monitoring and feedback control method for the 3D printing of continuous fiber-reinforced composites,and offers a potential solution for remote online monitoring and closed-loop adjustment in unmanned or extreme space environments.展开更多
The graded density impactor(GDI)dynamic loading technique is crucial for acquiring the dynamic physical property parameters of materials used in weapons.The accuracy and timeliness of GDI structural design are key to ...The graded density impactor(GDI)dynamic loading technique is crucial for acquiring the dynamic physical property parameters of materials used in weapons.The accuracy and timeliness of GDI structural design are key to achieving controllable stress-strain rate loading.In this study,we have,for the first time,combined one-dimensional fluid computational software with machine learning methods.We first elucidated the mechanisms by which GDI structures control stress and strain rates.Subsequently,we constructed a machine learning model to create a structure-property response surface.The results show that altering the loading velocity and interlayer thickness has a pronounced regulatory effect on stress and strain rates.In contrast,the impedance distribution index and target thickness have less significant effects on stress regulation,although there is a matching relationship between target thickness and interlayer thickness.Compared with traditional design methods,the machine learning approach offers a10^(4)—10^(5)times increase in efficiency and the potential to achieve a global optimum,holding promise for guiding the design of GDI.展开更多
This paper investigates a multiplayer Pareto game for affine nonlinear stochastic systems disturbed by both external and the internal multiplicative noises.The Pareto cooperative optimal strategies with the H_(∞) con...This paper investigates a multiplayer Pareto game for affine nonlinear stochastic systems disturbed by both external and the internal multiplicative noises.The Pareto cooperative optimal strategies with the H_(∞) constraint are resolved by integrating H_(2)/H_(∞) theory with Pareto game theory.First,a nonlinear stochastic bounded real lemma(SBRL)is derived,explicitly accounting for non-zero initial conditions.Through the analysis of four cross-coupled Hamilton-Jacobi equations(HJEs),we establish necessary and sufficient conditions for the existence of Pareto optimal strategies with the H_(∞) constraint.Secondly,to address the complexity of solving these nonlinear partial differential HJEs,we propose a neural network(NN)framework with synchronous tuning rules for the actor,critic,and disturbance components,based on a reinforcement learning(RL)approach.The designed tuning rules ensure convergence of the actor-critic-disturbance components to the desired values,enabling the realization of robust Pareto control strategies.The convergence of the proposed algorithm is rigorously analyzed using a constructed Lyapunov function for the NN weight errors.Finally,a numerical simulation example is provided to demonstrate the effectiveness of the proposed methods and main results.展开更多
The increasingly stringent performance requirement in integrated circuit manufacturing, characterized by smaller feature sizes and higher productivity, necessitates the wafer stage executing a extreme motion with the ...The increasingly stringent performance requirement in integrated circuit manufacturing, characterized by smaller feature sizes and higher productivity, necessitates the wafer stage executing a extreme motion with the accuracy in terms of nanometers. This demanding requirement witnesses a widespread application of iterative learning control(ILC), given the repetitive nature of wafer scanning. ILC enables substantial performance improvement by using past measurement data in combination with the system model knowledge. However, challenges arise in cases where the data is contaminated by the stochastic noise, or when the system model exhibits significant uncertainties, constraining the achievable performance. In response to this issue, an extended state observer(ESO) based adaptive ILC approach is proposed in the frequency domain.Despite being model-based, it utilizes only a rough system model and then compensates for the resulting model uncertainties using an ESO, thereby achieving high robustness against uncertainties with minimal modeling effort. Additionally, an adaptive learning law is developed to mitigate the limited performance in the presence of stochastic noise, yielding high convergence accuracy yet without compromising convergence speed. Simulation and experimental comparisons with existing model-based and data-driven inversion-based ILC validate the effectiveness as well as the superiority of the proposed method.展开更多
This study presents an emergency control method for sub-synchronous oscillations in wind power gridconnected systems based on transfer learning,addressing the issue of insufficient generalization ability of traditiona...This study presents an emergency control method for sub-synchronous oscillations in wind power gridconnected systems based on transfer learning,addressing the issue of insufficient generalization ability of traditional methods in complex real-world scenarios.By combining deep reinforcement learning with a transfer learning framework,cross-scenario knowledge transfer is achieved,significantly enhancing the adaptability of the control strategy.First,a sub-synchronous oscillation emergency control model for the wind power grid integration system is constructed under fixed scenarios based on deep reinforcement learning.A reward evaluation system based on the active power oscillation pattern of the system is proposed,introducing penalty functions for the number of machine-shedding rounds and the number of machines shed.This avoids the economic losses and grid security risks caused by the excessive one-time shedding of wind turbines.Furthermore,transfer learning is introduced into model training to enhance the model’s generalization capability in dealing with complex scenarios of actual wind power grid integration systems.By introducing the Maximum Mean Discrepancy(MMD)algorithm to calculate the distribution differences between source data and target data,the online decision-making reliability of the emergency control model is improved.Finally,the effectiveness of the proposed emergency control method for multi-scenario sub-synchronous oscillation in wind power grid integration systems based on transfer learning is analyzed using the New England 39-bus system.展开更多
The integration of artificial intelligence into the development and production of mechatronic products offers a substantial opportunity to enhance efficiency, adaptability, and system performance. This paper examines ...The integration of artificial intelligence into the development and production of mechatronic products offers a substantial opportunity to enhance efficiency, adaptability, and system performance. This paper examines the utilization of reinforcement learning as a control strategy, with a particular focus on its deployment in pivotal stages of the product development lifecycle, specifically between system architecture and system integration and verification. A controller based on reinforcement learning was developed and evaluated in comparison to traditional proportional-integral controllers in dynamic and fault-prone environments. The results illustrate the superior adaptability, stability, and optimization potential of the reinforcement learning approach, particularly in addressing dynamic disturbances and ensuring robust performance. The study illustrates how reinforcement learning can facilitate the transition from conceptual design to implementation by automating optimization processes, enabling interface automation, and enhancing system-level testing. Based on the aforementioned findings, this paper presents future directions for research, which include the integration of domain-specific knowledge into the reinforcement learning process and the validation of this process in real-world environments. The results underscore the potential of artificial intelligence-driven methodologies to revolutionize the design and deployment of intelligent mechatronic systems.展开更多
A mixed adaptive dynamic programming(ADP)scheme based on zero-sum game theory is developed to address optimal control problems of autonomous underwater vehicle(AUV)systems subject to disturbances and safe constraints....A mixed adaptive dynamic programming(ADP)scheme based on zero-sum game theory is developed to address optimal control problems of autonomous underwater vehicle(AUV)systems subject to disturbances and safe constraints.By combining prior dynamic knowledge and actual sampled data,the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm.Initially,the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias.Also,the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset.To comprehensively leverage the advantages of model-based and model-free methods during training,an adaptive tuning factor is introduced based on the dataset that possesses model-referenced information and conforms to the distribution of the real-world environment,which balances the influence of model-based control law and data-driven policy gradient on the direction of policy improvement.As a result,the proposed approach accelerates the learning speed compared to data-driven methods,concurrently also enhancing the tracking performance in comparison to model-based control methods.Moreover,the optimal control problem under disturbances is formulated as a zero-sum game,and the actor-critic-disturbance framework is introduced to approximate the optimal control input,cost function,and disturbance policy,respectively.Furthermore,the convergence property of the proposed algorithm based on the value iteration method is analysed.Finally,an example of AUV path following based on the improved line-of-sight guidance is presented to demonstrate the effectiveness of the proposed method.展开更多
Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,s...Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,so it can handle optimal control problems effectively.This paper proposes an inverse reinforcement learning optimal control method for Takagi-Sugeno(T-S)fuzzy systems.Based on learner systems,an expert system is constructed,where the learner system only knows the expert system's optimal control policy.To reconstruct the unknown cost function,we firstly develop a model-based inverse reinforcement learning algorithm for the case that systems dynamics are known.The developed model-based learning algorithm is consists of two learning stages:an inner reinforcement learning loop and an outer inverse optimal control loop.The inner loop desires to obtain optimal control policy via learner's cost function and the outer loop aims to update learner's state-penalty matrices via only using expert's optimal control policy.Then,to eliminate the requirement that the system dynamics must be known,a data-driven integral learning algorithm is presented.It is proved that the presented two algorithms are convergent and the developed inverse reinforcement learning optimal control scheme can ensure the controlled fuzzy learner systems to be asymptotically stable.Finally,we apply the proposed fuzzy optimal control to the truck-trailer system,and the computer simulation results verify the effectiveness of the presented approach.展开更多
The increasing deployment of Internet of Things(IoT)devices has introduced significant security chal-lenges,including identity spoofing,unauthorized access,and data integrity breaches.Traditional security mechanisms r...The increasing deployment of Internet of Things(IoT)devices has introduced significant security chal-lenges,including identity spoofing,unauthorized access,and data integrity breaches.Traditional security mechanisms rely on centralized frameworks that suffer from single points of failure,scalability issues,and inefficiencies in real-time security enforcement.To address these limitations,this study proposes the Blockchain-Enhanced Trust and Access Control for IoT Security(BETAC-IoT)model,which integrates blockchain technology,smart contracts,federated learning,and Merkle tree-based integrity verification to enhance IoT security.The proposed model eliminates reliance on centralized authentication by employing decentralized identity management,ensuring tamper-proof data storage,and automating access control through smart contracts.Experimental evaluation using a synthetic IoT dataset shows that the BETAC-IoT model improves access control enforcement accuracy by 92%,reduces device authentication time by 52%(from 2.5 to 1.2 s),and enhances threat detection efficiency by 7%(from 85%to 92%)using federated learning.Additionally,the hybrid blockchain architecture achieves a 300%increase in transaction throughput when comparing private blockchain performance(1200 TPS)to public chains(300 TPS).Access control enforcement accuracy was quantified through confusion matrix analysis,with high precision and minimal false positives observed across access decision categories.Although the model presents advantages in security and scalability,challenges such as computational overhead,blockchain storage constraints,and interoperability with existing IoT systems remain areas for future research.This study contributes to advancing decentralized security frameworks for IoT,providing a resilient and scalable solution for securing connected environments.展开更多
The exponential growth of Internet ofThings(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet ...The exponential growth of Internet ofThings(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees.展开更多
Dear Editor,This letter proposes a deep synchronization control(DSC) method to synchronize grid-forming converters with power grids. The method involves constructing a novel controller for grid-forming converters base...Dear Editor,This letter proposes a deep synchronization control(DSC) method to synchronize grid-forming converters with power grids. The method involves constructing a novel controller for grid-forming converters based on the stable deep dynamics model. To enhance the performance of the controller, the dynamics model is optimized within the deep reinforcement learning(DRL) framework. Simulation results verify that the proposed method can reduce frequency deviation and improve active power responses.展开更多
In this paper, the containment control problem in nonlinear multi-agent systems(NMASs) under denial-of-service(DoS) attacks is addressed. Firstly, a prediction model is obtained using the broad learning technique to t...In this paper, the containment control problem in nonlinear multi-agent systems(NMASs) under denial-of-service(DoS) attacks is addressed. Firstly, a prediction model is obtained using the broad learning technique to train historical data generated by the system offline without DoS attacks. Secondly, the dynamic linearization method is used to obtain the equivalent linearization model of NMASs. Then, a novel model-free adaptive predictive control(MFAPC) framework based on historical and online data generated by the system is proposed, which combines the trained prediction model with the model-free adaptive control method. The development of the MFAPC method motivates a much simpler robust predictive control solution that is convenient to use in the case of DoS attacks. Meanwhile, the MFAPC algorithm provides a unified predictive framework for solving consensus tracking and containment control problems. The boundedness of the containment error can be proven by using the contraction mapping principle and the mathematical induction method. Finally, the proposed MFAPC is assessed through comparative experiments.展开更多
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ...Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.展开更多
With the boom in maritime activities,the need for highly reliable maritime communication is becoming urgent,which is an important component of 5G/6G communication networks.However,the bandwidth reuse characteristic of...With the boom in maritime activities,the need for highly reliable maritime communication is becoming urgent,which is an important component of 5G/6G communication networks.However,the bandwidth reuse characteristic of 5G/6G networks will inevitably lead to severe interference,resulting in degradation in the communication performance of maritime users.In this paper,we propose a safe deep reinforcement learning based interference coordination scheme to jointly optimize the power control and bandwidth allocation in maritime communication systems,and exploit the quality-of-service requirements of users as the risk value references to evaluate the communication policies.In particular,this scheme designs a deep neural network to select the communication policies through the evaluation network and update the parameters using the target network,which improves the communication performance and speeds up the convergence rate.Moreover,the Nash equilibrium of the interference coordination game and the computational complexity of the proposed scheme are analyzed.Simulation and experimental results verify the performance gain of the proposed scheme compared with benchmarks.展开更多
Performing diverse motor skills with a universal controller has been a longstanding challenge for legged robots.While motion imitation-based reinforcement learning(RL)has shown remarkable performance in reproducing de...Performing diverse motor skills with a universal controller has been a longstanding challenge for legged robots.While motion imitation-based reinforcement learning(RL)has shown remarkable performance in reproducing designed motor skills,the trained controller is only suitable for one specific type of motion.Motion synthesis has been well developed to generate a variety of different motions for character animation,but those motions only contain kinematic information and cannot be used for control.In this study,we introduce a control pipeline combining motion synthesis and motion imitation-based RL for generic motor skills.We design an animation state machine to synthesize motion from various sources and feed the generated kinematic reference trajectory to the RL controller as part of the input.With the proposed method,we show that a single policy is able to learn various motor skills simultaneously.Further,we notice the ability of the policy to uncover the correlations lurking behind the reference motions to improve control performance.We analyze this ability based on the predictability of the reference trajectory and use the quantified measurements to optimize the design of the controller.To demonstrate the effectiveness of our method,we deploy the trained policy on hardware and,with a single control policy,the quadruped robot can perform various learned skills,including automatic gait transitions,high kick,and forward jump.展开更多
This paper proposes a modified iterative learning control(MILC)periodical feedback-feedforward algorithm to reduce the vibration of a rotor caused by coupled unbalance and parallel misalignment.The control of the vibr...This paper proposes a modified iterative learning control(MILC)periodical feedback-feedforward algorithm to reduce the vibration of a rotor caused by coupled unbalance and parallel misalignment.The control of the vibration of the rotor is provided by an active magnetic actuator(AMA).The iterative gain of the MILC algorithm here presented has a self-adjustment based on the magnitude of the vibration.Notch filters are adopted to extract the synchronous(1×Ω)and twice rotational frequency(2×Ω)components of the rotor vibration.Both the notch frequency of the filter and the size of feedforward storage used during the experiment have a real-time adaptation to the rotational speed.The method proposed in this work can provide effective suppression of the vibration of the rotor in case of sudden changes or fluctuations of the rotor speed.Simulations and experiments using the MILC algorithm proposed here are carried out and give evidence to the feasibility and robustness of the technique proposed.展开更多
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight...This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.展开更多
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg...This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.展开更多
For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to in...For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.展开更多
基金supported by the National Natural Science Foundation of China(12072090).
文摘This work proposes the application of an iterative learning model predictive control(ILMPC)approach based on an adaptive fault observer(FOBILMPC)for fault-tolerant control and trajectory tracking in air-breathing hypersonic vehicles.In order to increase the control amount,this online control legislation makes use of model predictive control(MPC)that is based on the concept of iterative learning control(ILC).By using offline data to decrease the linearized model’s faults,the strategy may effectively increase the robustness of the control system and guarantee that disturbances can be suppressed.An adaptive fault observer is created based on the suggested ILMPC approach in order to enhance overall fault tolerance by estimating and compensating for actuator disturbance and fault degree.During the derivation process,a linearized model of longitudinal dynamics is established.The suggested ILMPC approach is likely to be used in the design of hypersonic vehicle control systems since numerical simulations have demonstrated that it can decrease tracking error and speed up convergence when compared to the offline controller.
基金supported by National Key Research and Development Program of China(Grant No.2023YFB4604100)National Key Research and Development Program of China(Grant No.2022YFB3806104)+4 种基金Key Research and Development Program in Shaanxi Province(Grant No.2021LLRH-08-17)Young Elite Scientists Sponsorship Program by CAST(No.2023QNRC001)K C Wong Education Foundation of ChinaYouth Innovation Team of Shaanxi Universities of ChinaKey Research and Development Program of Shaanxi Province(Grant 2021LLRH-08-3.1).
文摘Ensuring the consistent mechanical performance of three-dimensional(3D)-printed continuous fiber-reinforced composites is a significant challenge in additive manufacturing.The current reliance on manual monitoring exacerbates this challenge by rendering the process vulnerable to environmental changes and unexpected factors,resulting in defects and inconsistent product quality,particularly in unmanned long-term operations or printing in extreme environments.To address these issues,we developed a process monitoring and closed-loop feedback control strategy for the 3D printing process.Real-time printing image data were captured and analyzed using a well-trained neural network model,and a real-time control module-enabled closed-loop feedback control of the flow rate was developed.The neural network model,which was based on image processing and artificial intelligence,enabled the recognition of flow rate values with an accuracy of 94.70%.The experimental results showed significant improvements in both the surface performance and mechanical properties of printed composites,with three to six times improvement in tensile strength and elastic modulus,demonstrating the effectiveness of the strategy.This study provides a generalized process monitoring and feedback control method for the 3D printing of continuous fiber-reinforced composites,and offers a potential solution for remote online monitoring and closed-loop adjustment in unmanned or extreme space environments.
基金supported by the Guangdong Major Project of Basic and Applied Basic Research(Grant No.2021B0301030001)the National Key Research and Development Program of China(Grant No.2021YFB3802300)the Foundation of National Key Laboratory of Shock Wave and Detonation Physics(Grant No.JCKYS2022212004)。
文摘The graded density impactor(GDI)dynamic loading technique is crucial for acquiring the dynamic physical property parameters of materials used in weapons.The accuracy and timeliness of GDI structural design are key to achieving controllable stress-strain rate loading.In this study,we have,for the first time,combined one-dimensional fluid computational software with machine learning methods.We first elucidated the mechanisms by which GDI structures control stress and strain rates.Subsequently,we constructed a machine learning model to create a structure-property response surface.The results show that altering the loading velocity and interlayer thickness has a pronounced regulatory effect on stress and strain rates.In contrast,the impedance distribution index and target thickness have less significant effects on stress regulation,although there is a matching relationship between target thickness and interlayer thickness.Compared with traditional design methods,the machine learning approach offers a10^(4)—10^(5)times increase in efficiency and the potential to achieve a global optimum,holding promise for guiding the design of GDI.
基金supported by the National Natural Science Foundation of China(12426609,62203220,62373229)the Taishan Scholar Project Foundation of Shandong Province(tsqnz20230619,tsqn202408110)+2 种基金the Fundamental Research Foundation of the Central Universities(23Cx06024A)the Natural Science Foundation of Shandong Province(ZR2024QF096)the Outstanding Youth Innovation Team in Shandong Higher Education Institutions(2023KJ061).
文摘This paper investigates a multiplayer Pareto game for affine nonlinear stochastic systems disturbed by both external and the internal multiplicative noises.The Pareto cooperative optimal strategies with the H_(∞) constraint are resolved by integrating H_(2)/H_(∞) theory with Pareto game theory.First,a nonlinear stochastic bounded real lemma(SBRL)is derived,explicitly accounting for non-zero initial conditions.Through the analysis of four cross-coupled Hamilton-Jacobi equations(HJEs),we establish necessary and sufficient conditions for the existence of Pareto optimal strategies with the H_(∞) constraint.Secondly,to address the complexity of solving these nonlinear partial differential HJEs,we propose a neural network(NN)framework with synchronous tuning rules for the actor,critic,and disturbance components,based on a reinforcement learning(RL)approach.The designed tuning rules ensure convergence of the actor-critic-disturbance components to the desired values,enabling the realization of robust Pareto control strategies.The convergence of the proposed algorithm is rigorously analyzed using a constructed Lyapunov function for the NN weight errors.Finally,a numerical simulation example is provided to demonstrate the effectiveness of the proposed methods and main results.
基金supported by National Natural Science Foundation of China(52375530,52075132)Natural Science Foundation of Heilongjiang Province(YQ2022E025)+4 种基金State Key Laboratory of Precision Electronic Manufacturing Technology and Equipment(Guangdong University of Technology)(JMDZ202312)Fundamental Research Funds for the Central Universities(HIT.OCEF.2024034)China Postdoctoral Science Foundation(2019M651278,2020T130155)Heilongjiang Province Postdoctoral Science Foundation(LBH-Z19066)Space Drive and Manipulation Mechanism Laboratory of BICE and National Key Laboratory of Space Intelligent Control,No BICE-SDMM-2024-01
文摘The increasingly stringent performance requirement in integrated circuit manufacturing, characterized by smaller feature sizes and higher productivity, necessitates the wafer stage executing a extreme motion with the accuracy in terms of nanometers. This demanding requirement witnesses a widespread application of iterative learning control(ILC), given the repetitive nature of wafer scanning. ILC enables substantial performance improvement by using past measurement data in combination with the system model knowledge. However, challenges arise in cases where the data is contaminated by the stochastic noise, or when the system model exhibits significant uncertainties, constraining the achievable performance. In response to this issue, an extended state observer(ESO) based adaptive ILC approach is proposed in the frequency domain.Despite being model-based, it utilizes only a rough system model and then compensates for the resulting model uncertainties using an ESO, thereby achieving high robustness against uncertainties with minimal modeling effort. Additionally, an adaptive learning law is developed to mitigate the limited performance in the presence of stochastic noise, yielding high convergence accuracy yet without compromising convergence speed. Simulation and experimental comparisons with existing model-based and data-driven inversion-based ILC validate the effectiveness as well as the superiority of the proposed method.
基金funded by Sponsorship of Science and Technology Project of State Grid Xinjiang Electric Power Co.,Ltd.,grant number SGXJ0000TKJS2400168.
文摘This study presents an emergency control method for sub-synchronous oscillations in wind power gridconnected systems based on transfer learning,addressing the issue of insufficient generalization ability of traditional methods in complex real-world scenarios.By combining deep reinforcement learning with a transfer learning framework,cross-scenario knowledge transfer is achieved,significantly enhancing the adaptability of the control strategy.First,a sub-synchronous oscillation emergency control model for the wind power grid integration system is constructed under fixed scenarios based on deep reinforcement learning.A reward evaluation system based on the active power oscillation pattern of the system is proposed,introducing penalty functions for the number of machine-shedding rounds and the number of machines shed.This avoids the economic losses and grid security risks caused by the excessive one-time shedding of wind turbines.Furthermore,transfer learning is introduced into model training to enhance the model’s generalization capability in dealing with complex scenarios of actual wind power grid integration systems.By introducing the Maximum Mean Discrepancy(MMD)algorithm to calculate the distribution differences between source data and target data,the online decision-making reliability of the emergency control model is improved.Finally,the effectiveness of the proposed emergency control method for multi-scenario sub-synchronous oscillation in wind power grid integration systems based on transfer learning is analyzed using the New England 39-bus system.
文摘The integration of artificial intelligence into the development and production of mechatronic products offers a substantial opportunity to enhance efficiency, adaptability, and system performance. This paper examines the utilization of reinforcement learning as a control strategy, with a particular focus on its deployment in pivotal stages of the product development lifecycle, specifically between system architecture and system integration and verification. A controller based on reinforcement learning was developed and evaluated in comparison to traditional proportional-integral controllers in dynamic and fault-prone environments. The results illustrate the superior adaptability, stability, and optimization potential of the reinforcement learning approach, particularly in addressing dynamic disturbances and ensuring robust performance. The study illustrates how reinforcement learning can facilitate the transition from conceptual design to implementation by automating optimization processes, enabling interface automation, and enhancing system-level testing. Based on the aforementioned findings, this paper presents future directions for research, which include the integration of domain-specific knowledge into the reinforcement learning process and the validation of this process in real-world environments. The results underscore the potential of artificial intelligence-driven methodologies to revolutionize the design and deployment of intelligent mechatronic systems.
基金National Key Research and Development Program of China,Grant/Award Number:2021YFC2801700Defense Industrial Technology Development Program,Grant/Award Numbers:JCKY2021110B024,JCKY2022110C072+6 种基金Science and Technology Innovation 2030-“New Generation Artificial Intelligence”Major Project,Grant/Award Number:2022ZD0116305Natural Science Foundation of Hefei,China,Grant/Award Number:202321National Natural Science Foundation of China,Grant/Award Numbers:U2013601,U20A20225Yangtze River Delta S&T Innovation Community Joint Research Project,Grant/Award Number:2022CSJGG0900Anhui Province Natural Science Funds for Distinguished Young Scholar,Grant/Award Number:2308085J02State Key Laboratory of Intelligent Green Vehicle and Mobility,Grant/Award Number:KFY2417State Key Laboratory of Advanced Design and Manufacturing Technology for Vehicle,Grant/Award Number:32215010。
文摘A mixed adaptive dynamic programming(ADP)scheme based on zero-sum game theory is developed to address optimal control problems of autonomous underwater vehicle(AUV)systems subject to disturbances and safe constraints.By combining prior dynamic knowledge and actual sampled data,the proposed approach effectively mitigates the defect caused by the inaccurate dynamic model and significantly improves the training speed of the ADP algorithm.Initially,the dataset is enriched with sufficient reference data collected based on a nominal model without considering modelling bias.Also,the control object interacts with the real environment and continuously gathers adequate sampled data in the dataset.To comprehensively leverage the advantages of model-based and model-free methods during training,an adaptive tuning factor is introduced based on the dataset that possesses model-referenced information and conforms to the distribution of the real-world environment,which balances the influence of model-based control law and data-driven policy gradient on the direction of policy improvement.As a result,the proposed approach accelerates the learning speed compared to data-driven methods,concurrently also enhancing the tracking performance in comparison to model-based control methods.Moreover,the optimal control problem under disturbances is formulated as a zero-sum game,and the actor-critic-disturbance framework is introduced to approximate the optimal control input,cost function,and disturbance policy,respectively.Furthermore,the convergence property of the proposed algorithm based on the value iteration method is analysed.Finally,an example of AUV path following based on the improved line-of-sight guidance is presented to demonstrate the effectiveness of the proposed method.
基金The National Natural Science Foundation of China(62173172).
文摘Inverse reinforcement learning optimal control is under the framework of learner-expert.The learner system can imitate the expert system's demonstrated behaviors and does not require the predefined cost function,so it can handle optimal control problems effectively.This paper proposes an inverse reinforcement learning optimal control method for Takagi-Sugeno(T-S)fuzzy systems.Based on learner systems,an expert system is constructed,where the learner system only knows the expert system's optimal control policy.To reconstruct the unknown cost function,we firstly develop a model-based inverse reinforcement learning algorithm for the case that systems dynamics are known.The developed model-based learning algorithm is consists of two learning stages:an inner reinforcement learning loop and an outer inverse optimal control loop.The inner loop desires to obtain optimal control policy via learner's cost function and the outer loop aims to update learner's state-penalty matrices via only using expert's optimal control policy.Then,to eliminate the requirement that the system dynamics must be known,a data-driven integral learning algorithm is presented.It is proved that the presented two algorithms are convergent and the developed inverse reinforcement learning optimal control scheme can ensure the controlled fuzzy learner systems to be asymptotically stable.Finally,we apply the proposed fuzzy optimal control to the truck-trailer system,and the computer simulation results verify the effectiveness of the presented approach.
文摘The increasing deployment of Internet of Things(IoT)devices has introduced significant security chal-lenges,including identity spoofing,unauthorized access,and data integrity breaches.Traditional security mechanisms rely on centralized frameworks that suffer from single points of failure,scalability issues,and inefficiencies in real-time security enforcement.To address these limitations,this study proposes the Blockchain-Enhanced Trust and Access Control for IoT Security(BETAC-IoT)model,which integrates blockchain technology,smart contracts,federated learning,and Merkle tree-based integrity verification to enhance IoT security.The proposed model eliminates reliance on centralized authentication by employing decentralized identity management,ensuring tamper-proof data storage,and automating access control through smart contracts.Experimental evaluation using a synthetic IoT dataset shows that the BETAC-IoT model improves access control enforcement accuracy by 92%,reduces device authentication time by 52%(from 2.5 to 1.2 s),and enhances threat detection efficiency by 7%(from 85%to 92%)using federated learning.Additionally,the hybrid blockchain architecture achieves a 300%increase in transaction throughput when comparing private blockchain performance(1200 TPS)to public chains(300 TPS).Access control enforcement accuracy was quantified through confusion matrix analysis,with high precision and minimal false positives observed across access decision categories.Although the model presents advantages in security and scalability,challenges such as computational overhead,blockchain storage constraints,and interoperability with existing IoT systems remain areas for future research.This study contributes to advancing decentralized security frameworks for IoT,providing a resilient and scalable solution for securing connected environments.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R909),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The exponential growth of Internet ofThings(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees.
基金supported in part by the National Natural Science Foundation of China(62033005,62273270)the Natural Science Foundation of Shaanxi Province(2023JC-XJ17)
文摘Dear Editor,This letter proposes a deep synchronization control(DSC) method to synchronize grid-forming converters with power grids. The method involves constructing a novel controller for grid-forming converters based on the stable deep dynamics model. To enhance the performance of the controller, the dynamics model is optimized within the deep reinforcement learning(DRL) framework. Simulation results verify that the proposed method can reduce frequency deviation and improve active power responses.
基金supported in part by the National Natural Science Foundation of China(62403396,62433018,62373113)the Guangdong Basic and Applied Basic Research Foundation(2023A1515011527,2023B1515120010)the Postdoctoral Fellowship Program of CPSF(GZB20240621)
文摘In this paper, the containment control problem in nonlinear multi-agent systems(NMASs) under denial-of-service(DoS) attacks is addressed. Firstly, a prediction model is obtained using the broad learning technique to train historical data generated by the system offline without DoS attacks. Secondly, the dynamic linearization method is used to obtain the equivalent linearization model of NMASs. Then, a novel model-free adaptive predictive control(MFAPC) framework based on historical and online data generated by the system is proposed, which combines the trained prediction model with the model-free adaptive control method. The development of the MFAPC method motivates a much simpler robust predictive control solution that is convenient to use in the case of DoS attacks. Meanwhile, the MFAPC algorithm provides a unified predictive framework for solving consensus tracking and containment control problems. The boundedness of the containment error can be proven by using the contraction mapping principle and the mathematical induction method. Finally, the proposed MFAPC is assessed through comparative experiments.
基金supported in part by the National Natural Science Foundation of China(62222301, 62073085, 62073158, 61890930-5, 62021003)the National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)Beijing Natural Science Foundation (JQ19013)。
文摘Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence.
文摘With the boom in maritime activities,the need for highly reliable maritime communication is becoming urgent,which is an important component of 5G/6G communication networks.However,the bandwidth reuse characteristic of 5G/6G networks will inevitably lead to severe interference,resulting in degradation in the communication performance of maritime users.In this paper,we propose a safe deep reinforcement learning based interference coordination scheme to jointly optimize the power control and bandwidth allocation in maritime communication systems,and exploit the quality-of-service requirements of users as the risk value references to evaluate the communication policies.In particular,this scheme designs a deep neural network to select the communication policies through the evaluation network and update the parameters using the target network,which improves the communication performance and speeds up the convergence rate.Moreover,the Nash equilibrium of the interference coordination game and the computational complexity of the proposed scheme are analyzed.Simulation and experimental results verify the performance gain of the proposed scheme compared with benchmarks.
基金supported by the National Natural Science Foundation of China(No.12132013).
文摘Performing diverse motor skills with a universal controller has been a longstanding challenge for legged robots.While motion imitation-based reinforcement learning(RL)has shown remarkable performance in reproducing designed motor skills,the trained controller is only suitable for one specific type of motion.Motion synthesis has been well developed to generate a variety of different motions for character animation,but those motions only contain kinematic information and cannot be used for control.In this study,we introduce a control pipeline combining motion synthesis and motion imitation-based RL for generic motor skills.We design an animation state machine to synthesize motion from various sources and feed the generated kinematic reference trajectory to the RL controller as part of the input.With the proposed method,we show that a single policy is able to learn various motor skills simultaneously.Further,we notice the ability of the policy to uncover the correlations lurking behind the reference motions to improve control performance.We analyze this ability based on the predictability of the reference trajectory and use the quantified measurements to optimize the design of the controller.To demonstrate the effectiveness of our method,we deploy the trained policy on hardware and,with a single control policy,the quadruped robot can perform various learned skills,including automatic gait transitions,high kick,and forward jump.
基金Supported by National Natural Science Foundation of China(Grant Nos.51975037,52375075).
文摘This paper proposes a modified iterative learning control(MILC)periodical feedback-feedforward algorithm to reduce the vibration of a rotor caused by coupled unbalance and parallel misalignment.The control of the vibration of the rotor is provided by an active magnetic actuator(AMA).The iterative gain of the MILC algorithm here presented has a self-adjustment based on the magnitude of the vibration.Notch filters are adopted to extract the synchronous(1×Ω)and twice rotational frequency(2×Ω)components of the rotor vibration.Both the notch frequency of the filter and the size of feedforward storage used during the experiment have a real-time adaptation to the rotational speed.The method proposed in this work can provide effective suppression of the vibration of the rotor in case of sudden changes or fluctuations of the rotor speed.Simulations and experiments using the MILC algorithm proposed here are carried out and give evidence to the feasibility and robustness of the technique proposed.
基金supported by the National Science and Technology Major Project(2021ZD0112702)the National Natural Science Foundation(NNSF)of China(62373100,62233003)the Natural Science Foundation of Jiangsu Province of China(BK20202006)。
文摘This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.
基金supported in part by the National Key R&D Program of China under Grant 2021YFB2011300the National Natural Science Foundation of China under Grant 52075262。
文摘This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.
基金supported by the National Natural Science Foundation of China (62173333, 12271522)Beijing Natural Science Foundation (Z210002)the Research Fund of Renmin University of China (2021030187)。
文摘For unachievable tracking problems, where the system output cannot precisely track a given reference, achieving the best possible approximation for the reference trajectory becomes the objective. This study aims to investigate solutions using the Ptype learning control scheme. Initially, we demonstrate the necessity of gradient information for achieving the best approximation.Subsequently, we propose an input-output-driven learning gain design to handle the imprecise gradients of a class of uncertain systems. However, it is discovered that the desired performance may not be attainable when faced with incomplete information.To address this issue, an extended iterative learning control scheme is introduced. In this scheme, the tracking errors are modified through output data sampling, which incorporates lowmemory footprints and offers flexibility in learning gain design.The input sequence is shown to converge towards the desired input, resulting in an output that is closest to the given reference in the least square sense. Numerical simulations are provided to validate the theoretical findings.