One application of software engineering is the vast and widely popular video game entertainment industry. Success of a video game product depends on how well the player base receives it. Of research towards understand...One application of software engineering is the vast and widely popular video game entertainment industry. Success of a video game product depends on how well the player base receives it. Of research towards understanding factors of success behind releasing a video game, we are interested in studying a factor known as Replayability. Towards a software engineering oriented game design methodology, we collect player opinions on Replayability via surveys and provide methods to analyze the data. We believe these results can help game designers to more successfully produce entertaining games with longer lasting appeal by utilizing our software engineering techniques.展开更多
It is well known that the kit completeness of parts processed in the previous stage is crucial for the subsequent manufacturing stage.This paper studies the flexible job shop scheduling problem(FJSP)with the objective...It is well known that the kit completeness of parts processed in the previous stage is crucial for the subsequent manufacturing stage.This paper studies the flexible job shop scheduling problem(FJSP)with the objective of material kitting,where a material kit is a collection of components that ensures that a batch of components can be ready at the same time during the product assembly process.In this study,we consider completion time variance and maximumcompletion time as scheduling objectives,continue the weighted summation process formultiple objectives,and design adaptive weighted summation parameters to optimize productivity and reduce the difference in completion time between components in the same kit.The Soft Actor Critic(SAC)algorithm is designed to be combined with the Adaptive Multi-Buffer Experience Replay(AMBER)mechanism to propose the SAC-AMBER algorithm.The AMBER mechanism optimizes the experience sampling and policy updating process and enhances learning efficiency by categorically storing the experience into the standard buffer,the high equipment utilization buffer,and the high productivity buffer.Experimental results show that the SAC-AMBER algorithm can effectively reduce the maximum completion time on multiple datasets,reduce the difference in component completion time in the same kit,and thus optimize the readiness of the part kits,demonstrating relatively good stability and convergence.Compared with traditional heuristics,meta-heuristics,and other deep reinforcement learning methods,the SAC-AMBER algorithm performs better in terms of solution quality and computational efficiency,and through extensive testing on multiple datasets,the algorithm has been confirmed to have good generalization ability,providing an effective solution to the FJSP problem.展开更多
In the current era of intelligent technologies,comprehensive and precise regional coverage path planning is critical for tasks such as environmental monitoring,emergency rescue,and agricultural plant protection.Owing ...In the current era of intelligent technologies,comprehensive and precise regional coverage path planning is critical for tasks such as environmental monitoring,emergency rescue,and agricultural plant protection.Owing to their exceptional flexibility and rapid deployment capabilities,unmanned aerial vehicles(UAVs)have emerged as the ideal platforms for accomplishing these tasks.This study proposes a swarm A^(*)-guided Deep Q-Network(SADQN)algorithm to address the coverage path planning(CPP)problem for UAV swarms in complex environments.Firstly,to overcome the dependency of traditional modeling methods on regular terrain environments,this study proposes an improved cellular decomposition method for map discretization.Simultaneously,a distributed UAV swarm system architecture is adopted,which,through the integration of multi-scale maps,addresses the issues of redundant operations and flight conflicts inmulti-UAV cooperative coverage.Secondly,the heuristic mechanism of the A^(*)algorithmis combinedwith full-coverage path planning,and this approach is incorporated at the initial stage ofDeep Q-Network(DQN)algorithm training to provide effective guidance in action selection,thereby accelerating convergence.Additionally,a prioritized experience replay mechanism is introduced to further enhance the coverage performance of the algorithm.To evaluate the efficacy of the proposed algorithm,simulation experiments were conducted in several irregular environments and compared with several popular algorithms.Simulation results show that the SADQNalgorithmoutperforms othermethods,achieving performance comparable to that of the baseline prior algorithm,with an average coverage efficiency exceeding 2.6 and fewer turning maneuvers.In addition,the algorithm demonstrates excellent generalization ability,enabling it to adapt to different environments.展开更多
In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapi...In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy.展开更多
Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increa...Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increasingly been integratedwithDeep Learning(DL)for real-time prediction of CVDs.However,DL models are prone to performance degradation due to concept drift and to catastrophic forgetting.To address this issue,we propose a realtime CVDs prediction approach,referred to as ADWIN-GFR that combines Convolutional Neural Network(CNN)layers,for spatial feature extraction,with Gated Recurrent Units(GRU),for temporal modeling,alongside adaptive drift detection and mitigation mechanisms.The proposed approach integratesAdaptiveWindowing(ADWIN)for realtime concept drift detection,a fine-tuning strategy based on Generative Features Replay(GFR)to preserve previously acquired knowledge,and a dynamic replay buffer ensuring variance,diversity,and data distribution coverage.Extensive experiments conducted on the MIT-BIH arrhythmia dataset demonstrate that ADWIN-GFR outperforms standard fine-tuning techniques,achieving an average post-drift accuracy of 95.4%,amacro F1-score of 93.9%,and a remarkably low forgetting score of 0.9%.It also exhibits an average drift detection delay of 12 steps and achieves an adaptation gain of 17.2%.These findings underscore the potential of ADWIN-GFR for deployment in real-world cardiac monitoring systems,including wearable ECG devices and hospital-based patient monitoring platforms.展开更多
Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issu...Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issue of automated vehicles affected by replay attacks.A proportional-integral-observer(PIO)with predetermined forgetting parameters is first constructed to acquire the dynamical information of vehicles.Then,a time-varying parameter and two positive scalars are employed to describe the temporal behavior of replay attacks.In light of such a scheme and the common properties of Laplace matrices,the closed-loop system with PIO-based controllers is transformed into a switched and time-delayed one.Furthermore,some sufficient conditions are derived to achieve the desired platooning performance by the view of the Lyapunov stability theory.The controller gains are analytically determined by resorting to the solution of certain matrix inequalities only dependent on maximum and minimum eigenvalues of communication topologies.Finally,a simulation example is provided to illustrate the effectiveness of the proposed control strategy.展开更多
In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenge...In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenges for UAV autonomous navigation and collision avoidance.In this paper,an improved deep-reinforcement-learning algorithm,Deep Q-Network with a Faster R-CNN model and a Data Deposit Mechanism(FRDDM-DQN),is proposed.A Faster R-CNN model(FR)is introduced and optimized to obtain the ability to extract obstacle information from images,and a new replay memory Data Deposit Mechanism(DDM)is designed to train an agent with a better performance.During training,a two-part training approach is used to reduce the time spent on training as well as retraining when the scenario changes.In order to verify the performance of the proposed method,a series of experiments,including training experiments,test experiments,and typical episodes experiments,is conducted in a 3D simulation environment.Experimental results show that the agent trained by the proposed FRDDM-DQN has the ability to navigate autonomously and avoid collisions,and performs better compared to the FRDQN,FR-DDQN,FR-Dueling DQN,YOLO-based YDDM-DQN,and original FR outputbased FR-ODQN.展开更多
Pupil dynamics are the important characteristics of face spoofing detection.The face recognition system is one of the most used biometrics for authenticating individual identity.The main threats to the facial recognit...Pupil dynamics are the important characteristics of face spoofing detection.The face recognition system is one of the most used biometrics for authenticating individual identity.The main threats to the facial recognition system are different types of presentation attacks like print attacks,3D mask attacks,replay attacks,etc.The proposed model uses pupil characteristics for liveness detection during the authentication process.The pupillary light reflex is an involuntary reaction controlling the pupil’s diameter at different light intensities.The proposed framework consists of two-phase methodologies.In the first phase,the pupil’s diameter is calculated by applying stimulus(light)in one eye of the subject and calculating the constriction of the pupil size on both eyes in different video frames.The above measurement is converted into feature space using Kohn and Clynes model-defined parameters.The Support Vector Machine is used to classify legitimate subjects when the diameter change is normal(or when the eye is alive)or illegitimate subjects when there is no change or abnormal oscillations of pupil behavior due to the presence of printed photograph,video,or 3D mask of the subject in front of the camera.In the second phase,we perform the facial recognition process.Scale-invariant feature transform(SIFT)is used to find the features from the facial images,with each feature having a size of a 128-dimensional vector.These features are scale,rotation,and orientation invariant and are used for recognizing facial images.The brute force matching algorithm is used for matching features of two different images.The threshold value we considered is 0.08 for good matches.To analyze the performance of the framework,we tested our model in two Face antispoofing datasets named Replay attack datasets and CASIA-SURF datasets,which were used because they contain the videos of the subjects in each sample having three modalities(RGB,IR,Depth).The CASIA-SURF datasets showed an 89.9%Equal Error Rate,while the Replay Attack datasets showed a 92.1%Equal Error Rate.展开更多
Various organizations store data online rather than on physical servers.As the number of user’s data stored in cloud servers increases,the attack rate to access data from cloud servers also increases.Different resear...Various organizations store data online rather than on physical servers.As the number of user’s data stored in cloud servers increases,the attack rate to access data from cloud servers also increases.Different researchers worked on different algorithms to protect cloud data from replay attacks.None of the papers used a technique that simultaneously detects a full-message and partial-message replay attack.This study presents the development of a TKN(Text,Key and Name)cryptographic algorithm aimed at protecting data from replay attacks.The program employs distinct ways to encrypt plain text[P],a user-defined Key[K],and a Secret Code[N].The novelty of the TKN cryptographic algorithm is that the bit value of each text is linked to another value with the help of the proposed algorithm,and the length of the cipher text obtained is twice the length of the original text.In the scenario that an attacker executes a replay attack on the cloud server,engages in cryptanalysis,or manipulates any data,it will result in automated modification of all associated values inside the backend.This mechanism has the benefit of enhancing the detectability of replay attacks.Nevertheless,the attacker cannot access data not included in any of the papers,regardless of how effective the attack strategy is.At the end of paper,the proposed algorithm’s novelty will be compared with different algorithms,and it will be discussed how far the proposed algorithm is better than all other algorithms.展开更多
Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different ...Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.展开更多
Network attack detection and mitigation require packet collection,pre-processing,feature analysis,classification,and post-processing.Models for these tasks sometimes become complex or inefficient when applied to real-...Network attack detection and mitigation require packet collection,pre-processing,feature analysis,classification,and post-processing.Models for these tasks sometimes become complex or inefficient when applied to real-time data samples.To mitigate hybrid assaults,this study designs an efficient forensic layer employing deep learning pattern analysis and multidomain feature extraction.In this paper,we provide a novel multidomain feature extraction method using Fourier,Z,Laplace,Discrete Cosine Transform(DCT),1D Haar Wavelet,Gabor,and Convolutional Operations.Evolutionary method dragon fly optimisation reduces feature dimensionality and improves feature selection accuracy.The selected features are fed into VGGNet and GoogLeNet models using binary cascaded neural networks to analyse network traffic patterns,detect anomalies,and warn network administrators.The suggested model tackles the inadequacies of existing approaches to hybrid threats,which are growing more common and challenge conventional security measures.Our model integrates multidomain feature extraction,deep learning pattern analysis,and the forensic layer to improve intrusion detection and prevention systems.In diverse attack scenarios,our technique has 3.5% higher accuracy,4.3% higher precision,8.5% higher recall,and 2.9% lower delay than previous models.展开更多
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay ...Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.展开更多
In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use ...In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use of predictive arguments with a twofold aim:1)Promptly detect malicious agent behaviors affecting normal system operations;2)Apply specific control actions,based on predictive ideas,for mitigating as much as possible undesirable domino effects resulting from adversary operations.Specifically,the multi-agent system is topologically described by a leader-follower digraph characterized by a unique leader and set-theoretic receding horizon control ideas are exploited to develop a distributed algorithm capable to instantaneously recognize the attacked agent.Finally,numerical simulations are carried out to show benefits and effectiveness of the proposed approach.展开更多
As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and ...As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm.展开更多
This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constrai...This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constraints are imposed by not only the ego system’s own physical limitations but also other systems operating nearby.Since the model of the external agent is required to impose control barrier functions(CBFs)as safety constraints,a safety-aware loss function is defined and minimized to learn the uncertain and unknown behavior of external agents.More specifically,the loss function is defined based on barrier function error,instead of the system model error,and is minimized for both current samples as well as past samples stored in the memory to assure a fast and generalizable learning algorithm for approximating the safe set.The proposed model learning and CBF are then integrated together to form a learning-enabled zeroing CBF(L-ZCBF),which employs the approximated trajectory information of the external agents provided by the learned model but shrinks the safety boundary in case of an imminent safety violation using instantaneous sensory observations.It is shown that the proposed L-ZCBF assures the safety guarantees during learning and even in the face of inaccurate or simplified approximation of external agents,which is crucial in safety-critical applications in highly interactive environments.The efficacy of the proposed method is examined in a simulation of safe maneuver control of a vehicle in an urban area.展开更多
Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can lear...Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment.In the paper,we propose the sampled-data RL control strategy to reduce the computational demand.In the sampled-data control strategy,the whole control system is of a hybrid structure,in which the plant is of continuous structure while the controller(RL agent)adopts a discrete structure.Given that the continuous states of the plant will be the input of the agent,the state–action value function is approximated by the fully connected feed-forward neural networks(FCFFNN).Instead of learning the controller at every step during the interaction with the environment,the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay.In the acting stage,the most effective experience obtained during the interaction with the environment will be stored and during the learning stage,the stored experience will be replayed to customized times,which helps enhance the experience replay process.The effectiveness of proposed approach will be verified by simulation examples.展开更多
The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast ...The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast communication and plaintext to transmit data, the data frame does not contain the source address and the destination address, making the Electric Vehicle (EV) vulnerable to replay attack in the charging process. In order to verify the security problems of the protocol, this paper uses 27,655 message data in the complete charging process provided by Shanghai Thaisen electric company, and analyzes these actual data frames one by one with the program written by C++. In order to enhance the security of the protocol, Rivest-Shamir-Adleman (RSA) digital signature and adding random numbers are proposed to resist replay attack. Under the experimental environment of Eclipse, the normal charging of electric vehicles, RSA digital signature and random number defense are simulated. Experimental results show that RSA digital signature cannot resist replay attack, and adding random numbers can effectively enhance the ability of EV to resist replay attack during charging.展开更多
In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learni...In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.展开更多
Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, w...Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.展开更多
Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,kno...Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,known as catastrophic forgetting,due to allowing parameter sharing.In this work,we consider a more practical online class-incremental CL setting,where the model learns new samples in an online manner and may continuously experience new classes.Moreover,prior knowledge is unavailable during training and evaluation.Existing works usually explore sample usages from a single dimension,which ignores a lot of valuable supervisory information.To better tackle the setting,we propose a novel replay-based CL method,which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge.Specifically,besides the previous raw samples,we store the corresponding logits and features in the memory.Furthermore,to imitate the prediction of the past model,we construct extra constraints by leveraging multi-level information stored in the memory.With the same number of samples for replay,our method can use more past knowledge to prevent interference.We conduct extensive evaluations on several popular CL datasets,and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory.We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios.展开更多
文摘One application of software engineering is the vast and widely popular video game entertainment industry. Success of a video game product depends on how well the player base receives it. Of research towards understanding factors of success behind releasing a video game, we are interested in studying a factor known as Replayability. Towards a software engineering oriented game design methodology, we collect player opinions on Replayability via surveys and provide methods to analyze the data. We believe these results can help game designers to more successfully produce entertaining games with longer lasting appeal by utilizing our software engineering techniques.
文摘It is well known that the kit completeness of parts processed in the previous stage is crucial for the subsequent manufacturing stage.This paper studies the flexible job shop scheduling problem(FJSP)with the objective of material kitting,where a material kit is a collection of components that ensures that a batch of components can be ready at the same time during the product assembly process.In this study,we consider completion time variance and maximumcompletion time as scheduling objectives,continue the weighted summation process formultiple objectives,and design adaptive weighted summation parameters to optimize productivity and reduce the difference in completion time between components in the same kit.The Soft Actor Critic(SAC)algorithm is designed to be combined with the Adaptive Multi-Buffer Experience Replay(AMBER)mechanism to propose the SAC-AMBER algorithm.The AMBER mechanism optimizes the experience sampling and policy updating process and enhances learning efficiency by categorically storing the experience into the standard buffer,the high equipment utilization buffer,and the high productivity buffer.Experimental results show that the SAC-AMBER algorithm can effectively reduce the maximum completion time on multiple datasets,reduce the difference in component completion time in the same kit,and thus optimize the readiness of the part kits,demonstrating relatively good stability and convergence.Compared with traditional heuristics,meta-heuristics,and other deep reinforcement learning methods,the SAC-AMBER algorithm performs better in terms of solution quality and computational efficiency,and through extensive testing on multiple datasets,the algorithm has been confirmed to have good generalization ability,providing an effective solution to the FJSP problem.
文摘In the current era of intelligent technologies,comprehensive and precise regional coverage path planning is critical for tasks such as environmental monitoring,emergency rescue,and agricultural plant protection.Owing to their exceptional flexibility and rapid deployment capabilities,unmanned aerial vehicles(UAVs)have emerged as the ideal platforms for accomplishing these tasks.This study proposes a swarm A^(*)-guided Deep Q-Network(SADQN)algorithm to address the coverage path planning(CPP)problem for UAV swarms in complex environments.Firstly,to overcome the dependency of traditional modeling methods on regular terrain environments,this study proposes an improved cellular decomposition method for map discretization.Simultaneously,a distributed UAV swarm system architecture is adopted,which,through the integration of multi-scale maps,addresses the issues of redundant operations and flight conflicts inmulti-UAV cooperative coverage.Secondly,the heuristic mechanism of the A^(*)algorithmis combinedwith full-coverage path planning,and this approach is incorporated at the initial stage ofDeep Q-Network(DQN)algorithm training to provide effective guidance in action selection,thereby accelerating convergence.Additionally,a prioritized experience replay mechanism is introduced to further enhance the coverage performance of the algorithm.To evaluate the efficacy of the proposed algorithm,simulation experiments were conducted in several irregular environments and compared with several popular algorithms.Simulation results show that the SADQNalgorithmoutperforms othermethods,achieving performance comparable to that of the baseline prior algorithm,with an average coverage efficiency exceeding 2.6 and fewer turning maneuvers.In addition,the algorithm demonstrates excellent generalization ability,enabling it to adapt to different environments.
基金supported by the Basic Scientific Research Business Fund Project of Higher Education Institutions in Heilongjiang Province(145409601)the First Batch of Experimental Teaching and Teaching Laboratory Construction Research Projects in Heilongjiang Province(SJGZ20240038).
文摘In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R196)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Cardiovascular diseases(CVDs)continue to present a leading cause ofmortalityworldwide,emphasizing the importance of early and accurate prediction.Electrocardiogram(ECG)signals,central to cardiac monitoring,have increasingly been integratedwithDeep Learning(DL)for real-time prediction of CVDs.However,DL models are prone to performance degradation due to concept drift and to catastrophic forgetting.To address this issue,we propose a realtime CVDs prediction approach,referred to as ADWIN-GFR that combines Convolutional Neural Network(CNN)layers,for spatial feature extraction,with Gated Recurrent Units(GRU),for temporal modeling,alongside adaptive drift detection and mitigation mechanisms.The proposed approach integratesAdaptiveWindowing(ADWIN)for realtime concept drift detection,a fine-tuning strategy based on Generative Features Replay(GFR)to preserve previously acquired knowledge,and a dynamic replay buffer ensuring variance,diversity,and data distribution coverage.Extensive experiments conducted on the MIT-BIH arrhythmia dataset demonstrate that ADWIN-GFR outperforms standard fine-tuning techniques,achieving an average post-drift accuracy of 95.4%,amacro F1-score of 93.9%,and a remarkably low forgetting score of 0.9%.It also exhibits an average drift detection delay of 12 steps and achieves an adaptation gain of 17.2%.These findings underscore the potential of ADWIN-GFR for deployment in real-world cardiac monitoring systems,including wearable ECG devices and hospital-based patient monitoring platforms.
基金supported in part by the National Natural Science Foundation of China (61973219,U21A2019,61873058)the Hainan Province Science and Technology Special Fund (ZDYF2022SHFZ105)。
文摘Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issue of automated vehicles affected by replay attacks.A proportional-integral-observer(PIO)with predetermined forgetting parameters is first constructed to acquire the dynamical information of vehicles.Then,a time-varying parameter and two positive scalars are employed to describe the temporal behavior of replay attacks.In light of such a scheme and the common properties of Laplace matrices,the closed-loop system with PIO-based controllers is transformed into a switched and time-delayed one.Furthermore,some sufficient conditions are derived to achieve the desired platooning performance by the view of the Lyapunov stability theory.The controller gains are analytically determined by resorting to the solution of certain matrix inequalities only dependent on maximum and minimum eigenvalues of communication topologies.Finally,a simulation example is provided to illustrate the effectiveness of the proposed control strategy.
文摘In some military application scenarios,Unmanned Aerial Vehicles(UAVs)need to perform missions with the assistance of on-board cameras when radar is not available and communication is interrupted,which brings challenges for UAV autonomous navigation and collision avoidance.In this paper,an improved deep-reinforcement-learning algorithm,Deep Q-Network with a Faster R-CNN model and a Data Deposit Mechanism(FRDDM-DQN),is proposed.A Faster R-CNN model(FR)is introduced and optimized to obtain the ability to extract obstacle information from images,and a new replay memory Data Deposit Mechanism(DDM)is designed to train an agent with a better performance.During training,a two-part training approach is used to reduce the time spent on training as well as retraining when the scenario changes.In order to verify the performance of the proposed method,a series of experiments,including training experiments,test experiments,and typical episodes experiments,is conducted in a 3D simulation environment.Experimental results show that the agent trained by the proposed FRDDM-DQN has the ability to navigate autonomously and avoid collisions,and performs better compared to the FRDQN,FR-DDQN,FR-Dueling DQN,YOLO-based YDDM-DQN,and original FR outputbased FR-ODQN.
基金funded by Researchers Supporting Program at King Saud University (RSPD2023R809).
文摘Pupil dynamics are the important characteristics of face spoofing detection.The face recognition system is one of the most used biometrics for authenticating individual identity.The main threats to the facial recognition system are different types of presentation attacks like print attacks,3D mask attacks,replay attacks,etc.The proposed model uses pupil characteristics for liveness detection during the authentication process.The pupillary light reflex is an involuntary reaction controlling the pupil’s diameter at different light intensities.The proposed framework consists of two-phase methodologies.In the first phase,the pupil’s diameter is calculated by applying stimulus(light)in one eye of the subject and calculating the constriction of the pupil size on both eyes in different video frames.The above measurement is converted into feature space using Kohn and Clynes model-defined parameters.The Support Vector Machine is used to classify legitimate subjects when the diameter change is normal(or when the eye is alive)or illegitimate subjects when there is no change or abnormal oscillations of pupil behavior due to the presence of printed photograph,video,or 3D mask of the subject in front of the camera.In the second phase,we perform the facial recognition process.Scale-invariant feature transform(SIFT)is used to find the features from the facial images,with each feature having a size of a 128-dimensional vector.These features are scale,rotation,and orientation invariant and are used for recognizing facial images.The brute force matching algorithm is used for matching features of two different images.The threshold value we considered is 0.08 for good matches.To analyze the performance of the framework,we tested our model in two Face antispoofing datasets named Replay attack datasets and CASIA-SURF datasets,which were used because they contain the videos of the subjects in each sample having three modalities(RGB,IR,Depth).The CASIA-SURF datasets showed an 89.9%Equal Error Rate,while the Replay Attack datasets showed a 92.1%Equal Error Rate.
基金Deanship of Scientific Research at Majmaah University for supporting this work under Project Number R-2023-811.
文摘Various organizations store data online rather than on physical servers.As the number of user’s data stored in cloud servers increases,the attack rate to access data from cloud servers also increases.Different researchers worked on different algorithms to protect cloud data from replay attacks.None of the papers used a technique that simultaneously detects a full-message and partial-message replay attack.This study presents the development of a TKN(Text,Key and Name)cryptographic algorithm aimed at protecting data from replay attacks.The program employs distinct ways to encrypt plain text[P],a user-defined Key[K],and a Secret Code[N].The novelty of the TKN cryptographic algorithm is that the bit value of each text is linked to another value with the help of the proposed algorithm,and the length of the cipher text obtained is twice the length of the original text.In the scenario that an attacker executes a replay attack on the cloud server,engages in cryptanalysis,or manipulates any data,it will result in automated modification of all associated values inside the backend.This mechanism has the benefit of enhancing the detectability of replay attacks.Nevertheless,the attacker cannot access data not included in any of the papers,regardless of how effective the attack strategy is.At the end of paper,the proposed algorithm’s novelty will be compared with different algorithms,and it will be discussed how far the proposed algorithm is better than all other algorithms.
文摘Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.
文摘Network attack detection and mitigation require packet collection,pre-processing,feature analysis,classification,and post-processing.Models for these tasks sometimes become complex or inefficient when applied to real-time data samples.To mitigate hybrid assaults,this study designs an efficient forensic layer employing deep learning pattern analysis and multidomain feature extraction.In this paper,we provide a novel multidomain feature extraction method using Fourier,Z,Laplace,Discrete Cosine Transform(DCT),1D Haar Wavelet,Gabor,and Convolutional Operations.Evolutionary method dragon fly optimisation reduces feature dimensionality and improves feature selection accuracy.The selected features are fed into VGGNet and GoogLeNet models using binary cascaded neural networks to analyse network traffic patterns,detect anomalies,and warn network administrators.The suggested model tackles the inadequacies of existing approaches to hybrid threats,which are growing more common and challenge conventional security measures.Our model integrates multidomain feature extraction,deep learning pattern analysis,and the forensic layer to improve intrusion detection and prevention systems.In diverse attack scenarios,our technique has 3.5% higher accuracy,4.3% higher precision,8.5% higher recall,and 2.9% lower delay than previous models.
基金supported by the National Natural Science Foundation of China(61751210,61572441)。
文摘Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
文摘In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use of predictive arguments with a twofold aim:1)Promptly detect malicious agent behaviors affecting normal system operations;2)Apply specific control actions,based on predictive ideas,for mitigating as much as possible undesirable domino effects resulting from adversary operations.Specifically,the multi-agent system is topologically described by a leader-follower digraph characterized by a unique leader and set-theoretic receding horizon control ideas are exploited to develop a distributed algorithm capable to instantaneously recognize the attacked agent.Finally,numerical simulations are carried out to show benefits and effectiveness of the proposed approach.
基金co-supported by the National Natural Science Foundation of China(Nos.62003267 and 61573285)the Natural Science Basic Research Plan in Shaanxi Province of China(No.2020JQ-220)+1 种基金the Open Project of Science and Technology on Electronic Information Control Laboratory,China(No.JS20201100339)the Open Project of Science and Technology on Electromagnetic Space Operations and Applications Laboratory,China(No.JS20210586512).
文摘As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm.
文摘This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constraints are imposed by not only the ego system’s own physical limitations but also other systems operating nearby.Since the model of the external agent is required to impose control barrier functions(CBFs)as safety constraints,a safety-aware loss function is defined and minimized to learn the uncertain and unknown behavior of external agents.More specifically,the loss function is defined based on barrier function error,instead of the system model error,and is minimized for both current samples as well as past samples stored in the memory to assure a fast and generalizable learning algorithm for approximating the safe set.The proposed model learning and CBF are then integrated together to form a learning-enabled zeroing CBF(L-ZCBF),which employs the approximated trajectory information of the external agents provided by the learned model but shrinks the safety boundary in case of an imminent safety violation using instantaneous sensory observations.It is shown that the proposed L-ZCBF assures the safety guarantees during learning and even in the face of inaccurate or simplified approximation of external agents,which is crucial in safety-critical applications in highly interactive environments.The efficacy of the proposed method is examined in a simulation of safe maneuver control of a vehicle in an urban area.
基金supported by Imperial College London,UK,King’s College London,UK and Engineering and Physical Sciences Research Council(EPSRC),UK.
文摘Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment.In the paper,we propose the sampled-data RL control strategy to reduce the computational demand.In the sampled-data control strategy,the whole control system is of a hybrid structure,in which the plant is of continuous structure while the controller(RL agent)adopts a discrete structure.Given that the continuous states of the plant will be the input of the agent,the state–action value function is approximated by the fully connected feed-forward neural networks(FCFFNN).Instead of learning the controller at every step during the interaction with the environment,the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay.In the acting stage,the most effective experience obtained during the interaction with the environment will be stored and during the learning stage,the stored experience will be replayed to customized times,which helps enhance the experience replay process.The effectiveness of proposed approach will be verified by simulation examples.
文摘The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast communication and plaintext to transmit data, the data frame does not contain the source address and the destination address, making the Electric Vehicle (EV) vulnerable to replay attack in the charging process. In order to verify the security problems of the protocol, this paper uses 27,655 message data in the complete charging process provided by Shanghai Thaisen electric company, and analyzes these actual data frames one by one with the program written by C++. In order to enhance the security of the protocol, Rivest-Shamir-Adleman (RSA) digital signature and adding random numbers are proposed to resist replay attack. Under the experimental environment of Eclipse, the normal charging of electric vehicles, RSA digital signature and random number defense are simulated. Experimental results show that RSA digital signature cannot resist replay attack, and adding random numbers can effectively enhance the ability of EV to resist replay attack during charging.
基金supported in part by the National Key Research and Development Program of China(2021YFB1714700)the National Natural Science Foundation of China(62022061,6192100028)。
文摘In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.
基金supported by National Key Research and Development Program of China(NO.2018AAA0103003)National Natural Science Foundation of China(NO.61773378)+1 种基金Basic Research Program(NO.JCKY*******B029)Strategic Priority Research Program of Chinese Academy of Science(NO.XDB32050100).
文摘Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.
基金supported in part by the National Natura Science Foundation of China(U2013602,61876181,51521003)the Nationa Key R&D Program of China(2020YFB13134)+2 种基金Shenzhen Science and Technology Research and Development Foundation(JCYJ20190813171009236)Beijing Nova Program of Science and Technology(Z191100001119043)the Youth Innovation Promotion Association,Chinese Academy of Sciences。
文摘Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,known as catastrophic forgetting,due to allowing parameter sharing.In this work,we consider a more practical online class-incremental CL setting,where the model learns new samples in an online manner and may continuously experience new classes.Moreover,prior knowledge is unavailable during training and evaluation.Existing works usually explore sample usages from a single dimension,which ignores a lot of valuable supervisory information.To better tackle the setting,we propose a novel replay-based CL method,which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge.Specifically,besides the previous raw samples,we store the corresponding logits and features in the memory.Furthermore,to imitate the prediction of the past model,we construct extra constraints by leveraging multi-level information stored in the memory.With the same number of samples for replay,our method can use more past knowledge to prevent interference.We conduct extensive evaluations on several popular CL datasets,and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory.We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios.