Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain...Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.展开更多
In response to the distinctly different heating load characteristics within heterogeneous building complex,traditional heating load allocation strategies based on fixed weights can no longer meet the requirements for ...In response to the distinctly different heating load characteristics within heterogeneous building complex,traditional heating load allocation strategies based on fixed weights can no longer meet the requirements for energy conservation and improving indoor temperature satisfaction rates.This study addresses this problem by proposing an adaptive-weighted multi-objective reinforcement learning(Adaptive-Weighted MORL)framework for a heterogeneous building complex comprising a training gym,office building,dormitory,and cafeteria.The framework achieves dynamic balance optimization between heating load and thermal comfort through an adaptive weight adjustment mechanism integrating proximal policy optimization(PPO)algorithm and non-dominated sorting genetic algorithm II(NSGA-II).PPO learns optimal heating load allocation strategies to adapt to environmental changes,while NSGA-II generates Pareto-optimal solution sets to guide PPO’s weight coefficient updates.This mechanism dynamically adjusts the heating load weight and thermal comfort weight,prioritizing thermal comfort weight under extreme weather conditions.Results demonstrate that,compared to the PPO method and traditional fixed-weight approach,the proposed framework achieves an overall energy saving rate of 22.1%,and a peak heating load reduction exceeding 40%,while maintaining indoor temperature satisfaction rates above 91%in most building types.Notably,under extreme conditions(such as the peak load day of March 17),the framework achieves a 39%peak reduction rate and a 22.2%daily energy saving rate.These findings thoroughly validate its effectiveness in complex dynamic environments.Overall,this framework provides an intelligent solution for optimizing heating load allocation across heterogeneous building types,effectively balancing the conflicting objectives of energy efficiency and thermal comfort while adapting to dynamic environmental conditions.展开更多
In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordinat...In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordination control strategy under a single objective and simple conditions is difficult to adapt to the multi-dimensional state variables’input.In this paper,we propose a deep reinforcement learning algorithm-based multi-objective parallel human-machine steering coordination strategy for path tracking considering driver misoperation and external disturbance.Firstly,the driver steering mathematical model is constructed based on the driver preview characteristics and steering delay response,and the driver characteristic parameters are fitted after collecting the actual driver driving data.Secondly,considering that the vehicle is susceptible to the influence of external disturbances during the driving process,the Tube MPC(Tube Model Predictive Control)based path tracking steering controller is designed based on the vehicle system dynamics error model.After verifying that the driver steering model meets the driver steering operation characteristics,DQN(Deep Q-network),DDPG(Deep Deterministic Policy Gradient)and TD3(Twin Delayed Deep Deterministic Policy Gradient)deep reinforcement learning algorithms are utilized to design a multi-objective parallel steering coordination strategy which satisfies the multi-dimensional state variables’input of the vehicle.Finally,the tracking accuracy,lateral safety,human-machine conflict and driver steering load evaluation index are designed in different driver operation states and different road environments,and the performance of the parallel steering coordination control strategies with different deep reinforcement learning algorithms and fuzzy algorithms are compared by simulations and hardware in the loop experiments.The results show that the parallel steering collaborative strategy based on a deep reinforcement learning algorithm can more effectively assist the driver in tracking the target path under lateral wind interference and driver misoperation,and the TD3-based coordination control strategy has better overall performance.展开更多
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr...This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.展开更多
In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Mu...In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Multi-agent reinforcement learning(MARL)overcomes this limitation by allowing several agents to learn simultaneously within a shared environment,each choosing actions that maximize its own or the group's rewards.By explicitly modeling and exploiting agent-to-agent dynamics,MARL can align those interactions with pedagogical goals such as peer tutoring,collaborative problem-solving,or gamified competition,thus opening richer avenues for adaptive and socially informed learning experiences.This survey investigates the impact of MARL on educational outcomes by examining evidence of its effectiveness in enhancing learner performance,engagement,equity,and reducing teacher workload compared to single agent or traditional approaches.It explores the educational domains and pedagogical problems addressed by MARL,identifies the algorithmic families used,and analyzes their influence on learning.The review also assesses experimental settings and evaluation metrics to determine ecological validity,and outlines current challenges and future research directions in applying MARL to education.展开更多
Federated learning is a distributed framework that trains a centralised model using data from multiple clients without transferring that data to a central server.Despite rapid progress,federated learning still faces s...Federated learning is a distributed framework that trains a centralised model using data from multiple clients without transferring that data to a central server.Despite rapid progress,federated learning still faces several unsolved challenges.Specifically,communication costs and system heterogeneity,such as nonidentical data distribution,hinder federated learning's progress.Several approaches have recently emerged for federated learning involving heterogeneous clients with varying computational capabilities(namely,heterogeneous federated learning).However,heterogeneous federated learning faces two key challenges:optimising model size and determining client selection ratios.Moreover,efficiently aggregating local models from clients with diverse capabilities is crucial for addressing system heterogeneity and communication efficiency.This paper proposes an evolutionary multiobjective optimisation framework for heterogeneous federated learning(MOHFL)to address these issues.Our approach elegantly formulates and solves a biobjective optimisation problem that minimises communication cost and model error rate.The decision variables in this framework comprise model sizes and client selection ratios for each Q client cluster,yielding a total of 2×Q optimisation parameters to be tuned.We develop a partition-based strategy for MOHFL that segregates clients into clusters based on their communication and computation capabilities.Additionally,we implement an adaptive model sizing mechanism that dynamically assigns appropriate subnetwork architectures to clients based on their computational constraints.We also propose a unified aggregation framework to combine models of varying sizes from heterogeneous clients effectively.Extensive experiments on multiple datasets demonstrate the effectiveness and superiority of our proposed method compared to existing approaches.展开更多
Adversarial Reinforcement Learning(ARL)models for intelligent devices and Network Intrusion Detection Systems(NIDS)improve systemresilience against sophisticated cyber-attacks.As a core component of ARL,Adversarial Tr...Adversarial Reinforcement Learning(ARL)models for intelligent devices and Network Intrusion Detection Systems(NIDS)improve systemresilience against sophisticated cyber-attacks.As a core component of ARL,Adversarial Training(AT)enables NIDS agents to discover and prevent newattack paths by exposing them to competing examples,thereby increasing detection accuracy,reducing False Positives(FPs),and enhancing network security.To develop robust decision-making capabilities for real-world network disruptions and hostile activity,NIDS agents are trained in adversarial scenarios to monitor the current state and notify management of any abnormal or malicious activity.The accuracy and timeliness of the IDS were crucial to the network’s availability and reliability at this time.This paper analyzes ARL applications in NIDS,revealing State-of-The-Art(SoTA)methodology,issues,and future research prospects.This includes Reinforcement Machine Learning(RML)-based NIDS,which enables an agent to interact with the environment to achieve a goal,andDeep Reinforcement Learning(DRL)-based NIDS,which can solve complex decision-making problems.Additionally,this survey study addresses cybersecurity adversarial circumstances and their importance for ARL and NIDS.Architectural design,RL algorithms,feature representation,and training methodologies are examined in the ARL-NIDS study.This comprehensive study evaluates ARL for intelligent NIDS research,benefiting cybersecurity researchers,practitioners,and policymakers.The report promotes cybersecurity defense research and innovation.展开更多
Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic top...Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.展开更多
Muon scattering tomography(MST) is a powerful noninvasive imaging technique with significant applications in nuclear material detection and security screening.Traditional MST usually relies on the point of closest app...Muon scattering tomography(MST) is a powerful noninvasive imaging technique with significant applications in nuclear material detection and security screening.Traditional MST usually relies on the point of closest approach(PoCA) algorithm to reconstruct images from muon scattering data;however,PoCA often suffers from suboptimal image clarity and resolution.To overcome these challenges,we propose a novel approach that leverages reinforcement learning(RL) to enhance MST reconstruction,termed the μRL-enhanced method.By framing the MST optimization task as an RL problem,we developed an intelligent agent capable of dynamically adjusting the key PoCA parameters.The agent is trained using a multi-objective reward function that guides the optimization toward higher-quality reconstructions.Our experimental results show that theμRL-enhanced method significantly outperforms the traditional PoCA baseline acros s multiple benchmark metrics.Specifically,the proposed approach on average attains a 307% improvement in the intersection over union(IoU),a 79% increase in the structural similarity index measure(SSIM),and a 8.4% enhancement in the peak signal-to-noise ratio(PSNR) across four experiments.Furthermore,when benchmarked against the maximum likelihood scattering and displacement(MLSD)algorithm,the μRL-enhanced method offers modest gains in PS NR and IoU,together with a one-third increase in SSIM.These improvements demonstrate the enhanced reconstruction accuracy and structural fidelity of the μRL-enhanced method,highlighting its potential to advance MST technologies and their applications.展开更多
Multi-agent reinforcement learning(MARL)has proven its effectiveness in cooperative multi-agent systems(MASs)but still faces issues on the curse of dimensionality and learning efficiency.The main difficulty is caused ...Multi-agent reinforcement learning(MARL)has proven its effectiveness in cooperative multi-agent systems(MASs)but still faces issues on the curse of dimensionality and learning efficiency.The main difficulty is caused by the strong inter-agent coupling nature embedded in an MARL problem,which is yet to be fully exploited in existing algorithms.In this work,we recognize a learning graph characterizing the dependence between individual rewards and individual policies.Then we propose a graph-based reward aggregation(GRA)method,which utilizes the inherent coupling relationship among agents to eliminate redundant information.Specifically,GRA passes information among cooperating agents through graph attention networks to obtain aggregated rewards that contribute to the fitting of the value function,making each agent learn a decentralized executable cooperation policy.In addition,we propose a variant of GRA,named GRA-decen,which achieves decentralized training and decentralized execution(DTDE)when each agent only has access to information of partial agents in the learning process.We conduct experiments in different environments and demonstrate the practicality and scalability of our algorithms.展开更多
Ride-hailing electric vehicles are mobile resources with dispatch potential to improve resilience.However,they have not been well investigated because their charging and order-serving are affected or managed by the po...Ride-hailing electric vehicles are mobile resources with dispatch potential to improve resilience.However,they have not been well investigated because their charging and order-serving are affected or managed by the power grid dispatching center and the ride-hailing platform.Effective pre-strategies can improve the prevention ability for high-impact and low-probability(HILP)events and provide the foundation for measures in the response and restoration stages.First,this paper proposes a resilience reserve to expand the existing research on power system resilience.Secondly,this paper puts forward an interactive method of deep reinforcement learning,which considers the interests of both the power grid dispatching center and the ride-hailing platform.It improves the resilience reserve by achieving the order dispatch,orderly charging management of ride-hailing electric vehicles,and the pricing strategy of charging stations.Finally,this paper uses a practical example covering about 107.32 km2 in the center of Chengdu to verify that the proposed method improves the resilience reserve of the power system without obviously damaging the interests of the ride-hailing platform.展开更多
Reinforcement learning(RL),as an important branch of machine learning,has recently achieved extensive attention and success in many applications.Its main idea is to enable agents to continuously learn to make optimal ...Reinforcement learning(RL),as an important branch of machine learning,has recently achieved extensive attention and success in many applications.Its main idea is to enable agents to continuously learn to make optimal decisions by trying to maximize a reward function for their actions and interactions with the environment.However,making highquality decisions in complex and uncertain real-world scenarios is a challenging task.The interference and attacks in such scenarios tend to destroy the existing strategies.Maintaining RL's optimal performance in various cases and adapting to changing environments remains an important challenge.This article presents a comprehensive review of recent advancements in robust reinforcement learning(RRL),and analyzes them from the perspectives of challenges,methodologies,and applications.It systematically evaluates current progress in RRL and summarizes the commonly used benchmark platforms.Finally,several open challenges are discussed to stimulate further research and guide future developments in this area.展开更多
Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunitie...Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunities and challenges.As RL continues to demonstrate remarkable success in model-free and partially observable environments,its real-world deployment increasingly requires effective collaboration with human operators and stakeholders.This article systematically examines HAII techniques in RL through both theoretical analysis and practical case studies.We establish a conceptual framework built upon three fundamental pillars of effective human-AI collaboration:computational trust modeling,system usability,and decision understandability.Our comprehensive review organizes HAII methods into five key categories:(1)learning from human feedback,including various shaping approaches;(2)learning from human demonstration through inverse RL and imitation learning;(3)shared autonomy architectures for dynamic control allocation;(4)human-in-the-loop querying strategies for active learning;and(5)explainable RL techniques for interpretable policy generation.Recent state-of-the-art works are critically reviewed,with particular emphasis on advances incorporating large language models in human-AI interaction research.To illustrate some concepts,we present three detailed case studies:an empirical trust model for farmers adopting AI-driven agricultural management systems,the implementation of ethical constraints in roboticmotion planning through human-guided RL,and an experimental investigation of human trust dynamics using a multi-armed bandit paradigm.These applications demonstrate how HAII principles can enhance RL systems’practical utility while bridging the gap between theoretical RL and real-world human-centered applications,ultimately contributing to more deployable and socially beneficial intelligent systems.展开更多
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier...With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.展开更多
To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy s...To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy storage(ES)devices),and the increased grid load fluctuations and safety risks due to uncoordinated electric vehicles(EVs)charging,this paper proposes a novel dual-scale hierarchical collaborative optimization strategy.This strategy decouples system-level economic dispatch from distributed EV agent control,effectively solving the resource coordination conflicts arising from the high computational complexity,poor scalability of existing centralized optimization,or the reliance on local information decision-making in fully decentralized frameworks.At the lower level,an EV charging and discharging model with a hybrid discrete-continuous action space is established,and optimized using an improved Parameterized Deep Q-Network(PDQN)algorithm,which directly handles mode selection and power regulation while embedding physical constraints to ensure safety.At the upper level,microgrid(MG)operators adopt a dynamic pricing strategy optimized through Deep Reinforcement Learning(DRL)to maximize economic benefits and achieve peak-valley shaving.Simulation results show that the proposed strategy outperforms traditional methods,reducing the total operating cost of the MG by 21.6%,decreasing the peak-to-valley load difference by 33.7%,reducing the number of voltage limit violations by 88.9%,and lowering the average electricity cost for EV users by 15.2%.This method brings a win-win result for operators and users,providing a reliable and efficient scheduling solution for distribution networks with high renewable energy penetration rates.展开更多
Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making p...Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.展开更多
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision...Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.展开更多
While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance re...While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance remains underexplored in field investigations.To evaluate the practical applicability of this emerging technique in adverse shallow sea channels,a field experiment was conducted using three communication modes:orthogonal frequency division multiplexing(OFDM),M-ary frequency-shift keying(MFSK),and direct sequence spread spectrum(DSSS)for reinforcement learning-driven adaptive modulation.Specifically,a Q-learning method is used to select the optimal modulation mode according to the channel quality quantified by signal-to-noise ratio,multipath spread length,and Doppler frequency offset.Experimental results demonstrate that the reinforcement learning-based adaptive modulation scheme outperformed fixed threshold detection in terms of total throughput and average bit error rate,surpassing conventional adaptive modulation strategies.展开更多
As the types of traffic requests increase,the elastic optical network(EON)is considered as a promising architecture to carry multiple types of traffic requests simultaneously,including immediate reservation(IR)and adv...As the types of traffic requests increase,the elastic optical network(EON)is considered as a promising architecture to carry multiple types of traffic requests simultaneously,including immediate reservation(IR)and advance reservation(AR).Various resource allocation schemes for IR/AR requests have been designed in EON to reduce bandwidth blocking probability(BBP).However,these schemes do not consider different transmission requirements of IR requests and cannot maintain a low BBP for high-priority requests.In this paper,multi-priority is considered in the hybrid IR/AR request scenario.We modify the asynchronous advantage actor critic(A3C)model and propose an A3C-assisted priority resource allocation(APRA)algorithm.The APRA integrates priority and transmission quality of IR requests to design the A3C reward function,then dynamically allocates dedicated resources for different IR requests according to the time-varying requirements.By maximizing the reward,the transmission quality of IR requests can be matched with the priority,and lower BBP for high-priority IR requests can be ensured.Simulation results show that the APRA reduces the BBP of high-priority IR requests from 0.0341 to0.0138,and the overall network operation gain is improved by 883 compared to the scheme without considering the priority.展开更多
Wi-Fi technology has evolved significantly since its introduction in 1997,advancing to Wi-Fi 6 as the latest standard,with Wi-Fi 7 currently under development.Despite these advancements,integrating machine learning in...Wi-Fi technology has evolved significantly since its introduction in 1997,advancing to Wi-Fi 6 as the latest standard,with Wi-Fi 7 currently under development.Despite these advancements,integrating machine learning into Wi-Fi networks remains challenging,especially in decentralized environments with multiple access points(mAPs).This paper is a short review that summarizes the potential applications of federated reinforcement learning(FRL)across eight key areas of Wi-Fi functionality,including channel access,link adaptation,beamforming,multi-user transmissions,channel bonding,multi-link operation,spatial reuse,and multi-basic servic set(multi-BSS)coordination.FRL is highlighted as a promising framework for enabling decentralized training and decision-making while preserving data privacy.To illustrate its role in practice,we present a case study on link activation in a multi-link operation(MLO)environment with multiple APs.Through theoretical discussion and simulation results,the study demonstrates how FRL can improve performance and reliability,paving the way for more adaptive and collaborative Wi-Fi networks in the era of Wi-Fi 7 and beyond.展开更多
基金supported by Key Science and Technology Program of Henan Province,China(Grant Nos.242102210147,242102210027)Fujian Province Young and Middle aged Teacher Education Research Project(Science and Technology Category)(No.JZ240101)(Corresponding author:Dong Yuan).
文摘Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.
文摘In response to the distinctly different heating load characteristics within heterogeneous building complex,traditional heating load allocation strategies based on fixed weights can no longer meet the requirements for energy conservation and improving indoor temperature satisfaction rates.This study addresses this problem by proposing an adaptive-weighted multi-objective reinforcement learning(Adaptive-Weighted MORL)framework for a heterogeneous building complex comprising a training gym,office building,dormitory,and cafeteria.The framework achieves dynamic balance optimization between heating load and thermal comfort through an adaptive weight adjustment mechanism integrating proximal policy optimization(PPO)algorithm and non-dominated sorting genetic algorithm II(NSGA-II).PPO learns optimal heating load allocation strategies to adapt to environmental changes,while NSGA-II generates Pareto-optimal solution sets to guide PPO’s weight coefficient updates.This mechanism dynamically adjusts the heating load weight and thermal comfort weight,prioritizing thermal comfort weight under extreme weather conditions.Results demonstrate that,compared to the PPO method and traditional fixed-weight approach,the proposed framework achieves an overall energy saving rate of 22.1%,and a peak heating load reduction exceeding 40%,while maintaining indoor temperature satisfaction rates above 91%in most building types.Notably,under extreme conditions(such as the peak load day of March 17),the framework achieves a 39%peak reduction rate and a 22.2%daily energy saving rate.These findings thoroughly validate its effectiveness in complex dynamic environments.Overall,this framework provides an intelligent solution for optimizing heating load allocation across heterogeneous building types,effectively balancing the conflicting objectives of energy efficiency and thermal comfort while adapting to dynamic environmental conditions.
基金Supported by National Natural Science Foundation of China(Grant Nos.U22A20246,52372382)Hefei Municipal Natural Science Foundation(Grant No.2022008)+1 种基金the Open Fund of State Key Laboratory of Mechanical Behavior and System Safety of Traffic Engineering Structures(Grant No.KF2023-06)S&T Program of Hebei(Grant No.225676162GH).
文摘In the parallel steering coordination control strategy for path tracking,it is difficult to match the current driver steering model using the fixed parameters with the actual driver,and the designed steering coordination control strategy under a single objective and simple conditions is difficult to adapt to the multi-dimensional state variables’input.In this paper,we propose a deep reinforcement learning algorithm-based multi-objective parallel human-machine steering coordination strategy for path tracking considering driver misoperation and external disturbance.Firstly,the driver steering mathematical model is constructed based on the driver preview characteristics and steering delay response,and the driver characteristic parameters are fitted after collecting the actual driver driving data.Secondly,considering that the vehicle is susceptible to the influence of external disturbances during the driving process,the Tube MPC(Tube Model Predictive Control)based path tracking steering controller is designed based on the vehicle system dynamics error model.After verifying that the driver steering model meets the driver steering operation characteristics,DQN(Deep Q-network),DDPG(Deep Deterministic Policy Gradient)and TD3(Twin Delayed Deep Deterministic Policy Gradient)deep reinforcement learning algorithms are utilized to design a multi-objective parallel steering coordination strategy which satisfies the multi-dimensional state variables’input of the vehicle.Finally,the tracking accuracy,lateral safety,human-machine conflict and driver steering load evaluation index are designed in different driver operation states and different road environments,and the performance of the parallel steering coordination control strategies with different deep reinforcement learning algorithms and fuzzy algorithms are compared by simulations and hardware in the loop experiments.The results show that the parallel steering collaborative strategy based on a deep reinforcement learning algorithm can more effectively assist the driver in tracking the target path under lateral wind interference and driver misoperation,and the TD3-based coordination control strategy has better overall performance.
基金partially supported by the National Key Research and Development Program of the Ministry of Science and Technology of China(2022YFE0114200)the National Natural Science Foundation of China(U20A6004).
文摘This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.
文摘In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Multi-agent reinforcement learning(MARL)overcomes this limitation by allowing several agents to learn simultaneously within a shared environment,each choosing actions that maximize its own or the group's rewards.By explicitly modeling and exploiting agent-to-agent dynamics,MARL can align those interactions with pedagogical goals such as peer tutoring,collaborative problem-solving,or gamified competition,thus opening richer avenues for adaptive and socially informed learning experiences.This survey investigates the impact of MARL on educational outcomes by examining evidence of its effectiveness in enhancing learner performance,engagement,equity,and reducing teacher workload compared to single agent or traditional approaches.It explores the educational domains and pedagogical problems addressed by MARL,identifies the algorithmic families used,and analyzes their influence on learning.The review also assesses experimental settings and evaluation metrics to determine ecological validity,and outlines current challenges and future research directions in applying MARL to education.
基金supported by the National Research Foundation of Korea grant funded by the Korea government(RS-2023-00217116)。
文摘Federated learning is a distributed framework that trains a centralised model using data from multiple clients without transferring that data to a central server.Despite rapid progress,federated learning still faces several unsolved challenges.Specifically,communication costs and system heterogeneity,such as nonidentical data distribution,hinder federated learning's progress.Several approaches have recently emerged for federated learning involving heterogeneous clients with varying computational capabilities(namely,heterogeneous federated learning).However,heterogeneous federated learning faces two key challenges:optimising model size and determining client selection ratios.Moreover,efficiently aggregating local models from clients with diverse capabilities is crucial for addressing system heterogeneity and communication efficiency.This paper proposes an evolutionary multiobjective optimisation framework for heterogeneous federated learning(MOHFL)to address these issues.Our approach elegantly formulates and solves a biobjective optimisation problem that minimises communication cost and model error rate.The decision variables in this framework comprise model sizes and client selection ratios for each Q client cluster,yielding a total of 2×Q optimisation parameters to be tuned.We develop a partition-based strategy for MOHFL that segregates clients into clusters based on their communication and computation capabilities.Additionally,we implement an adaptive model sizing mechanism that dynamically assigns appropriate subnetwork architectures to clients based on their computational constraints.We also propose a unified aggregation framework to combine models of varying sizes from heterogeneous clients effectively.Extensive experiments on multiple datasets demonstrate the effectiveness and superiority of our proposed method compared to existing approaches.
文摘Adversarial Reinforcement Learning(ARL)models for intelligent devices and Network Intrusion Detection Systems(NIDS)improve systemresilience against sophisticated cyber-attacks.As a core component of ARL,Adversarial Training(AT)enables NIDS agents to discover and prevent newattack paths by exposing them to competing examples,thereby increasing detection accuracy,reducing False Positives(FPs),and enhancing network security.To develop robust decision-making capabilities for real-world network disruptions and hostile activity,NIDS agents are trained in adversarial scenarios to monitor the current state and notify management of any abnormal or malicious activity.The accuracy and timeliness of the IDS were crucial to the network’s availability and reliability at this time.This paper analyzes ARL applications in NIDS,revealing State-of-The-Art(SoTA)methodology,issues,and future research prospects.This includes Reinforcement Machine Learning(RML)-based NIDS,which enables an agent to interact with the environment to achieve a goal,andDeep Reinforcement Learning(DRL)-based NIDS,which can solve complex decision-making problems.Additionally,this survey study addresses cybersecurity adversarial circumstances and their importance for ARL and NIDS.Architectural design,RL algorithms,feature representation,and training methodologies are examined in the ARL-NIDS study.This comprehensive study evaluates ARL for intelligent NIDS research,benefiting cybersecurity researchers,practitioners,and policymakers.The report promotes cybersecurity defense research and innovation.
基金funded by Hung Yen University of Technology and Education under grand number UTEHY.L.2025.62.
文摘Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments.
基金supported by the National Natural Science Foundation of China (No.12222502)。
文摘Muon scattering tomography(MST) is a powerful noninvasive imaging technique with significant applications in nuclear material detection and security screening.Traditional MST usually relies on the point of closest approach(PoCA) algorithm to reconstruct images from muon scattering data;however,PoCA often suffers from suboptimal image clarity and resolution.To overcome these challenges,we propose a novel approach that leverages reinforcement learning(RL) to enhance MST reconstruction,termed the μRL-enhanced method.By framing the MST optimization task as an RL problem,we developed an intelligent agent capable of dynamically adjusting the key PoCA parameters.The agent is trained using a multi-objective reward function that guides the optimization toward higher-quality reconstructions.Our experimental results show that theμRL-enhanced method significantly outperforms the traditional PoCA baseline acros s multiple benchmark metrics.Specifically,the proposed approach on average attains a 307% improvement in the intersection over union(IoU),a 79% increase in the structural similarity index measure(SSIM),and a 8.4% enhancement in the peak signal-to-noise ratio(PSNR) across four experiments.Furthermore,when benchmarked against the maximum likelihood scattering and displacement(MLSD)algorithm,the μRL-enhanced method offers modest gains in PS NR and IoU,together with a one-third increase in SSIM.These improvements demonstrate the enhanced reconstruction accuracy and structural fidelity of the μRL-enhanced method,highlighting its potential to advance MST technologies and their applications.
基金supported in part by the National Natural Science Foundation of China(grants 62203073 and 62573068)the Natural Science Foundation of Chongqing,China(grant CSTB2022NSCQMSX0577)。
文摘Multi-agent reinforcement learning(MARL)has proven its effectiveness in cooperative multi-agent systems(MASs)but still faces issues on the curse of dimensionality and learning efficiency.The main difficulty is caused by the strong inter-agent coupling nature embedded in an MARL problem,which is yet to be fully exploited in existing algorithms.In this work,we recognize a learning graph characterizing the dependence between individual rewards and individual policies.Then we propose a graph-based reward aggregation(GRA)method,which utilizes the inherent coupling relationship among agents to eliminate redundant information.Specifically,GRA passes information among cooperating agents through graph attention networks to obtain aggregated rewards that contribute to the fitting of the value function,making each agent learn a decentralized executable cooperation policy.In addition,we propose a variant of GRA,named GRA-decen,which achieves decentralized training and decentralized execution(DTDE)when each agent only has access to information of partial agents in the learning process.We conduct experiments in different environments and demonstrate the practicality and scalability of our algorithms.
文摘Ride-hailing electric vehicles are mobile resources with dispatch potential to improve resilience.However,they have not been well investigated because their charging and order-serving are affected or managed by the power grid dispatching center and the ride-hailing platform.Effective pre-strategies can improve the prevention ability for high-impact and low-probability(HILP)events and provide the foundation for measures in the response and restoration stages.First,this paper proposes a resilience reserve to expand the existing research on power system resilience.Secondly,this paper puts forward an interactive method of deep reinforcement learning,which considers the interests of both the power grid dispatching center and the ride-hailing platform.It improves the resilience reserve by achieving the order dispatch,orderly charging management of ride-hailing electric vehicles,and the pricing strategy of charging stations.Finally,this paper uses a practical example covering about 107.32 km2 in the center of Chengdu to verify that the proposed method improves the resilience reserve of the power system without obviously damaging the interests of the ride-hailing platform.
文摘Reinforcement learning(RL),as an important branch of machine learning,has recently achieved extensive attention and success in many applications.Its main idea is to enable agents to continuously learn to make optimal decisions by trying to maximize a reward function for their actions and interactions with the environment.However,making highquality decisions in complex and uncertain real-world scenarios is a challenging task.The interference and attacks in such scenarios tend to destroy the existing strategies.Maintaining RL's optimal performance in various cases and adapting to changing environments remains an important challenge.This article presents a comprehensive review of recent advancements in robust reinforcement learning(RRL),and analyzes them from the perspectives of challenges,methodologies,and applications.It systematically evaluates current progress in RRL and summarizes the commonly used benchmark platforms.Finally,several open challenges are discussed to stimulate further research and guide future developments in this area.
基金funded by the U.S.Department of Education under Grant Number ED#P116S210005the National Science Foundation under Grant Numbers 2226936 and 2420405.
文摘Theintegration of human factors into artificial intelligence(AI)systems has emerged as a critical research frontier,particularly in reinforcement learning(RL),where human-AI interaction(HAII)presents both opportunities and challenges.As RL continues to demonstrate remarkable success in model-free and partially observable environments,its real-world deployment increasingly requires effective collaboration with human operators and stakeholders.This article systematically examines HAII techniques in RL through both theoretical analysis and practical case studies.We establish a conceptual framework built upon three fundamental pillars of effective human-AI collaboration:computational trust modeling,system usability,and decision understandability.Our comprehensive review organizes HAII methods into five key categories:(1)learning from human feedback,including various shaping approaches;(2)learning from human demonstration through inverse RL and imitation learning;(3)shared autonomy architectures for dynamic control allocation;(4)human-in-the-loop querying strategies for active learning;and(5)explainable RL techniques for interpretable policy generation.Recent state-of-the-art works are critically reviewed,with particular emphasis on advances incorporating large language models in human-AI interaction research.To illustrate some concepts,we present three detailed case studies:an empirical trust model for farmers adopting AI-driven agricultural management systems,the implementation of ethical constraints in roboticmotion planning through human-guided RL,and an experimental investigation of human trust dynamics using a multi-armed bandit paradigm.These applications demonstrate how HAII principles can enhance RL systems’practical utility while bridging the gap between theoretical RL and real-world human-centered applications,ultimately contributing to more deployable and socially beneficial intelligent systems.
文摘With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.
基金supported in part by the Research on Key Technologies for the Development of an Active Balancing Cooperative Control Systemfor Distribution Networks and the National Natural Science Foundation of China under Grant 521532240029,Grant 62303006.
文摘To address the high costs and operational instability of distribution networks caused by the large-scale integration of distributed energy resources(DERs)(such as photovoltaic(PV)systems,wind turbines(WT),and energy storage(ES)devices),and the increased grid load fluctuations and safety risks due to uncoordinated electric vehicles(EVs)charging,this paper proposes a novel dual-scale hierarchical collaborative optimization strategy.This strategy decouples system-level economic dispatch from distributed EV agent control,effectively solving the resource coordination conflicts arising from the high computational complexity,poor scalability of existing centralized optimization,or the reliance on local information decision-making in fully decentralized frameworks.At the lower level,an EV charging and discharging model with a hybrid discrete-continuous action space is established,and optimized using an improved Parameterized Deep Q-Network(PDQN)algorithm,which directly handles mode selection and power regulation while embedding physical constraints to ensure safety.At the upper level,microgrid(MG)operators adopt a dynamic pricing strategy optimized through Deep Reinforcement Learning(DRL)to maximize economic benefits and achieve peak-valley shaving.Simulation results show that the proposed strategy outperforms traditional methods,reducing the total operating cost of the MG by 21.6%,decreasing the peak-to-valley load difference by 33.7%,reducing the number of voltage limit violations by 88.9%,and lowering the average electricity cost for EV users by 15.2%.This method brings a win-win result for operators and users,providing a reliable and efficient scheduling solution for distribution networks with high renewable energy penetration rates.
基金co-supported by the National Natural Science Foundation of China(Nos.72371052 and 71871042).
文摘Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.
基金funded by the Beijing Engineering Research Center of Electric Rail Transportation.
文摘Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.
基金funding from the National Key Research and Development Program of China(No.2018YFE0110000)the National Natural Science Foundation of China(No.11274259,No.11574258)the Science and Technology Commission Foundation of Shanghai(21DZ1205500)in support of the present research.
文摘While reinforcement learning-based underwater acoustic adaptive modulation shows promise for enabling environment-adaptive communication as supported by extensive simulation-based research,its practical performance remains underexplored in field investigations.To evaluate the practical applicability of this emerging technique in adverse shallow sea channels,a field experiment was conducted using three communication modes:orthogonal frequency division multiplexing(OFDM),M-ary frequency-shift keying(MFSK),and direct sequence spread spectrum(DSSS)for reinforcement learning-driven adaptive modulation.Specifically,a Q-learning method is used to select the optimal modulation mode according to the channel quality quantified by signal-to-noise ratio,multipath spread length,and Doppler frequency offset.Experimental results demonstrate that the reinforcement learning-based adaptive modulation scheme outperformed fixed threshold detection in terms of total throughput and average bit error rate,surpassing conventional adaptive modulation strategies.
文摘As the types of traffic requests increase,the elastic optical network(EON)is considered as a promising architecture to carry multiple types of traffic requests simultaneously,including immediate reservation(IR)and advance reservation(AR).Various resource allocation schemes for IR/AR requests have been designed in EON to reduce bandwidth blocking probability(BBP).However,these schemes do not consider different transmission requirements of IR requests and cannot maintain a low BBP for high-priority requests.In this paper,multi-priority is considered in the hybrid IR/AR request scenario.We modify the asynchronous advantage actor critic(A3C)model and propose an A3C-assisted priority resource allocation(APRA)algorithm.The APRA integrates priority and transmission quality of IR requests to design the A3C reward function,then dynamically allocates dedicated resources for different IR requests according to the time-varying requirements.By maximizing the reward,the transmission quality of IR requests can be matched with the priority,and lower BBP for high-priority IR requests can be ensured.Simulation results show that the APRA reduces the BBP of high-priority IR requests from 0.0341 to0.0138,and the overall network operation gain is improved by 883 compared to the scheme without considering the priority.
基金funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,Saudi Arabia,grant number RG-2-611-42(A.O.A.).
文摘Wi-Fi technology has evolved significantly since its introduction in 1997,advancing to Wi-Fi 6 as the latest standard,with Wi-Fi 7 currently under development.Despite these advancements,integrating machine learning into Wi-Fi networks remains challenging,especially in decentralized environments with multiple access points(mAPs).This paper is a short review that summarizes the potential applications of federated reinforcement learning(FRL)across eight key areas of Wi-Fi functionality,including channel access,link adaptation,beamforming,multi-user transmissions,channel bonding,multi-link operation,spatial reuse,and multi-basic servic set(multi-BSS)coordination.FRL is highlighted as a promising framework for enabling decentralized training and decision-making while preserving data privacy.To illustrate its role in practice,we present a case study on link activation in a multi-link operation(MLO)environment with multiple APs.Through theoretical discussion and simulation results,the study demonstrates how FRL can improve performance and reliability,paving the way for more adaptive and collaborative Wi-Fi networks in the era of Wi-Fi 7 and beyond.