To address low learning efficiency and inadequate path safety in spraying robot navigation within complex obstacle-rich environments—with dense,dynamic,unpredictable obstacles challenging conventional methods—this p...To address low learning efficiency and inadequate path safety in spraying robot navigation within complex obstacle-rich environments—with dense,dynamic,unpredictable obstacles challenging conventional methods—this paper proposes a hybrid algorithm integrating Q-learning and improved A*-Artificial Potential Field(A-APF).Centered on theQ-learning framework,the algorithmleverages safety-oriented guidance generated byA-APF and employs a dynamic coordination mechanism that adaptively balances exploration and exploitation.The proposed system comprises four core modules:(1)an environment modeling module that constructs grid-based obstacle maps;(2)an A-APF module that combines heuristic search from A*algorithm with repulsive force strategies from APF to generate guidance;(3)a Q-learning module that learns optimal state-action values(Q-values)through spraying robot-environment interaction and a reward function emphasizing path optimality and safety;and(4)a dynamic optimization module that ensures adaptive cooperation between Q-learning and A-APF through exploration rate control and environment-aware constraints.Simulation results demonstrate that the proposed method significantly enhances path safety in complex underground mining environments.Quantitative results indicate that,compared to the traditional Q-learning algorithm,the proposed method shortens training time by 42.95% and achieves a reduction in training failures from 78 to just 3.Compared to the static fusion algorithm,it further reduces both training time(by 10.78%)and training failures(by 50%),thereby improving overall training efficiency.展开更多
Cooperative task assignment is one of the key research focuses in the field of unmanned aerial vehicles(UAVs). In this paper, an energy learning hyper-heuristic(EL-HH) algorithm is proposed to address the cooperative ...Cooperative task assignment is one of the key research focuses in the field of unmanned aerial vehicles(UAVs). In this paper, an energy learning hyper-heuristic(EL-HH) algorithm is proposed to address the cooperative task assignment problem of heterogeneous UAVs under complex constraints. First, a mathematical model is designed to define the scenario, complex constraints, and objective function of the problem. Then, the scheme encoding, the EL-HH strategy, multiple optimization operators, and the task sequence and time adjustment strategies are designed in the EL-HH algorithm. The scheme encoding is designed with three layers: task sequence, UAV sequence, and waiting time. The EL-HH strategy applies an energy learning method to adaptively adjust the energies of operators, thereby facilitating the selection and application of operators. Multiple optimization operators can update schemes in different ways, enabling the algorithm to fully explore the solution space. Afterward, the task order and time adjustment strategies are designed to adjust task order and insert waiting time. Through the iterative optimization process, a satisfactory assignment scheme is ultimately produced. Finally, simulation and experiment verify the effectiveness of the proposed algorithm.展开更多
With the development of economic globalization,distributedmanufacturing is becomingmore andmore prevalent.Recently,integrated scheduling of distributed production and assembly has captured much concern.This research s...With the development of economic globalization,distributedmanufacturing is becomingmore andmore prevalent.Recently,integrated scheduling of distributed production and assembly has captured much concern.This research studies a distributed flexible job shop scheduling problem with assembly operations.Firstly,a mixed integer programming model is formulated to minimize the maximum completion time.Secondly,a Q-learning-assisted coevolutionary algorithmis presented to solve themodel:(1)Multiple populations are developed to seek required decisions simultaneously;(2)An encoding and decoding method based on problem features is applied to represent individuals;(3)A hybrid approach of heuristic rules and random methods is employed to acquire a high-quality population;(4)Three evolutionary strategies having crossover and mutation methods are adopted to enhance exploration capabilities;(5)Three neighborhood structures based on problem features are constructed,and a Q-learning-based iterative local search method is devised to improve exploitation abilities.The Q-learning approach is applied to intelligently select better neighborhood structures.Finally,a group of instances is constructed to perform comparison experiments.The effectiveness of the Q-learning approach is verified by comparing the developed algorithm with its variant without the Q-learning method.Three renowned meta-heuristic algorithms are used in comparison with the developed algorithm.The comparison results demonstrate that the designed method exhibits better performance in coping with the formulated problem.展开更多
A novel framework of hyper-heuristic algorithm was proposed to improve the adaption of evolutionary algorithms( EAs)in optimization. The algorithm could be changed during the evolutionary progress according to their p...A novel framework of hyper-heuristic algorithm was proposed to improve the adaption of evolutionary algorithms( EAs)in optimization. The algorithm could be changed during the evolutionary progress according to their performances. In addition,a large number of elite individuals were employed in the algorithm and the elite individuals helped algorithm achieve a better performance,while such number of elite individuals stagnated the global convergence in conventional single algorithm. The time complexity was analyzed to demonstrate the novel framework did not increase the time complexity. The simulation results indicate that the proposed framework outperforms any single algorithm that composes the framework.展开更多
The solution strategy of the heuristic algorithm is pre-set and has good performance in the conventional cloud resource scheduling process.However,for complex and dynamic cloud service scheduling tasks,due to the diff...The solution strategy of the heuristic algorithm is pre-set and has good performance in the conventional cloud resource scheduling process.However,for complex and dynamic cloud service scheduling tasks,due to the difference in service attributes,the solution efficiency of a single strategy is low for such problems.In this paper,we presents a hyper-heuristic algorithm based on reinforcement learning(HHRL)to optimize the completion time of the task sequence.Firstly,In the reward table setting stage of HHRL,we introduce population diversity and integrate maximum time to comprehensively deter-mine the task scheduling and the selection of low-level heuristic strategies.Secondly,a task computational complexity estimation method integrated with linear regression is proposed to influence task scheduling priorities.Besides,we propose a high-quality candidate solution migration method to ensure the continuity and diversity of the solving process.Compared with HHSA,ACO,GA,F-PSO,etc,HHRL can quickly obtain task complexity,select appropriate heuristic strategies for task scheduling,search for the the best makspan and have stronger disturbance detection ability for population diversity.展开更多
Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidt...Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidth,and high energy consumption.With the development of artificial intelligence and machine learning algorithms,many researchers apply these new methods to improve the quality of routing.In this paper,we propose a Qlearning-based multi-hop cooperative routing protocol(QMCR)for UWSNs.Our protocol can automatically choose nodes with the maximum Q-value as forwarders based on distance information.Moreover,we combine cooperative communications with Q-learning algorithm to reduce network energy consumption and improve communication efficiency.Experimental results show that the running time of the QMCR is less than one-tenth of that of the artificial fish-swarm algorithm(AFSA),while the routing energy consumption is kept at the same level.Due to the extremely fast speed of the algorithm,the QMCR is a promising method of routing design for UWSNs,especially for the case that it suffers from the extreme dynamic underwater acoustic channels in the real ocean environment.展开更多
The paper presents a fuzzy Q-learning(FQL)and optical flow-based autonomous navigation approach.The FQL method takes decisions in an unknown environment and without mapping,using motion information and through a reinf...The paper presents a fuzzy Q-learning(FQL)and optical flow-based autonomous navigation approach.The FQL method takes decisions in an unknown environment and without mapping,using motion information and through a reinforcement signal into an evolutionary algorithm.The reinforcement signal is calculated by estimating the optical flow densities in areas of the camera to determine whether they are“dense”or“thin”which has a relationship with the proximity of objects.The results obtained show that the present approach improves the rate of learning compared with a method with a simple reward system and without the evolutionary component.The proposed system was implemented in a virtual robotics system using the CoppeliaSim software and in communication with Python.展开更多
The Cross-domain Heuristic Search Challenge(CHeSC)is a competition focused on creating efficient search algorithms adaptable to diverse problem domains.Selection hyper-heuristics are a class of algorithms that dynamic...The Cross-domain Heuristic Search Challenge(CHeSC)is a competition focused on creating efficient search algorithms adaptable to diverse problem domains.Selection hyper-heuristics are a class of algorithms that dynamically choose heuristics during the search process.Numerous selection hyper-heuristics have different imple-mentation strategies.However,comparisons between them are lacking in the literature,and previous works have not highlighted the beneficial and detrimental implementation methods of different components.The question is how to effectively employ them to produce an efficient search heuristic.Furthermore,the algorithms that competed in the inaugural CHeSC have not been collectively reviewed.This work conducts a review analysis of the top twenty competitors from this competition to identify effective and ineffective strategies influencing algorithmic performance.A summary of the main characteristics and classification of the algorithms is presented.The analysis underlines efficient and inefficient methods in eight key components,including search points,search phases,heuristic selection,move acceptance,feedback,Tabu mechanism,restart mechanism,and low-level heuristic parameter control.This review analyzes the components referencing the competition’s final leaderboard and discusses future research directions for these components.The effective approaches,identified as having the highest quality index,are mixed search point,iterated search phases,relay hybridization selection,threshold acceptance,mixed learning,Tabu heuristics,stochastic restart,and dynamic parameters.Findings are also compared with recent trends in hyper-heuristics.This work enhances the understanding of selection hyper-heuristics,offering valuable insights for researchers and practitioners aiming to develop effective search algorithms for diverse problem domains.展开更多
By comparing price plans offered by several retail energy firms,end users with smart meters and controllers may optimize their energy use cost portfolios,due to the growth of deregulated retail power markets.To help s...By comparing price plans offered by several retail energy firms,end users with smart meters and controllers may optimize their energy use cost portfolios,due to the growth of deregulated retail power markets.To help smart grid end-users decrease power payment and usage unhappiness,this article suggests a decision system based on reinforcement learning to aid with electricity price plan selection.An enhanced state-based Markov decision process(MDP)without transition probabilities simulates the decision issue.A Kernel approximate-integrated batch Q-learning approach is used to tackle the given issue.Several adjustments to the sampling and data representation are made to increase the computational and prediction performance.Using a continuous high-dimensional state space,the suggested approach can uncover the underlying characteristics of time-varying pricing schemes.Without knowing anything regarding the market environment in advance,the best decision-making policy may be learned via case studies that use data from actual historical price plans.Experiments show that the suggested decision approach may reduce cost and energy usage dissatisfaction by using user data to build an accurate prediction strategy.In this research,we look at how smart city energy planners rely on precise load forecasts.It presents a hybrid method that extracts associated characteristics to improve accuracy in residential power consumption forecasts using machine learning(ML).It is possible to measure the precision of forecasts with the use of loss functions with the RMSE.This research presents a methodology for estimating smart home energy usage in response to the growing interest in explainable artificial intelligence(XAI).Using Shapley Additive explanations(SHAP)approaches,this strategy makes it easy for consumers to comprehend their energy use trends.To predict future energy use,the study employs gradient boosting in conjunction with long short-term memory neural networks.展开更多
As the global economy develops and people's awareness of environmental protection increases,the efficient scheduling of production lines in workshops has received more and more attention.However,there is very litt...As the global economy develops and people's awareness of environmental protection increases,the efficient scheduling of production lines in workshops has received more and more attention.However,there is very little research focusing on distributed scheduling for heterogeneous factories.This study addresses a multi-objective distributed heterogeneous permutation flow shop scheduling problem with sequence-dependent setup times(DHPFSP-SDST).The objective is to optimize the trade-off between the maximum completion time(Makespan)and total energy consumption.First,to describe the concerned problems,we establish a mathematical model.Second,we use the artificial bee colony(ABC)algorithm to optimize the two objectives,incorporating five local search strategies tailored to the problem characteristics to enhance the algorithm's performance.Third,to improve the convergence speed of the algorithm,a Q-learning based strategy is designed to select the appropriated local search operator during iterations.Finally,based on experiments conducted on 72 instances,statistical analysis and discussions show that the Q-learning based ABC algorithm can effectively solve the problems better than its peers.展开更多
Purpose–The examination timetabling problem is an NP-hard problem.A large number of approaches for this problem are developed to find more appropriate search strategies.Hyper-heuristic is a kind of representative met...Purpose–The examination timetabling problem is an NP-hard problem.A large number of approaches for this problem are developed to find more appropriate search strategies.Hyper-heuristic is a kind of representative methods.In hyper-heuristic,the high-level search is executed to construct heuristic lists by traditional methods(such as Tabu search,variable neighborhoods and so on).The purpose of this paper is to apply the evolutionary strategy instead of traditional methods for high-level search to improve the capability of global search.Design/methodology/approach–This paper combines hyper-heuristic with evolutionary strategy to solve examination timetabling problems.First,four graph coloring heuristics are employed to construct heuristic lists.Within the evolutionary algorithm framework,the iterative initialization is utilized to improve the number of feasible solutions in the population;meanwhile,the crossover and mutation operators are applied to find potential heuristic lists in the heuristic space(high-level search).At last,two local search methods are combined to optimize the feasible solutions in the solution space(low-level search).Findings–Experimental results demonstrate that the proposed approach obtains competitive results and outperforms the compared approaches on some benchmark instances.Originality/value–The contribution of this paper is the development of a framework which combines evolutionary algorithm and hyper-heuristic for examination timetabling problems.展开更多
The uninterrupted operation of the quay crane(QC)ensures that the large container ship can depart port within laytime,which effectively reduces the handling cost for the container terminal and ship owners.The QC waiti...The uninterrupted operation of the quay crane(QC)ensures that the large container ship can depart port within laytime,which effectively reduces the handling cost for the container terminal and ship owners.The QC waiting caused by automated guided vehicles(AGVs)delay in the uncertain environment can be alleviated by dynamic scheduling optimization.A dynamic scheduling process is introduced in this paper to solve the AGV scheduling and path planning problems,in which the scheduling scheme determines the starting and ending nodes of paths,and the choice of paths between nodes affects the scheduling of subsequent AGVs.This work proposes a two-stage mixed integer optimization model to minimize the transportation cost of AGVs under the constraint of laytime.A dynamic optimization algorithm,including the improved rule-based heuristic algorithm and the integration of the Dijkstra algorithm and the Q-Learning algorithm,is designed to solve the optimal AGV scheduling and path schemes.A new conflict avoidance strategy based on graph theory is also proposed to reduce the probability of path conflicts between AGVs.Numerical experiments are conducted to demonstrate the effectiveness of the proposed model and algorithm over existing methods.展开更多
A hyper-heuristic algorithm is a general solution framework that adaptively selects the optimizer to address complex problems.A classical hyper-heuristic framework consists of two levels,including the high-level heuri...A hyper-heuristic algorithm is a general solution framework that adaptively selects the optimizer to address complex problems.A classical hyper-heuristic framework consists of two levels,including the high-level heuristic and a set of low-level heuristics.The low-level heuristics to be used in the optimization process are chosen by the high-level tactics in the hyper-heuristic.In this study,a Cooperative Multi-Stage Hyper-Heuristic(CMS-HH)algorithm is proposed to address certain combinatorial optimization problems.In the CMS-HH,a genetic algorithm is introduced to perturb the initial solution to increase the diversity of the solution.In the search phase,an online learning mechanism based on the multi-armed bandits and relay hybridization technology are proposed to improve the quality of the solution.In addition,a multi-point search is introduced to cooperatively search with a single-point search when the state of the solution does not change in continuous time.The performance of the CMS-HH algorithm is assessed in six specific combinatorial optimization problems,including Boolean satisfiability problems,one-dimensional packing problems,permutation flow-shop scheduling problems,personnel scheduling problems,traveling salesman problems,and vehicle routing problems.The experimental results demonstrate the efficiency and significance of the proposed CMS-HH algorithm.展开更多
Two-stage hybrid flow shop scheduling has been extensively considered in single-factory settings.However,the distributed two-stage hybrid flow shop scheduling problem(DTHFSP)with fuzzy processing time is seldom invest...Two-stage hybrid flow shop scheduling has been extensively considered in single-factory settings.However,the distributed two-stage hybrid flow shop scheduling problem(DTHFSP)with fuzzy processing time is seldom investigated in multiple factories.Furthermore,the integration of reinforcement learning and metaheuristic is seldom applied to solve DTHFSP.In the current study,DTHFSP with fuzzy processing time was investigated,and a novel Q-learning-based teaching-learning based optimization(QTLBO)was constructed to minimize makespan.Several teachers were recruited for this study.The teacher phase,learner phase,teacher’s self-learning phase,and learner’s self-learning phase were designed.The Q-learning algorithm was implemented by 9 states,4 actions defined as combinations of the above phases,a reward,and an adaptive action selection,which were applied to dynamically adjust the algorithm structure.A number of experiments were conducted.The computational results demonstrate that the new strategies of QTLBO are effective;furthermore,it presents promising results on the considered DTHFSP.展开更多
Pipeline isolation plugging robot (PIPR) is an important tool in pipeline maintenance operation. During the plugging process, the violent vibration will occur by the flow field, which can cause serious damage to the p...Pipeline isolation plugging robot (PIPR) is an important tool in pipeline maintenance operation. During the plugging process, the violent vibration will occur by the flow field, which can cause serious damage to the pipeline and PIPR. In this paper, we propose a dynamic regulating strategy to reduce the plugging-induced vibration by regulating the spoiler angle and plugging velocity. Firstly, the dynamic plugging simulation and experiment are performed to study the flow field changes during dynamic plugging. And the pressure difference is proposed to evaluate the degree of flow field vibration. Secondly, the mathematical models of pressure difference with plugging states and spoiler angles are established based on the extreme learning machine (ELM) optimized by improved sparrow search algorithm (ISSA). Finally, a modified Q-learning algorithm based on simulated annealing is applied to determine the optimal strategy for the spoiler angle and plugging velocity in real time. The results show that the proposed method can reduce the plugging-induced vibration by 19.9% and 32.7% on average, compared with single-regulating methods. This study can effectively ensure the stability of the plugging process.展开更多
In this paper,the problem of trajectory de-sign of unmanned aerial vehicles(UAVs)for maximizing the number of satisfied users is studied in a UAV based cellular network where the UAV works as a flying base station tha...In this paper,the problem of trajectory de-sign of unmanned aerial vehicles(UAVs)for maximizing the number of satisfied users is studied in a UAV based cellular network where the UAV works as a flying base station that serves users,and the user indicates its satis-faction in terms of completion of its data request within an allowable maximum waiting time.The trajectory design is formulated as an optimization problem whose goal is to maximize the number of satisfied users.To solve this problem,a machine learning framework based on double Q-learning algorithm is proposed.The algorithm enables the UAV tofind the optimal trajectory that maximizes the number of satisfied users.Compared to the traditional learning algorithms,such as Q-learning that selects and evaluates the action using the same Q-table,the proposed algorithm can decouple the selection from the evaluation,therefore avoid overestimation which leads to sub-optimal policies.Simulation results show that the proposed algorithm can achieve up to 19.4% and 14.1% gains in terms of the number of satisfied users compared to random algorithm and Q-learning algorithm.展开更多
A novel microgrid control strategy is presented in this paper. A resilient community microgrid model, which is equipped with solar PV generation and electric vehicles (EVs) and an improved inverter control system, is ...A novel microgrid control strategy is presented in this paper. A resilient community microgrid model, which is equipped with solar PV generation and electric vehicles (EVs) and an improved inverter control system, is considered. To fully exploit the capability of the community microgrid to operate in either grid-connected mode or islanded mode, as well as to achieve improved stability of the microgrid system, universal droop control, virtual inertia control, and a reinforcement learning-based control mechanism are combined in a cohesive manner, in which adaptive control parameters are determined online to tune the influence of the controllers. The microgrid model and control mechanisms are implemented in MATLAB/Simulink and set up in real-time simulation to test the feasibility and effectiveness of the proposed model. Experiment results reveal the effectiveness of regulating the controller’s frequency and voltage for various operating conditions and scenarios of a microgrid.展开更多
There are many studies about flexible job shop scheduling problem with fuzzy processing time and deteriorating scheduling,but most scholars neglect the connection between them,which means the purpose of both models is...There are many studies about flexible job shop scheduling problem with fuzzy processing time and deteriorating scheduling,but most scholars neglect the connection between them,which means the purpose of both models is to simulate a more realistic factory environment.From this perspective,the solutions can be more precise and practical if both issues are considered simultaneously.Therefore,the deterioration effect is treated as a part of the fuzzy job shop scheduling problem in this paper,which means the linear increase of a certain processing time is transformed into an internal linear shift of a triangle fuzzy processing time.Apart from that,many other contributions can be stated as follows.A new algorithm called reinforcement learning based biased bi-population evolutionary algorithm(RB2EA)is proposed,which utilizes Q-learning algorithm to adjust the size of the two populations and the interaction frequency according to the quality of population.A local enhancement method which combimes multiple local search stratgies is presented.An interaction mechanism is designed to promote the convergence of the bi-population.Extensive experiments are designed to evaluate the efficacy of RB2EA,and the conclusion can be drew that RB2EA is able to solve energy-efficient fuzzy flexible job shop scheduling problem with deteriorating jobs(EFFJSPD)efficiently.展开更多
In an unmanned aerial vehicle ad-hoc network(UANET),sparse and rapidly mobile unmanned aerial vehicles(UAVs)/nodes can dynamically change the UANET topology.This may lead to UANET service performance issues.In this st...In an unmanned aerial vehicle ad-hoc network(UANET),sparse and rapidly mobile unmanned aerial vehicles(UAVs)/nodes can dynamically change the UANET topology.This may lead to UANET service performance issues.In this study,for planning rapidly changing UAV swarms,we propose a dynamic value iteration network(DVIN)model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function,which enables UAVs/nodes to adapt to novel physical locations.We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method.Simulation results demonstrate that the proposed model significantly reduces the decisionmaking time for UAV/node path planning with a high average success rate.展开更多
With the establishment of “carbon peaking and carbon neutrality” goals in China, along with the development of new power systems and ongoing electricity market reforms, pumped-storage power stations (PSPSs) will inc...With the establishment of “carbon peaking and carbon neutrality” goals in China, along with the development of new power systems and ongoing electricity market reforms, pumped-storage power stations (PSPSs) will increasingly play a significant role in power systems. Therefore, this study focuses on trading and bidding strategies for PSPSs in the electricity market. Firstly, a comprehensive framework for PSPSs participating in the electricity energy and frequency regulation (FR) ancillary service market is proposed. Subsequently, a two-layer trading model is developed to achieve joint clearing in the energy and frequency regulation markets. The upper-layer model aims to maximize the revenue of the power station by optimizing the bidding strategies using a Q-learning algorithm. The lower-layer model minimized the total electricity purchasing cost of the system. Finally, the proposed bi-level trading model is validated by studying an actual case in which data are obtained from a provincial power system in China. The results indicate that through this decision-making method, PSPSs can achieve higher economic revenue in the market, which will provide a reference for the planning and operation of PSPSs.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.52374156).
文摘To address low learning efficiency and inadequate path safety in spraying robot navigation within complex obstacle-rich environments—with dense,dynamic,unpredictable obstacles challenging conventional methods—this paper proposes a hybrid algorithm integrating Q-learning and improved A*-Artificial Potential Field(A-APF).Centered on theQ-learning framework,the algorithmleverages safety-oriented guidance generated byA-APF and employs a dynamic coordination mechanism that adaptively balances exploration and exploitation.The proposed system comprises four core modules:(1)an environment modeling module that constructs grid-based obstacle maps;(2)an A-APF module that combines heuristic search from A*algorithm with repulsive force strategies from APF to generate guidance;(3)a Q-learning module that learns optimal state-action values(Q-values)through spraying robot-environment interaction and a reward function emphasizing path optimality and safety;and(4)a dynamic optimization module that ensures adaptive cooperation between Q-learning and A-APF through exploration rate control and environment-aware constraints.Simulation results demonstrate that the proposed method significantly enhances path safety in complex underground mining environments.Quantitative results indicate that,compared to the traditional Q-learning algorithm,the proposed method shortens training time by 42.95% and achieves a reduction in training failures from 78 to just 3.Compared to the static fusion algorithm,it further reduces both training time(by 10.78%)and training failures(by 50%),thereby improving overall training efficiency.
基金funded by the National Natural Science Foundation of China (Grant No.62203217)the Jiangsu Province Basic Research Program Natural Science Foundation (Grant No.BK20220885)+3 种基金the Hong Kong,Macao and Taiwan Science and Technology Cooperation Project of Special Foundation in Jiangsu Science and Technology Plan (Grant No.BZ2023057)the Fundamental Research Funds for the Central Universities (Grant No.NJ2024012)the China Postdoctoral Science Foundation (Grant No.GZC20242230)the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No.KYCX24_0586)。
文摘Cooperative task assignment is one of the key research focuses in the field of unmanned aerial vehicles(UAVs). In this paper, an energy learning hyper-heuristic(EL-HH) algorithm is proposed to address the cooperative task assignment problem of heterogeneous UAVs under complex constraints. First, a mathematical model is designed to define the scenario, complex constraints, and objective function of the problem. Then, the scheme encoding, the EL-HH strategy, multiple optimization operators, and the task sequence and time adjustment strategies are designed in the EL-HH algorithm. The scheme encoding is designed with three layers: task sequence, UAV sequence, and waiting time. The EL-HH strategy applies an energy learning method to adaptively adjust the energies of operators, thereby facilitating the selection and application of operators. Multiple optimization operators can update schemes in different ways, enabling the algorithm to fully explore the solution space. Afterward, the task order and time adjustment strategies are designed to adjust task order and insert waiting time. Through the iterative optimization process, a satisfactory assignment scheme is ultimately produced. Finally, simulation and experiment verify the effectiveness of the proposed algorithm.
文摘With the development of economic globalization,distributedmanufacturing is becomingmore andmore prevalent.Recently,integrated scheduling of distributed production and assembly has captured much concern.This research studies a distributed flexible job shop scheduling problem with assembly operations.Firstly,a mixed integer programming model is formulated to minimize the maximum completion time.Secondly,a Q-learning-assisted coevolutionary algorithmis presented to solve themodel:(1)Multiple populations are developed to seek required decisions simultaneously;(2)An encoding and decoding method based on problem features is applied to represent individuals;(3)A hybrid approach of heuristic rules and random methods is employed to acquire a high-quality population;(4)Three evolutionary strategies having crossover and mutation methods are adopted to enhance exploration capabilities;(5)Three neighborhood structures based on problem features are constructed,and a Q-learning-based iterative local search method is devised to improve exploitation abilities.The Q-learning approach is applied to intelligently select better neighborhood structures.Finally,a group of instances is constructed to perform comparison experiments.The effectiveness of the Q-learning approach is verified by comparing the developed algorithm with its variant without the Q-learning method.Three renowned meta-heuristic algorithms are used in comparison with the developed algorithm.The comparison results demonstrate that the designed method exhibits better performance in coping with the formulated problem.
基金National Natural Science Foundations of China(Nos.70871091,61075064,61034004,61005090)Program for New Century Excellent Talents in University of Ministry of Education of ChinaPh.D.Programs Foundation of Ministry of Education of China(No.20100072110038)
文摘A novel framework of hyper-heuristic algorithm was proposed to improve the adaption of evolutionary algorithms( EAs)in optimization. The algorithm could be changed during the evolutionary progress according to their performances. In addition,a large number of elite individuals were employed in the algorithm and the elite individuals helped algorithm achieve a better performance,while such number of elite individuals stagnated the global convergence in conventional single algorithm. The time complexity was analyzed to demonstrate the novel framework did not increase the time complexity. The simulation results indicate that the proposed framework outperforms any single algorithm that composes the framework.
基金supported in part by the National Key R&D Program of China under Grant 2017YFB1302400the Jinan“20 New Colleges and Universities”Funded Scientific Research Leader Studio under Grant 2021GXRC079+2 种基金the Major Agricultural Applied Technological Innovation Projects of Shandong Province underGrant SD2019NJ014the Shandong Natural Science Foundation under Grant ZR2019MF064the Beijing Advanced Innovation Center for Intelligent Robots and Systems under Grant 2019IRS19.
文摘The solution strategy of the heuristic algorithm is pre-set and has good performance in the conventional cloud resource scheduling process.However,for complex and dynamic cloud service scheduling tasks,due to the difference in service attributes,the solution efficiency of a single strategy is low for such problems.In this paper,we presents a hyper-heuristic algorithm based on reinforcement learning(HHRL)to optimize the completion time of the task sequence.Firstly,In the reward table setting stage of HHRL,we introduce population diversity and integrate maximum time to comprehensively deter-mine the task scheduling and the selection of low-level heuristic strategies.Secondly,a task computational complexity estimation method integrated with linear regression is proposed to influence task scheduling priorities.Besides,we propose a high-quality candidate solution migration method to ensure the continuity and diversity of the solving process.Compared with HHSA,ACO,GA,F-PSO,etc,HHRL can quickly obtain task complexity,select appropriate heuristic strategies for task scheduling,search for the the best makspan and have stronger disturbance detection ability for population diversity.
基金the National Key Research and Development Program of China under Grant No.2016YFC1400200in part by the Basic Research Program of Science and Technology of Shenzhen,China under Grant No.JCYJ20190809161805508+2 种基金in part by the Fundamental Research Funds for the Central Universities of China under Grant No.20720200092in part by the Xiamen University’s Honors Program for Undergraduates in Marine Sciences under Grant No.22320152201106in part by the National Natural Science Foundation of China under Grants No.41476026,41976178 and 61801139。
文摘Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidth,and high energy consumption.With the development of artificial intelligence and machine learning algorithms,many researchers apply these new methods to improve the quality of routing.In this paper,we propose a Qlearning-based multi-hop cooperative routing protocol(QMCR)for UWSNs.Our protocol can automatically choose nodes with the maximum Q-value as forwarders based on distance information.Moreover,we combine cooperative communications with Q-learning algorithm to reduce network energy consumption and improve communication efficiency.Experimental results show that the running time of the QMCR is less than one-tenth of that of the artificial fish-swarm algorithm(AFSA),while the routing energy consumption is kept at the same level.Due to the extremely fast speed of the algorithm,the QMCR is a promising method of routing design for UWSNs,especially for the case that it suffers from the extreme dynamic underwater acoustic channels in the real ocean environment.
文摘The paper presents a fuzzy Q-learning(FQL)and optical flow-based autonomous navigation approach.The FQL method takes decisions in an unknown environment and without mapping,using motion information and through a reinforcement signal into an evolutionary algorithm.The reinforcement signal is calculated by estimating the optical flow densities in areas of the camera to determine whether they are“dense”or“thin”which has a relationship with the proximity of objects.The results obtained show that the present approach improves the rate of learning compared with a method with a simple reward system and without the evolutionary component.The proposed system was implemented in a virtual robotics system using the CoppeliaSim software and in communication with Python.
基金funded by Ministry of Higher Education(MoHE)Malaysia,under Transdisciplinary Research Grant Scheme(TRGS/1/2019/UKM/01/4/2).
文摘The Cross-domain Heuristic Search Challenge(CHeSC)is a competition focused on creating efficient search algorithms adaptable to diverse problem domains.Selection hyper-heuristics are a class of algorithms that dynamically choose heuristics during the search process.Numerous selection hyper-heuristics have different imple-mentation strategies.However,comparisons between them are lacking in the literature,and previous works have not highlighted the beneficial and detrimental implementation methods of different components.The question is how to effectively employ them to produce an efficient search heuristic.Furthermore,the algorithms that competed in the inaugural CHeSC have not been collectively reviewed.This work conducts a review analysis of the top twenty competitors from this competition to identify effective and ineffective strategies influencing algorithmic performance.A summary of the main characteristics and classification of the algorithms is presented.The analysis underlines efficient and inefficient methods in eight key components,including search points,search phases,heuristic selection,move acceptance,feedback,Tabu mechanism,restart mechanism,and low-level heuristic parameter control.This review analyzes the components referencing the competition’s final leaderboard and discusses future research directions for these components.The effective approaches,identified as having the highest quality index,are mixed search point,iterated search phases,relay hybridization selection,threshold acceptance,mixed learning,Tabu heuristics,stochastic restart,and dynamic parameters.Findings are also compared with recent trends in hyper-heuristics.This work enhances the understanding of selection hyper-heuristics,offering valuable insights for researchers and practitioners aiming to develop effective search algorithms for diverse problem domains.
文摘By comparing price plans offered by several retail energy firms,end users with smart meters and controllers may optimize their energy use cost portfolios,due to the growth of deregulated retail power markets.To help smart grid end-users decrease power payment and usage unhappiness,this article suggests a decision system based on reinforcement learning to aid with electricity price plan selection.An enhanced state-based Markov decision process(MDP)without transition probabilities simulates the decision issue.A Kernel approximate-integrated batch Q-learning approach is used to tackle the given issue.Several adjustments to the sampling and data representation are made to increase the computational and prediction performance.Using a continuous high-dimensional state space,the suggested approach can uncover the underlying characteristics of time-varying pricing schemes.Without knowing anything regarding the market environment in advance,the best decision-making policy may be learned via case studies that use data from actual historical price plans.Experiments show that the suggested decision approach may reduce cost and energy usage dissatisfaction by using user data to build an accurate prediction strategy.In this research,we look at how smart city energy planners rely on precise load forecasts.It presents a hybrid method that extracts associated characteristics to improve accuracy in residential power consumption forecasts using machine learning(ML).It is possible to measure the precision of forecasts with the use of loss functions with the RMSE.This research presents a methodology for estimating smart home energy usage in response to the growing interest in explainable artificial intelligence(XAI).Using Shapley Additive explanations(SHAP)approaches,this strategy makes it easy for consumers to comprehend their energy use trends.To predict future energy use,the study employs gradient boosting in conjunction with long short-term memory neural networks.
基金supported by the Science and Technology Development Fund(FDCT),Macao SAR(No.0019/2021/A)National Natural Science Foundation of China(No.62173356)+2 种基金Zhuhai Industry-University-Research Project with Hongkong and Macao(No.ZH22017002210014PWC)Guangdong Basic and Applied Basic Research Foundation(No.2023A1515011531)Key Technologies for Scheduling and Optimization of Complex Distributed Manufacturing Systems(No.22JR10KA007).
文摘As the global economy develops and people's awareness of environmental protection increases,the efficient scheduling of production lines in workshops has received more and more attention.However,there is very little research focusing on distributed scheduling for heterogeneous factories.This study addresses a multi-objective distributed heterogeneous permutation flow shop scheduling problem with sequence-dependent setup times(DHPFSP-SDST).The objective is to optimize the trade-off between the maximum completion time(Makespan)and total energy consumption.First,to describe the concerned problems,we establish a mathematical model.Second,we use the artificial bee colony(ABC)algorithm to optimize the two objectives,incorporating five local search strategies tailored to the problem characteristics to enhance the algorithm's performance.Third,to improve the convergence speed of the algorithm,a Q-learning based strategy is designed to select the appropriated local search operator during iterations.Finally,based on experiments conducted on 72 instances,statistical analysis and discussions show that the Q-learning based ABC algorithm can effectively solve the problems better than its peers.
文摘Purpose–The examination timetabling problem is an NP-hard problem.A large number of approaches for this problem are developed to find more appropriate search strategies.Hyper-heuristic is a kind of representative methods.In hyper-heuristic,the high-level search is executed to construct heuristic lists by traditional methods(such as Tabu search,variable neighborhoods and so on).The purpose of this paper is to apply the evolutionary strategy instead of traditional methods for high-level search to improve the capability of global search.Design/methodology/approach–This paper combines hyper-heuristic with evolutionary strategy to solve examination timetabling problems.First,four graph coloring heuristics are employed to construct heuristic lists.Within the evolutionary algorithm framework,the iterative initialization is utilized to improve the number of feasible solutions in the population;meanwhile,the crossover and mutation operators are applied to find potential heuristic lists in the heuristic space(high-level search).At last,two local search methods are combined to optimize the feasible solutions in the solution space(low-level search).Findings–Experimental results demonstrate that the proposed approach obtains competitive results and outperforms the compared approaches on some benchmark instances.Originality/value–The contribution of this paper is the development of a framework which combines evolutionary algorithm and hyper-heuristic for examination timetabling problems.
基金supported in part by the National Natural Science Foundation of China(61473053)the Science and Technology Innovation Foundation of Dalian,China(2020JJ26GX033)。
文摘The uninterrupted operation of the quay crane(QC)ensures that the large container ship can depart port within laytime,which effectively reduces the handling cost for the container terminal and ship owners.The QC waiting caused by automated guided vehicles(AGVs)delay in the uncertain environment can be alleviated by dynamic scheduling optimization.A dynamic scheduling process is introduced in this paper to solve the AGV scheduling and path planning problems,in which the scheduling scheme determines the starting and ending nodes of paths,and the choice of paths between nodes affects the scheduling of subsequent AGVs.This work proposes a two-stage mixed integer optimization model to minimize the transportation cost of AGVs under the constraint of laytime.A dynamic optimization algorithm,including the improved rule-based heuristic algorithm and the integration of the Dijkstra algorithm and the Q-Learning algorithm,is designed to solve the optimal AGV scheduling and path schemes.A new conflict avoidance strategy based on graph theory is also proposed to reduce the probability of path conflicts between AGVs.Numerical experiments are conducted to demonstrate the effectiveness of the proposed model and algorithm over existing methods.
基金supported by the National Key Research and Development Plan(No.2020YFB1713600)the National Natural Science Foundation of China(No.62063021)+2 种基金the Lanzhou Science Bureau Project(No.2018-rc-98)Public Welfare Project of Zhejiang Natural Science Foundation(No.LGJ19E050001)Project of Zhejiang Natural Science Foundation(No.LQ20F020011).
文摘A hyper-heuristic algorithm is a general solution framework that adaptively selects the optimizer to address complex problems.A classical hyper-heuristic framework consists of two levels,including the high-level heuristic and a set of low-level heuristics.The low-level heuristics to be used in the optimization process are chosen by the high-level tactics in the hyper-heuristic.In this study,a Cooperative Multi-Stage Hyper-Heuristic(CMS-HH)algorithm is proposed to address certain combinatorial optimization problems.In the CMS-HH,a genetic algorithm is introduced to perturb the initial solution to increase the diversity of the solution.In the search phase,an online learning mechanism based on the multi-armed bandits and relay hybridization technology are proposed to improve the quality of the solution.In addition,a multi-point search is introduced to cooperatively search with a single-point search when the state of the solution does not change in continuous time.The performance of the CMS-HH algorithm is assessed in six specific combinatorial optimization problems,including Boolean satisfiability problems,one-dimensional packing problems,permutation flow-shop scheduling problems,personnel scheduling problems,traveling salesman problems,and vehicle routing problems.The experimental results demonstrate the efficiency and significance of the proposed CMS-HH algorithm.
文摘Two-stage hybrid flow shop scheduling has been extensively considered in single-factory settings.However,the distributed two-stage hybrid flow shop scheduling problem(DTHFSP)with fuzzy processing time is seldom investigated in multiple factories.Furthermore,the integration of reinforcement learning and metaheuristic is seldom applied to solve DTHFSP.In the current study,DTHFSP with fuzzy processing time was investigated,and a novel Q-learning-based teaching-learning based optimization(QTLBO)was constructed to minimize makespan.Several teachers were recruited for this study.The teacher phase,learner phase,teacher’s self-learning phase,and learner’s self-learning phase were designed.The Q-learning algorithm was implemented by 9 states,4 actions defined as combinations of the above phases,a reward,and an adaptive action selection,which were applied to dynamically adjust the algorithm structure.A number of experiments were conducted.The computational results demonstrate that the new strategies of QTLBO are effective;furthermore,it presents promising results on the considered DTHFSP.
基金This work was financially supported by the National Natural Science Foundation of China(Grant No.51575528)the Science Foundation of China University of Petroleum,Beijing(No.2462022QEDX011).
文摘Pipeline isolation plugging robot (PIPR) is an important tool in pipeline maintenance operation. During the plugging process, the violent vibration will occur by the flow field, which can cause serious damage to the pipeline and PIPR. In this paper, we propose a dynamic regulating strategy to reduce the plugging-induced vibration by regulating the spoiler angle and plugging velocity. Firstly, the dynamic plugging simulation and experiment are performed to study the flow field changes during dynamic plugging. And the pressure difference is proposed to evaluate the degree of flow field vibration. Secondly, the mathematical models of pressure difference with plugging states and spoiler angles are established based on the extreme learning machine (ELM) optimized by improved sparrow search algorithm (ISSA). Finally, a modified Q-learning algorithm based on simulated annealing is applied to determine the optimal strategy for the spoiler angle and plugging velocity in real time. The results show that the proposed method can reduce the plugging-induced vibration by 19.9% and 32.7% on average, compared with single-regulating methods. This study can effectively ensure the stability of the plugging process.
基金supported in part by the National Natural Science Foundation of China under Grant 61671086 and Grant 61629101。
文摘In this paper,the problem of trajectory de-sign of unmanned aerial vehicles(UAVs)for maximizing the number of satisfied users is studied in a UAV based cellular network where the UAV works as a flying base station that serves users,and the user indicates its satis-faction in terms of completion of its data request within an allowable maximum waiting time.The trajectory design is formulated as an optimization problem whose goal is to maximize the number of satisfied users.To solve this problem,a machine learning framework based on double Q-learning algorithm is proposed.The algorithm enables the UAV tofind the optimal trajectory that maximizes the number of satisfied users.Compared to the traditional learning algorithms,such as Q-learning that selects and evaluates the action using the same Q-table,the proposed algorithm can decouple the selection from the evaluation,therefore avoid overestimation which leads to sub-optimal policies.Simulation results show that the proposed algorithm can achieve up to 19.4% and 14.1% gains in terms of the number of satisfied users compared to random algorithm and Q-learning algorithm.
文摘A novel microgrid control strategy is presented in this paper. A resilient community microgrid model, which is equipped with solar PV generation and electric vehicles (EVs) and an improved inverter control system, is considered. To fully exploit the capability of the community microgrid to operate in either grid-connected mode or islanded mode, as well as to achieve improved stability of the microgrid system, universal droop control, virtual inertia control, and a reinforcement learning-based control mechanism are combined in a cohesive manner, in which adaptive control parameters are determined online to tune the influence of the controllers. The microgrid model and control mechanisms are implemented in MATLAB/Simulink and set up in real-time simulation to test the feasibility and effectiveness of the proposed model. Experiment results reveal the effectiveness of regulating the controller’s frequency and voltage for various operating conditions and scenarios of a microgrid.
文摘There are many studies about flexible job shop scheduling problem with fuzzy processing time and deteriorating scheduling,but most scholars neglect the connection between them,which means the purpose of both models is to simulate a more realistic factory environment.From this perspective,the solutions can be more precise and practical if both issues are considered simultaneously.Therefore,the deterioration effect is treated as a part of the fuzzy job shop scheduling problem in this paper,which means the linear increase of a certain processing time is transformed into an internal linear shift of a triangle fuzzy processing time.Apart from that,many other contributions can be stated as follows.A new algorithm called reinforcement learning based biased bi-population evolutionary algorithm(RB2EA)is proposed,which utilizes Q-learning algorithm to adjust the size of the two populations and the interaction frequency according to the quality of population.A local enhancement method which combimes multiple local search stratgies is presented.An interaction mechanism is designed to promote the convergence of the bi-population.Extensive experiments are designed to evaluate the efficacy of RB2EA,and the conclusion can be drew that RB2EA is able to solve energy-efficient fuzzy flexible job shop scheduling problem with deteriorating jobs(EFFJSPD)efficiently.
基金Project supported by the National Natural Science Foundation of China(No.61501399)the SAIC MOTOR(No.1925)the National Key R&D Program of China(No.2018AAA0102302)。
文摘In an unmanned aerial vehicle ad-hoc network(UANET),sparse and rapidly mobile unmanned aerial vehicles(UAVs)/nodes can dynamically change the UANET topology.This may lead to UANET service performance issues.In this study,for planning rapidly changing UAV swarms,we propose a dynamic value iteration network(DVIN)model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function,which enables UAVs/nodes to adapt to novel physical locations.We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method.Simulation results demonstrate that the proposed model significantly reduces the decisionmaking time for UAV/node path planning with a high average success rate.
基金Supported by the Innovation Project of the China Southern Power Grid Co.,Ltd.(020000KK52210005).
文摘With the establishment of “carbon peaking and carbon neutrality” goals in China, along with the development of new power systems and ongoing electricity market reforms, pumped-storage power stations (PSPSs) will increasingly play a significant role in power systems. Therefore, this study focuses on trading and bidding strategies for PSPSs in the electricity market. Firstly, a comprehensive framework for PSPSs participating in the electricity energy and frequency regulation (FR) ancillary service market is proposed. Subsequently, a two-layer trading model is developed to achieve joint clearing in the energy and frequency regulation markets. The upper-layer model aims to maximize the revenue of the power station by optimizing the bidding strategies using a Q-learning algorithm. The lower-layer model minimized the total electricity purchasing cost of the system. Finally, the proposed bi-level trading model is validated by studying an actual case in which data are obtained from a provincial power system in China. The results indicate that through this decision-making method, PSPSs can achieve higher economic revenue in the market, which will provide a reference for the planning and operation of PSPSs.