Understanding the reinforcement effect of the newly developed prestressed reinforcement components(PRCs)(a system composed of prestressed steel bars(PSBs),protective sleeves,lateral pressure plates(LPPs),and anchoring...Understanding the reinforcement effect of the newly developed prestressed reinforcement components(PRCs)(a system composed of prestressed steel bars(PSBs),protective sleeves,lateral pressure plates(LPPs),and anchoring elements)is technically significant for the rational design of prestressed subgrade.A three-dimensional finite element model was established and verified based on a novel static model test and utilized to systematically analyze the influence of prestress levels and reinforcement modes on the reinforcement effect of the subgrade.The results show that the PRCs provide additional confining pressure to the subgrade through the diffusion effect of the prestress,which can therefore effectively improve the service performance of the subgrade.Compared to the unreinforced conventional subgrades,the settlements of prestressreinforced subgrades are reduced.The settlement attenuation rate(Rs)near the LPPs is larger than that at the subgrade center,and increasing the prestress positively contributes to the stability of the subgrade structure.In the multi-row reinforcement mode,the reinforcement effect of PRCs can extend from the reinforced area to the unreinforced area.In addition,as the horizontal distance from the LPPs increases,the additional confining pressure converted by the PSBs and LPPs gradually diminishes when spreading to the core load bearing area of the subgrade,resulting in a decrease in the Rs.Under the singlerow reinforcement mode,PRCs can be strategically arranged according to the local areas where subgrade defects readily occurred or observed,to obtain the desired reinforcement effect.Moreover,excessive prestress should not be applied near the subgrade shoulder line to avoid the shear failure of the subgrade shoulder.PRCs can be flexibly used for preventing and treating various subgrade defects of newly constructed or existing railway lines,achieving targeted and classified prevention,and effectively improving the bearing performance and deformation resistance of the subgrade.The research results are instructive for further elucidating the prestress reinforcement effect of PRCs on railway subgrades.展开更多
Lunar core samples are the key materials for accurately assessing and developing lunar resources.However,the difficulty of maintaining borehole stability in the lunar coring process limits the depth of lunar coring.He...Lunar core samples are the key materials for accurately assessing and developing lunar resources.However,the difficulty of maintaining borehole stability in the lunar coring process limits the depth of lunar coring.Here,a strategy of using a reinforcement fluid that undergoes a phase transition spontaneously in a vacuum environment to reinforce the borehole is proposed.Based on this strategy,a reinforcement liquid suitable for a wide temperature range and a high vacuum environment was developed.A feasibility study on reinforcing the borehole with the reinforcement liquid was carried out,and it is found that the cohesion of the simulated lunar soil can be increased from 2 to 800 kPa after using the reinforcement liquid.Further,a series of coring experiments are conducted using a selfdeveloped high vacuum(vacuum degree of 5 Pa)and low-temperature(between-30 and 50℃)simulation platform.It is confirmed that the high-boiling-point reinforcement liquid pre-placed in the drill pipe can be released spontaneously during the drilling process and finally complete the reinforcement of the borehole.The reinforcement effect of the borehole is better when the solute concentration is between0.15 and 0.25 g/mL.展开更多
Carbon fiber reinforced polymer(CFRP)is an advanced material widely used in bridge structures,demonstrating a promising application prospect.CFRP possesses excellent mechanical properties,construction advantages,and d...Carbon fiber reinforced polymer(CFRP)is an advanced material widely used in bridge structures,demonstrating a promising application prospect.CFRP possesses excellent mechanical properties,construction advantages,and durability benefits.Its application in bridge reinforcement can significantly enhance the overall performance of the reinforced bridge,thereby improving the durability and extending the service life of the bridge.Therefore,it is necessary to further explore how CFRP can be effectively applied in bridge reinforcement projects to improve the quality of such projects and ensure the safety of bridges during operation.展开更多
To improve the wettability of hypereutectic Al−60Si alloy and enhance the mechanical properties of the joints,Al−60Si alloy was joined by ultrasonic soldering with Sn-9Zn solder,and a sound joint with in-situ Si parti...To improve the wettability of hypereutectic Al−60Si alloy and enhance the mechanical properties of the joints,Al−60Si alloy was joined by ultrasonic soldering with Sn-9Zn solder,and a sound joint with in-situ Si particle reinforcement was obtained.The oxide film of Al−60Si alloy at the interface was identified by transmission electron microscopy(TEM)analysis as amorphous Al_(2)O_(3).The oxide of Si particles in the base metal was also alumina.The oxide film of Al−60Si alloy was observed to be removed by ultrasonic vibration instead of holding treatment.Si particle-reinforced joints(35.7 vol.%)were obtained by increasing the ultrasonication time.The maximum shear strength peaked at 99.5 MPa for soldering at 330℃with an ultrasonic vibration time of 50 s.A model of forming of Si particles reinforced joint under the ultrasound was proposed,and ultrasonic vibration was considered to promote the dissolution of Al and migration of Si particles.展开更多
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj...This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.展开更多
To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforce...To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.展开更多
This study investigates the flexural performance of ultra-high performance concrete(UHPC)in reinforced concrete T-beams,focusing on the effects of interfacial treatments.Three concrete T-beam specimens were fabricated...This study investigates the flexural performance of ultra-high performance concrete(UHPC)in reinforced concrete T-beams,focusing on the effects of interfacial treatments.Three concrete T-beam specimens were fabricated and tested:a control beam(RC-T),a UHPC-reinforced beam with a chiseled interface(UN-C-50F),and a UHPC-reinforced beam featuring both a chiseled interface and anchored steel rebars(UN-CS-50F).The test results indicated that both chiseling and the incorporation of anchored rebars effectively created a synergistic combination between the concrete T-beam and the UHPC reinforcement layer,with the UN-CS-50F exhibiting the highest flexural resistance.The cracking load and ultimate load of UN-CS-50F were 221.5%and 40.8%,respectively,higher than those of the RC-T.Finite element(FE)models were developed to provide further insights into the behavior of the UHPCreinforced T-beams,showing a maximumdeviation of just 8%when validated against experimental data.A parametric analysis varied the height,thickness,andmaterial strength of the UHPC reinforcement layer based on the validated FE model,revealing that increasing the UHPC layer thickness from 30 to 50 mm improved the ultimate resistance by 20%while reducing the UHPC reinforcement height from 440 to 300 mm led to a 10%decrease in bending resistance.The interfacial anchoring rebars significantly reduced crack propagation and enhanced stress redistribution,highlighting the importance of strengthening interfacial bonds and optimizing geometric parameters ofUHPCfor improved T-beam performance.These findings offer valuable insights for the design and retrofitting of UHPC-reinforced bridge girders.展开更多
Granite residual soil (GRS) is a type of weathering soil that can decompose upon contact with water, potentially causing geological hazards. In this study, cement, an alkaline solution, and glass fiber were used to re...Granite residual soil (GRS) is a type of weathering soil that can decompose upon contact with water, potentially causing geological hazards. In this study, cement, an alkaline solution, and glass fiber were used to reinforce GRS. The effects of cement content and SiO_(2)/Na2O ratio of the alkaline solution on the static and dynamic strengths of GRS were discussed. Microscopically, the reinforcement mechanism and coupling effect were examined using X-ray diffraction (XRD), micro-computed tomography (micro-CT), and scanning electron microscopy (SEM). The results indicated that the addition of 2% cement and an alkaline solution with an SiO_(2)/Na2O ratio of 0.5 led to the densest matrix, lowest porosity, and highest static compressive strength, which was 4994 kPa with a dynamic impact resistance of 75.4 kN after adding glass fiber. The compressive strength and dynamic impact resistance were a result of the coupling effect of cement hydration, a pozzolanic reaction of clay minerals in the GRS, and the alkali activation of clay minerals. Excessive cement addition or an excessively high SiO_(2)/Na2O ratio in the alkaline solution can have negative effects, such as the destruction of C-(A)-S-H gels by the alkaline solution and hindering the production of N-A-S-H gels. This can result in damage to the matrix of reinforced GRS, leading to a decrease in both static and dynamic strengths. This study suggests that further research is required to gain a more precise understanding of the effects of this mixture in terms of reducing our carbon footprint and optimizing its properties. The findings indicate that cement and alkaline solution are appropriate for GRS and that the reinforced GRS can be used for high-strength foundation and embankment construction. The study provides an analysis of strategies for mitigating and managing GRS slope failures, as well as enhancing roadbed performance.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision u...Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.展开更多
Grouting has been the most effective approach to mitigate water inrush disasters in underground engineering due to its ability to plug groundwater and enhance rock strength.Nevertheless,there is a lack of potent numer...Grouting has been the most effective approach to mitigate water inrush disasters in underground engineering due to its ability to plug groundwater and enhance rock strength.Nevertheless,there is a lack of potent numerical tools for assessing the grouting effectiveness in water-rich fractured strata.In this study,the hydro-mechanical coupled discontinuous deformation analysis(HM-DDA)is inaugurally extended to simulate the grouting process in a water-rich discrete fracture network(DFN),including the slurry migration,fracture dilation,water plugging in a seepage field,and joint reinforcement after coagulation.To validate the capabilities of the developed method,several numerical examples are conducted incorporating the Newtonian fluid and Bingham slurry.The simulation results closely align with the analytical solutions.Additionally,a set of compression tests is conducted on the fresh and grouted rock specimens to verify the reinforcement method and calibrate the rational properties of reinforced joints.An engineering-scale model based on a real water inrush case of the Yonglian tunnel in a water-rich fractured zone has been established.The model demonstrates the effectiveness of grouting reinforcement in mitigating water inrush disaster.The results indicate that increased grouting pressure greatly affects the regulation of water outflow from the tunnel face and the prevention of rock detachment face after excavation.展开更多
Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To sa...Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.展开更多
Dopamine polymerization reaction and hydrothermal method were used to prepare nickel coated Al_(2)O_(3)reinforcement phase(Ni/Al_(2)O_(3)).Ni/Al_(2)O_(3)reinforced Sn_(1.0)Ag_(0.5)Cu(SAC105)composite solder was prepar...Dopamine polymerization reaction and hydrothermal method were used to prepare nickel coated Al_(2)O_(3)reinforcement phase(Ni/Al_(2)O_(3)).Ni/Al_(2)O_(3)reinforced Sn_(1.0)Ag_(0.5)Cu(SAC105)composite solder was prepared using traditional casting method.The result shows that the nickel coating layer is continuous with uneven thickness.The interface between nickel and aluminum oxide exhibits a metallurgical bonding with coherent interface relationship.The strength,toughness and wettability of the SAC105 solder on the substrate are improved,while the conductivity is not decreased significantly.The fracture mode of composites transitions from a mixed toughness-brittleness mode to a purely toughness-dominated mode,characterized by many dimples.The prepared composite brazing material was made into solder paste for copper plate lap joint experiments.The maximum shear strength is achieved when the doping amount was 0.3wt%.The growth index of intermetallic compound at the brazing interface of Ni/Al_(2)O_(3)reinforced SAC105 composite solder is linearly fitted to n=0.39,demonstrating that the growth of intermetallic compound at the interface is a combined effect of grain boundary diffusion and bulk diffusion.展开更多
Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous...Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous control process of SMR can be divided into three stages,say,state diagnosis,autonomous decision-making and coordinated control.In this paper,the autonomous state recognition and task planning of unmanned SMR are investigated.An operating condition recognition method based on the knowledge base of SMR operation is proposed by using the artificial neural network(ANN)technology,which constructs a basis for the state judgment of intelligent reactor control path planning.An improved reinforcement learning path planning algorithm is utilized to implement the path transfer decision-makingThis algorithm performs condition transitions with minimal cost under specified modes.In summary,the full range control path intelligent decision-planning technology of SMR is realized,thus provides some theoretical basis for the design and build of unmanned SMR in the future.展开更多
AIM:To investigate the refractive and the histological changes in guinea pig eyes after posterior scleral reinforcement with scleral allografts.METHODS:Four-week-old guinea pigs were implanted with scleral allografts,...AIM:To investigate the refractive and the histological changes in guinea pig eyes after posterior scleral reinforcement with scleral allografts.METHODS:Four-week-old guinea pigs were implanted with scleral allografts,and the changes of refraction,corneal curvature and axis length were monitored for 51d.The effects of methylprednisolone(MPS)on refraction parameters were also evaluated.And the microstructure and ultra-microstructure of eyes were observed on the 9d and 51d after operation.Repeated-measures analysis of variance and one-way analysis of variance were used.RESULTS:The refraction outcome of the implanted eye decreased after operation,and the refraction change of the 3 mm scleral allografts group was significantly different with control group(P=0.005)and the sham surgical group(P=0.004).After the application of MPS solution,the reduction of refraction outcome was statistically suppressed(P=0.008).The inflammatory encapsulation appeared 9d after surgery.On 51d after operation,the loose implanted materials were absorbed,while the adherent implanted materials with MPS group were still tightly attached to the recipient’s eyeball.CONCLUSION:After implantation of scleral allografts,the refraction of guinea pig eyes fluctuated from a decrease to an increase.The outcome of the scleral allografts is affected by implantation methods and the inflammatory response.Stability of the material can be improved by MPS.展开更多
In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies an...In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.展开更多
The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in th...The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in this domain,the existing methods often lack strategic depth and generalization in complex,high-dimensional environments.To address these limitations,this paper proposes an optimized self-play method enhanced by advancements in fighter modeling,neural network design,and algorithmic frameworks.This study employs a six-degree-of-freedom(6-DOF)F-16 fighter model based on open-source aerodynamic data,featuring airborne equipment and a realistic visual simulation platform,unlike traditional 3-DOF models.To capture temporal dynamics,Long Short-Term Memory(LSTM)layers are integrated into the neural network,complemented by delayed input stacking.The RL environment incorporates expert strategies,curiositydriven rewards,and curriculum learning to improve adaptability and strategic decision-making.Experimental results demonstrate that the proposed approach achieves a winning rate exceeding90%against classical single-agent methods.Additionally,through enhanced 3D visual platforms,we conducted human-agent confrontation experiments,where the agent attained an average winning rate of over 75%.The agent's maneuver trajectories closely align with human pilot strategies,showcasing its potential in decision-making and pilot training applications.This study highlights the effectiveness of integrating advanced modeling and self-play techniques in developing robust air combat decision-making systems.展开更多
The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggl...The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggle with this aspect.A potential reason is that the benchmarks used for training and evaluation may not adequately offer a diverse set of transferable tasks.Although recent studies have developed bench-marking environments to address this shortcoming,they typically fall short in providing tasks that both ensure a solid foundation for generalization and exhibit significant variability.To overcome these limitations,this work introduces the concept that‘objects are composed of more fundamental components’in environment design,as implemented in the proposed environment called summon the magic(StM).This environment generates tasks where objects are derived from extensible and shareable basic components,facilitating strategy reuse and enhancing generalization.Furthermore,two new metrics,adaptation sensitivity range(ASR)and parameter correlation coefficient(PCC),are proposed to better capture and evaluate the generalization process of RL agents.Experimental results show that increasing the number of basic components of the object reduces the proximal policy optimization(PPO)agent’s training-testing gap by 60.9%(in episode reward),significantly alleviating overfitting.Additionally,linear variations in other environmental factors,such as the training monster set proportion and the total number of basic components,uniformly decrease the gap by at least 32.1%.These results highlight StM’s effectiveness in benchmarking and probing the generalization capabilities of RL algorithms.展开更多
Smart learning environments have been considered as vital sources and essential needs in modern digital education systems.With the rapid proliferation of smart and assistive technologies,smart learning processes have ...Smart learning environments have been considered as vital sources and essential needs in modern digital education systems.With the rapid proliferation of smart and assistive technologies,smart learning processes have become quite convenient,comfortable,and financially affordable.This shift has led to the emergence of pervasive computing environments,where user’s intelligent behavior is supported by smart gadgets;however,it is becoming more challenging due to inconsistent behavior of Artificial intelligence(AI)assistive technologies in terms of networking issues,slow user responses to technologies and limited computational resources.This paper presents a context-aware predictive reasoning based formalism for smart learning environments that facilitates students in managing their academic as well as extra-curricular activities autonomously with limited human intervention.This system consists of a three-tier architecture including the acquisition of the contextualized information from the environment autonomously,modeling the system using Web Ontology Rule Language(OWL 2 RL)and Semantic Web Rule Language(SWRL),and perform reasoning to infer the desired goals whenever and wherever needed.For contextual reasoning,we develop a non-monotonic reasoning based formalism to reason with contextual information using rule-based reasoning.The focus is on distributed problem solving,where context-aware agents exchange information using rule-based reasoning and specify constraints to accomplish desired goals.To formally model-check and simulate the system behavior,we model the case study of a smart learning environment in the UPPAAL model checker and verify the desired properties in the model,such as safety,liveness and robust properties to reflect the overall correctness behavior of the system with achieving the minimum analysis time of 0.002 s and 34,712 KB memory utilization.展开更多
Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)du...Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)due to its tamper-proof and non-repudiation features.Although blockchain typically does not require the endorsement of third-party trust organizations,it mostly needs to perform necessary mathematical calculations to prevent malicious attacks,which results in stricter requirements for computation resources on the participating devices.By offloading the computation tasks required to support blockchain consensus to edge service nodes or the cloud,while providing data privacy protection for IoT applications,it can effectively address the limitations of computation and energy resources in IoT devices.However,how to make reasonable offloading decisions for IoT devices remains an open issue.Due to the excellent self-learning ability of Reinforcement Learning(RL),this paper proposes a RL enabled Swarm Intelligence Optimization Algorithm(RLSIOA)that aims to improve the quality of initial solutions and achieve efficient optimization of computation task offloading decisions.The algorithm considers various factors that may affect the revenue obtained by IoT devices executing consensus algorithms(e.g.,Proof-of-Work),it optimizes the proportion of sub-tasks to be offloaded and the scale of computing resources to be rented from the edge and cloud to maximize the revenue of devices.Experimental results show that RLSIOA can obtain higher-quality offloading decision-making schemes at lower latency costs compared to representative benchmark algorithms.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.51978672 and 52308335)the Natural Science Funding of Hunan Province(Grant No.2023JJ41054)the Natural Science Research Project of Anhui Educational Committee(Grant No.2023AH051170)。
文摘Understanding the reinforcement effect of the newly developed prestressed reinforcement components(PRCs)(a system composed of prestressed steel bars(PSBs),protective sleeves,lateral pressure plates(LPPs),and anchoring elements)is technically significant for the rational design of prestressed subgrade.A three-dimensional finite element model was established and verified based on a novel static model test and utilized to systematically analyze the influence of prestress levels and reinforcement modes on the reinforcement effect of the subgrade.The results show that the PRCs provide additional confining pressure to the subgrade through the diffusion effect of the prestress,which can therefore effectively improve the service performance of the subgrade.Compared to the unreinforced conventional subgrades,the settlements of prestressreinforced subgrades are reduced.The settlement attenuation rate(Rs)near the LPPs is larger than that at the subgrade center,and increasing the prestress positively contributes to the stability of the subgrade structure.In the multi-row reinforcement mode,the reinforcement effect of PRCs can extend from the reinforced area to the unreinforced area.In addition,as the horizontal distance from the LPPs increases,the additional confining pressure converted by the PSBs and LPPs gradually diminishes when spreading to the core load bearing area of the subgrade,resulting in a decrease in the Rs.Under the singlerow reinforcement mode,PRCs can be strategically arranged according to the local areas where subgrade defects readily occurred or observed,to obtain the desired reinforcement effect.Moreover,excessive prestress should not be applied near the subgrade shoulder line to avoid the shear failure of the subgrade shoulder.PRCs can be flexibly used for preventing and treating various subgrade defects of newly constructed or existing railway lines,achieving targeted and classified prevention,and effectively improving the bearing performance and deformation resistance of the subgrade.The research results are instructive for further elucidating the prestress reinforcement effect of PRCs on railway subgrades.
基金National Natural Science Foundation of China (Nos.U2013603,51827901,and 52403383)Program for Guangdong Introducing Innovative and Entrepreneurial Teams (No.2019ZT08G315)+1 种基金Institute of New Energy and Low-Carbon Technology (Sichuan University)State Key Laboratory of Coal Mine Disaster Dynamics and Control of Chongqing University。
文摘Lunar core samples are the key materials for accurately assessing and developing lunar resources.However,the difficulty of maintaining borehole stability in the lunar coring process limits the depth of lunar coring.Here,a strategy of using a reinforcement fluid that undergoes a phase transition spontaneously in a vacuum environment to reinforce the borehole is proposed.Based on this strategy,a reinforcement liquid suitable for a wide temperature range and a high vacuum environment was developed.A feasibility study on reinforcing the borehole with the reinforcement liquid was carried out,and it is found that the cohesion of the simulated lunar soil can be increased from 2 to 800 kPa after using the reinforcement liquid.Further,a series of coring experiments are conducted using a selfdeveloped high vacuum(vacuum degree of 5 Pa)and low-temperature(between-30 and 50℃)simulation platform.It is confirmed that the high-boiling-point reinforcement liquid pre-placed in the drill pipe can be released spontaneously during the drilling process and finally complete the reinforcement of the borehole.The reinforcement effect of the borehole is better when the solute concentration is between0.15 and 0.25 g/mL.
文摘Carbon fiber reinforced polymer(CFRP)is an advanced material widely used in bridge structures,demonstrating a promising application prospect.CFRP possesses excellent mechanical properties,construction advantages,and durability benefits.Its application in bridge reinforcement can significantly enhance the overall performance of the reinforced bridge,thereby improving the durability and extending the service life of the bridge.Therefore,it is necessary to further explore how CFRP can be effectively applied in bridge reinforcement projects to improve the quality of such projects and ensure the safety of bridges during operation.
基金financial support from the National Natural Science Foundation of China(Nos.52275385,U2167216)Sichuan Province Science and Technology Support Program,China(No.2022YFG0086).
文摘To improve the wettability of hypereutectic Al−60Si alloy and enhance the mechanical properties of the joints,Al−60Si alloy was joined by ultrasonic soldering with Sn-9Zn solder,and a sound joint with in-situ Si particle reinforcement was obtained.The oxide film of Al−60Si alloy at the interface was identified by transmission electron microscopy(TEM)analysis as amorphous Al_(2)O_(3).The oxide of Si particles in the base metal was also alumina.The oxide film of Al−60Si alloy was observed to be removed by ultrasonic vibration instead of holding treatment.Si particle-reinforced joints(35.7 vol.%)were obtained by increasing the ultrasonication time.The maximum shear strength peaked at 99.5 MPa for soldering at 330℃with an ultrasonic vibration time of 50 s.A model of forming of Si particles reinforced joint under the ultrasound was proposed,and ultrasonic vibration was considered to promote the dissolution of Al and migration of Si particles.
基金supported by the National Natural Science Foundation of China(Nos.12272104,U22B2013).
文摘This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.
基金Supported by National Key Research and Development Program of China(Grant No.2022YFE0117100)National Science Foundation of China(Grant No.52102468,52325212)Fundamental Research Funds for the Central Universities。
文摘To solve problems of poor security guarantee and insufficient training efficiency in the conventional reinforcement learning methods for decision-making,this study proposes a hybrid framework to combine deep reinforcement learning with rule-based decision-making methods.A risk assessment model for lane-change maneuvers considering uncertain predictions of surrounding vehicles is established as a safety filter to improve learning efficiency while correcting dangerous actions for safety enhancement.On this basis,a Risk-fused DDQN is constructed utilizing the model-based risk assessment and supervision mechanism.The proposed reinforcement learning algorithm sets up a separate experience buffer for dangerous trials and punishes such actions,which is shown to improve the sampling efficiency and training outcomes.Compared with conventional DDQN methods,the proposed algorithm improves the convergence value of cumulated reward by 7.6%and 2.2%in the two constructed scenarios in the simulation study and reduces the number of training episodes by 52.2%and 66.8%respectively.The success rate of lane change is improved by 57.3%while the time headway is increased at least by 16.5%in real vehicle tests,which confirms the higher training efficiency,scenario adaptability,and security of the proposed Risk-fused DDQN.
基金The National Natural Science Foundation of China(Grant#52278161)the Science and Technology Project of Guangzhou(Grant#2024A04J9888)the Guangdong Basic and Applied Basic Research Foundation(Grant#2023A1515010535).
文摘This study investigates the flexural performance of ultra-high performance concrete(UHPC)in reinforced concrete T-beams,focusing on the effects of interfacial treatments.Three concrete T-beam specimens were fabricated and tested:a control beam(RC-T),a UHPC-reinforced beam with a chiseled interface(UN-C-50F),and a UHPC-reinforced beam featuring both a chiseled interface and anchored steel rebars(UN-CS-50F).The test results indicated that both chiseling and the incorporation of anchored rebars effectively created a synergistic combination between the concrete T-beam and the UHPC reinforcement layer,with the UN-CS-50F exhibiting the highest flexural resistance.The cracking load and ultimate load of UN-CS-50F were 221.5%and 40.8%,respectively,higher than those of the RC-T.Finite element(FE)models were developed to provide further insights into the behavior of the UHPCreinforced T-beams,showing a maximumdeviation of just 8%when validated against experimental data.A parametric analysis varied the height,thickness,andmaterial strength of the UHPC reinforcement layer based on the validated FE model,revealing that increasing the UHPC layer thickness from 30 to 50 mm improved the ultimate resistance by 20%while reducing the UHPC reinforcement height from 440 to 300 mm led to a 10%decrease in bending resistance.The interfacial anchoring rebars significantly reduced crack propagation and enhanced stress redistribution,highlighting the importance of strengthening interfacial bonds and optimizing geometric parameters ofUHPCfor improved T-beam performance.These findings offer valuable insights for the design and retrofitting of UHPC-reinforced bridge girders.
基金the support provided by the National Natural Science Foundation of China(Grant Nos.52278336 and 42302032)Guangdong Basic and Applied Research Foundation(Grant Nos.2023B1515020061).
文摘Granite residual soil (GRS) is a type of weathering soil that can decompose upon contact with water, potentially causing geological hazards. In this study, cement, an alkaline solution, and glass fiber were used to reinforce GRS. The effects of cement content and SiO_(2)/Na2O ratio of the alkaline solution on the static and dynamic strengths of GRS were discussed. Microscopically, the reinforcement mechanism and coupling effect were examined using X-ray diffraction (XRD), micro-computed tomography (micro-CT), and scanning electron microscopy (SEM). The results indicated that the addition of 2% cement and an alkaline solution with an SiO_(2)/Na2O ratio of 0.5 led to the densest matrix, lowest porosity, and highest static compressive strength, which was 4994 kPa with a dynamic impact resistance of 75.4 kN after adding glass fiber. The compressive strength and dynamic impact resistance were a result of the coupling effect of cement hydration, a pozzolanic reaction of clay minerals in the GRS, and the alkali activation of clay minerals. Excessive cement addition or an excessively high SiO_(2)/Na2O ratio in the alkaline solution can have negative effects, such as the destruction of C-(A)-S-H gels by the alkaline solution and hindering the production of N-A-S-H gels. This can result in damage to the matrix of reinforced GRS, leading to a decrease in both static and dynamic strengths. This study suggests that further research is required to gain a more precise understanding of the effects of this mixture in terms of reducing our carbon footprint and optimizing its properties. The findings indicate that cement and alkaline solution are appropriate for GRS and that the reinforced GRS can be used for high-strength foundation and embankment construction. The study provides an analysis of strategies for mitigating and managing GRS slope failures, as well as enhancing roadbed performance.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
基金co-supported by the National Natural Science Foundation of China(No.62103432)the China Postdoctoral Science Foundation(No.284881)the Young Talent fund of University Association for Science and Technology in Shaanxi,China(No.20210108)。
文摘Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.
基金supported by the China Scholarship Council(CSC,Grant No.202108050072)JSPS KAKENHI(Grant No.JP19KK0121)。
文摘Grouting has been the most effective approach to mitigate water inrush disasters in underground engineering due to its ability to plug groundwater and enhance rock strength.Nevertheless,there is a lack of potent numerical tools for assessing the grouting effectiveness in water-rich fractured strata.In this study,the hydro-mechanical coupled discontinuous deformation analysis(HM-DDA)is inaugurally extended to simulate the grouting process in a water-rich discrete fracture network(DFN),including the slurry migration,fracture dilation,water plugging in a seepage field,and joint reinforcement after coagulation.To validate the capabilities of the developed method,several numerical examples are conducted incorporating the Newtonian fluid and Bingham slurry.The simulation results closely align with the analytical solutions.Additionally,a set of compression tests is conducted on the fresh and grouted rock specimens to verify the reinforcement method and calibrate the rational properties of reinforced joints.An engineering-scale model based on a real water inrush case of the Yonglian tunnel in a water-rich fractured zone has been established.The model demonstrates the effectiveness of grouting reinforcement in mitigating water inrush disaster.The results indicate that increased grouting pressure greatly affects the regulation of water outflow from the tunnel face and the prevention of rock detachment face after excavation.
基金National Key Research and Development Program(2021YFB2900604)。
文摘Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.
基金ational Natural Science Foundation of China(U1604132)Central Plains Talents Program-Fund of Central Plains Leading Talents(ZYYCYU002130)+1 种基金Key Technology Research and Development Program of Henan Province(222102230114)Major Scientific Research Foundation of Higher Education of Henan Province(23B430003)。
文摘Dopamine polymerization reaction and hydrothermal method were used to prepare nickel coated Al_(2)O_(3)reinforcement phase(Ni/Al_(2)O_(3)).Ni/Al_(2)O_(3)reinforced Sn_(1.0)Ag_(0.5)Cu(SAC105)composite solder was prepared using traditional casting method.The result shows that the nickel coating layer is continuous with uneven thickness.The interface between nickel and aluminum oxide exhibits a metallurgical bonding with coherent interface relationship.The strength,toughness and wettability of the SAC105 solder on the substrate are improved,while the conductivity is not decreased significantly.The fracture mode of composites transitions from a mixed toughness-brittleness mode to a purely toughness-dominated mode,characterized by many dimples.The prepared composite brazing material was made into solder paste for copper plate lap joint experiments.The maximum shear strength is achieved when the doping amount was 0.3wt%.The growth index of intermetallic compound at the brazing interface of Ni/Al_(2)O_(3)reinforced SAC105 composite solder is linearly fitted to n=0.39,demonstrating that the growth of intermetallic compound at the interface is a combined effect of grain boundary diffusion and bulk diffusion.
文摘Small modular reactor(SMR)belongs to the research forefront of nuclear reactor technology.Nowadays,advancement of intelligent control technologies paves a new way to the design and build of unmanned SMR.The autonomous control process of SMR can be divided into three stages,say,state diagnosis,autonomous decision-making and coordinated control.In this paper,the autonomous state recognition and task planning of unmanned SMR are investigated.An operating condition recognition method based on the knowledge base of SMR operation is proposed by using the artificial neural network(ANN)technology,which constructs a basis for the state judgment of intelligent reactor control path planning.An improved reinforcement learning path planning algorithm is utilized to implement the path transfer decision-makingThis algorithm performs condition transitions with minimal cost under specified modes.In summary,the full range control path intelligent decision-planning technology of SMR is realized,thus provides some theoretical basis for the design and build of unmanned SMR in the future.
基金Supported by the Scientific Research Project of Shanghai Municipal Health Commission(No.202140416)the Clinical Research Boosting Program of the Ninth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine(No.JYLJ202117).
文摘AIM:To investigate the refractive and the histological changes in guinea pig eyes after posterior scleral reinforcement with scleral allografts.METHODS:Four-week-old guinea pigs were implanted with scleral allografts,and the changes of refraction,corneal curvature and axis length were monitored for 51d.The effects of methylprednisolone(MPS)on refraction parameters were also evaluated.And the microstructure and ultra-microstructure of eyes were observed on the 9d and 51d after operation.Repeated-measures analysis of variance and one-way analysis of variance were used.RESULTS:The refraction outcome of the implanted eye decreased after operation,and the refraction change of the 3 mm scleral allografts group was significantly different with control group(P=0.005)and the sham surgical group(P=0.004).After the application of MPS solution,the reduction of refraction outcome was statistically suppressed(P=0.008).The inflammatory encapsulation appeared 9d after surgery.On 51d after operation,the loose implanted materials were absorbed,while the adherent implanted materials with MPS group were still tightly attached to the recipient’s eyeball.CONCLUSION:After implantation of scleral allografts,the refraction of guinea pig eyes fluctuated from a decrease to an increase.The outcome of the scleral allografts is affected by implantation methods and the inflammatory response.Stability of the material can be improved by MPS.
基金co-supported by the National Natural Science Foundation of China(Nos.92371201 and 52192633)the Natural Science Foundation of Shaanxi Province of China(No.2022JC-03)the Aeronautical Science Foundation of China(No.ASFC-20220019070002)。
文摘In multiple Unmanned Aerial Vehicles(UAV)systems,achieving efficient navigation is essential for executing complex tasks and enhancing autonomy.Traditional navigation methods depend on predefined control strategies and trajectory planning and often perform poorly in complex environments.To improve the UAV-environment interaction efficiency,this study proposes a multi-UAV integrated navigation algorithm based on Deep Reinforcement Learning(DRL).This algorithm integrates the Inertial Navigation System(INS),Global Navigation Satellite System(GNSS),and Visual Navigation System(VNS)for comprehensive information fusion.Specifically,an improved multi-UAV integrated navigation algorithm called Information Fusion with MultiAgent Deep Deterministic Policy Gradient(IF-MADDPG)was developed.This algorithm enables UAVs to learn collaboratively and optimize their flight trajectories in real time.Through simulations and experiments,test scenarios in GNSS-denied environments were constructed to evaluate the effectiveness of the algorithm.The experimental results demonstrate that the IF-MADDPG algorithm significantly enhances the collaborative navigation capabilities of multiple UAVs in formation maintenance and GNSS-denied environments.Additionally,it has advantages in terms of mission completion time.This study provides a novel approach for efficient collaboration in multi-UAV systems,which significantly improves the robustness and adaptability of navigation systems.
基金co-supported by the National Natural Science Foundation of China(No.91852115)。
文摘The high maneuverability of modern fighters in close air combat imposes significant cognitive demands on pilots,making rapid,accurate decision-making challenging.While reinforcement learning(RL)has shown promise in this domain,the existing methods often lack strategic depth and generalization in complex,high-dimensional environments.To address these limitations,this paper proposes an optimized self-play method enhanced by advancements in fighter modeling,neural network design,and algorithmic frameworks.This study employs a six-degree-of-freedom(6-DOF)F-16 fighter model based on open-source aerodynamic data,featuring airborne equipment and a realistic visual simulation platform,unlike traditional 3-DOF models.To capture temporal dynamics,Long Short-Term Memory(LSTM)layers are integrated into the neural network,complemented by delayed input stacking.The RL environment incorporates expert strategies,curiositydriven rewards,and curriculum learning to improve adaptability and strategic decision-making.Experimental results demonstrate that the proposed approach achieves a winning rate exceeding90%against classical single-agent methods.Additionally,through enhanced 3D visual platforms,we conducted human-agent confrontation experiments,where the agent attained an average winning rate of over 75%.The agent's maneuver trajectories closely align with human pilot strategies,showcasing its potential in decision-making and pilot training applications.This study highlights the effectiveness of integrating advanced modeling and self-play techniques in developing robust air combat decision-making systems.
基金Supported by the National Key R&D Program of China(No.2023YFB4502200)the National Natural Science Foundation of China(No.U22A2028,61925208,62222214,62341411,62102398,62102399,U20A20227,62302478,62302482,62302483,62302480,62302481)+2 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDB0660300,XDB0660301,XDB0660302)the Chinese Academy of Sciences Project for Young Scientists in Basic Research(No.YSBR-029)the Youth Innovation Promotion Association of Chinese Academy of Sciences and Xplore Prize.
文摘The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggle with this aspect.A potential reason is that the benchmarks used for training and evaluation may not adequately offer a diverse set of transferable tasks.Although recent studies have developed bench-marking environments to address this shortcoming,they typically fall short in providing tasks that both ensure a solid foundation for generalization and exhibit significant variability.To overcome these limitations,this work introduces the concept that‘objects are composed of more fundamental components’in environment design,as implemented in the proposed environment called summon the magic(StM).This environment generates tasks where objects are derived from extensible and shareable basic components,facilitating strategy reuse and enhancing generalization.Furthermore,two new metrics,adaptation sensitivity range(ASR)and parameter correlation coefficient(PCC),are proposed to better capture and evaluate the generalization process of RL agents.Experimental results show that increasing the number of basic components of the object reduces the proximal policy optimization(PPO)agent’s training-testing gap by 60.9%(in episode reward),significantly alleviating overfitting.Additionally,linear variations in other environmental factors,such as the training monster set proportion and the total number of basic components,uniformly decrease the gap by at least 32.1%.These results highlight StM’s effectiveness in benchmarking and probing the generalization capabilities of RL algorithms.
基金supported by the National Research Foundation(NRF),Republic of Korea,under project BK21 FOUR(4299990213939).
文摘Smart learning environments have been considered as vital sources and essential needs in modern digital education systems.With the rapid proliferation of smart and assistive technologies,smart learning processes have become quite convenient,comfortable,and financially affordable.This shift has led to the emergence of pervasive computing environments,where user’s intelligent behavior is supported by smart gadgets;however,it is becoming more challenging due to inconsistent behavior of Artificial intelligence(AI)assistive technologies in terms of networking issues,slow user responses to technologies and limited computational resources.This paper presents a context-aware predictive reasoning based formalism for smart learning environments that facilitates students in managing their academic as well as extra-curricular activities autonomously with limited human intervention.This system consists of a three-tier architecture including the acquisition of the contextualized information from the environment autonomously,modeling the system using Web Ontology Rule Language(OWL 2 RL)and Semantic Web Rule Language(SWRL),and perform reasoning to infer the desired goals whenever and wherever needed.For contextual reasoning,we develop a non-monotonic reasoning based formalism to reason with contextual information using rule-based reasoning.The focus is on distributed problem solving,where context-aware agents exchange information using rule-based reasoning and specify constraints to accomplish desired goals.To formally model-check and simulate the system behavior,we model the case study of a smart learning environment in the UPPAAL model checker and verify the desired properties in the model,such as safety,liveness and robust properties to reflect the overall correctness behavior of the system with achieving the minimum analysis time of 0.002 s and 34,712 KB memory utilization.
基金supported by the Project of Science and Technology Research Program of Chongqing Education Commission of China(No.KJZD-K202401105)High-Quality Development Action Plan for Graduate Education at Chongqing University of Technology(No.gzljg2023308,No.gzljd2024204)+1 种基金the Graduate Innovation Program of Chongqing University of Technology(No.gzlcx20233197)Yunnan Provincial Key R&D Program(202203AA080006).
文摘Blockchain technology,based on decentralized data storage and distributed consensus design,has become a promising solution to address data security risks and provide privacy protection in the Internet-of-Things(IoT)due to its tamper-proof and non-repudiation features.Although blockchain typically does not require the endorsement of third-party trust organizations,it mostly needs to perform necessary mathematical calculations to prevent malicious attacks,which results in stricter requirements for computation resources on the participating devices.By offloading the computation tasks required to support blockchain consensus to edge service nodes or the cloud,while providing data privacy protection for IoT applications,it can effectively address the limitations of computation and energy resources in IoT devices.However,how to make reasonable offloading decisions for IoT devices remains an open issue.Due to the excellent self-learning ability of Reinforcement Learning(RL),this paper proposes a RL enabled Swarm Intelligence Optimization Algorithm(RLSIOA)that aims to improve the quality of initial solutions and achieve efficient optimization of computation task offloading decisions.The algorithm considers various factors that may affect the revenue obtained by IoT devices executing consensus algorithms(e.g.,Proof-of-Work),it optimizes the proportion of sub-tasks to be offloaded and the scale of computing resources to be rented from the edge and cloud to maximize the revenue of devices.Experimental results show that RLSIOA can obtain higher-quality offloading decision-making schemes at lower latency costs compared to representative benchmark algorithms.