The increasing complexity of on-orbit tasks imposes great demands on the flexible operation of space robotic arms, prompting the development of space robots from single-arm manipulation to multi-arm collaboration. In ...The increasing complexity of on-orbit tasks imposes great demands on the flexible operation of space robotic arms, prompting the development of space robots from single-arm manipulation to multi-arm collaboration. In this paper, a combined approach of Learning from Demonstration (LfD) and Reinforcement Learning (RL) is proposed for space multi-arm collaborative skill learning. The combination effectively resolves the trade-off between learning efficiency and feasible solution in LfD, as well as the time-consuming pursuit of the optimal solution in RL. With the prior knowledge of LfD, space robotic arms can achieve efficient guided learning in high-dimensional state-action space. Specifically, an LfD approach with Probabilistic Movement Primitives (ProMP) is firstly utilized to encode and reproduce the demonstration actions, generating a distribution as the initialization of policy. Then in the RL stage, a Relative Entropy Policy Search (REPS) algorithm modified in continuous state-action space is employed for further policy improvement. More importantly, the learned behaviors can maintain and reflect the characteristics of demonstrations. In addition, a series of supplementary policy search mechanisms are designed to accelerate the exploration process. The effectiveness of the proposed method has been verified both theoretically and experimentally. Moreover, comparisons with state-of-the-art methods have confirmed the outperformance of the approach.展开更多
The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense ...The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility.展开更多
Robotic systems are expected to play an increasingly important role in future space activities. The robotic on-orbital service, whose key is the capturing technology, becomes a research hot spot in recent years. This ...Robotic systems are expected to play an increasingly important role in future space activities. The robotic on-orbital service, whose key is the capturing technology, becomes a research hot spot in recent years. This paper studies the dynamics modeling and impedance control of a multi-arm free-flying space robotic system capturing a non-cooperative target. Firstly, a control-oriented dynamics model is essential in control algorithm design and code realization. Unlike a numerical algorithm, an analytical approach is suggested. Using a general and a quasi-coordinate Lagrangian formulation, the kinematics and dynamics equations are derived.Then, an impedance control algorithm is developed which allows coordinated control of the multiple manipulators to capture a target.Through enforcing a reference impedance, end-effectors behave like a mass-damper-spring system fixed in inertial space in reaction to any contact force between the capture hands and the target. Meanwhile, the position and the attitude of the base are maintained stably by using gas jet thrusters to work against the manipulators' reaction. Finally, a simulation by using a space robot with two manipulators and a free-floating non-cooperative target is illustrated to verify the effectiveness of the proposed method.展开更多
As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when ...As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm.展开更多
In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from...In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading.展开更多
The overall cross-linking copolymerization of acrylic acid and multi-armed cross-linkers are investigated by in situ interferometry. The results show that the more arms the cross-linkers have, the higher the polymeriz...The overall cross-linking copolymerization of acrylic acid and multi-armed cross-linkers are investigated by in situ interferometry. The results show that the more arms the cross-linkers have, the higher the polymerization rate is. However, they also mean the existence of less cross-linking efficiency and some defects in gel network.展开更多
Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique p...Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device.展开更多
The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the...The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms.展开更多
Most meta-analysis has concentrated on combining of treatment effect measures based on comparisons of two treatments. Meta-analysis of multi-arm trials is a key component of submission to summarize evidence from all p...Most meta-analysis has concentrated on combining of treatment effect measures based on comparisons of two treatments. Meta-analysis of multi-arm trials is a key component of submission to summarize evidence from all possible studies. In this paper, an exact binomial model is proposed by using logistic regression model to compare different treatment in multi-arm trials. Two approaches such as unconditional maximum likelihood and conditional maximum likelihood have been determined and compared for the logistic regression model. The proposed models are performed using the data from 27 randomized clinical trials (RCTs) which determine the efficacy of antiplatelet therapy in reduction venous thrombosis and pulmonary embolism.展开更多
In developing and exploring extreme and harsh underwater environments,underwater robots can effectively replace humans to complete tasks.To meet the requirements of underwater flexible motion and comprehensive subsea ...In developing and exploring extreme and harsh underwater environments,underwater robots can effectively replace humans to complete tasks.To meet the requirements of underwater flexible motion and comprehensive subsea operation,a novel octopus-inspired robot with eight soft limbs was designed and developed.This robot possesses the capabilities of underwater bipedal walking,multi-arm swimming,and grasping objects.To closely interact with the underwater seabed environment and minimize disturbance,the robot employs a cable-driven flexible arm for its walking in underwater floor through a bipedal walking mode.The multi-arm swimming offers a means of three-dimensional spatial movement,allowing the robot to swiftly explore and navigate over large areas,thereby enhancing its flexibility.Furthermore,the robot’s walking arm enables it to grasp and transport objects underwater,thereby enhancing its practicality in underwater environments.A simplified motion models and gait generation strategies were proposed for two modes of robot locomotion:swimming and walking,inspired by the movement characteristics of octopus-inspired multi-arm swimming and bipedal walking.Through experimental verification,the robot’s average speed of underwater bipedal walking reaches 7.26 cm/s,while the horizontal movement speed for multi-arm swimming is 8.6 cm/s.展开更多
Mobile Crowdsensing(MCS)represents a transformative approach to collecting data from the environment as it utilizes the ubiquity and sensory capabilities of mobile devices with human participants.This paradigm enables...Mobile Crowdsensing(MCS)represents a transformative approach to collecting data from the environment as it utilizes the ubiquity and sensory capabilities of mobile devices with human participants.This paradigm enables scales of data collection critical for applications ranging from environmental monitoring to urban planning.However,the effective harnessing of this distributed data collection capability faces significant challenges.One of the most significant challenges is the variability in the sensing qualities of the participating devices while they are initially unknown and must be learned over time to optimize task assignments.This paper tackles the dual challenges of managing task diversity to mitigate data redundancy and optimizing task assignment amidst the inherent variability of worker performance.We introduce a novel model that dynamically adjusts task weights based on assignment frequency to promote diversity and incorporates a flexible approach to account for the different qualities of task completion,especially in scenarios with overlapping task assignments.Our strategy aims to maximize the overall weighted quality of data collected within the constraints of a predefined budget.Our strategy leverages a combinatorial multi-armed bandit framework with an upper confidence bound approach to guide decision-making.We demonstrate the efficacy of our approach through a combination of regret analysis and simulations grounded in realistic scenarios.展开更多
Federated learning(FL)is an intricate and privacy-preserving technique that enables distributed mobile devices to collaboratively train a machine learning model.However,in real-world FL scenarios,the training performa...Federated learning(FL)is an intricate and privacy-preserving technique that enables distributed mobile devices to collaboratively train a machine learning model.However,in real-world FL scenarios,the training performance is affected by a combination of factors such as the mobility of user devices,limited communication and computational resources,thus making the user scheduling problem crucial.To tackle this problem,we jointly consider the user mobility,communication and computational capacities,and develop a stochastic optimization problem to minimize the convergence time.Specifically,we first establish a convergence bound on the training performance based on the heterogeneity of users’data,and then leverage this bound to derive the participation rate for each user.After deriving the user-specific participation rate,we aim to minimize the training latency by optimizing user scheduling under the constraints of the energy consumption and participation rate.Afterward,we transform this optimization problem to the contextual multi-armed bandit framework based on the Lyapunov method and solve it with the submodular reward enhanced linear upper confidence bound(SR-linUCB)algorithm.Experimental results demonstrate the superiority of our proposed algorithm on the training performance and time consumption compared with stateof-the-art algorithms for both independent and identically distributed(IID)and non-IID settings.展开更多
In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of u...In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of users transmitting status updates,which are collected by the user randomly over time,to an edge server through unreliable orthogonal channels.It begs a natural question:with random status update arrivals and obscure channel conditions,can we devise an intelligent scheduling policy that matches the users and channels to stabilize the queues of all users while minimizing the average AoI?To give an adequate answer,we define a bipartite graph and formulate a dynamic edge activation problem with stability constraints.Then,we propose an online matching while learning algorithm(MatL)and discuss its implementation for wireless scheduling.Finally,simulation results demonstrate that the MatL is reliable to learn the channel states and manage the users’buffers for fresher information at the edge.展开更多
In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting op...In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting optimization is how to choose the next BM which needs to be shot according to the previous engagements and results,thus maximizing the expected return of BMs killed or minimizing the cost of BMs penetration.Motivated by this,this study aims to determine an optimal shooting policy for a two-layer missile defense(TLMD)system.This paper considers a scenario in which the TLMD system wishes to shoot at a collection of BMs one at a time,and to maximize the return obtained from BMs killed before the system demise.To provide a policy analysis tool,this paper develops a general model for shooting decision-making,the shooting engagements can be described as a discounted reward Markov decision process.The index shooting policy is a strategy that can effectively balance the shooting returns and the risk that the defense mission fails,and the goal is to maximize the return obtained from BMs killed before the system demise.The numerical results show that the index policy is better than a range of competitors,especially the mean returns and the mean killing BM number.展开更多
Timely information updates are critical for real-time monitoring and control applications in the Internet of Things(IoT). In this paper, we consider a multi-antenna cellular IoT for state update where a base station(B...Timely information updates are critical for real-time monitoring and control applications in the Internet of Things(IoT). In this paper, we consider a multi-antenna cellular IoT for state update where a base station(BS) collects information from randomly distributed IoT nodes through time-varying channel.Specifically, multiple IoT nodes are allowed to transmit their state update simultaneously in a spatial multiplex manner. Inspired by age of information(AoI),we introduce a novel concept of age of transmission(AoT) for the sceneries in which BS cannot obtain the generation time of the packets waiting to be transmitted. The deadline-constrained AoT-optimal scheduling problem is formulated as a restless multi-armed bandit(RMAB) problem. Firstly, we prove the indexability of the scheduling problem and derive the closed-form of the Whittle index. Then, the interference graph and complementary graph are constructed to illustrate the interference between two nodes. The complete subgraphs are detected in the complementary graph to avoid inter-node interference. Next, an AoT-optimal scheduling strategy based on the Whittle index and complete subgraph detection is proposed.Finally, numerous simulations are conducted to verify the performance of the proposed strategy.展开更多
Acute and infected wounds resulting from accidents,battlefield trauma,or surgical interventions have become a global healthcare burden due to the complex bacterial infection environment.However,conventional gauze dres...Acute and infected wounds resulting from accidents,battlefield trauma,or surgical interventions have become a global healthcare burden due to the complex bacterial infection environment.However,conventional gauze dressings present insufficient contact with irregular wounds and lack antibacterial activity against multi-drug-resistant bacteria.In this study,we develop in situ nanofibrous dressings tailored tofit wounds of various shapes and sizes while providing nanoscale comfort and excellent antibacterial properties.Our approach involves the fabrication of these dressings using a handheld electrospinning device that allows for the direct depo-sition of nanofiber dressings onto specific irregular wound sites,resulting in perfect conformal wound closure without any mismatch in 2 min.The nanofibrous dressings are loaded with multi-armed antibiotics that exhibit outstanding antibacterial activ-ity against Staphylococcus aureus(S.aureus)and methicillin-resistant S.aureus.Compared to conventional vancomycin,this in situ nanofibrous dressing shows great antibacterial performance against up to 98%of multi-drug-resistant bacteria.In vitro and in vivo experiments demonstrate the ability of in situ nanofibrous dressings to prevent multi-drug-resistant bacterial infection,greatly alleviate inflammation,and promote wound healing.Ourfindings highlight the potential of these personalized nanofibrous dressings for clinical applications,including emergency,accident,and surgical healthcare treatment.展开更多
On-orbit construction and maintenance technology will play a significant role in future space exploration.The dexterous multifunctional spacecraft equipped with multi-arm,for instance,Spider Fab Bot,has attracted a gr...On-orbit construction and maintenance technology will play a significant role in future space exploration.The dexterous multifunctional spacecraft equipped with multi-arm,for instance,Spider Fab Bot,has attracted a great deal of focus due to its versatility in completing these missions.In such engineering practice,point-to-point moving in a complex environment is the fundamental issue.This paper investigates the three-dimensional point-to-point path planning problem,and a hierarchical path planning architecture is developed to give the trajectory of the multi-arm spacecraft effectively and efficiently.In the proposed 3-level architecture,the high-level planner generates the global constrained centric trajectory of the spacecraft with a rigid envelop assumption;the middle-level planner contributes the action sequence,a combination of the newly developed general translational and rotational locomotion mode,to cope with the relative position and attitude of the arms about the centroid of the spacecraft;the low-level planner maps the position/attitude of the end-effector of each arm from the operational space to the joint space optimally.Finally,the simulation experiment is carried out,and the results verify the effectiveness of the proposed three-layer architecture path planning strategy.展开更多
As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system f...As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users.展开更多
Multi-armed nanorods and nanobars of semiconductor selenium were simultane- ously synthesized in the light of biomineralization process through bio-membrane bi-templates of rush at room temperature. The multi-armed na...Multi-armed nanorods and nanobars of semiconductor selenium were simultane- ously synthesized in the light of biomineralization process through bio-membrane bi-templates of rush at room temperature. The multi-armed nanorods are 60 nm in diameter and 1.5 μm in length; the nanobars are 150 nm in diameter and 1000—1100 nm in length. The XRD pattern indicates that these nanocrystals were crystallized in the hexagonal structure with lattice constants a = 0.437 nm, c = 0.495 nm. The possible formation mechanism was investigated.展开更多
基金co-supported by the National Natural Science Foundation of China(No.12372045)the Guangdong Basic and Applied Basic Research Foundation,China(No.2023B1515120018)the Shenzhen Science and Technology Program,China(No.JCYJ20220818102207015).
文摘The increasing complexity of on-orbit tasks imposes great demands on the flexible operation of space robotic arms, prompting the development of space robots from single-arm manipulation to multi-arm collaboration. In this paper, a combined approach of Learning from Demonstration (LfD) and Reinforcement Learning (RL) is proposed for space multi-arm collaborative skill learning. The combination effectively resolves the trade-off between learning efficiency and feasible solution in LfD, as well as the time-consuming pursuit of the optimal solution in RL. With the prior knowledge of LfD, space robotic arms can achieve efficient guided learning in high-dimensional state-action space. Specifically, an LfD approach with Probabilistic Movement Primitives (ProMP) is firstly utilized to encode and reproduce the demonstration actions, generating a distribution as the initialization of policy. Then in the RL stage, a Relative Entropy Policy Search (REPS) algorithm modified in continuous state-action space is employed for further policy improvement. More importantly, the learned behaviors can maintain and reflect the characteristics of demonstrations. In addition, a series of supplementary policy search mechanisms are designed to accelerate the exploration process. The effectiveness of the proposed method has been verified both theoretically and experimentally. Moreover, comparisons with state-of-the-art methods have confirmed the outperformance of the approach.
基金supported by the National Natural Science Foundation of China(NSFC)[grant numbers 62172377,61872205]the Shandong Provincial Natural Science Foundation[grant number ZR2019MF018]the Startup Research Foundation for Distinguished Scholars No.202112016.
文摘The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things.To guarantee the network's overall security,we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit.Firstly,we propose the method for dynamic setting of node defense resource thresholds to obtain the defender(attacker)benefit function of edge servers(nodes)and distribution.Secondly,we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes.Subsequently,we use the decomposability and Lipschitz conti-nuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically.Finally,experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility.
基金supported by the National Natural Science Foundation of China (61673009)。
文摘Robotic systems are expected to play an increasingly important role in future space activities. The robotic on-orbital service, whose key is the capturing technology, becomes a research hot spot in recent years. This paper studies the dynamics modeling and impedance control of a multi-arm free-flying space robotic system capturing a non-cooperative target. Firstly, a control-oriented dynamics model is essential in control algorithm design and code realization. Unlike a numerical algorithm, an analytical approach is suggested. Using a general and a quasi-coordinate Lagrangian formulation, the kinematics and dynamics equations are derived.Then, an impedance control algorithm is developed which allows coordinated control of the multiple manipulators to capture a target.Through enforcing a reference impedance, end-effectors behave like a mass-damper-spring system fixed in inertial space in reaction to any contact force between the capture hands and the target. Meanwhile, the position and the attitude of the base are maintained stably by using gas jet thrusters to work against the manipulators' reaction. Finally, a simulation by using a space robot with two manipulators and a free-floating non-cooperative target is illustrated to verify the effectiveness of the proposed method.
基金supported by the National Natural Science Foundation of China(NSFC)(62102232,62122042,61971269)Natural Science Foundation of Shandong Province Under(ZR2021QF064)。
文摘As a combination of edge computing and artificial intelligence,edge intelligence has become a promising technique and provided its users with a series of fast,precise,and customized services.In edge intelligence,when learning agents are deployed on the edge side,the data aggregation from the end side to the designated edge devices is an important research topic.Considering the various importance of end devices,this paper studies the weighted data aggregation problem in a single hop end-to-edge communication network.Firstly,to make sure all the end devices with various weights are fairly treated in data aggregation,a distributed end-to-edge cooperative scheme is proposed.Then,to handle the massive contention on the wireless channel caused by end devices,a multi-armed bandit(MAB)algorithm is designed to help the end devices find their most appropriate update rates.Diffe-rent from the traditional data aggregation works,combining the MAB enables our algorithm a higher efficiency in data aggregation.With a theoretical analysis,we show that the efficiency of our algorithm is asymptotically optimal.Comparative experiments with previous works are also conducted to show the strength of our algorithm.
基金This work was supported in part by the Zhejiang Lab under Grant 20210AB02in part by the Sichuan International Science and Technology Innovation Cooperation/Hong Kong,Macao and Taiwan Science and Technology Innovation Cooperation Project under Grant 2019YFH0163in part by the Key Research and Development Project of Sichuan Provincial Department of Science and Technology under Grant 2018JZ0071.
文摘In order to solve the high latency of traditional cloud computing and the processing capacity limitation of Internet of Things(IoT)users,Multi-access Edge Computing(MEC)migrates computing and storage capabilities from the remote data center to the edge of network,providing users with computation services quickly and directly.In this paper,we investigate the impact of the randomness caused by the movement of the IoT user on decision-making for offloading,where the connection between the IoT user and the MEC servers is uncertain.This uncertainty would be the main obstacle to assign the task accurately.Consequently,if the assigned task cannot match well with the real connection time,a migration(connection time is not enough to process)would be caused.In order to address the impact of this uncertainty,we formulate the offloading decision as an optimization problem considering the transmission,computation and migration.With the help of Stochastic Programming(SP),we use the posteriori recourse to compensate for inaccurate predictions.Meanwhile,in heterogeneous networks,considering multiple candidate MEC servers could be selected simultaneously due to overlapping,we also introduce the Multi-Arm Bandit(MAB)theory for MEC selection.The extensive simulations validate the improvement and effectiveness of the proposed SP-based Multi-arm bandit Method(SMM)for offloading in terms of reward,cost,energy consumption and delay.The results showthat SMMcan achieve about 20%improvement compared with the traditional offloading method that does not consider the randomness,and it also outperforms the existing SP/MAB based method for offloading.
文摘The overall cross-linking copolymerization of acrylic acid and multi-armed cross-linkers are investigated by in situ interferometry. The results show that the more arms the cross-linkers have, the higher the polymerization rate is. However, they also mean the existence of less cross-linking efficiency and some defects in gel network.
文摘Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device.
文摘The process of making decisions is something humans do inherently and routinely,to the extent that it appears commonplace. However,in order to achieve good overall performance,decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning,which is fundamental to sequential decision-making,consists of the following components: 1 A set of decisions epochs; 2 A set of environment states; 3 A set of available actions to transition states; 4 State-action dependent immediate rewards for each action.At each decision,the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state,the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.This paper will focus on an archetypal example of reinforcement learning,the stochastic multi-armed bandit problem. After introducing the dilemma,I will briefly cover the most common methods used to solve it,namely the UCB and εn- greedy algorithms. I will also introduce my own greedy implementation,the strict-greedy algorithm,which more tightly follows the greedy pattern in algorithm design,and show that it runs comparably to the two accepted algorithms.
文摘Most meta-analysis has concentrated on combining of treatment effect measures based on comparisons of two treatments. Meta-analysis of multi-arm trials is a key component of submission to summarize evidence from all possible studies. In this paper, an exact binomial model is proposed by using logistic regression model to compare different treatment in multi-arm trials. Two approaches such as unconditional maximum likelihood and conditional maximum likelihood have been determined and compared for the logistic regression model. The proposed models are performed using the data from 27 randomized clinical trials (RCTs) which determine the efficacy of antiplatelet therapy in reduction venous thrombosis and pulmonary embolism.
基金provided by Hy Action Plan Project(Grant no.7172755A)the Key Projects of Science and Technology Plan of Zhejiang Province(Grant no.2019C04018)partially by the Ministry of Science and Higher Education of the Russian Federation as part of the World-class Research Center program:Advanced Digital Technologies(contract No.075-15-2022-312 dated 20.04.2022).
文摘In developing and exploring extreme and harsh underwater environments,underwater robots can effectively replace humans to complete tasks.To meet the requirements of underwater flexible motion and comprehensive subsea operation,a novel octopus-inspired robot with eight soft limbs was designed and developed.This robot possesses the capabilities of underwater bipedal walking,multi-arm swimming,and grasping objects.To closely interact with the underwater seabed environment and minimize disturbance,the robot employs a cable-driven flexible arm for its walking in underwater floor through a bipedal walking mode.The multi-arm swimming offers a means of three-dimensional spatial movement,allowing the robot to swiftly explore and navigate over large areas,thereby enhancing its flexibility.Furthermore,the robot’s walking arm enables it to grasp and transport objects underwater,thereby enhancing its practicality in underwater environments.A simplified motion models and gait generation strategies were proposed for two modes of robot locomotion:swimming and walking,inspired by the movement characteristics of octopus-inspired multi-arm swimming and bipedal walking.Through experimental verification,the robot’s average speed of underwater bipedal walking reaches 7.26 cm/s,while the horizontal movement speed for multi-arm swimming is 8.6 cm/s.
基金supported in part by NSF(Nos.SaTC 2310298,CNS 2214940,CPS 2128378,CNS 2107014,and CNS 2150152).
文摘Mobile Crowdsensing(MCS)represents a transformative approach to collecting data from the environment as it utilizes the ubiquity and sensory capabilities of mobile devices with human participants.This paradigm enables scales of data collection critical for applications ranging from environmental monitoring to urban planning.However,the effective harnessing of this distributed data collection capability faces significant challenges.One of the most significant challenges is the variability in the sensing qualities of the participating devices while they are initially unknown and must be learned over time to optimize task assignments.This paper tackles the dual challenges of managing task diversity to mitigate data redundancy and optimizing task assignment amidst the inherent variability of worker performance.We introduce a novel model that dynamically adjusts task weights based on assignment frequency to promote diversity and incorporates a flexible approach to account for the different qualities of task completion,especially in scenarios with overlapping task assignments.Our strategy aims to maximize the overall weighted quality of data collected within the constraints of a predefined budget.Our strategy leverages a combinatorial multi-armed bandit framework with an upper confidence bound approach to guide decision-making.We demonstrate the efficacy of our approach through a combination of regret analysis and simulations grounded in realistic scenarios.
文摘Federated learning(FL)is an intricate and privacy-preserving technique that enables distributed mobile devices to collaboratively train a machine learning model.However,in real-world FL scenarios,the training performance is affected by a combination of factors such as the mobility of user devices,limited communication and computational resources,thus making the user scheduling problem crucial.To tackle this problem,we jointly consider the user mobility,communication and computational capacities,and develop a stochastic optimization problem to minimize the convergence time.Specifically,we first establish a convergence bound on the training performance based on the heterogeneity of users’data,and then leverage this bound to derive the participation rate for each user.After deriving the user-specific participation rate,we aim to minimize the training latency by optimizing user scheduling under the constraints of the energy consumption and participation rate.Afterward,we transform this optimization problem to the contextual multi-armed bandit framework based on the Lyapunov method and solve it with the submodular reward enhanced linear upper confidence bound(SR-linUCB)algorithm.Experimental results demonstrate the superiority of our proposed algorithm on the training performance and time consumption compared with stateof-the-art algorithms for both independent and identically distributed(IID)and non-IID settings.
基金supported in part by Shanghai Pujiang Program under Grant No.21PJ1402600in part by Natural Science Foundation of Chongqing,China under Grant No.CSTB2022NSCQ-MSX0375+4 种基金in part by Song Shan Laboratory Foundation,under Grant No.YYJC022022007in part by Zhejiang Provincial Natural Science Foundation of China under Grant LGJ22F010001in part by National Key Research and Development Program of China under Grant 2020YFA0711301in part by National Natural Science Foundation of China under Grant 61922049。
文摘In this paper,we investigate the minimization of age of information(AoI),a metric that measures the information freshness,at the network edge with unreliable wireless communications.Particularly,we consider a set of users transmitting status updates,which are collected by the user randomly over time,to an edge server through unreliable orthogonal channels.It begs a natural question:with random status update arrivals and obscure channel conditions,can we devise an intelligent scheduling policy that matches the users and channels to stabilize the queues of all users while minimizing the average AoI?To give an adequate answer,we define a bipartite graph and formulate a dynamic edge activation problem with stability constraints.Then,we propose an online matching while learning algorithm(MatL)and discuss its implementation for wireless scheduling.Finally,simulation results demonstrate that the MatL is reliable to learn the channel states and manage the users’buffers for fresher information at the edge.
基金supported by the National Natural Science Foundation of China(7170120971771216)+1 种基金Shaanxi Natural Science Foundation(2019JQ-250)China Post-doctoral Fund(2019M653962)
文摘In order to cope with the increasing threat of the ballistic missile(BM)in a shorter reaction time,the shooting policy of the layered defense system needs to be optimized.The main decisionmaking problem of shooting optimization is how to choose the next BM which needs to be shot according to the previous engagements and results,thus maximizing the expected return of BMs killed or minimizing the cost of BMs penetration.Motivated by this,this study aims to determine an optimal shooting policy for a two-layer missile defense(TLMD)system.This paper considers a scenario in which the TLMD system wishes to shoot at a collection of BMs one at a time,and to maximize the return obtained from BMs killed before the system demise.To provide a policy analysis tool,this paper develops a general model for shooting decision-making,the shooting engagements can be described as a discounted reward Markov decision process.The index shooting policy is a strategy that can effectively balance the shooting returns and the risk that the defense mission fails,and the goal is to maximize the return obtained from BMs killed before the system demise.The numerical results show that the index policy is better than a range of competitors,especially the mean returns and the mean killing BM number.
基金supported by the Fundamental Research Funds for the Central Universities (2020ZDPYMS26)the National Natural Science Foundation of China (62071472, 51734009)+3 种基金the Natural Science Foundation o Jiangsu Province (BK20210489, BK20200650)China Postdoctoral Science Foundation (2019M660133)the Future Network Scientific Research Fund Project (FNSRFP-2021-YB-12)the Program for “Industrial IoT and Emergency Collaboration” Innovative Research Team in CUMT (No.2020ZY002)。
文摘Timely information updates are critical for real-time monitoring and control applications in the Internet of Things(IoT). In this paper, we consider a multi-antenna cellular IoT for state update where a base station(BS) collects information from randomly distributed IoT nodes through time-varying channel.Specifically, multiple IoT nodes are allowed to transmit their state update simultaneously in a spatial multiplex manner. Inspired by age of information(AoI),we introduce a novel concept of age of transmission(AoT) for the sceneries in which BS cannot obtain the generation time of the packets waiting to be transmitted. The deadline-constrained AoT-optimal scheduling problem is formulated as a restless multi-armed bandit(RMAB) problem. Firstly, we prove the indexability of the scheduling problem and derive the closed-form of the Whittle index. Then, the interference graph and complementary graph are constructed to illustrate the interference between two nodes. The complete subgraphs are detected in the complementary graph to avoid inter-node interference. Next, an AoT-optimal scheduling strategy based on the Whittle index and complete subgraph detection is proposed.Finally, numerous simulations are conducted to verify the performance of the proposed strategy.
基金National Key R&D Program of China,Grant/Award Number:2022YFB3804700Guangdong Innovative and Entrepreneurial Research Team Program,Grant/Award Number:2019ZT08Y191+5 种基金Shenzhen Science and Technology Program,Grant/Award Numbers:KQTD20190929172743294,JCYJ20200109141231365National Key Research and Development Program of China,Grant/Award Number:2022YFB3804700Guangdong Major Talent Introduction Project,Grant/Award Number:2019CX01Y196Beijing Institute of Genomics,Chinese Academy of Sciences,Grant/Award Number:QYZDJ-SSW-SLH039Shenzhen Key Laboratory of Smart Healthcare Engineering,Grant/Award Number:ZDSYS20200811144003009National Natural Science Foundation of China,Grant/Award Numbers:21535001,21761142006,22234004,52203243,81730051。
文摘Acute and infected wounds resulting from accidents,battlefield trauma,or surgical interventions have become a global healthcare burden due to the complex bacterial infection environment.However,conventional gauze dressings present insufficient contact with irregular wounds and lack antibacterial activity against multi-drug-resistant bacteria.In this study,we develop in situ nanofibrous dressings tailored tofit wounds of various shapes and sizes while providing nanoscale comfort and excellent antibacterial properties.Our approach involves the fabrication of these dressings using a handheld electrospinning device that allows for the direct depo-sition of nanofiber dressings onto specific irregular wound sites,resulting in perfect conformal wound closure without any mismatch in 2 min.The nanofibrous dressings are loaded with multi-armed antibiotics that exhibit outstanding antibacterial activ-ity against Staphylococcus aureus(S.aureus)and methicillin-resistant S.aureus.Compared to conventional vancomycin,this in situ nanofibrous dressing shows great antibacterial performance against up to 98%of multi-drug-resistant bacteria.In vitro and in vivo experiments demonstrate the ability of in situ nanofibrous dressings to prevent multi-drug-resistant bacterial infection,greatly alleviate inflammation,and promote wound healing.Ourfindings highlight the potential of these personalized nanofibrous dressings for clinical applications,including emergency,accident,and surgical healthcare treatment.
基金supported by the National Natural Science Foundation of China(Grant Nos.62003115 and 11972130)the Shenzhen Natural Science Fund(the Stable Support Plan Program GXWD2020123015542700320200821170719001)。
文摘On-orbit construction and maintenance technology will play a significant role in future space exploration.The dexterous multifunctional spacecraft equipped with multi-arm,for instance,Spider Fab Bot,has attracted a great deal of focus due to its versatility in completing these missions.In such engineering practice,point-to-point moving in a complex environment is the fundamental issue.This paper investigates the three-dimensional point-to-point path planning problem,and a hierarchical path planning architecture is developed to give the trajectory of the multi-arm spacecraft effectively and efficiently.In the proposed 3-level architecture,the high-level planner generates the global constrained centric trajectory of the spacecraft with a rigid envelop assumption;the middle-level planner contributes the action sequence,a combination of the newly developed general translational and rotational locomotion mode,to cope with the relative position and attitude of the arms about the centroid of the spacecraft;the low-level planner maps the position/attitude of the end-effector of each arm from the operational space to the joint space optimally.Finally,the simulation experiment is carried out,and the results verify the effectiveness of the proposed three-layer architecture path planning strategy.
基金supported by the National Natural Science Foundation of China(No.51907026)Natural Science Foundation of Jiangsu(No.BK20190361)+1 种基金Jiangsu Provincial Key Laboratory of Smart Grid Technology and EquipmentGlobal Energy Interconnection Research Institute(No.SGGR0000WLJS1900107)
文摘As the penetration of renewable energy continues to increase,stochastic and intermittent generation resources gradually replace the conventional generators,bringing significant challenges in stabilizing power system frequency.Thus,aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry.However,in practice,conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals.The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating,ventilation,and air conditioning(HVAC)to provide reliable secondary frequency regulation.Compared with the conventional approach,the simulation results show that the risk-averse multiarmed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control.Besides,the proposed approach is more robust to random and changing behaviors of the users.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.20175013&20471042)Nano-Foundation of Shanghai in China(Grant Nos.0259 nm021,0352 nm129).
文摘Multi-armed nanorods and nanobars of semiconductor selenium were simultane- ously synthesized in the light of biomineralization process through bio-membrane bi-templates of rush at room temperature. The multi-armed nanorods are 60 nm in diameter and 1.5 μm in length; the nanobars are 150 nm in diameter and 1000—1100 nm in length. The XRD pattern indicates that these nanocrystals were crystallized in the hexagonal structure with lattice constants a = 0.437 nm, c = 0.495 nm. The possible formation mechanism was investigated.