In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target fo...In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target for OOD value function based on dataset distance and proposes a novel generalized Q-learning mechanism with distance regularization(GQDR).In theory,we not only prove the convergence of GQDR,but also ensure that the difference between the Q-value learned by GQDR and its true value is bounded.Furthermore,an offline generalized actor-critic method with distance regularization(OGACDR) is proposed by combining GQDR with actor-critic learning framework.Two implementations of OGACDR,OGACDR-EXP and OGACDRSQR,are introduced according to exponential(EXP) and opensquare(SQR) distance weight functions,and it has been theoretically proved that OGACDR provides a safe policy improvement.Experimental results on Gym-MuJoCo continuous control tasks show that OGACDR can not only alleviate the overestimation and overconservatism of Q-value function,but also outperform conservative offline RL baselines.展开更多
One of the key challenges in ad-hoc networks is the resource discovery problem.How efciently&quickly the queried resource/object can be resolved in such a highly dynamic self-evolving network is the underlying que...One of the key challenges in ad-hoc networks is the resource discovery problem.How efciently&quickly the queried resource/object can be resolved in such a highly dynamic self-evolving network is the underlying question?Broadcasting is a basic technique in the Mobile Ad-hoc Networks(MANETs),and it refers to sending a packet from one node to every other node within the transmission range.Flooding is a type of broadcast where the received packet is retransmitted once by every node.The naive ooding technique oods the network with query messages,while the random walk scheme operates by contacting subsets of each node’s neighbors at every step,thereby restricting the search space.Many earlier works have mainly focused on the simulation-based analysis of ooding technique,and its variants,in a wired network scenario.Although,there have been some empirical studies in peer-to-peer(P2P)networks,the analytical results are still lacking,especially in the context of mobile P2P networks.In this article,we mathematically model different widely used existing search techniques,and compare with the proposed improved random walk method,a simple lightweight approach suitable for the non-DHT architecture.We provide analytical expressions to measure the performance of the different ooding-based search techniques,and our proposed technique.We analytically derive 3 relevant key performance measures,i.e.,the avg.number of steps needed to nd a resource,the probability of locating a resource,and the avg.number of messages generated during the entire search process.展开更多
The pursuit of high oil recovery rate has been a persistent objective for oil industry. Pseudomonas sp. LP-7 and Bacillus sp. PAH-2 were isolated from oil-contaminated surface soil samples of an oilfield. The antimicr...The pursuit of high oil recovery rate has been a persistent objective for oil industry. Pseudomonas sp. LP-7 and Bacillus sp. PAH-2 were isolated from oil-contaminated surface soil samples of an oilfield. The antimicrobial degradation rates(ADRs) of polymers achieved by LP-7 and PAH-2 were evaluated at a temperature of 35 °C in the mineral salt media during the shaken flask trial. The ADRs of copolymer synthesized by using a surfactant with a concentration of 5% could reach 8.4% for PAH-2 and 15.3% for LP-7. The ADRs of copolymer could reach 10.4% for PAH-2 and 21.3% for LP-7,when the polymer concentration was 2 g/L. All results confirmed that the ADRs of copolymers increased with an increasing content of HDDE(capsaicin derivative monomer) in the polymer. The copolymers also manifested excellent antimicrobial degradation performance in the presence of Cu^(2+), Zn^(2+), and Pb^(2+) ions, respectively, which had great potential for applications in enhanced oil recovery.展开更多
基金supported by the National Natural Science Foundation of China(62373364,62176259)the Key Research and Development Program of Jiangsu Province(BE2022095)。
文摘In order to address the issue of overly conservative offline reinforcement learning(RL) methods that limit the generalization of policy in the out-of-distribution(OOD) region,this article designs a surrogate target for OOD value function based on dataset distance and proposes a novel generalized Q-learning mechanism with distance regularization(GQDR).In theory,we not only prove the convergence of GQDR,but also ensure that the difference between the Q-value learned by GQDR and its true value is bounded.Furthermore,an offline generalized actor-critic method with distance regularization(OGACDR) is proposed by combining GQDR with actor-critic learning framework.Two implementations of OGACDR,OGACDR-EXP and OGACDRSQR,are introduced according to exponential(EXP) and opensquare(SQR) distance weight functions,and it has been theoretically proved that OGACDR provides a safe policy improvement.Experimental results on Gym-MuJoCo continuous control tasks show that OGACDR can not only alleviate the overestimation and overconservatism of Q-value function,but also outperform conservative offline RL baselines.
文摘One of the key challenges in ad-hoc networks is the resource discovery problem.How efciently&quickly the queried resource/object can be resolved in such a highly dynamic self-evolving network is the underlying question?Broadcasting is a basic technique in the Mobile Ad-hoc Networks(MANETs),and it refers to sending a packet from one node to every other node within the transmission range.Flooding is a type of broadcast where the received packet is retransmitted once by every node.The naive ooding technique oods the network with query messages,while the random walk scheme operates by contacting subsets of each node’s neighbors at every step,thereby restricting the search space.Many earlier works have mainly focused on the simulation-based analysis of ooding technique,and its variants,in a wired network scenario.Although,there have been some empirical studies in peer-to-peer(P2P)networks,the analytical results are still lacking,especially in the context of mobile P2P networks.In this article,we mathematically model different widely used existing search techniques,and compare with the proposed improved random walk method,a simple lightweight approach suitable for the non-DHT architecture.We provide analytical expressions to measure the performance of the different ooding-based search techniques,and our proposed technique.We analytically derive 3 relevant key performance measures,i.e.,the avg.number of steps needed to nd a resource,the probability of locating a resource,and the avg.number of messages generated during the entire search process.
基金the Natural Science Foundation of China(50673085,41576077)the National High-Tech Research and Development Programme of China(2010AA09Z203)the Fundamental Research Funds for the Central Universities of China(201562026)
文摘The pursuit of high oil recovery rate has been a persistent objective for oil industry. Pseudomonas sp. LP-7 and Bacillus sp. PAH-2 were isolated from oil-contaminated surface soil samples of an oilfield. The antimicrobial degradation rates(ADRs) of polymers achieved by LP-7 and PAH-2 were evaluated at a temperature of 35 °C in the mineral salt media during the shaken flask trial. The ADRs of copolymer synthesized by using a surfactant with a concentration of 5% could reach 8.4% for PAH-2 and 15.3% for LP-7. The ADRs of copolymer could reach 10.4% for PAH-2 and 21.3% for LP-7,when the polymer concentration was 2 g/L. All results confirmed that the ADRs of copolymers increased with an increasing content of HDDE(capsaicin derivative monomer) in the polymer. The copolymers also manifested excellent antimicrobial degradation performance in the presence of Cu^(2+), Zn^(2+), and Pb^(2+) ions, respectively, which had great potential for applications in enhanced oil recovery.