The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggl...The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggle with this aspect.A potential reason is that the benchmarks used for training and evaluation may not adequately offer a diverse set of transferable tasks.Although recent studies have developed bench-marking environments to address this shortcoming,they typically fall short in providing tasks that both ensure a solid foundation for generalization and exhibit significant variability.To overcome these limitations,this work introduces the concept that‘objects are composed of more fundamental components’in environment design,as implemented in the proposed environment called summon the magic(StM).This environment generates tasks where objects are derived from extensible and shareable basic components,facilitating strategy reuse and enhancing generalization.Furthermore,two new metrics,adaptation sensitivity range(ASR)and parameter correlation coefficient(PCC),are proposed to better capture and evaluate the generalization process of RL agents.Experimental results show that increasing the number of basic components of the object reduces the proximal policy optimization(PPO)agent’s training-testing gap by 60.9%(in episode reward),significantly alleviating overfitting.Additionally,linear variations in other environmental factors,such as the training monster set proportion and the total number of basic components,uniformly decrease the gap by at least 32.1%.These results highlight StM’s effectiveness in benchmarking and probing the generalization capabilities of RL algorithms.展开更多
Organoboron compounds have become important intermediates for the construction of new compounds in synthetic chemistry and pharmaceutical chemistry,and it has been found that pinacol biborate(B_(2)pin_(2))as the boron...Organoboron compounds have become important intermediates for the construction of new compounds in synthetic chemistry and pharmaceutical chemistry,and it has been found that pinacol biborate(B_(2)pin_(2))as the boron source and Cu^(Ⅱ) organophosphorus complex(L)as the catalyst can effectively realize the hydrogen-reduced borylation products and dehydrohydrated borylation products of aryl olefins.The reaction regioselectivity involvingβ-C positions of aryl olefins can be controlled by regulating the ligand and additive types.The formation mechanism of the product is conducted at LCu^(Ⅰ)Bpin formed from Cu^(Ⅱ),L and B_(2)pin_(2).Subsequently the substrate aryl olefins undergo addition reaction to form the active intermediate PhCH(LCu^(Ⅰ))CH_(2)Bpin.Followed by the metathesis of the active intermediate with water to form hydrogen reduction products,the same active intermediate can be oxidized with 2,2,6,6-tetramethylpiperidoxyl(TEMPO)to form trans dehydrogenation products.展开更多
基金Supported by the National Key R&D Program of China(No.2023YFB4502200)the National Natural Science Foundation of China(No.U22A2028,61925208,62222214,62341411,62102398,62102399,U20A20227,62302478,62302482,62302483,62302480,62302481)+2 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDB0660300,XDB0660301,XDB0660302)the Chinese Academy of Sciences Project for Young Scientists in Basic Research(No.YSBR-029)the Youth Innovation Promotion Association of Chinese Academy of Sciences and Xplore Prize.
文摘The challenge of enhancing the generalization capacity of reinforcement learning(RL)agents remains a formidable obstacle.Existing RL methods,despite achieving superhuman performance on certain benchmarks,often struggle with this aspect.A potential reason is that the benchmarks used for training and evaluation may not adequately offer a diverse set of transferable tasks.Although recent studies have developed bench-marking environments to address this shortcoming,they typically fall short in providing tasks that both ensure a solid foundation for generalization and exhibit significant variability.To overcome these limitations,this work introduces the concept that‘objects are composed of more fundamental components’in environment design,as implemented in the proposed environment called summon the magic(StM).This environment generates tasks where objects are derived from extensible and shareable basic components,facilitating strategy reuse and enhancing generalization.Furthermore,two new metrics,adaptation sensitivity range(ASR)and parameter correlation coefficient(PCC),are proposed to better capture and evaluate the generalization process of RL agents.Experimental results show that increasing the number of basic components of the object reduces the proximal policy optimization(PPO)agent’s training-testing gap by 60.9%(in episode reward),significantly alleviating overfitting.Additionally,linear variations in other environmental factors,such as the training monster set proportion and the total number of basic components,uniformly decrease the gap by at least 32.1%.These results highlight StM’s effectiveness in benchmarking and probing the generalization capabilities of RL algorithms.
文摘Organoboron compounds have become important intermediates for the construction of new compounds in synthetic chemistry and pharmaceutical chemistry,and it has been found that pinacol biborate(B_(2)pin_(2))as the boron source and Cu^(Ⅱ) organophosphorus complex(L)as the catalyst can effectively realize the hydrogen-reduced borylation products and dehydrohydrated borylation products of aryl olefins.The reaction regioselectivity involvingβ-C positions of aryl olefins can be controlled by regulating the ligand and additive types.The formation mechanism of the product is conducted at LCu^(Ⅰ)Bpin formed from Cu^(Ⅱ),L and B_(2)pin_(2).Subsequently the substrate aryl olefins undergo addition reaction to form the active intermediate PhCH(LCu^(Ⅰ))CH_(2)Bpin.Followed by the metathesis of the active intermediate with water to form hydrogen reduction products,the same active intermediate can be oxidized with 2,2,6,6-tetramethylpiperidoxyl(TEMPO)to form trans dehydrogenation products.