In this paper,an efficient skill learning framework is proposed for robotic insertion,based on one-shot demonstration and reinforcement learning.First,the robot action is composed of two parts:expert action and refine...In this paper,an efficient skill learning framework is proposed for robotic insertion,based on one-shot demonstration and reinforcement learning.First,the robot action is composed of two parts:expert action and refinement action.A force Jacobian matrix is calibrated with only one demonstration,based on which stable and safe expert action can be generated.The deep deterministic policy gradients(DDPG)method is employed to learn the refinement action,which aims to improve the assembly efficiency.Second,an episode-step exploration strategy is developed,which uses the expert action as a benchmark and adjusts the exploration intensity dynamically.A safety-efficiency reward function is designed for the compliant insertion.Third,to improve the adaptability with different components,a skill saving and selection mechanism is proposed.Several typical components are used to train the skill models.And the trained models and force Jacobian matrices are saved in a skill pool.Given a new component,the most appropriate model is selected from the skill pool according to the force Jacobian matrix and directly used to accomplish insertion tasks.Fourth,a simulation environment is established under the guidance of the force Jacobian matrix,which avoids tedious training process on real robotic systems.Simulation and experiments are conducted to validate the effectiveness of the proposed methods.展开更多
A novel one-shot decorrelator for asynchronous CDMA systems is developed. Corn-pared with existing one-shot decorrelator, it can reduce complexity and has better performance while eliminating all MAI. This decorrelato...A novel one-shot decorrelator for asynchronous CDMA systems is developed. Corn-pared with existing one-shot decorrelator, it can reduce complexity and has better performance while eliminating all MAI. This decorrelator is shown to be near-far resistant in both AWGN and fading channel.展开更多
One-shot systems such as missiles and extinguishers are placed in storage for a long time and used only once during their lives. Their reliability deteriorates with time even when they are in storage, and their failur...One-shot systems such as missiles and extinguishers are placed in storage for a long time and used only once during their lives. Their reliability deteriorates with time even when they are in storage, and their failures are detected only through inspections for their characteristics. Thus, we need to decide an appropriate inspection policy for such systems. In this paper, we deal with a system comprising non-identical units in series, where only minimal repairs are performed when unit failures are detected by periodic inspections. The system is replaced and becomes “as good as new” when the nth failure of the system is detected. Our objective is to find the optimal inspection interval and number of failures before replacement that minimize the expected total system cost per unit of time.展开更多
This paper presents a numerical method for PDE-constrained optimization problems. These problems arise in many fields of science and engineering including those dealing with real applications. The physical problem is ...This paper presents a numerical method for PDE-constrained optimization problems. These problems arise in many fields of science and engineering including those dealing with real applications. The physical problem is modeled by partial differential equations (PDEs) and involve optimization of some quantity. The PDEs are in most cases nonlinear and solved using numerical methods. Since such numerical solutions are being used routinely, the recent trend has been to develop numerical methods and algorithms so that the optimization problems can be solved numerically as well using the same PDE-solver. We present here one such numerical method which is based on simultaneous pseudo-time stepping. The efficiency of the method is increased with the help of a multigrid strategy. Application example is included for an aerodynamic shape optimization problem.展开更多
Given a query patch from a novel class,one-shot object detection aims to detect all instances of this class in a target image through the semantic similarity comparison.However,due to the extremely limited guidance in...Given a query patch from a novel class,one-shot object detection aims to detect all instances of this class in a target image through the semantic similarity comparison.However,due to the extremely limited guidance in the novel class as well as the unseen appearance difference between the query and target instances,it is difficult to appropriately exploit their semantic similarity and generalize well.To mitigate this problem,we present a universal Cross-Attention Transformer(CAT)module for accurate and efficient semantic similarity comparison in one-shot object detection.The proposed CAT utilizes the transformer mechanism to comprehensively capture bi-directional correspondence between any paired pixels from the query and the target image,which empowers us to sufficiently exploit their semantic characteristics for accurate similarity comparison.In addition,the proposed CAT enables feature dimensionality compression for inference speedup without performance loss.Extensive experiments on three object detection datasets MS-COCO,PASCAL VOC and FSOD under the one-shot setting demonstrate the effectiveness and efficiency of our model,e.g.,it surpasses CoAE,a major baseline in this task,by 1.0%in average precision(AP)on MS-COCO and runs nearly 2.5 times faster.展开更多
Efficient multi-resonance thermally activated delayed fluorescence(MR-TADF)materials hold significant potential for applications in organic light-emitting diodes(OLEDs)and ultra-high-definition displays.However,the st...Efficient multi-resonance thermally activated delayed fluorescence(MR-TADF)materials hold significant potential for applications in organic light-emitting diodes(OLEDs)and ultra-high-definition displays.However,the stringent synthesis conditions and low yields typically associated with these materials pose substantial challenges for their practical applications.In this study,we introduce an innovative strategy that involves peripheral modification with sulfur and selenium atoms for two materials,CFDBNS and CFDBNSe.This approach enables a directed one-shot borylation process,achieving synthesis yields of 66%and 25%,respectively,while also enhancing reverse intersystem crossing rates.Both emitters exhibit ultra-narrowband sky-blue emissions centered around 474 nm,with full width at half maximum(FWHM)values as narrow as 19 nm in dilute toluene solutions,along with high photoluminescence quantum yields of 98%and 99%in doped films,respectively.The OLEDs based on CFDBNS and CFDBNSe display sky-blue emissions with peaks at 476 and 477 nm and exceptionally slender FWHM values of 23 nm.Furthermore,the devices demonstrate remarkable performances,achieving maximum external quantum efficiencies of 24.1%and 27.2%.This work presents a novel and straightforward approach for the incorporation of heavy atoms,facilitating the rapid construction of efficient MR-TADF materials for OLEDs.展开更多
In recent years,diffusion models have achieved remarkable progress in image generation.However,extending them to text-to-video(T2V)generation remains challenging,particularly in maintaining semantic consistency and vi...In recent years,diffusion models have achieved remarkable progress in image generation.However,extending them to text-to-video(T2V)generation remains challenging,particularly in maintaining semantic consistency and visual quality across frames.Existing approaches often overlook the synergy between high-level semantics and low-level texture information,resulting in blurry or temporally inconsistent outputs.To address these issues,we propose Dual Consistency Training(DCT),a novel framework designed to jointly optimize semantic and texture consistency in video generation.Specifically,we introduce a multi-scale spatial adapter to enhance spatial feature extraction,and leverage the complementary strengths of CLIP and VGG—where CLIP focuses on high-level semantics and VGG captures fine-grained texture and detail.During training,a stepwise strategy is adopted to impose semantic and texture losses,constraining discrepancies between generated and ground-truth frames.Furthermore,we propose CLWS,which dynamically adjusts the balance between semantic and texture losses to facilitate more stable and effective optimization.Remarkably,DCT achieves high-quality video generation using only a single training video on a single NVIDIA A6000 GPU.Extensive experiments demonstrate that our method significantly improves temporal coherence and visual fidelity across various video generation tasks,verifying its effectiveness and generalizability.展开更多
Autonomous one-shot on-the-fly learning copes with the high privacy,small dataset,and in-stream data at the edge.Implementing such learning on digital hardware suffers from the well-known von-Neumann and scaling bottl...Autonomous one-shot on-the-fly learning copes with the high privacy,small dataset,and in-stream data at the edge.Implementing such learning on digital hardware suffers from the well-known von-Neumann and scaling bottlenecks.The optical neural networks featuring large parallelism,low latency,and high efficiency offer a promising solution.However,ex-situ training of conventional optical networks,where optical path configuration and deep learning model optimization are separated,incurs hardware,energy and time overheads,and defeats the advantages in edge learning.Here,we introduced a bio-inspired material-algorithm co-design to construct a hydrogel-based optical Willshaw model(HOWM),manifesting Hebbian-rule-based structural plasticity for simultaneous optical path configuration and deep learning model optimization thanks to the underlying opto-chemical reactions.We first employed the HOWM as an all optical in-sensor AI processor for one-shot pattern classification,association and denoising.We then leveraged HOWM to function as a ternary content addressable memory(TCAM)of an optical memory augmented neural network(MANN)for one-shot learning the Omniglot dataset.The HOWM empowered one-shot on-the-fly edge learning leads to 1000boost of energy efficiency and 10boost of speed,which paves the way for the next-generation autonomous,efficient,and affordable smart edge systems.展开更多
Neural architecture search(NAS)has become increasingly popular in the deep learning community recently,mainly because it can provide an opportunity to allow interested users without rich expertise to benefit from the ...Neural architecture search(NAS)has become increasingly popular in the deep learning community recently,mainly because it can provide an opportunity to allow interested users without rich expertise to benefit from the success of deep neural networks(DNNs).However,NAS is still laborious and time-consuming because a large number of performance estimations are required during the search process of NAS,and training DNNs is computationally intensive.To solve this major limitation of NAS,improving the computational efficiency is essential in the design of NAS.However,a systematic overview of computationally efficient NAS(CE-NAS)methods still lacks.To fill this gap,we provide a comprehensive survey of the state-of-the-art on CE-NAS by categorizing the existing work into proxy-based and surrogate-assisted NAS methods,together with a thorough discussion of their design principles and a quantitative comparison of their performances and computational complexities.The remaining challenges and open research questions are also discussed,and promising research topics in this emerging field are suggested.展开更多
Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on...Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on full training set.To calibrate a quantized model,current PTQ methods usually randomly select some unlabeled data from the training set as calibration data.However,we show the random data selection would result in performance instability and degradation due to the activation distribution mismatch.In this paper,we attempt to solve the crucial task on appropriate calibration data selection,and propose a novel one-shot calibration data selection method termed SelectQ,which selects specific data for calibration via dynamic clustering.The setting of our SelectQ uses the statistic information of activation and performs layer-wise clustering to learn an activation distribution on training set.For that purpose,a new metric called knowledge distance is proposed to calculate the distances of the activation statistics to centroids.Finally,after calibration with the selected data,quantization noise can be alleviated by mitigating the distribution mismatch within activations.Extensive experiments on ImageNet dataset show that our SelectQ increases the top-1 accuracy of ResNet18 over 15% in 4-bit quantization,compared to randomly sampled calibration data.It's noteworthy that SelectQ does not involve both the backward propagation and batch normalization parameters,which means that it has fewer limitations in practical applications.展开更多
Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models a...Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate.However,building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem.In this paper,we focus on combining convolutional neural networks with selfattention-based vision transformers to realize lightweight and high-speed computing for 3D object detection.We propose lightweight detection 3D(LWD-3D),which is a point cloud conversion and lightweight vision transformer for autonomous driving.LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data,which provides a new feature representation method based on a vision transformer for 3D detection applications.The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection(time per image<20 ms).LWD-3D obtains a mean average precision(mAP)75%higher than that of another 3D real-time detector with half the number of parameters.Our research extends the application of visual transformers to 3D object detection tasks.展开更多
基金supported by National Key Research and Development Program of China(No.2018AAA0103005)National Natural Science Foundation of China(No.61873266)。
文摘In this paper,an efficient skill learning framework is proposed for robotic insertion,based on one-shot demonstration and reinforcement learning.First,the robot action is composed of two parts:expert action and refinement action.A force Jacobian matrix is calibrated with only one demonstration,based on which stable and safe expert action can be generated.The deep deterministic policy gradients(DDPG)method is employed to learn the refinement action,which aims to improve the assembly efficiency.Second,an episode-step exploration strategy is developed,which uses the expert action as a benchmark and adjusts the exploration intensity dynamically.A safety-efficiency reward function is designed for the compliant insertion.Third,to improve the adaptability with different components,a skill saving and selection mechanism is proposed.Several typical components are used to train the skill models.And the trained models and force Jacobian matrices are saved in a skill pool.Given a new component,the most appropriate model is selected from the skill pool according to the force Jacobian matrix and directly used to accomplish insertion tasks.Fourth,a simulation environment is established under the guidance of the force Jacobian matrix,which avoids tedious training process on real robotic systems.Simulation and experiments are conducted to validate the effectiveness of the proposed methods.
文摘A novel one-shot decorrelator for asynchronous CDMA systems is developed. Corn-pared with existing one-shot decorrelator, it can reduce complexity and has better performance while eliminating all MAI. This decorrelator is shown to be near-far resistant in both AWGN and fading channel.
文摘One-shot systems such as missiles and extinguishers are placed in storage for a long time and used only once during their lives. Their reliability deteriorates with time even when they are in storage, and their failures are detected only through inspections for their characteristics. Thus, we need to decide an appropriate inspection policy for such systems. In this paper, we deal with a system comprising non-identical units in series, where only minimal repairs are performed when unit failures are detected by periodic inspections. The system is replaced and becomes “as good as new” when the nth failure of the system is detected. Our objective is to find the optimal inspection interval and number of failures before replacement that minimize the expected total system cost per unit of time.
文摘This paper presents a numerical method for PDE-constrained optimization problems. These problems arise in many fields of science and engineering including those dealing with real applications. The physical problem is modeled by partial differential equations (PDEs) and involve optimization of some quantity. The PDEs are in most cases nonlinear and solved using numerical methods. Since such numerical solutions are being used routinely, the recent trend has been to develop numerical methods and algorithms so that the optimization problems can be solved numerically as well using the same PDE-solver. We present here one such numerical method which is based on simultaneous pseudo-time stepping. The efficiency of the method is increased with the help of a multigrid strategy. Application example is included for an aerodynamic shape optimization problem.
基金supported by the National Science and Technology Major Project under Grant No.2020AAA0106900the National Natural Science Foundation of China under Grant Nos.U19B2307 and 61876152+1 种基金the Shaanxi Provincial Key Research and Development Program of China under Grant No.2021KWZ-03the Natural Science Basic Research Program of Shaanxi Province of China under Grant No.2021JCW-03.
文摘Given a query patch from a novel class,one-shot object detection aims to detect all instances of this class in a target image through the semantic similarity comparison.However,due to the extremely limited guidance in the novel class as well as the unseen appearance difference between the query and target instances,it is difficult to appropriately exploit their semantic similarity and generalize well.To mitigate this problem,we present a universal Cross-Attention Transformer(CAT)module for accurate and efficient semantic similarity comparison in one-shot object detection.The proposed CAT utilizes the transformer mechanism to comprehensively capture bi-directional correspondence between any paired pixels from the query and the target image,which empowers us to sufficiently exploit their semantic characteristics for accurate similarity comparison.In addition,the proposed CAT enables feature dimensionality compression for inference speedup without performance loss.Extensive experiments on three object detection datasets MS-COCO,PASCAL VOC and FSOD under the one-shot setting demonstrate the effectiveness and efficiency of our model,e.g.,it surpasses CoAE,a major baseline in this task,by 1.0%in average precision(AP)on MS-COCO and runs nearly 2.5 times faster.
基金supported by the National Natural Science Foundation of China(92256304,U23A20593)the Fundamental Research Funds for the Central Universities(020514380294)。
文摘Efficient multi-resonance thermally activated delayed fluorescence(MR-TADF)materials hold significant potential for applications in organic light-emitting diodes(OLEDs)and ultra-high-definition displays.However,the stringent synthesis conditions and low yields typically associated with these materials pose substantial challenges for their practical applications.In this study,we introduce an innovative strategy that involves peripheral modification with sulfur and selenium atoms for two materials,CFDBNS and CFDBNSe.This approach enables a directed one-shot borylation process,achieving synthesis yields of 66%and 25%,respectively,while also enhancing reverse intersystem crossing rates.Both emitters exhibit ultra-narrowband sky-blue emissions centered around 474 nm,with full width at half maximum(FWHM)values as narrow as 19 nm in dilute toluene solutions,along with high photoluminescence quantum yields of 98%and 99%in doped films,respectively.The OLEDs based on CFDBNS and CFDBNSe display sky-blue emissions with peaks at 476 and 477 nm and exceptionally slender FWHM values of 23 nm.Furthermore,the devices demonstrate remarkable performances,achieving maximum external quantum efficiencies of 24.1%and 27.2%.This work presents a novel and straightforward approach for the incorporation of heavy atoms,facilitating the rapid construction of efficient MR-TADF materials for OLEDs.
基金supported in part by the National Natural Science Foundation of China[Grant number 62471075]the Major Science and Technology Project Grant of the Chongqing Municipal Education Commission[Grant number KJZD-M202301901]Graduate Innovation Project Funding of Chongqing University of Technology[Grant number gzlcx20253249].
文摘In recent years,diffusion models have achieved remarkable progress in image generation.However,extending them to text-to-video(T2V)generation remains challenging,particularly in maintaining semantic consistency and visual quality across frames.Existing approaches often overlook the synergy between high-level semantics and low-level texture information,resulting in blurry or temporally inconsistent outputs.To address these issues,we propose Dual Consistency Training(DCT),a novel framework designed to jointly optimize semantic and texture consistency in video generation.Specifically,we introduce a multi-scale spatial adapter to enhance spatial feature extraction,and leverage the complementary strengths of CLIP and VGG—where CLIP focuses on high-level semantics and VGG captures fine-grained texture and detail.During training,a stepwise strategy is adopted to impose semantic and texture losses,constraining discrepancies between generated and ground-truth frames.Furthermore,we propose CLWS,which dynamically adjusts the balance between semantic and texture losses to facilitate more stable and effective optimization.Remarkably,DCT achieves high-quality video generation using only a single training video on a single NVIDIA A6000 GPU.Extensive experiments demonstrate that our method significantly improves temporal coherence and visual fidelity across various video generation tasks,verifying its effectiveness and generalizability.
基金supported by the National Key R&D Program of China(Grant No.2018YFA0701500)Hong Kong Research Grant Council(Grant No.27206321,17205922)+5 种基金the National Natural Science Foundation of China(Grant Nos.62122004,61874138,61888102,61771176,and 62171173)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB44000000)Research on the GaN Chip for 5G Applications(Grant No:JCYJ20210324120409025)Research on high-reliable GaN power device and the related industrial power system(Grant No:HZQBKCZYZ-2021052)Key Project of Department of Education of Guangdong Province(No.2018KCXTD026)supported by ACCESS-AI Chip Center for Emerging Smart Systems,sponsored by Innovation and Technology Fund(ITF),Hong Kong SAR.
文摘Autonomous one-shot on-the-fly learning copes with the high privacy,small dataset,and in-stream data at the edge.Implementing such learning on digital hardware suffers from the well-known von-Neumann and scaling bottlenecks.The optical neural networks featuring large parallelism,low latency,and high efficiency offer a promising solution.However,ex-situ training of conventional optical networks,where optical path configuration and deep learning model optimization are separated,incurs hardware,energy and time overheads,and defeats the advantages in edge learning.Here,we introduced a bio-inspired material-algorithm co-design to construct a hydrogel-based optical Willshaw model(HOWM),manifesting Hebbian-rule-based structural plasticity for simultaneous optical path configuration and deep learning model optimization thanks to the underlying opto-chemical reactions.We first employed the HOWM as an all optical in-sensor AI processor for one-shot pattern classification,association and denoising.We then leveraged HOWM to function as a ternary content addressable memory(TCAM)of an optical memory augmented neural network(MANN)for one-shot learning the Omniglot dataset.The HOWM empowered one-shot on-the-fly edge learning leads to 1000boost of energy efficiency and 10boost of speed,which paves the way for the next-generation autonomous,efficient,and affordable smart edge systems.
基金This work was supported by a Ulucu PhD studentshipY.Jin is funded by an Alexander von Humboldt Professorship for Artificial Intelligence endowed by the German Federal Ministry of Education and Research.
文摘Neural architecture search(NAS)has become increasingly popular in the deep learning community recently,mainly because it can provide an opportunity to allow interested users without rich expertise to benefit from the success of deep neural networks(DNNs).However,NAS is still laborious and time-consuming because a large number of performance estimations are required during the search process of NAS,and training DNNs is computationally intensive.To solve this major limitation of NAS,improving the computational efficiency is essential in the design of NAS.However,a systematic overview of computationally efficient NAS(CE-NAS)methods still lacks.To fill this gap,we provide a comprehensive survey of the state-of-the-art on CE-NAS by categorizing the existing work into proxy-based and surrogate-assisted NAS methods,together with a thorough discussion of their design principles and a quantitative comparison of their performances and computational complexities.The remaining challenges and open research questions are also discussed,and promising research topics in this emerging field are suggested.
基金partially supported by the National Natural Science Foundation of China(Nos.62072151,62376236,61932009)Anhui Provincial Natural Science Fund for the Distinguished Young Scholars,China(No.2008085J30)+2 种基金Open Foundation of Yunnan Key Laboratory of Software Engineering,China(No.2023SE103)CCF-Baidu Open Fund,CAAI-Huawei MindSpore Open Fund,Shenzhen Science and Technology Program,China(No.ZDSYS20230626091302006)Key Project of Science and Technology of Guangxi,China(No.AB22035022-2021AB20147).
文摘Post-training quantization(PTQ)can reduce the memory footprint and latency of deep model inference while still preserving the accuracy of model,with only a small unlabeled calibration set and without the retraining on full training set.To calibrate a quantized model,current PTQ methods usually randomly select some unlabeled data from the training set as calibration data.However,we show the random data selection would result in performance instability and degradation due to the activation distribution mismatch.In this paper,we attempt to solve the crucial task on appropriate calibration data selection,and propose a novel one-shot calibration data selection method termed SelectQ,which selects specific data for calibration via dynamic clustering.The setting of our SelectQ uses the statistic information of activation and performs layer-wise clustering to learn an activation distribution on training set.For that purpose,a new metric called knowledge distance is proposed to calculate the distances of the activation statistics to centroids.Finally,after calibration with the selected data,quantization noise can be alleviated by mitigating the distribution mismatch within activations.Extensive experiments on ImageNet dataset show that our SelectQ increases the top-1 accuracy of ResNet18 over 15% in 4-bit quantization,compared to randomly sampled calibration data.It's noteworthy that SelectQ does not involve both the backward propagation and batch normalization parameters,which means that it has fewer limitations in practical applications.
基金supported by the National Natural Science Foundation of China(No.62206237)Japan Science Promotion Society(Nos.22K12093 and 22K12094)Japan Science and Technology Agency(No.JPMJST2281).
文摘Lightweight modules play a key role in 3D object detection tasks for autonomous driving,which are necessary for the application of 3D object detectors.At present,research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate.However,building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem.In this paper,we focus on combining convolutional neural networks with selfattention-based vision transformers to realize lightweight and high-speed computing for 3D object detection.We propose lightweight detection 3D(LWD-3D),which is a point cloud conversion and lightweight vision transformer for autonomous driving.LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data,which provides a new feature representation method based on a vision transformer for 3D detection applications.The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection(time per image<20 ms).LWD-3D obtains a mean average precision(mAP)75%higher than that of another 3D real-time detector with half the number of parameters.Our research extends the application of visual transformers to 3D object detection tasks.