期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
KitWaSor:Pioneering pre-trained model for kitchen waste sorting with an innovative million-level benchmark dataset
1
作者 Leyuan Fang Shuaiyu Ding +3 位作者 Hao Feng Junwu Yu Lin Tang Pedram Ghamisi 《CAAI Transactions on Intelligence Technology》 2025年第1期94-114,共21页
Intelligent sorting is an important prerequisite for the full quantitative consumption and harmless disposal of kitchen waste.The existing object detection method based on an ImageNet pre-trained model is an effective... Intelligent sorting is an important prerequisite for the full quantitative consumption and harmless disposal of kitchen waste.The existing object detection method based on an ImageNet pre-trained model is an effective way of sorting.Owing to significant domain gaps between natural images and kitchen waste images,it is difficult to reflect the characteristics of diverse scales and dense distribution in kitchen waste based on an ImageNet pre-trained model,leading to poor generalisation.In this article,the authors propose the first pre-trained model for kitchen waste sorting called KitWaSor,which combines both contrastive learning(CL)and masked image modelling(MIM)through self-supervised learning(SSL).First,to address the issue of diverse scales,the authors propose a mixed masking strategy by introducing an incomplete masking branch based on the original random masking branch.It prevents the complete loss of small-scale objects while avoiding excessive leakage of large-scale object pixels.Second,to address the issue of dense distribution,the authors introduce semantic consistency constraints on the basis of the mixed masking strategy.That is,object semantic reasoning is performed through semantic consistency constraints to compensate for the lack of contextual information.To train KitWaSor,the authors construct the first million-level kitchen waste dataset across seasonal and regional distributions,named KWD-Million.Extensive experiments show that KitWaSor achieves state-of-the-art(SOTA)performance on the two most relevant downstream tasks for kitchen waste sorting(i.e.image classification and object detection),demonstrating the effectiveness of the proposed KitWaSor. 展开更多
关键词 contrastive learning kitchen waste masked image modeling pre-trained model self-supervised learning
在线阅读 下载PDF
DenseCL:A simple framework for self-supervised dense visual pre-training 被引量:1
2
作者 Xinlong Wang Rufeng Zhang +1 位作者 Chunhua Shen Tao Kong 《Visual Informatics》 EI 2023年第1期30-40,共11页
Self-supervised learning aims to learn a universal feature representation without labels.To date,most existing self-supervised learning methods are designed and optimized for image classification.These pre-trained mod... Self-supervised learning aims to learn a universal feature representation without labels.To date,most existing self-supervised learning methods are designed and optimized for image classification.These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction.To fill this gap,we aim to design an effective,dense self-supervised learning framework that directly works at the level of pixels(or local features)by taking into account the correspondence between local features.Specifically,we present dense contrastive learning(DenseCL),which implements self-supervised learning by optimizing a pairwise contrastive(dis)similarity loss at the pixel level between two views of input images.Compared to the supervised ImageNet pre-training and other self-supervised learning methods,our self-supervised DenseCL pretraining demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection,semantic segmentation and instance segmentation.Specifically,our approach significantly outperforms the strong MoCo-v2 by 2.0%AP on PASCAL VOC object detection,1.1%AP on COCO object detection,0.9%AP on COCO instance segmentation,3.0%mIoU on PASCAL VOC semantic segmentation and 1.8%mIoU on Cityscapes semantic segmentation.The improvements are up to 3.5%AP and 8.8%mIoU over MoCo-v2,and 6.1%AP and 6.1%mIoU over supervised counterpart with frozen-backbone evaluation protocol. 展开更多
关键词 self-supervised learning Visual pre-training Dense prediction tasks
原文传递
Self-Supervised Task Augmentation for Few-Shot Intent Detection 被引量:1
3
作者 Peng-Fei Sun Ya-Wen Ouyang +1 位作者 Ding-Jie Song Xin-Yu Dai 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第3期527-538,共12页
Few-shot intent detection is a practical challenge task,because new intents are frequently emerging and collecting large-scale data for them could be costly.Meta-learning,a promising technique for leveraging data from... Few-shot intent detection is a practical challenge task,because new intents are frequently emerging and collecting large-scale data for them could be costly.Meta-learning,a promising technique for leveraging data from previous tasks to enable efficient learning of new tasks,has been a popular way to tackle this problem.However,the existing meta-learning models have been evidenced to be overfitting when the meta-training tasks are insufficient.To overcome this challenge,we present a novel self-supervised task augmentation with meta-learning framework,namely STAM.Firstly,we introduce the task augmentation,which explores two different strategies and combines them to extend meta-training tasks.Secondly,we devise two auxiliary losses for integrating self-supervised learning into meta-learning to learn more generalizable and transferable features.Experimental results show that STAM can achieve consistent and considerable performance improvement to existing state-of-the-art methods on four datasets. 展开更多
关键词 self-supervised learning task augmentation META-LEARNING few-shot intent detection
原文传递
A Phonetic-Semantic Pre-Training Model for Robust Speech Recognition 被引量:1
4
作者 Xueyang Wu Rongzhong Lian +4 位作者 Di Jiang Yuanfeng Song Weiwei Zhao Qian Xu Qiang Yang 《CAAI Artificial Intelligence Research》 2022年第1期1-7,共7页
Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to anno... Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets. 展开更多
关键词 pre-training automatic speech recognition self-supervised learning
原文传递
Pre-training in Medical Data:A Survey
5
作者 Yixuan Qiu Feng Lin +1 位作者 Weitong Chen Miao Xu 《Machine Intelligence Research》 EI CSCD 2023年第2期147-179,共33页
Medical data refers to health-related information associated with regular patient care or as part of a clinical trial program.There are many categories of such data,such as clinical imaging data,bio-signal data,electr... Medical data refers to health-related information associated with regular patient care or as part of a clinical trial program.There are many categories of such data,such as clinical imaging data,bio-signal data,electronic health records(EHR),and multi-modality medical data.With the development of deep neural networks in the last decade,the emerging pre-training paradigm has become dominant in that it has significantly improved machine learning methods′performance in a data-limited scenario.In recent years,studies of pre-training in the medical domain have achieved significant progress.To summarize these technology advancements,this work provides a comprehensive survey of recent advances for pre-training on several major types of medical data.In this survey,we summarize a large number of related publications and the existing benchmarking in the medical domain.Especially,the survey briefly describes how some pre-training methods are applied to or developed for medical data.From a data-driven perspective,we examine the extensive use of pre-training in many medical scenarios.Moreover,based on the summary of recent pre-training studies,we identify several challenges in this field to provide insights for future studies. 展开更多
关键词 Medical data pre-training transfer learning self-supervised learning medical image data electrocardiograms(ECG)data
原文传递
Enhancing Intermodal Interaction for Unified Vision-Language Understanding and Generation
6
作者 Yang Qin Huiming Xie +2 位作者 Yujie Li Benying Tan Shuxue Ding 《Data Intelligence》 2025年第2期358-380,共23页
The majority of vision-language pre-training(VLP)models rely on pre-trained object detectors,which incur high costs and restrict the recognition of object classes.Additionally,their encoder-based structures hinder the... The majority of vision-language pre-training(VLP)models rely on pre-trained object detectors,which incur high costs and restrict the recognition of object classes.Additionally,their encoder-based structures hinder their ability to perform text generation tasks effectively.To mitigate these challenges,we propose a Detector-free Vision-and-Language Pre-training(D-VLP)model designed to bolster intermodal interaction for unified understanding and generation tasks.Our D-VLP model employs a co-modality decoder equipped with a fused multi-attention self-attention module,enhancing feature fusion and information alignment between images and text.It is pre-trained using a novel Prefix Masked Language Modeling(prefixMLM)approach,leveraging the strengths of masked language modeling and unidirectional language modeling,which enables bidirectional processing and autoregressive token generation.Extensive experiments demonstrate that D-VLP surpasses state-of-the-art models in vision-language tasks,highlighting its superior performance and adaptability across various image-text tasks with minimal adjustments. 展开更多
关键词 Image-and-text Representation learning pre-training Transformer self-supervised learning
原文传递
Pre-trained models for natural language processing: A survey 被引量:198
7
作者 QIU XiPeng SUN TianXiang +3 位作者 XU YiGe SHAO YunFan DAI Ning HUANG XuanJing 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2020年第10期1872-1897,共26页
Recently, the emergence of pre-trained models(PTMs) has brought natural language processing(NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language rep... Recently, the emergence of pre-trained models(PTMs) has brought natural language processing(NLP) to a new era. In this survey, we provide a comprehensive review of PTMs for NLP. We first briefly introduce language representation learning and its research progress. Then we systematically categorize existing PTMs based on a taxonomy from four different perspectives. Next,we describe how to adapt the knowledge of PTMs to downstream tasks. Finally, we outline some potential directions of PTMs for future research. This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks. 展开更多
关键词 deep learning neural network natural language processing pre-trained model distributed representation word embedding self-supervised learning language modelling
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部