期刊文献+
共找到34篇文章
< 1 2 >
每页显示 20 50 100
SRS-Net: Training object detectors from scratch for remote sensing images without pretraining 被引量:2
1
作者 Haining WANG Yang LI +4 位作者 Yuqiang FANG Yurong LIAO Bitao JIANG Xitao ZHANG Shuyan NI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第8期269-283,共15页
Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in ... Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in the field of remote sensing image object detection,as pretrained models are significantly different from remote sensing data,it is meaningful to explore a train-fromscratch technique for remote sensing images.This paper proposes an object detection framework trained from scratch,SRS-Net,and describes the design of a densely connected backbone network to provide integrated hidden layer supervision for the convolution module.Then,two necessary improvement principles are proposed:studying the role of normalization in the network structure,and improving data augmentation methods for remote sensing images.To evaluate the proposed framework,we performed many ablation experiments on the DIOR,DOTA,and AS datasets.The results show that whether using the improved backbone network,the normalization method or training data enhancement strategy,the performance of the object detection network trained from scratch increased.These principles compensate for the lack of pretrained models.Furthermore,we found that SRS-Net could achieve similar to or slightly better performance than baseline methods,and surpassed most advanced general detectors. 展开更多
关键词 Denseconnection Object detection pretraining Remote sensing image Trainfrom scratch
原文传递
Swin3D++:Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
2
作者 Yu-Qi Yang Yu-Xiao Guo Yang Liu 《Computational Visual Media》 2025年第3期465-481,共17页
Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.However,the 3D vision domain suffers from a lack of 3D data,and simply... Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.However,the 3D vision domain suffers from a lack of 3D data,and simply combining multiple 3D datasets for pretraining a 3D backbone does not yield significant improvement,due to the domain discrepancies among different 3D datasets that impede effective feature learning.In this work,we identify the main sources of the domain discrepancies between 3D indoor scene datasets,and propose Swin3d++,an enhanced architecture based on Swin3d for efficient pretraining on multi-source 3D point clouds.Swin3d++introduces domain-specific mechanisms to SWIN3D's modules to address domain discrepancies and enhance the network capability on multi-source pretraining.Moreover,we devise a simple source-augmentation strategy to increase the pretraining data scale and facilitate supervised pretraining.We validate the effectiveness of our design,and demonstrate that Swin3d++surpasses the state-of-the-art 3D pretraining methods on typical indoor scene understanding tasks. 展开更多
关键词 3D scenes INDOOR pretraining multi-source data data augmentation
原文传递
Multimodal Pretraining from Monolingual to Multilingual 被引量:1
3
作者 Liang Zhang Ludan Ruan +1 位作者 Anwen Hu Qin Jin 《Machine Intelligence Research》 EI CSCD 2023年第2期220-232,共13页
Multimodal pretraining has made convincing achievements in various downstream tasks in recent years.However,since the majority of the existing works construct models based on English,their applications are limited by ... Multimodal pretraining has made convincing achievements in various downstream tasks in recent years.However,since the majority of the existing works construct models based on English,their applications are limited by language.In this work,we address this issue by developing models with multimodal and multilingual capabilities.We explore two types of methods to extend multimodal pretraining model from monolingual to multilingual.Specifically,we propose a pretraining-based model named multilingual multimodal pretraining(MLMM),and two generalization-based models named multilingual CLIP(M-CLIP)and multilingual acquisition(MLA).In addition,we further extend the generalization-based models to incorporate the audio modality and develop the multilingual CLIP for vision,language,and audio(CLIP4VLA).Our models achieve state-of-the-art performances on multilingual vision-text retrieval,visual question answering,and image captioning benchmarks.Based on the experimental results,we discuss the pros and cons of the two types of models and their potential practical applications. 展开更多
关键词 Multilingual pretraining multimodal pretraining cross-lingual transfer multilingual generation cross-modal retrieval
原文传递
Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery 被引量:3
4
作者 Zhaoxu Meng Cheng Chen +2 位作者 Xuan Zhang Wei Zhao Xuefeng Cui 《Big Data Mining and Analytics》 EI CSCD 2024年第3期565-576,共12页
The effectiveness of Al-driven drug discovery can be enhanced by pretraining on small molecules.However,the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to th... The effectiveness of Al-driven drug discovery can be enhanced by pretraining on small molecules.However,the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules.To overcome these challenges,we propose FragAdd,a strategy that involves adding a chemically implausible molecular fragment to the input molecule.This approach allows for the incorporation of rich local information and the generation of a high-quality graph representation,which is advantageous for tasks like virtual screening.Consequently,we have developed a virtual screening protocol that focuses on identifying estrogen receptor alpha binders on a nucleus receptor.Our results demonstrate a significant improvement in the binding capacity of the retrieved molecules.Additionally,we demonstrate that the FragAdd strategy can be combined with other self-supervised methods to further expedite the drug discovery process. 展开更多
关键词 pretraining information retrieval drug discovery virtual screening molecule property prediction
原文传递
MVContrast:Unsupervised Pretraining for Multi-view 3D Object Recognition 被引量:2
5
作者 Luequan Wang Hongbin Xu Wenxiong Kang 《Machine Intelligence Research》 EI CSCD 2023年第6期872-883,共12页
3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost a... 3D shape recognition has drawn much attention in recent years.The view-based approach performs best of all.However,the current multi-view methods are almost all fully supervised,and the pretraining models are almost all based on ImageNet.Although the pretraining results of ImageNet are quite impressive,there is still a significant discrepancy between multi-view datasets and ImageNet.Multi-view datasets naturally retain rich 3D information.In addition,large-scale datasets such as ImageNet require considerable cleaning and annotation work,so it is difficult to regenerate a second dataset.In contrast,unsupervised learning methods can learn general feature representations without any extra annotation.To this end,we propose a three-stage unsupervised joint pretraining model.Specifically,we decouple the final representations into three fine-grained representations.Data augmentation is utilized to obtain pixel-level representations within each view.And we boost the spatial invariant features from the view level.Finally,we exploit global information at the shape level through a novel extract-and-swap module.Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks,and shows generalization to cross-dataset tasks. 展开更多
关键词 Multi view unsupervised pretraining contrastive learning 3D vision shape recognition
原文传递
Learning Top-K Subtask Planning Tree Based on Discriminative Representation Pretraining for Decision-making
6
作者 Jingqing Ruan Kaishen Wang +2 位作者 Qingyang Zhang Dengpeng Xing Bo Xu 《Machine Intelligence Research》 EI CSCD 2024年第4期782-800,共19页
Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI ... Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI agents and naturally raises two questions:(1)How to extract discriminative knowledge representation from priors?(2)How to develop a rational plan to decompose complex problems?To address these issues,we introduce a groundbreaking framework that incorporates two main contributions.First,our multiple-encoder and individual-predictor regime goes beyond traditional architectures to extract nuanced task-specific dynamics from datasets,enriching the feature space for subtasks.Second,we innovate in planning by introducing a top-K subtask planning tree generated through an attention mechanism,which allows for dynamic adaptability and forward-looking decision-making.Our framework is empirically validated against challenging benchmarks BabyAI including multiple combinatorially rich synthetic tasks(e.g.,GoToSeq,SynthSeq,BossLevel),where it not only outperforms competitive baselines but also demonstrates superior adaptability and effectiveness incomplex task decomposition. 展开更多
关键词 Reinforcement learning representation learning subtask planning task decomposition pretraining.
原文传递
Pretraining Enhanced RNN Transducer
7
作者 Junyu Lu Rongzhong Lian +4 位作者 Di Jiang Yuanfeng Song Zhiyang Su Victor Junqiu Wei Lin Yang 《CAAI Artificial Intelligence Research》 2024年第1期74-81,共8页
Recurrent neural network transducer(RNN-T)is an important branch of current end-to-end automatic speech recognition(ASR).Various promising approaches have been designed for boosting RNN-T architecture;however,few stud... Recurrent neural network transducer(RNN-T)is an important branch of current end-to-end automatic speech recognition(ASR).Various promising approaches have been designed for boosting RNN-T architecture;however,few studies exploit the effectiveness of pretrained methods in this framework.In this paper,we introduce the pretrained acoustic extractor(PAE)and the pretrained linguistic network(PLN)to enhance the Conformer long short-term memory(Conformer-LSTM)transducer.First,we construct the input of the acoustic encoder with two different latent representations:one extracted by PAE from the raw waveform,and the other obtained from filter-bank transformation.Second,we fuse an extra semantic feature from the PLN into the joint network to reduce illogical and homophonic errors.Compared with previous works,our approaches are able to obtain pretrained representations for better model generalization.Evaluation on two large-scale datasets has demonstrated that our proposed approaches yield better performance than existing approaches. 展开更多
关键词 pretraining automatic speech recognition self-supervised learning
原文传递
Enhanced Panoramic Image Generation with GAN and CLIP Models
8
作者 Shilong Li Qiang Zhao 《Journal of Beijing Institute of Technology》 2025年第1期91-101,共11页
Panoramic images, offering a 360-degree view, are essential in virtual reality(VR) and augmented reality(AR), enhancing realism with high-quality textures. However, acquiring complete and high-quality panoramic textur... Panoramic images, offering a 360-degree view, are essential in virtual reality(VR) and augmented reality(AR), enhancing realism with high-quality textures. However, acquiring complete and high-quality panoramic textures is challenging. This paper introduces a method using generative adversarial networks(GANs) and the contrastive language-image pretraining(CLIP) model to restore and control texture in panoramic images. The GAN model captures complex structures and maintains consistency, while CLIP enables fine-grained texture control via semantic text-image associations. GAN inversion optimizes latent codes for precise texture details. The resulting low dynamic range(LDR) images are converted to high dynamic range(HDR) using the Blender engine for seamless texture blending. Experimental results demonstrate the effectiveness and flexibility of this method in panoramic texture restoration and generation. 展开更多
关键词 panoramic images environment texture generative adversarial networks(GANs) contrastive language-image pretraining(CLIP)model blender engine fine-grained control texture generation
在线阅读 下载PDF
Evaluating chat generative pretrained transformer in answering questions on endoscopic mucosal resection and endoscopic submucosal dissection
9
作者 Shi-Song Wang Hui Gao +3 位作者 Peng-Yao Lin Tian-Chen Qian Ying Du Lei Xu 《World Journal of Gastrointestinal Oncology》 2025年第10期290-303,共14页
BACKGROUND With the rising use of endoscopic submucosal dissection(ESD)and endoscopic mucosal resection(EMR),patients are increasingly questioning various aspects of these endoscopic procedures.At the same time,conver... BACKGROUND With the rising use of endoscopic submucosal dissection(ESD)and endoscopic mucosal resection(EMR),patients are increasingly questioning various aspects of these endoscopic procedures.At the same time,conversational artificial intelligence(AI)tools like chat generative pretrained transformer(ChatGPT)are rapidly emerging as sources of medical information.AIM To evaluate ChatGPT’s reliability and usefulness regarding ESD and EMR for patients and healthcare professionals.METHODS In this study,30 specific questions related to ESD and EMR were identified.Then,these questions were repeatedly entered into ChatGPT,with two independent answers generated for each question.A Likert scale was used to rate the accuracy,completeness,and comprehensibility of the responses.Meanwhile,a binary category(high/Low)was used to evaluate each aspect of the two responses generated by ChatGPT and the response retrieved from Google.RESULTS By analyzing the average scores of the three raters,our findings indicated that the responses generated by ChatGPT received high ratings for accuracy(mean score of 5.14 out of 6),completeness(mean score of 2.34 out of 3),and comprehensibility(mean score of 2.96 out of 3).Kendall’s coefficients of concordance indicated good agreement among raters(all P<0.05).For the responses generated by Google,more than half were classified by experts as having low accuracy and low completeness.CONCLUSION ChatGPT provided accurate and reliable answers in response to questions about ESD and EMR.Future studies should address ChatGPT’s current limitations by incorporating more detailed and up-to-date medical information.This could establish AI chatbots as significant resource for both patients and health care professionals. 展开更多
关键词 Endoscopic submucosal dissection Endoscopic mucosal dissection Artificial intelligence Chat generative pretrained transformer Patient education Google
暂未订购
Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI:a multi-center evidence study
10
作者 Ming-Jun Shi Zhi-Xiang Wang +19 位作者 Shuang-Kun Wang Xuan-Hao Li Yan-Lin Zhang Ying Yan Ran An Li-Ning Dong Lei Qiu Tian Tian Jia-Xin Liu Hong-Chen Song Ya-Fan Wang Che Deng Zi-Bing Cao Hong-Yin Wang Zheng Wang Wei Wei Jian Song Jian Lu Xuan Wei Zhen-Chang Wang 《Military Medical Research》 2025年第11期1735-1746,共12页
Background:Multiparametric magnetic resonance imaging(mpMRI)has significantly advanced prostate cancer(PCa)detection,yet decisions on invasive biopsy with moderate prostate imaging reporting and data system(PI-RADS)sc... Background:Multiparametric magnetic resonance imaging(mpMRI)has significantly advanced prostate cancer(PCa)detection,yet decisions on invasive biopsy with moderate prostate imaging reporting and data system(PI-RADS)scores remain ambiguous.Methods:To explore the decision-making capacity of Generative Pretrained Transformer-4(GPT-4)for automated prostate biopsy recommendations,we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers,with available mpMRI before biopsy and documented clinical-histopathological records.GPT-4 generated structured reports with given prompts.The performance of GPT-4 was quantified using confusion matrices,and sensitivity,specificity,as well as area under the curve were calculated.Multiple artificial evaluation procedures were conducted.Wilcoxon’s rank sum test,Fisher’s exact test,and Kruskal-Wallis tests were used for comparisons.Results:Utilizing the largest sample size in the Chinese population,patients with moderate PI-RADS scores(scores 3 and 4)accounted for 39.7%(912/2299),defined as the subset-of-interest(SOI).The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4%,27.3%,49.2%,and 80.1%,respectively.Nearly 47.5%(433/912)of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies.With the assistance of GPT-4,20.8%(190/912)of the SOI population could avoid unnecessary biopsies,and it performed even better[28.8%(118/410)]in the most heterogeneous subgroup of PI-RADS score 3.More than 90.0%of GPT-4-generated reports were comprehensive and easy to understand,but less satisfied with the accuracy(82.8%).GPT-4 also demonstrated cognitive potential for handling complex problems.Additionally,the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4.Eventually,we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients.Conclusions:This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios. 展开更多
关键词 Prostate biopsy Generative Pretrained Transformer-4(GPT-4) DECISION-MAKING Prostate cancer Multiparametric magnetic resonance imaging(mpMRI)
原文传递
Classification of Conversational Sentences Using an Ensemble Pre-Trained Language Model with the Fine-Tuned Parameter
11
作者 R.Sujatha K.Nimala 《Computers, Materials & Continua》 SCIE EI 2024年第2期1669-1686,共18页
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir... Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88. 展开更多
关键词 Bidirectional encoder for representation of transformer conversation ensemble model fine-tuning generalized autoregressive pretraining for language understanding generative pre-trained transformer hyperparameter tuning natural language processing robustly optimized BERT pretraining approach sentence classification transformer models
在线阅读 下载PDF
PAL-BERT:An Improved Question Answering Model
12
作者 Wenfeng Zheng Siyu Lu +3 位作者 Zhuohang Cai Ruiyang Wang Lei Wang Lirong Yin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2729-2745,共17页
In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput... In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance. 展开更多
关键词 PAL-BERT question answering model pretraining language models ALBERT pruning model network pruning TextCNN BiLSTM
在线阅读 下载PDF
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
13
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(NLP) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
在线阅读 下载PDF
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 被引量:4
14
作者 Liangping Ding Zhixiong Zhang +2 位作者 Huan Liu Jie Li GaihongYu 《Journal of Data and Information Science》 CSCD 2021年第3期35-57,共23页
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p... Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent. 展开更多
关键词 Automatic keyphrase extraction Character-level sequence labeling Pretrained language model Scientific chinese medical abstracts
在线阅读 下载PDF
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer 被引量:3
15
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
在线阅读 下载PDF
Spatio-temporal intention learning for recommendation of next point-of-interest 被引量:2
16
作者 Hao Li Peng Yue +2 位作者 Shangcheng Li Chenxiao Zhang Can Yang 《Geo-Spatial Information Science》 CSCD 2024年第2期384-397,共14页
Next point-of-interest(POI)recommendation has been applied by many internet companies to enhance the user travel experience.Recent research advocates deep-learning methods to model long-term check-in sequences and min... Next point-of-interest(POI)recommendation has been applied by many internet companies to enhance the user travel experience.Recent research advocates deep-learning methods to model long-term check-in sequences and mine mobility patterns of people to improve recommendation performance.Existing approaches model general user preferences based on historical check-ins and can be termed as preference pattern models.The preference pattern is different from the intention pattern,in that it does not emphasize the user mobility pattern of revisiting POIs,which is a common behavior and kind of intention for users.An effective module is needed to predict when and where users will repeat visits.In this paper,we propose a Spatio-Temporal Intention Learning Self-Attention Network(STILSAN)for next POI recommendation.STILSAN employs a preference-intention module to capture the user’s long-term preference and recognizes the user’s intention to revisit some specific POIs at a specific time.Meanwhile,we design a spatial encoder module as a pretrained model for learning POI spatial feature by simulating the spatial clustering phenomenon and the spatial proximity of the POIs.Experiments are conducted on two real-world check-in datasets.The experimental results demonstrate that all the proposed modules can effectively improve recommendation accuracy and STILSAN yields outstanding improvements over the state-of-the-art models. 展开更多
关键词 Point-of-Interest(POI) RECOMMENDATION spatial pretrained model selfattention revisiting intention
原文传递
An Efficient and Robust Hand Gesture Recognition System of Sign Language Employing Finetuned Inception-V3 and Efficientnet-B0 Network 被引量:1
17
作者 Adnan Hussain Sareer Ul Amin +1 位作者 Muhammad Fayaz Sanghyun Seo 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3509-3525,共17页
Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured for... Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured form of hand gestures.The fundamental building blocks of structured expressions in sign language are the arrangement of the fingers,the orientation of the hand,and the hand’s position concerning the body.The importance of HGR has increased due to the increasing number of touchless applications and the rapid growth of the hearing-impaired population.Therefore,real-time HGR is one of the most effective interaction methods between computers and humans.Developing a user-free interface with good recognition performance should be the goal of real-time HGR systems.Nowadays,Convolutional Neural Network(CNN)shows great recognition rates for different image-level classification tasks.It is challenging to train deep CNN networks like VGG-16,VGG-19,Inception-v3,and Efficientnet-B0 from scratch because only some significant labeled image datasets are available for static hand gesture images.However,an efficient and robust hand gesture recognition system of sign language employing finetuned Inception-v3 and Efficientnet-Bo network is proposed to identify hand gestures using a comparative small HGR dataset.Experiments show that Inception-v3 achieved 90%accuracy and 0.93%precision,0.91%recall,and 0.90%f1-score,respectively,while EfficientNet-B0 achieved 99%accuracy and 0.98%,0.97%,0.98%,precision,recall,and f1-score respectively. 展开更多
关键词 Pretrained CNN hand gesture recognition transfer learning
在线阅读 下载PDF
LKMT:Linguistics Knowledge-Driven Multi-Task Neural Machine Translation for Urdu and English
18
作者 Muhammad Naeem Ul Hassan Zhengtao Yu +4 位作者 Jian Wang Ying Li Shengxiang Gao Shuwan Yang Cunli Mao 《Computers, Materials & Continua》 SCIE EI 2024年第10期951-969,共19页
Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the ... Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation. 展开更多
关键词 Urdu NMT(neural machine translation) Urdu natural language processing Urdu Linguistic features low resources language linguistic features pretrain model
在线阅读 下载PDF
Cross-Sensor Generative Self-Supervised Learning Network for Fault Detection Under Few Samples
19
作者 ZHU Huijuan ZHAO Yunbo +2 位作者 YAN Xiaohui KANG Yu LIU Binkun 《Journal of Systems Science & Complexity》 2025年第3期1000-1020,共21页
In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining ... In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining between channels to deal with the pretext task,the shared features between multi-sensor data can be captured,and the gap between channel data features will be reduced.Meanwhile,in order to model fault features in the downstream task,the salience module is developed to optimize cross-sensor data features based on a small amount of labeled data to make warning feature information prominent for improving the separator accuracy.Finally,experimental results on the public datasets FEMTO-ST dataset and the private datasets SMT shock absorber dataset(SMT-SA dataset)show that the proposed method performs favorably against other STATE-of-the-art methods. 展开更多
关键词 Fault detection generative self-supervised learning multi-dimension cross-sensor MULTISENSOR pretraining
原文传递
Swin3D: A pretrained transformer backbone for 3D indoor scene understanding
20
作者 Yu-Qi Yang Yu-Xiao Guo +5 位作者 Jian-Yu Xiong Yang Liu Hao Pan Peng-Shuai Wang Xin Tong Baining Guo 《Computational Visual Media》 2025年第1期83-101,共19页
The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,call... The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,called Swin3D,for 3D indoor scene understanding.We designed a 3D Swin Transformer as our backbone network,which enables efficient selfattention on sparse voxels with linear memory complexity,making the backbone scalable to large models and datasets.We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance.We pretrained a large Swin3D model on a synthetic Structured3D dataset,which is an order of magnitude larger than the ScanNet dataset.Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets but also outperforms state-of-the-art methods on downstream tasks with+2.3 mIoU and+2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation,respectively,+1.8 mIoU on ScanNet segmentation(val),+1.9 mAP@0.5 on ScanNet detection,and+8.1 mAP@0.5 on S3DIS detection.A series of extensive ablation studies further validated the scalability,generality,and superior performance enabled by our approach. 展开更多
关键词 3D pretraining ponitcloud analysis trans-former backbone Swin Transformer 3D semantic segmentation 3D object detection
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部