期刊文献+
共找到32篇文章
< 1 2 >
每页显示 20 50 100
Multimodal Pretrained Knowledge for Real-world Object Navigation
1
作者 Hui Yuan Yan Huang +4 位作者 Naigong Yu Dongbo Zhang Zetao Du Ziqi Liu Kun Zhang 《Machine Intelligence Research》 2025年第4期713-729,共17页
Most visual-language navigation(VLN)research focuses on simulate environments,but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environme... Most visual-language navigation(VLN)research focuses on simulate environments,but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environments,leading to path deviations.To address this,we propose a novel vision-and-language object navigation strategy that uses multimodal pretrained knowledge as a cross-modal bridge to link semantic concepts in both images and text.This improves navigation supervision at key-points and enhances robustness.Specifically,we 1)randomly generate key-points within a specific density range and optimize them on the basis of challenging locations;2)use pretrained multimodal knowledge to efficiently retrieve target objects;3)combine depth information with simultaneous localization and mapping(SLAM)map data to predict optimal positions and orientations for accurate navigation;and 4)implement the method on a physical robot,successfully conducting navigation tests.Our approach achieves a maximum success rate of 66.7%,outperforming existing VLN methods in real-world environments. 展开更多
关键词 Visual-and-language object navigation key-points multimodal pretrained knowledge optimal positions and orientations physical robot
原文传递
Swin3D: A pretrained transformer backbone for 3D indoor scene understanding
2
作者 Yu-Qi Yang Yu-Xiao Guo +5 位作者 Jian-Yu Xiong Yang Liu Hao Pan Peng-Shuai Wang Xin Tong Baining Guo 《Computational Visual Media》 2025年第1期83-101,共19页
The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,call... The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,called Swin3D,for 3D indoor scene understanding.We designed a 3D Swin Transformer as our backbone network,which enables efficient selfattention on sparse voxels with linear memory complexity,making the backbone scalable to large models and datasets.We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance.We pretrained a large Swin3D model on a synthetic Structured3D dataset,which is an order of magnitude larger than the ScanNet dataset.Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets but also outperforms state-of-the-art methods on downstream tasks with+2.3 mIoU and+2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation,respectively,+1.8 mIoU on ScanNet segmentation(val),+1.9 mAP@0.5 on ScanNet detection,and+8.1 mAP@0.5 on S3DIS detection.A series of extensive ablation studies further validated the scalability,generality,and superior performance enabled by our approach. 展开更多
关键词 3D pretraining ponitcloud analysis trans-former backbone Swin Transformer 3D semantic segmentation 3D object detection
原文传递
Evaluating chat generative pretrained transformer in answering questions on endoscopic mucosal resection and endoscopic submucosal dissection
3
作者 Shi-Song Wang Hui Gao +3 位作者 Peng-Yao Lin Tian-Chen Qian Ying Du Lei Xu 《World Journal of Gastrointestinal Oncology》 2025年第10期290-303,共14页
BACKGROUND With the rising use of endoscopic submucosal dissection(ESD)and endoscopic mucosal resection(EMR),patients are increasingly questioning various aspects of these endoscopic procedures.At the same time,conver... BACKGROUND With the rising use of endoscopic submucosal dissection(ESD)and endoscopic mucosal resection(EMR),patients are increasingly questioning various aspects of these endoscopic procedures.At the same time,conversational artificial intelligence(AI)tools like chat generative pretrained transformer(ChatGPT)are rapidly emerging as sources of medical information.AIM To evaluate ChatGPT’s reliability and usefulness regarding ESD and EMR for patients and healthcare professionals.METHODS In this study,30 specific questions related to ESD and EMR were identified.Then,these questions were repeatedly entered into ChatGPT,with two independent answers generated for each question.A Likert scale was used to rate the accuracy,completeness,and comprehensibility of the responses.Meanwhile,a binary category(high/Low)was used to evaluate each aspect of the two responses generated by ChatGPT and the response retrieved from Google.RESULTS By analyzing the average scores of the three raters,our findings indicated that the responses generated by ChatGPT received high ratings for accuracy(mean score of 5.14 out of 6),completeness(mean score of 2.34 out of 3),and comprehensibility(mean score of 2.96 out of 3).Kendall’s coefficients of concordance indicated good agreement among raters(all P<0.05).For the responses generated by Google,more than half were classified by experts as having low accuracy and low completeness.CONCLUSION ChatGPT provided accurate and reliable answers in response to questions about ESD and EMR.Future studies should address ChatGPT’s current limitations by incorporating more detailed and up-to-date medical information.This could establish AI chatbots as significant resource for both patients and health care professionals. 展开更多
关键词 Endoscopic submucosal dissection Endoscopic mucosal dissection Artificial intelligence Chat generative pretrained transformer Patient education Google
暂未订购
Generative pretrained transformer 4:an innovative approach to facilitate value-based healthcare
4
作者 Han Lyu Zhixiang Wang +6 位作者 Jia Li Jing Sun Xinghao Wang Pengling Ren Linkun Cai Zhenchang Wang Max Wintermark 《Intelligent Medicine》 EI CSCD 2024年第1期10-15,共6页
Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing approp... Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。 展开更多
关键词 Generative pretrained transformer 4 model Natural language processing Medical imaging APPROPRIATENESS
原文传递
Pretrained Models and Evaluation Data for the Khmer Language
5
作者 Shengyi Jiang Sihui Fu +1 位作者 Nankai Lin Yingwen Fu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期709-718,共10页
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(N... Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications. 展开更多
关键词 pretrained models Khmer language word segmentation part-of-speech(POS)tagging news categorization
原文传递
Enhanced Panoramic Image Generation with GAN and CLIP Models
6
作者 Shilong Li Qiang Zhao 《Journal of Beijing Institute of Technology》 2025年第1期91-101,共11页
Panoramic images, offering a 360-degree view, are essential in virtual reality(VR) and augmented reality(AR), enhancing realism with high-quality textures. However, acquiring complete and high-quality panoramic textur... Panoramic images, offering a 360-degree view, are essential in virtual reality(VR) and augmented reality(AR), enhancing realism with high-quality textures. However, acquiring complete and high-quality panoramic textures is challenging. This paper introduces a method using generative adversarial networks(GANs) and the contrastive language-image pretraining(CLIP) model to restore and control texture in panoramic images. The GAN model captures complex structures and maintains consistency, while CLIP enables fine-grained texture control via semantic text-image associations. GAN inversion optimizes latent codes for precise texture details. The resulting low dynamic range(LDR) images are converted to high dynamic range(HDR) using the Blender engine for seamless texture blending. Experimental results demonstrate the effectiveness and flexibility of this method in panoramic texture restoration and generation. 展开更多
关键词 panoramic images environment texture generative adversarial networks(GANs) contrastive language-image pretraining(CLIP)model blender engine fine-grained control texture generation
在线阅读 下载PDF
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer 被引量:3
7
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
在线阅读 下载PDF
Spatio-temporal intention learning for recommendation of next point-of-interest 被引量:2
8
作者 Hao Li Peng Yue +2 位作者 Shangcheng Li Chenxiao Zhang Can Yang 《Geo-Spatial Information Science》 CSCD 2024年第2期384-397,共14页
Next point-of-interest(POI)recommendation has been applied by many internet companies to enhance the user travel experience.Recent research advocates deep-learning methods to model long-term check-in sequences and min... Next point-of-interest(POI)recommendation has been applied by many internet companies to enhance the user travel experience.Recent research advocates deep-learning methods to model long-term check-in sequences and mine mobility patterns of people to improve recommendation performance.Existing approaches model general user preferences based on historical check-ins and can be termed as preference pattern models.The preference pattern is different from the intention pattern,in that it does not emphasize the user mobility pattern of revisiting POIs,which is a common behavior and kind of intention for users.An effective module is needed to predict when and where users will repeat visits.In this paper,we propose a Spatio-Temporal Intention Learning Self-Attention Network(STILSAN)for next POI recommendation.STILSAN employs a preference-intention module to capture the user’s long-term preference and recognizes the user’s intention to revisit some specific POIs at a specific time.Meanwhile,we design a spatial encoder module as a pretrained model for learning POI spatial feature by simulating the spatial clustering phenomenon and the spatial proximity of the POIs.Experiments are conducted on two real-world check-in datasets.The experimental results demonstrate that all the proposed modules can effectively improve recommendation accuracy and STILSAN yields outstanding improvements over the state-of-the-art models. 展开更多
关键词 Point-of-Interest(POI) RECOMMENDATION spatial pretrained model selfattention revisiting intention
原文传递
Classification of Conversational Sentences Using an Ensemble Pre-Trained Language Model with the Fine-Tuned Parameter
9
作者 R.Sujatha K.Nimala 《Computers, Materials & Continua》 SCIE EI 2024年第2期1669-1686,共18页
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir... Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88. 展开更多
关键词 Bidirectional encoder for representation of transformer conversation ensemble model fine-tuning generalized autoregressive pretraining for language understanding generative pre-trained transformer hyperparameter tuning natural language processing robustly optimized BERT pretraining approach sentence classification transformer models
在线阅读 下载PDF
PAL-BERT:An Improved Question Answering Model
10
作者 Wenfeng Zheng Siyu Lu +3 位作者 Zhuohang Cai Ruiyang Wang Lei Wang Lirong Yin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2729-2745,共17页
In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and comput... In the field of natural language processing(NLP),there have been various pre-training language models in recent years,with question answering systems gaining significant attention.However,as algorithms,data,and computing power advance,the issue of increasingly larger models and a growing number of parameters has surfaced.Consequently,model training has become more costly and less efficient.To enhance the efficiency and accuracy of the training process while reducing themodel volume,this paper proposes a first-order pruningmodel PAL-BERT based on the ALBERT model according to the characteristics of question-answering(QA)system and language model.Firstly,a first-order network pruning method based on the ALBERT model is designed,and the PAL-BERT model is formed.Then,the parameter optimization strategy of the PAL-BERT model is formulated,and the Mish function was used as an activation function instead of ReLU to improve the performance.Finally,after comparison experiments with traditional deep learning models TextCNN and BiLSTM,it is confirmed that PALBERT is a pruning model compression method that can significantly reduce training time and optimize training efficiency.Compared with traditional models,PAL-BERT significantly improves the NLP task’s performance. 展开更多
关键词 PAL-BERT question answering model pretraining language models ALBERT pruning model network pruning TextCNN BiLSTM
在线阅读 下载PDF
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
11
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(NLP) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
在线阅读 下载PDF
LKMT:Linguistics Knowledge-Driven Multi-Task Neural Machine Translation for Urdu and English
12
作者 Muhammad Naeem Ul Hassan Zhengtao Yu +4 位作者 Jian Wang Ying Li Shengxiang Gao Shuwan Yang Cunli Mao 《Computers, Materials & Continua》 SCIE EI 2024年第10期951-969,共19页
Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the ... Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation. 展开更多
关键词 Urdu NMT(neural machine translation) Urdu natural language processing Urdu Linguistic features low resources language linguistic features pretrain model
在线阅读 下载PDF
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 被引量:4
13
作者 Liangping Ding Zhixiong Zhang +2 位作者 Huan Liu Jie Li GaihongYu 《Journal of Data and Information Science》 CSCD 2021年第3期35-57,共23页
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p... Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent. 展开更多
关键词 Automatic keyphrase extraction Character-level sequence labeling pretrained language model Scientific chinese medical abstracts
在线阅读 下载PDF
An Efficient and Robust Hand Gesture Recognition System of Sign Language Employing Finetuned Inception-V3 and Efficientnet-B0 Network 被引量:1
14
作者 Adnan Hussain Sareer Ul Amin +1 位作者 Muhammad Fayaz Sanghyun Seo 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3509-3525,共17页
Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured for... Hand Gesture Recognition(HGR)is a promising research area with an extensive range of applications,such as surgery,video game techniques,and sign language translation,where sign language is a complicated structured form of hand gestures.The fundamental building blocks of structured expressions in sign language are the arrangement of the fingers,the orientation of the hand,and the hand’s position concerning the body.The importance of HGR has increased due to the increasing number of touchless applications and the rapid growth of the hearing-impaired population.Therefore,real-time HGR is one of the most effective interaction methods between computers and humans.Developing a user-free interface with good recognition performance should be the goal of real-time HGR systems.Nowadays,Convolutional Neural Network(CNN)shows great recognition rates for different image-level classification tasks.It is challenging to train deep CNN networks like VGG-16,VGG-19,Inception-v3,and Efficientnet-B0 from scratch because only some significant labeled image datasets are available for static hand gesture images.However,an efficient and robust hand gesture recognition system of sign language employing finetuned Inception-v3 and Efficientnet-Bo network is proposed to identify hand gestures using a comparative small HGR dataset.Experiments show that Inception-v3 achieved 90%accuracy and 0.93%precision,0.91%recall,and 0.90%f1-score,respectively,while EfficientNet-B0 achieved 99%accuracy and 0.98%,0.97%,0.98%,precision,recall,and f1-score respectively. 展开更多
关键词 pretrained CNN hand gesture recognition transfer learning
在线阅读 下载PDF
SRS-Net: Training object detectors from scratch for remote sensing images without pretraining 被引量:2
15
作者 Haining WANG Yang LI +4 位作者 Yuqiang FANG Yurong LIAO Bitao JIANG Xitao ZHANG Shuyan NI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第8期269-283,共15页
Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in ... Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in the field of remote sensing image object detection,as pretrained models are significantly different from remote sensing data,it is meaningful to explore a train-fromscratch technique for remote sensing images.This paper proposes an object detection framework trained from scratch,SRS-Net,and describes the design of a densely connected backbone network to provide integrated hidden layer supervision for the convolution module.Then,two necessary improvement principles are proposed:studying the role of normalization in the network structure,and improving data augmentation methods for remote sensing images.To evaluate the proposed framework,we performed many ablation experiments on the DIOR,DOTA,and AS datasets.The results show that whether using the improved backbone network,the normalization method or training data enhancement strategy,the performance of the object detection network trained from scratch increased.These principles compensate for the lack of pretrained models.Furthermore,we found that SRS-Net could achieve similar to or slightly better performance than baseline methods,and surpassed most advanced general detectors. 展开更多
关键词 Denseconnection Object detection Pretraining Remote sensing image Trainfrom scratch
原文传递
Unveiling the Hidden Interactions Among Features:A Heterogeneous Graph Approach for Personality Prediction
16
作者 Yuxuan Song Yilin Wu +2 位作者 Qiudan Li Liping Chen Daniel Zeng 《Machine Intelligence Research》 2025年第1期91-100,共10页
Identifying personalities accurately helps merchants and management departments understand user needs in detail and improve the quality of service and decision-making efficiency.Existing research on text-based persona... Identifying personalities accurately helps merchants and management departments understand user needs in detail and improve the quality of service and decision-making efficiency.Existing research on text-based personality prediction mainly uses deep neural networks or pretrained language models to mine deep semantics,ignoring the dynamic interactions among personality features.This paper presents a novel personality prediction method that simultaneously taps into the capability of graph neural networks to model the deep interactions among features and that of pretrained language models to learn latent semantics with a hierarchical aggregation mechanism.Specifically,the proposed model leverages self-attention to capture the interaction relationships among POS tags,entities,personality tags,etc.,and considers the labels’cooccurrence patterns.The efficacy of the proposed model is evaluated on the myPersonality and PANDORA datasets.This research contributes to the personality prediction literature from the perspective of a multigranular personality feature learning perspective and provides business value for consuming predictive analytics. 展开更多
关键词 Personality prediction graph representation learning pretrained models graph convolutional networks attention mechanism
原文传递
Cross-Sensor Generative Self-Supervised Learning Network for Fault Detection Under Few Samples
17
作者 ZHU Huijuan ZHAO Yunbo +2 位作者 YAN Xiaohui KANG Yu LIU Binkun 《Journal of Systems Science & Complexity》 2025年第3期1000-1020,共21页
In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining ... In this paper,a cross-sensor generative self-supervised learning network is proposed for fault detection of multi-sensor.By modeling the sensor signals in multiple dimensions to achieve correlation information mining between channels to deal with the pretext task,the shared features between multi-sensor data can be captured,and the gap between channel data features will be reduced.Meanwhile,in order to model fault features in the downstream task,the salience module is developed to optimize cross-sensor data features based on a small amount of labeled data to make warning feature information prominent for improving the separator accuracy.Finally,experimental results on the public datasets FEMTO-ST dataset and the private datasets SMT shock absorber dataset(SMT-SA dataset)show that the proposed method performs favorably against other STATE-of-the-art methods. 展开更多
关键词 Fault detection generative self-supervised learning multi-dimension cross-sensor MULTISENSOR pretraining
原文传递
Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery 被引量:3
18
作者 Zhaoxu Meng Cheng Chen +2 位作者 Xuan Zhang Wei Zhao Xuefeng Cui 《Big Data Mining and Analytics》 EI CSCD 2024年第3期565-576,共12页
The effectiveness of Al-driven drug discovery can be enhanced by pretraining on small molecules.However,the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to th... The effectiveness of Al-driven drug discovery can be enhanced by pretraining on small molecules.However,the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules.To overcome these challenges,we propose FragAdd,a strategy that involves adding a chemically implausible molecular fragment to the input molecule.This approach allows for the incorporation of rich local information and the generation of a high-quality graph representation,which is advantageous for tasks like virtual screening.Consequently,we have developed a virtual screening protocol that focuses on identifying estrogen receptor alpha binders on a nucleus receptor.Our results demonstrate a significant improvement in the binding capacity of the retrieved molecules.Additionally,we demonstrate that the FragAdd strategy can be combined with other self-supervised methods to further expedite the drug discovery process. 展开更多
关键词 pretraining information retrieval drug discovery virtual screening molecule property prediction
原文传递
Learning to compose diversified prompts for image emotion classification 被引量:2
19
作者 Sinuo Deng Lifang Wu +4 位作者 Ge Shi Lehao Xing Meng Jian Ye Xiang Ruihai Dong 《Computational Visual Media》 CSCD 2024年第6期1169-1183,共15页
Image emotion classification(IEC)aims to extract the abstract emotions evoked in images.Recently,language-supervised methods such as con-trastive language-image pretraining(CLIP)have demonstrated superior performance ... Image emotion classification(IEC)aims to extract the abstract emotions evoked in images.Recently,language-supervised methods such as con-trastive language-image pretraining(CLIP)have demonstrated superior performance in image under-standing.However,the underexplored task of IEC presents three major challenges:a tremendous training objective gap between pretraining and IEC,shared suboptimal prompts,and invariant prompts for all instances.In this study,we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task.First,a prompt-tuning method that mimics the pretraining objective of CLIP is introduced,to exploit the rich image and text semantics associated with CLIP.Subsequently,instance-specific prompts are automatically composed,conditioning them on the categories and image content of instances,diversifying the prompts,and thus avoiding suboptimal problems.Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods(up to 9.29%accuracy gain on the EmotionROI dataset)on IEC tasks with only a few trained parameters.The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes. 展开更多
关键词 image emotion analysis multimodal learning pretraining model prompt tuning
原文传递
Learning Top-K Subtask Planning Tree Based on Discriminative Representation Pretraining for Decision-making
20
作者 Jingqing Ruan Kaishen Wang +2 位作者 Qingyang Zhang Dengpeng Xing Bo Xu 《Machine Intelligence Research》 EI CSCD 2024年第4期782-800,共19页
Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI ... Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI agents and naturally raises two questions:(1)How to extract discriminative knowledge representation from priors?(2)How to develop a rational plan to decompose complex problems?To address these issues,we introduce a groundbreaking framework that incorporates two main contributions.First,our multiple-encoder and individual-predictor regime goes beyond traditional architectures to extract nuanced task-specific dynamics from datasets,enriching the feature space for subtasks.Second,we innovate in planning by introducing a top-K subtask planning tree generated through an attention mechanism,which allows for dynamic adaptability and forward-looking decision-making.Our framework is empirically validated against challenging benchmarks BabyAI including multiple combinatorially rich synthetic tasks(e.g.,GoToSeq,SynthSeq,BossLevel),where it not only outperforms competitive baselines but also demonstrates superior adaptability and effectiveness incomplex task decomposition. 展开更多
关键词 Reinforcement learning representation learning subtask planning task decomposition pretraining.
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部