期刊文献+
共找到248篇文章
< 1 2 13 >
每页显示 20 50 100
Evaluating chat generative pretrained transformer in answering questions on endoscopic mucosal resection and endoscopic submucosal dissection
1
作者 Shi-Song Wang Hui Gao +3 位作者 Peng-Yao Lin Tian-Chen Qian Ying Du Lei Xu 《World Journal of Gastrointestinal Oncology》 2025年第10期290-303,共14页
BACKGROUND With the rising use of endoscopic submucosal dissection(ESD)and endoscopic mucosal resection(EMR),patients are increasingly questioning various aspects of these endoscopic procedures.At the same time,conver... BACKGROUND With the rising use of endoscopic submucosal dissection(ESD)and endoscopic mucosal resection(EMR),patients are increasingly questioning various aspects of these endoscopic procedures.At the same time,conversational artificial intelligence(AI)tools like chat generative pretrained transformer(ChatGPT)are rapidly emerging as sources of medical information.AIM To evaluate ChatGPT’s reliability and usefulness regarding ESD and EMR for patients and healthcare professionals.METHODS In this study,30 specific questions related to ESD and EMR were identified.Then,these questions were repeatedly entered into ChatGPT,with two independent answers generated for each question.A Likert scale was used to rate the accuracy,completeness,and comprehensibility of the responses.Meanwhile,a binary category(high/Low)was used to evaluate each aspect of the two responses generated by ChatGPT and the response retrieved from Google.RESULTS By analyzing the average scores of the three raters,our findings indicated that the responses generated by ChatGPT received high ratings for accuracy(mean score of 5.14 out of 6),completeness(mean score of 2.34 out of 3),and comprehensibility(mean score of 2.96 out of 3).Kendall’s coefficients of concordance indicated good agreement among raters(all P<0.05).For the responses generated by Google,more than half were classified by experts as having low accuracy and low completeness.CONCLUSION ChatGPT provided accurate and reliable answers in response to questions about ESD and EMR.Future studies should address ChatGPT’s current limitations by incorporating more detailed and up-to-date medical information.This could establish AI chatbots as significant resource for both patients and health care professionals. 展开更多
关键词 Endoscopic submucosal dissection Endoscopic mucosal dissection Artificial intelligence Chat generative pretrained transformer Patient education Google
暂未订购
SRS-Net: Training object detectors from scratch for remote sensing images without pretraining 被引量:2
2
作者 Haining WANG Yang LI +4 位作者 Yuqiang FANG Yurong LIAO Bitao JIANG Xitao ZHANG Shuyan NI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第8期269-283,共15页
Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in ... Most of the current object detection algorithms use pretrained models that are trained on ImageNet and then fine-tuned in the network,which can achieve good performance in terms of general object detectors.However,in the field of remote sensing image object detection,as pretrained models are significantly different from remote sensing data,it is meaningful to explore a train-fromscratch technique for remote sensing images.This paper proposes an object detection framework trained from scratch,SRS-Net,and describes the design of a densely connected backbone network to provide integrated hidden layer supervision for the convolution module.Then,two necessary improvement principles are proposed:studying the role of normalization in the network structure,and improving data augmentation methods for remote sensing images.To evaluate the proposed framework,we performed many ablation experiments on the DIOR,DOTA,and AS datasets.The results show that whether using the improved backbone network,the normalization method or training data enhancement strategy,the performance of the object detection network trained from scratch increased.These principles compensate for the lack of pretrained models.Furthermore,we found that SRS-Net could achieve similar to or slightly better performance than baseline methods,and surpassed most advanced general detectors. 展开更多
关键词 Denseconnection Object detection pretraining Remote sensing image Trainfrom scratch
原文传递
Swin3D++:Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
3
作者 Yu-Qi Yang Yu-Xiao Guo Yang Liu 《Computational Visual Media》 2025年第3期465-481,共17页
Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.However,the 3D vision domain suffers from a lack of 3D data,and simply... Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.However,the 3D vision domain suffers from a lack of 3D data,and simply combining multiple 3D datasets for pretraining a 3D backbone does not yield significant improvement,due to the domain discrepancies among different 3D datasets that impede effective feature learning.In this work,we identify the main sources of the domain discrepancies between 3D indoor scene datasets,and propose Swin3d++,an enhanced architecture based on Swin3d for efficient pretraining on multi-source 3D point clouds.Swin3d++introduces domain-specific mechanisms to SWIN3D's modules to address domain discrepancies and enhance the network capability on multi-source pretraining.Moreover,we devise a simple source-augmentation strategy to increase the pretraining data scale and facilitate supervised pretraining.We validate the effectiveness of our design,and demonstrate that Swin3d++surpasses the state-of-the-art 3D pretraining methods on typical indoor scene understanding tasks. 展开更多
关键词 3D scenes INDOOR pretraining multi-source data data augmentation
原文传递
Multimodal Pretrained Knowledge for Real-world Object Navigation
4
作者 Hui Yuan Yan Huang +4 位作者 Naigong Yu Dongbo Zhang Zetao Du Ziqi Liu Kun Zhang 《Machine Intelligence Research》 2025年第4期713-729,共17页
Most visual-language navigation(VLN)research focuses on simulate environments,but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environme... Most visual-language navigation(VLN)research focuses on simulate environments,but applying these methods to real-world scenarios is challenging because of misalignments between vision and language in complex environments,leading to path deviations.To address this,we propose a novel vision-and-language object navigation strategy that uses multimodal pretrained knowledge as a cross-modal bridge to link semantic concepts in both images and text.This improves navigation supervision at key-points and enhances robustness.Specifically,we 1)randomly generate key-points within a specific density range and optimize them on the basis of challenging locations;2)use pretrained multimodal knowledge to efficiently retrieve target objects;3)combine depth information with simultaneous localization and mapping(SLAM)map data to predict optimal positions and orientations for accurate navigation;and 4)implement the method on a physical robot,successfully conducting navigation tests.Our approach achieves a maximum success rate of 66.7%,outperforming existing VLN methods in real-world environments. 展开更多
关键词 Visual-and-language object navigation key-points multimodal pretrained knowledge optimal positions and orientations physical robot
原文传递
Swin3D: A pretrained transformer backbone for 3D indoor scene understanding
5
作者 Yu-Qi Yang Yu-Xiao Guo +5 位作者 Jian-Yu Xiong Yang Liu Hao Pan Peng-Shuai Wang Xin Tong Baining Guo 《Computational Visual Media》 2025年第1期83-101,共19页
The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,call... The use of pretrained backbones with finetuning has shown success for 2D vision and natural language processing tasks,with advantages over taskspecific networks.In this paper,we introduce a pretrained 3D backbone,called Swin3D,for 3D indoor scene understanding.We designed a 3D Swin Transformer as our backbone network,which enables efficient selfattention on sparse voxels with linear memory complexity,making the backbone scalable to large models and datasets.We also introduce a generalized contextual relative positional embedding scheme to capture various irregularities of point signals for improved network performance.We pretrained a large Swin3D model on a synthetic Structured3D dataset,which is an order of magnitude larger than the ScanNet dataset.Our model pretrained on the synthetic dataset not only generalizes well to downstream segmentation and detection on real 3D point datasets but also outperforms state-of-the-art methods on downstream tasks with+2.3 mIoU and+2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation,respectively,+1.8 mIoU on ScanNet segmentation(val),+1.9 mAP@0.5 on ScanNet detection,and+8.1 mAP@0.5 on S3DIS detection.A series of extensive ablation studies further validated the scalability,generality,and superior performance enabled by our approach. 展开更多
关键词 3D pretraining ponitcloud analysis trans-former backbone Swin Transformer 3D semantic segmentation 3D object detection
原文传递
Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery 被引量:3
6
作者 Zhaoxu Meng Cheng Chen +2 位作者 Xuan Zhang Wei Zhao Xuefeng Cui 《Big Data Mining and Analytics》 EI CSCD 2024年第3期565-576,共12页
The effectiveness of Al-driven drug discovery can be enhanced by pretraining on small molecules.However,the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to th... The effectiveness of Al-driven drug discovery can be enhanced by pretraining on small molecules.However,the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules.To overcome these challenges,we propose FragAdd,a strategy that involves adding a chemically implausible molecular fragment to the input molecule.This approach allows for the incorporation of rich local information and the generation of a high-quality graph representation,which is advantageous for tasks like virtual screening.Consequently,we have developed a virtual screening protocol that focuses on identifying estrogen receptor alpha binders on a nucleus receptor.Our results demonstrate a significant improvement in the binding capacity of the retrieved molecules.Additionally,we demonstrate that the FragAdd strategy can be combined with other self-supervised methods to further expedite the drug discovery process. 展开更多
关键词 pretraining information retrieval drug discovery virtual screening molecule property prediction
原文传递
Learning Top-K Subtask Planning Tree Based on Discriminative Representation Pretraining for Decision-making
7
作者 Jingqing Ruan Kaishen Wang +2 位作者 Qingyang Zhang Dengpeng Xing Bo Xu 《Machine Intelligence Research》 EI CSCD 2024年第4期782-800,共19页
Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI ... Decomposing complex real-world tasks into simpler subtasks and devising a subtask execution plan is critical for humans to achieve effective decision-making.However,replicating this process remains challenging for AI agents and naturally raises two questions:(1)How to extract discriminative knowledge representation from priors?(2)How to develop a rational plan to decompose complex problems?To address these issues,we introduce a groundbreaking framework that incorporates two main contributions.First,our multiple-encoder and individual-predictor regime goes beyond traditional architectures to extract nuanced task-specific dynamics from datasets,enriching the feature space for subtasks.Second,we innovate in planning by introducing a top-K subtask planning tree generated through an attention mechanism,which allows for dynamic adaptability and forward-looking decision-making.Our framework is empirically validated against challenging benchmarks BabyAI including multiple combinatorially rich synthetic tasks(e.g.,GoToSeq,SynthSeq,BossLevel),where it not only outperforms competitive baselines but also demonstrates superior adaptability and effectiveness incomplex task decomposition. 展开更多
关键词 Reinforcement learning representation learning subtask planning task decomposition pretraining.
原文传递
Generative pretrained transformer 4:an innovative approach to facilitate value-based healthcare
8
作者 Han Lyu Zhixiang Wang +6 位作者 Jia Li Jing Sun Xinghao Wang Pengling Ren Linkun Cai Zhenchang Wang Max Wintermark 《Intelligent Medicine》 EI CSCD 2024年第1期10-15,共6页
Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing approp... Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。 展开更多
关键词 Generative pretrained transformer 4 model Natural language processing Medical imaging APPROPRIATENESS
原文传递
Pretraining Enhanced RNN Transducer
9
作者 Junyu Lu Rongzhong Lian +4 位作者 Di Jiang Yuanfeng Song Zhiyang Su Victor Junqiu Wei Lin Yang 《CAAI Artificial Intelligence Research》 2024年第1期74-81,共8页
Recurrent neural network transducer(RNN-T)is an important branch of current end-to-end automatic speech recognition(ASR).Various promising approaches have been designed for boosting RNN-T architecture;however,few stud... Recurrent neural network transducer(RNN-T)is an important branch of current end-to-end automatic speech recognition(ASR).Various promising approaches have been designed for boosting RNN-T architecture;however,few studies exploit the effectiveness of pretrained methods in this framework.In this paper,we introduce the pretrained acoustic extractor(PAE)and the pretrained linguistic network(PLN)to enhance the Conformer long short-term memory(Conformer-LSTM)transducer.First,we construct the input of the acoustic encoder with two different latent representations:one extracted by PAE from the raw waveform,and the other obtained from filter-bank transformation.Second,we fuse an extra semantic feature from the PLN into the joint network to reduce illogical and homophonic errors.Compared with previous works,our approaches are able to obtain pretrained representations for better model generalization.Evaluation on two large-scale datasets has demonstrated that our proposed approaches yield better performance than existing approaches. 展开更多
关键词 pretraining automatic speech recognition self-supervised learning
原文传递
一种针对混合频谱噪声的主动减振技术 被引量:1
10
作者 钟志 牛国标 +1 位作者 刘磊 单明广 《实验技术与管理》 北大核心 2025年第6期46-54,共9页
在船舶、海洋工程装备等领域,振动噪声工况呈现出复杂的宽-窄带复合噪声的特点。以往主动控制技术只针对单一类型的噪声进行消减,导致整体减振效果不佳。为解决上述问题,设计了一种能够消减宽-窄带复合噪声的混合频谱主动减振(MSN-HVNC... 在船舶、海洋工程装备等领域,振动噪声工况呈现出复杂的宽-窄带复合噪声的特点。以往主动控制技术只针对单一类型的噪声进行消减,导致整体减振效果不佳。为解决上述问题,设计了一种能够消减宽-窄带复合噪声的混合频谱主动减振(MSN-HVNC)算法,并在X型小浮筏配机实验平台进行实验验证。MSN-HVNC算法由窄带噪声控制子系统(NBCS)和宽带噪声控制子系统(WBCS)两个子系统组成,两者协同完成对混合频谱噪声的消减。其中,WBCS采用含有预训练的选择系数模型的滤波x最小均方(FxLMS)算法,来完成宽带噪声消减;NBCS采用自适应陷波技术,对能量集中的窄带线谱噪声进行消减。用减振后的残余振动噪声来衡量减振水平,并作为误差信号更新控制器权重。最后,用X型小浮筏配机结构来搭建实验平台,完成振动噪声的主动控制实验。结果表明,MSN-HVNC算法对单频窄带振动噪声在50、75 Hz工况下的平均减振效果分别为23.6、21.3 dB;MSN-HVNC算法对模拟多源耦合振动场景下,混合激励振动信号的平均减振效果为12.4 dB,均优于传统控制算法,对宽-窄带复合的混合频谱噪声具有良好的消减效果。 展开更多
关键词 主动控制 混合频谱噪声 预训练模型 协同控制
在线阅读 下载PDF
基于Bert+GCN多模态数据融合的药物分子属性预测 被引量:1
11
作者 闫效莺 靳艳春 +1 位作者 冯月华 张绍武 《生物化学与生物物理进展》 北大核心 2025年第3期783-794,共12页
目的药物研发成本高、周期长且成功率低。准确预测分子属性对有效筛选药物候选物、优化分子结构具有重要意义。基于特征工程的传统分子属性预测方法需研究人员具备深厚的学科背景和广泛的专业知识。随着人工智能技术的不断成熟,涌现出... 目的药物研发成本高、周期长且成功率低。准确预测分子属性对有效筛选药物候选物、优化分子结构具有重要意义。基于特征工程的传统分子属性预测方法需研究人员具备深厚的学科背景和广泛的专业知识。随着人工智能技术的不断成熟,涌现出大量优于传统特征工程方法的分子属性预测算法。然而这些算法模型仍然存在标记数据稀缺、泛化性能差等问题。鉴于此,本文提出一种基于Bert+GCN的多模态数据融合的分子属性预测算法(命名为BGMF),旨在整合药物分子的多模态数据,并充分利用大量无标记药物分子训练模型学习药物分子的有用信息。方法本文提出了BGMF算法,该算法根据药物SMILES表达式分别提取了原子序列、分子指纹序列和分子图数据,采用预训练模型Bert和图卷积神经网络GCN结合的方式进行特征学习,在挖掘药物分子中“单词”全局特征的同时,融合了分子图的局部拓扑特征,从而更充分利用分子全局-局部上下文语义关系,之后,通过对原子序列和分子指纹序列的双解码器设计加强分子特征表达。结果5个数据集共43个分子属性预测任务上,BGMF方法的AUC值均优于现有其他方法。此外,本文还构建独立测试数据集验证了模型具有良好的泛化性能。对生成的分子指纹表征(molecular fingerprint representation)进行t-SNE可视化分析,证明了BGMF模型可成功捕获不同分子指纹的内在结构与特征。结论通过图卷积神经网络与Bert模型相结合,BGMF将分子图数据整合到分子指纹恢复和掩蔽原子恢复的任务中,可以有效地捕捉分子指纹的内在结构和特征,进而高效预测药物分子属性。 展开更多
关键词 Bert预训练 注意力机制 分子指纹 分子属性预测 图卷积神经网络
原文传递
低资源条件下的藏语语音情感识别 被引量:1
12
作者 张维昭 李皓渊 杨鸿武 《信号处理》 北大核心 2025年第9期1558-1569,共12页
近年来,虽然面向主流语言的语音情感识别研究已经取得了较大进展,但是面向低资源语言的语音情感识别研究在数据集构建、特征提取与识别模型设计等方面面临诸多困难。针对低资源条件下的藏语语音情感识别问题,首先通过视频剪辑、音频提... 近年来,虽然面向主流语言的语音情感识别研究已经取得了较大进展,但是面向低资源语言的语音情感识别研究在数据集构建、特征提取与识别模型设计等方面面临诸多困难。针对低资源条件下的藏语语音情感识别问题,首先通过视频剪辑、音频提取与增强、人工标注与校对等步骤,初步构建了藏语情感语音数据集(Tibetan Emotion Speech Dataset-2500,TESD-2500)。该数据集涵盖四种情感类型(生气、悲伤、高兴和中性),共包含2500个语音样本,情感类别与样本数量仍在持续扩充中。然后,设计了一种融合交叉注意力与协同注意力机制的多特征融合语音情感识别模型,采用双向长短期记忆网络(Bidirectional Long Short-Term Memory Network,BiLSTM)对梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)进行时序建模,以提取语音信号中的动态时序表征;利用AlexNet提取语谱图的时频特征,以捕获语音信号的时频联合分布模式,并通过交叉注意力机制计算上述两类异构特征间的相关性权重;引入大规模预训练模型WavLM提取语音信号的深层特征,并以前述交叉注意力计算的结果作为权重向量,通过协同注意力机制对深层特征进行加权重构;将MFCC时序特征、语谱图时频特征和加权的预训练模型深层特征拼接成多层次特征融合表示,通过全连接层映射至情感类别空间,完成藏语语音情感分类任务。最终实验结果表明,所提出的模型在TESD-2500数据集上取得了76.56%的加权准确率和75.42%的未加权准确率,显著优于基线模型。本文还在IEMOCAP和EmoDB数据集上进行了模型泛化能力测试,在IEMOCAP上达到了74.27%的加权准确率和73.60%的未加权准确率,在EmoDB上达到了92.61%的加权准确率和91.68%的未加权准确率。本文的研究方法与结果亦可为其他低资源语言的语音情感识别研究提供参考。 展开更多
关键词 语音情感识别 低资源 多特征融合 预训练模型 藏语
在线阅读 下载PDF
医学GPT研发应用之吾见 被引量:2
13
作者 白春学 《元宇宙医学》 2025年第1期6-14,共9页
在应对医学GPT技术开发过程中的重重挑战与固有限制时,笔者建议采取全面且深入的革新策略,这要求从数据获取到系统运维的每一个环节都进行精细化重塑。数据的收集不再仅仅是量的积累,而是质的飞跃,意味着要从海量的医疗信息中精挑细选,... 在应对医学GPT技术开发过程中的重重挑战与固有限制时,笔者建议采取全面且深入的革新策略,这要求从数据获取到系统运维的每一个环节都进行精细化重塑。数据的收集不再仅仅是量的积累,而是质的飞跃,意味着要从海量的医疗信息中精挑细选,确保每一份数据都具备高度代表性、准确性和可用性。这一过程不仅需要先进的技术手段支持,更需医学专家的深度参与,以实现数据的精准筛选与价值挖掘。在基座模型的选择上,应摒弃传统的简单问答框架,转而探索构建专家数字人分身的可能性。这样的转变能够让患者仿佛直接面对经验丰富的医生,获得更加个性化、专业化的医疗咨询服务,极大地提升了交互体验与信任度。同时,为了确保医疗信息的安全与准确性,建议应用基于AI的智能质量控制机制来替代过往的盲目依赖,通过算法自动审核与人工复核相结合的方式,严把质量关。此外,模型的训练评估与优化也应更加注重实践经验的融合。在循证医学的基础上,倡导将大医的临床智慧与经验融入模型之中,使医学GPT技术不仅能够依据最新的科研成果,还能结合临床实际,为患者提供更为精准、个体化的诊疗建议。简言之,要实现从数据清洗到精选、从简单咨询到专家分身、从盲目信任到质控把关、从单纯循证到结合大医经验的四大转变。 展开更多
关键词 人工智能 生成型预训练变换模型 医学GPT 自然语言处理 公开证据
暂未订购
多模态持续学习方法研究进展
14
作者 张伟 钱龙玥 +1 位作者 张林 李腾 《数据采集与处理》 北大核心 2025年第5期1122-1138,共17页
多模态持续学习(Multimodal continual learning,MMCL)作为机器学习和人工智能领域的一个重要研究方向,旨在通过融合多种模态数据(如图像、文本或语音等)来实现持续的知识积累与任务适应。相较于传统单模态学习方法,MMCL不仅能够并行处... 多模态持续学习(Multimodal continual learning,MMCL)作为机器学习和人工智能领域的一个重要研究方向,旨在通过融合多种模态数据(如图像、文本或语音等)来实现持续的知识积累与任务适应。相较于传统单模态学习方法,MMCL不仅能够并行处理多源异构数据,还能在有效保持已有知识的同时适应新任务需求,展现出在智能系统中的巨大应用潜力。本文系统性地对多模态持续学习进行综述。首先,从基本概念、评估体系和经典单模态持续学习方法3个维度阐述了MMCL的基础理论框架。其次,深入剖析了MMCL在实际应用中的优势与挑战:尽管其在多模态信息融合方面具有显著优势,但仍面临模态不平衡、异构性融合等关键挑战,这些挑战既制约了当前方法的性能表现,也为未来研究指明了方向。基于此,本文随后从基于回放、正则化、参数隔离和大模型4个主要方面,全面梳理了MMCL方法的研究现状与最新进展。最后,对MMCL的未来发展趋势进行了前瞻性展望。 展开更多
关键词 多模态持续学习 模态对齐 灾难性遗忘 预训练模型 任务适应性
在线阅读 下载PDF
Multimodal Pretraining from Monolingual to Multilingual 被引量:1
15
作者 Liang Zhang Ludan Ruan +1 位作者 Anwen Hu Qin Jin 《Machine Intelligence Research》 EI CSCD 2023年第2期220-232,共13页
Multimodal pretraining has made convincing achievements in various downstream tasks in recent years.However,since the majority of the existing works construct models based on English,their applications are limited by ... Multimodal pretraining has made convincing achievements in various downstream tasks in recent years.However,since the majority of the existing works construct models based on English,their applications are limited by language.In this work,we address this issue by developing models with multimodal and multilingual capabilities.We explore two types of methods to extend multimodal pretraining model from monolingual to multilingual.Specifically,we propose a pretraining-based model named multilingual multimodal pretraining(MLMM),and two generalization-based models named multilingual CLIP(M-CLIP)and multilingual acquisition(MLA).In addition,we further extend the generalization-based models to incorporate the audio modality and develop the multilingual CLIP for vision,language,and audio(CLIP4VLA).Our models achieve state-of-the-art performances on multilingual vision-text retrieval,visual question answering,and image captioning benchmarks.Based on the experimental results,we discuss the pros and cons of the two types of models and their potential practical applications. 展开更多
关键词 Multilingual pretraining multimodal pretraining cross-lingual transfer multilingual generation cross-modal retrieval
原文传递
基于对抗训练和全局指针网络的医疗文本 实体关系联合抽取模型
16
作者 段宇锋 柏萍 《情报科学》 北大核心 2025年第3期47-57,共11页
【目的/意义】在比较分析现有关系抽取方法的基础上,构建适用于医疗文本的关系抽取模型。【方法/过程】构建AGP模型实现关系抽取。该模型将医疗文本的嵌入表示输入Transformer编码器进一步提取文本特征,利用全局指针网络解码。为了提高... 【目的/意义】在比较分析现有关系抽取方法的基础上,构建适用于医疗文本的关系抽取模型。【方法/过程】构建AGP模型实现关系抽取。该模型将医疗文本的嵌入表示输入Transformer编码器进一步提取文本特征,利用全局指针网络解码。为了提高鲁棒性,模型引入了对抗训练。【结果/结论】AGP模型在CMeIE-V1、CMeIE-V2和DiaKG数据集上F1值分别达到0.6190、0.5321和0.5684。实验结果证明AGP模型在医疗文本关系抽取任务上的性能优于基准模型。【创新/局限】本文提出的模型未整合大语言模型。 展开更多
关键词 对抗训练 全局指针网络 关系抽取 预训练模型 医疗文本
原文传递
基于XLNet—BiLSTM—AFF—CRF的谷物收割机械维修知识命名实体识别
17
作者 李先旺 刘赛虎 +1 位作者 黄忠祥 章霞东 《中国农机化学报》 北大核心 2025年第2期319-325,352,共8页
针对谷物收割机械维修实体识别过程中存在上下文语义特征缺失、长距离依赖信息不充足、实体复杂度较高等问题,提出一种引入注意力机制特征融合的谷物收割机械维修知识命名实体识别模型XLNet—BiLSTM—AFF—CRF。该模型采用基于Transfor... 针对谷物收割机械维修实体识别过程中存在上下文语义特征缺失、长距离依赖信息不充足、实体复杂度较高等问题,提出一种引入注意力机制特征融合的谷物收割机械维修知识命名实体识别模型XLNet—BiLSTM—AFF—CRF。该模型采用基于Transformer—XL的广义自回归XLNet预训练模型作为嵌入层提取字向量;然后使用双向长短时记忆网络(BiLSTM)获取上下文语义特征;利用注意力特征融合AFF将XLNet层输出与BiLSTM层输出进行组合,增强序列的语义信息;最后输入条件随机场CRF模型学习标注约束规则生成全局最优序列。在创建的维修语料库上展开试验,结果表明:所提模型的精确率、召回率和F1值分别为98.4%、97.6%和97.9%,均高于对比模型,验证所提模型的有效性。 展开更多
关键词 谷物收割机械 维修 命名实体识别 注意力机制 广义自回归预训练语言模型(XLNet)
在线阅读 下载PDF
新质生产力时代康养
18
作者 白春学 张黎川 +1 位作者 朱文思 蔡沁怡 《元宇宙医学》 2025年第1期21-27,共7页
应用物联网、元宇宙和医学GPT赋能康养领域,提升健康照护的质量与效率。物联网技术通过部署传感器和监控设备,实时监控个体健康状况并收集数据,为制定个性化健康计划提供科学依据。元宇宙则构建一个三维立体的虚拟空间,使用户能以虚拟... 应用物联网、元宇宙和医学GPT赋能康养领域,提升健康照护的质量与效率。物联网技术通过部署传感器和监控设备,实时监控个体健康状况并收集数据,为制定个性化健康计划提供科学依据。元宇宙则构建一个三维立体的虚拟空间,使用户能以虚拟身份参与多样化的健康促进活动,享受沉浸式的健康教育体验,如虚拟现实中的运动训练和康复课程。医学GPT利用自然语言处理技术,结合用户的基因、生理指标等信息,提供个性化的健康咨询和预防策略,生成定制化的健康管理计划,并评估健康风险,提供专业咨询与教育。这些技术的结合不仅提升了健康管理的个性化和智能化水平,还扩大了健康教育资源的可及性,使偏远地区居民也能享受优质资源。此外,它们降低了对传统医疗资源的依赖,减少了成本,提高了服务效率,并通过提供互动性强、趣味性高的虚拟现实体验,增强了用户对健康促进活动的参与度和满意度。这些技术有利于推动医疗健康服务的创新,促进健康资源的公平分配,提升全民健康水平。 展开更多
关键词 虚拟现实 增强现实 物联网 生成型预训练变换模型
暂未订购
基于《中国英语能力等级量表(2024版)》的口译测试自动评分研究
19
作者 王巍巍 张昱琪 王轲 《现代外语》 北大核心 2025年第4期536-546,共11页
本研究从解释性、评估性和概化性三个维度,验证了基于《中国英语能力等级量表(2024版)》开发的口译质量自动评分模型在不同口译测试场景中的信度表现,旨在探讨该模型的评分质量。结果显示:在同质人群和同声传译任务中,该模型与人工评分... 本研究从解释性、评估性和概化性三个维度,验证了基于《中国英语能力等级量表(2024版)》开发的口译质量自动评分模型在不同口译测试场景中的信度表现,旨在探讨该模型的评分质量。结果显示:在同质人群和同声传译任务中,该模型与人工评分具有较高的一致性和相关性,但在异质人群和交替传译任务中,该模型的评分质量仍需进一步提升;自动评分的精确度和可靠性还受语音识别准确性的影响。自动评分技术在语言测试领域拥有广阔的应用前景,但其算法和特征提取模型仍需进一步优化,以提升评分的稳定性。未来研究应依托自动评分技术,推动口译教学中形成性评价的发展,并构建一个形成性评价与终结性评价相结合的综合性测评体系。 展开更多
关键词 《中国英语能力等级量表》 口译质量自动评分模型 口译评分质量 自动评分
原文传递
融合对抗训练和改进BERT的文本分类方法研究
20
作者 吉训生 蔡志万 胡凯 《计算机与数字工程》 2025年第8期2200-2204,2245,共6页
针对中文文本中广泛存在的“一词多义”现象,以及文本不规范导致的分类模型鲁棒性问题,提出一种基于对抗训练和中文预训练模型相结合的AT-NEZHA(Adversarial Training NEZHA)分类模型。一方面通过引入BERT模型的中文改进版NEZHA模型的wo... 针对中文文本中广泛存在的“一词多义”现象,以及文本不规范导致的分类模型鲁棒性问题,提出一种基于对抗训练和中文预训练模型相结合的AT-NEZHA(Adversarial Training NEZHA)分类模型。一方面通过引入BERT模型的中文改进版NEZHA模型的word embedding融合上下文信息解决中文文本中“一词多义”问题,另一方面利用对抗训练算法,对词嵌入层参数矩阵进行梯度扰动来增加训练过程中的损失值,使得模型找到更合适的参数,从而提高模型的鲁棒性。实验结果表明,AT-NEZHA能有效提高文本分类的准确度。 展开更多
关键词 文本分类 安全隐患 对抗训练 预训练模型 鲁棒性
在线阅读 下载PDF
上一页 1 2 13 下一页 到第
使用帮助 返回顶部