期刊文献+
共找到404篇文章
< 1 2 21 >
每页显示 20 50 100
Deep Learning-Based Natural Language Processing Model and Optical Character Recognition for Detection of Online Grooming on Social Networking Services
1
作者 Sangmin Kim Byeongcheon Lee +2 位作者 Muazzam Maqsood Jihoon Moon Seungmin Rho 《Computer Modeling in Engineering & Sciences》 2025年第5期2079-2108,共30页
The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children a... The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children and adolescents who are increasingly exposed to online grooming crimes.Early and accurate identification of grooming conversations is crucial in preventing long-term harm to victims.However,research on grooming detection in South Korea remains limited,as existing models trained primarily on English text and fail to reflect the unique linguistic features of SNS conversations,leading to inaccurate classifications.To address these issues,this study proposes a novel framework that integrates optical character recognition(OCR)technology with KcELECTRA,a deep learning-based natural language processing(NLP)model that shows excellent performance in processing the colloquial Korean language.In the proposed framework,the KcELECTRA model is fine-tuned by an extensive dataset,including Korean social media conversations,Korean ethical verification data from AI-Hub,and Korean hate speech data from Hug-gingFace,to enable more accurate classification of text extracted from social media conversation images.Experimental results show that the proposed framework achieves an accuracy of 0.953,outperforming existing transformer-based models.Furthermore,OCR technology shows high accuracy in extracting text from images,demonstrating that the proposed framework is effective for online grooming detection.The proposed framework is expected to contribute to the more accurate detection of grooming text and the prevention of grooming-related crimes. 展开更多
关键词 Online grooming KcELECTRA natural language processing optical character recognition social networking service text classification
在线阅读 下载PDF
Research on the Automatic Pattem Abstraction and Recognition Methodology for Large-scale Database System based on Natural Language Processing 被引量:1
2
作者 RongWang Cuizhen Jiao Wenhua Dai 《International Journal of Technology Management》 2015年第9期125-127,共3页
In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connec... In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connection between nodes, data across different nodes and even regional distribution are well recognized. In order to reduce data redundancy and model design of the database will usually contain a lot of forms we combine the NLP theory to optimize the traditional method. The experimental analysis and simulation proves the correctness of our method. 展开更多
关键词 pattern Abstraction and recognition Database System natural language processing.
在线阅读 下载PDF
Automated Handwriting Recognition and Speech Synthesizer for Indigenous Language Processing
3
作者 Bassam A.Y.Alqaralleh Fahad Aldhaban +1 位作者 Feras Mohammed A-Matarneh Esam A.AlQaralleh 《Computers, Materials & Continua》 SCIE EI 2022年第8期3913-3927,共15页
In recent years,researchers in handwriting recognition analysis relating to indigenous languages have gained significant internet among research communities.The recent developments of artificial intelligence(AI),natur... In recent years,researchers in handwriting recognition analysis relating to indigenous languages have gained significant internet among research communities.The recent developments of artificial intelligence(AI),natural language processing(NLP),and computational linguistics(CL)find useful in the analysis of regional low resource languages.Automatic lexical task participation might be elaborated to various applications in the NLP.It is apparent from the availability of effective machine recognition models and open access handwritten databases.Arabic language is a commonly spoken Semitic language,and it is written with the cursive Arabic alphabet from right to left.Arabic handwritten Character Recognition(HCR)is a crucial process in optical character recognition.In this view,this paper presents effective Computational linguistics with Deep Learning based Handwriting Recognition and Speech Synthesizer(CLDL-THRSS)for Indigenous Language.The presented CLDL-THRSS model involves two stages of operations namely automated handwriting recognition and speech recognition.Firstly,the automated handwriting recognition procedure involves preprocessing,segmentation,feature extraction,and classification.Also,the Capsule Network(CapsNet)based feature extractor is employed for the recognition of handwritten Arabic characters.For optimal hyperparameter tuning,the cuckoo search(CS)optimization technique was included to tune the parameters of the CapsNet method.Besides,deep neural network with hidden Markov model(DNN-HMM)model is employed for the automatic speech synthesizer.To validate the effective performance of the proposed CLDL-THRSS model,a detailed experimental validation process takes place and investigates the outcomes interms of different measures.The experimental outcomes denoted that the CLDL-THRSS technique has demonstrated the compared methods. 展开更多
关键词 Computational linguistics handwriting character recognition natural language processing indigenous language
在线阅读 下载PDF
Continuous Arabic Sign Language Recognition in User Dependent Mode
4
作者 K. Assaleh T. Shanableh +2 位作者 M. Fanaswala F. Amin H. Bajaj 《Journal of Intelligent Learning Systems and Applications》 2010年第1期19-27,共9页
Arabic Sign Language recognition is an emerging field of research. Previous attempts at automatic vision-based recog-nition of Arabic Sign Language mainly focused on finger spelling and recognizing isolated gestures. ... Arabic Sign Language recognition is an emerging field of research. Previous attempts at automatic vision-based recog-nition of Arabic Sign Language mainly focused on finger spelling and recognizing isolated gestures. In this paper we report the first continuous Arabic Sign Language by building on existing research in feature extraction and pattern recognition. The development of the presented work required collecting a continuous Arabic Sign Language database which we designed and recorded in cooperation with a sign language expert. We intend to make the collected database available for the research community. Our system which we based on spatio-temporal feature extraction and hidden Markov models has resulted in an average word recognition rate of 94%, keeping in the mind the use of a high perplex-ity vocabulary and unrestrictive grammar. We compare our proposed work against existing sign language techniques based on accumulated image difference and motion estimation. The experimental results section shows that the pro-posed work outperforms existing solutions in terms of recognition accuracy. 展开更多
关键词 pattern recognition Motion Analysis Image/ VIDEO processing and SIGN language
在线阅读 下载PDF
A Convolutional Neural Network Based Optical Character Recognition for Purely Handwritten Characters and Digits
5
作者 Syed Atir Raza Muhammad Shoaib Farooq +3 位作者 Uzma Farooq Hanen Karamti Tahir Khurshaid Imran Ashraf 《Computers, Materials & Continua》 2025年第8期3149-3173,共25页
Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have bee... Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have been proposed,most of them focus on recognizing printed Urdu characters and digits.To the best of our knowledge,very little research has focused solely on Urdu pure handwriting recognition,and the results of such proposed methods are often inadequate.In this study,we introduce a novel approach to recognizing Urdu pure handwritten digits and characters using Convolutional Neural Networks(CNN).Our proposed method utilizes convolutional layers to extract important features from input images and classifies them using fully connected layers,enabling efficient and accurate detection of Urdu handwritten digits and characters.We implemented the proposed technique on a large publicly available dataset of Urdu handwritten digits and characters.The findings demonstrate that the CNN model achieves an accuracy of 98.30%and an F1 score of 88.6%,indicating its effectiveness in detecting and classifyingUrdu handwritten digits and characters.These results have far-reaching implications for various applications,including document analysis,text recognition,and language understanding,which have previously been unexplored in the context of Urdu handwriting data.This work lays a solid foundation for future research and development in Urdu language detection and processing,opening up new opportunities for advancement in this field. 展开更多
关键词 Image processing natural language processing handwritten Urdu characters optical character recognition deep learning feature extraction CLASSIFICATION
在线阅读 下载PDF
基于EALMDA的医疗命名实体识别数据增强方法
6
作者 道路 刘纳 +2 位作者 郑国风 李晨 杨杰 《郑州大学学报(理学版)》 北大核心 2026年第1期43-50,共8页
医疗命名实体识别是从非结构化医疗文本中识别命名实体,在许多下游任务中起重要作用。医疗命名实体的复杂性需要专家利用领域知识进行标注,导致医疗领域存在严重的标注数据稀缺问题。为解决该问题,提出了一种基于实体感知掩码局部融合... 医疗命名实体识别是从非结构化医疗文本中识别命名实体,在许多下游任务中起重要作用。医疗命名实体的复杂性需要专家利用领域知识进行标注,导致医疗领域存在严重的标注数据稀缺问题。为解决该问题,提出了一种基于实体感知掩码局部融合命名实体识别数据增强(entity aware mask local mixup data augmentation,EALMDA)方法。首先,使用实体感知掩码通道提取关键元素并掩码非实体部分,以保留核心语义。其次,通过上下文实体相似度和k近邻两种采样策略的线性组合对掩码句子进行融合,保留核心语义的同时增加样本的多样性。最后,经序列线性化操作后,将句子输入生成的模型中得到增强样本。在NCBI-disease等五个主流医疗命名实体识别数据集上,模拟低资源场景与主流的数据增强基线方法进行对比实验,所提方法的性能相比基线方法有显著提升。 展开更多
关键词 数据增强 命名实体识别 自然语言处理 生成模型 Mixup
在线阅读 下载PDF
开源情报多模态智能处理系统设计与工程实现
7
作者 董泽云 甘莅豪 +1 位作者 薛楠 陆泰廷 《大数据》 2026年第1期71-83,共13页
针对开源情报系统存在的模态割裂、结构化能力不足及用户交互性差等问题,提出一种融合计算机视觉、自然语言处理与文本转语音技术的智能信息处理系统。基于多源异构数据设计了涵盖数据采集、预处理、深度建模、智能决策与用户交互反馈... 针对开源情报系统存在的模态割裂、结构化能力不足及用户交互性差等问题,提出一种融合计算机视觉、自然语言处理与文本转语音技术的智能信息处理系统。基于多源异构数据设计了涵盖数据采集、预处理、深度建模、智能决策与用户交互反馈的完整闭环流程,重点突破跨模态数据融合、情报内容结构化处理、语音播报与多媒体可视化呈现等关键技术。实验结果表明,系统在情报抽取准确率、响应时间及用户可解释反馈等关键指标上表现优异,具备模块化与可扩展性,适配政务安全、金融风控与公共舆情等场景。 展开更多
关键词 开源情报 计算机视觉 自然语言处理 文本转语音 语音识别 多模态融合 大语言模型 人工智能
在线阅读 下载PDF
基于自然语言处理技术的俄语语料库构建与深度应用研究
8
作者 张芷若 《中国科技术语》 2026年第1期137-139,共3页
文章对俄语语料库建设与计算机语言处理技术的融合路径进行研究,依托类型划分与标注机制,梳理形态、句法、语义处理的适配策略,在集成多源语料与深度模型的基础上构建翻译、问答、舆情等应用系统框架。结果表明,语料标准构建与处理机制... 文章对俄语语料库建设与计算机语言处理技术的融合路径进行研究,依托类型划分与标注机制,梳理形态、句法、语义处理的适配策略,在集成多源语料与深度模型的基础上构建翻译、问答、舆情等应用系统框架。结果表明,语料标准构建与处理机制协同可增强系统鲁棒性与语义解析能力。 展开更多
关键词 俄语语料库 自然语言处理 句法分析 深度学习 语义消歧
在线阅读 下载PDF
Automatic Text Summarization Using Genetic Algorithm and Repetitive Patterns 被引量:2
9
作者 Ebrahim Heidary Hamïd Parvïn +4 位作者 Samad Nejatian Karamollah Bagherifard Vahideh Rezaie Zulkefli Mansor Kim-Hung Pho 《Computers, Materials & Continua》 SCIE EI 2021年第4期1085-1101,共17页
Taking into account the increasing volume of text documents,automatic summarization is one of the important tools for quick and optimal utilization of such sources.Automatic summarization is a text compression process... Taking into account the increasing volume of text documents,automatic summarization is one of the important tools for quick and optimal utilization of such sources.Automatic summarization is a text compression process for producing a shorter document in order to quickly access the important goals and main features of the input document.In this study,a novel method is introduced for selective text summarization using the genetic algorithm and generation of repetitive patterns.One of the important features of the proposed summarization is to identify and extract the relationship between the main features of the input text and the creation of repetitive patterns in order to produce and optimize the vector of the main document features in the production of the summary document compared to other previous methods.In this study,attempts were made to encompass all the main parameters of the summary text including unambiguous summary with the highest precision,continuity and consistency.To investigate the efficiency of the proposed algorithm,the results of the study were evaluated with respect to the precision and recall criteria.The results of the study evaluation showed the optimization the dimensions of the features and generation of a sequence of summary document sentences having the most consistency with the main goals and features of the input document. 展开更多
关键词 natural language processing extractive summarization features optimization repetitive patterns genetic algorithm
在线阅读 下载PDF
SUBDIVIDING VERBS TO IMPROVE SYNTACTIC PARSING 被引量:2
10
作者 Liu Ting Ma Jinshan Zhang Huipeng Li Sheng 《Journal of Electronics(China)》 2007年第3期347-352,共6页
This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing mod... This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance. 展开更多
关键词 Verb subdivision Maximum entropy model syntactic parsing natural language processing
在线阅读 下载PDF
Number Entities Recognition in Multiple Rounds of Dialogue Systems 被引量:1
11
作者 Shan Zhang Bin Cao +1 位作者 Yueshen Xu Jing Fan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第4期309-323,共15页
As a representative technique in natural language processing(NLP),named entity recognition is used in many tasks,such as dialogue systems,machine translation and information extraction.In dialogue systems,there is a c... As a representative technique in natural language processing(NLP),named entity recognition is used in many tasks,such as dialogue systems,machine translation and information extraction.In dialogue systems,there is a common case for named entity recognition,where a lot of entities are composed of numbers,and are segmented to be located in different places.For example,in multiple rounds of dialogue systems,a phone number is likely to be divided into several parts,because the phone number is usually long and is emphasized.In this paper,the entity consisting of numbers is named as number entity.The discontinuous positions of number entities result from many reasons.We find two reasons from real-world dialogue systems.The first reason is the repetitive confirmation of different components of a number entity,and the second reason is the interception of mood words.The extraction of number entities is quite useful in many tasks,such as user information completion and service requests correction.However,the existing entity extraction methods cannot extract entities consisting of discontinuous entity blocks.To address these problems,in this paper,we propose a comprehensive method for number entity recognition,which is capable of extracting number entities in multiple rounds of dialogues systems.We conduct extensive experiments on a real-world dataset,and the experimental results demonstrate the high performance of our method. 展开更多
关键词 natural language processing dialogue systems named entity recognition number entity discontinuous entity blocks
在线阅读 下载PDF
Person-specific named entity recognition using SVM with rich feature sets 被引量:2
12
作者 Hui NIE 《Chinese Journal of Library and Information Science》 2012年第3期27-46,共20页
Purpose: The purpose of the study is to explore the potential use of nature language process(NLP) and machine learning(ML) techniques and intents to find a feasible strategy and effective approach to fulfill the NER t... Purpose: The purpose of the study is to explore the potential use of nature language process(NLP) and machine learning(ML) techniques and intents to find a feasible strategy and effective approach to fulfill the NER task for Web oriented person-specific information extraction.Design/methodology/approach: An SVM-based multi-classification approach combined with a set of rich NLP features derived from state-of-the-art NLP techniques has been proposed to fulfill the NER task. A group of experiments has been designed to investigate the influence of various NLP-based features to the performance of the system,especially the semantic features. Optimal parameter settings regarding with SVM models,including kernel functions,margin parameter of SVM model and the context window size,have been explored through experiments as well.Findings: The SVM-based multi-classification approach has been proved to be effective for the NER task. This work shows that NLP-based features are of great importance in datadriven NE recognition,particularly the semantic features. The study indicates that higher order kernel function may not be desirable for the specific classification problem in practical application. The simple linear-kernel SVM model performed better in this case. Moreover,the modified SVM models with uneven margin parameter are more common and flexible,which have been proved to solve the imbalanced data problem better.Research limitations/implications: The SVM-based approach for NER problem is only proved to be effective on limited experiment data. Further research need to be conducted on the large batch of real Web data. In addition,the performance of the NER system need be tested when incorporated into a complete IE framework.Originality/value: The specially designed experiments make it feasible to fully explore the characters of the data and obtain the optimal parameter settings for the NER task,leading to a preferable rate in recall,precision and F1measures. The overall system performance(F1value) for all types of name entities can achieve above 88.6%,which can meet the requirements for the practical application. 展开更多
关键词 Named entity recognition natural language processing SVM-based classifier Feature selection
原文传递
Generating Factual Text via Entailment Recognition Task
13
作者 Jinqiao Dai Pengsen Cheng Jiayong Liu 《Computers, Materials & Continua》 SCIE EI 2024年第7期547-565,共19页
Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.Ho... Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance. 展开更多
关键词 Text generation entailment recognition task natural language processing artificial intelligence
在线阅读 下载PDF
Dart Games Optimizer with Deep Learning-Based Computational Linguistics Named Entity Recognition
14
作者 Mesfer Al Duhayyim Hala J.Alshahrani +5 位作者 Khaled Tarmissi Heyam H.Al-Baity Abdullah Mohamed Ishfaq Yaseen Amgad Atta Abdelmageed Mohamed IEldesouki 《Intelligent Automation & Soft Computing》 SCIE 2023年第9期2549-2566,共18页
Computational linguistics is an engineering-based scientific discipline.It deals with understanding written and spoken language from a computational viewpoint.Further,the domain also helps construct the artefacts that... Computational linguistics is an engineering-based scientific discipline.It deals with understanding written and spoken language from a computational viewpoint.Further,the domain also helps construct the artefacts that are useful in processing and producing a language either in bulk or in a dialogue setting.Named Entity Recognition(NER)is a fundamental task in the data extraction process.It concentrates on identifying and labelling the atomic components from several texts grouped under different entities,such as organizations,people,places,and times.Further,the NER mechanism identifies and removes more types of entities as per the requirements.The significance of the NER mechanism has been well-established in Natural Language Processing(NLP)tasks,and various research investigations have been conducted to develop novel NER methods.The conventional ways of managing the tasks range from rule-related and hand-crafted feature-related Machine Learning(ML)techniques to Deep Learning(DL)techniques.In this aspect,the current study introduces a novel Dart Games Optimizer with Hybrid Deep Learning-Driven Computational Linguistics(DGOHDL-CL)model for NER.The presented DGOHDL-CL technique aims to determine and label the atomic components from several texts as a collection of the named entities.In the presented DGOHDL-CL technique,the word embed-ding process is executed at the initial stage with the help of the word2vec model.For the NER mechanism,the Convolutional Gated Recurrent Unit(CGRU)model is employed in this work.At last,the DGO technique is used as a hyperparameter tuning strategy for the CGRU algorithm to boost the NER’s outcomes.No earlier studies integrated the DGO mechanism with the CGRU model for NER.To exhibit the superiority of the proposed DGOHDL-CL technique,a widespread simulation analysis was executed on two datasets,CoNLL-2003 and OntoNotes 5.0.The experimental outcomes establish the promising performance of the DGOHDL-CL technique over other models. 展开更多
关键词 Named entity recognition deep learning natural language processing computational linguistics dart games optimizer
在线阅读 下载PDF
自然语言处理研究综述 被引量:14
15
作者 赵铁军 许木璠 陈安东 《新疆师范大学学报(哲学社会科学版)》 北大核心 2025年第2期89-111,F0002,共24页
近年来,自然语言处理因在分析与建模人类语言任务领域取得诸多成果而备受关注。当前,大规模预训练语言模型展现出强大的对话问答和文本生成能力,带来自然语言处理研究的新一轮热潮。自然语言处理在机器翻译、文本摘要、信息抽取等领域... 近年来,自然语言处理因在分析与建模人类语言任务领域取得诸多成果而备受关注。当前,大规模预训练语言模型展现出强大的对话问答和文本生成能力,带来自然语言处理研究的新一轮热潮。自然语言处理在机器翻译、文本摘要、信息抽取等领域应用广泛。文本首先讨论自然语言处理针对语言学四个不同层次文本信息的分析手段,对自然语言处理的基本任务组成进行概述;其次,讨论自然语言处理在具体下游任务中的应用现状,包括自然语言处理在具体任务中的应用历史、当前的研究趋势以及面临的挑战;最后,在大规模预训练语言模型研究对数据集提出更高要求的背景下,对自然语言处理领域已有的数据集及评测基准集等进行讨论。 展开更多
关键词 自然语言处理 句法分析 语义分析 机器翻译 问答系统 信息抽取
原文传递
基于知识图谱的钻井顶部驱动装置故障智能诊断方法 被引量:1
16
作者 陈冬 肖远山 +2 位作者 尹志勇 张彦龙 叶智慧 《天然气工业》 北大核心 2025年第2期125-135,共11页
钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井... 钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。 展开更多
关键词 钻井装备 顶部驱动装置 故障诊断 深度学习 知识图谱 自然语言处理 命名实体识别 智能问答系统
在线阅读 下载PDF
基于语言表达模式和自然语言处理的有机化学文献数据自动识别提取方法
17
作者 陈维明 戴静芳 +5 位作者 李英勇 周俊红 高犇 赵英莉 徐挺军 薛小松 《有机化学》 北大核心 2025年第6期2189-2198,共10页
期刊文献是科学数据的一个重要来源,以往大多采用人工标引方法识别和提取其中的科学数据.随着信息技术和人工智能方法的发展,从期刊文献资料中自动识别和提取科学数据正在逐步成为可能.研究了结合语言表达模式和基于规则的自然语言处理... 期刊文献是科学数据的一个重要来源,以往大多采用人工标引方法识别和提取其中的科学数据.随着信息技术和人工智能方法的发展,从期刊文献资料中自动识别和提取科学数据正在逐步成为可能.研究了结合语言表达模式和基于规则的自然语言处理技术(NLP)从期刊文章中自动识别提取化学数据和信息的方法,完成了2013~2022年10年《有机化学》期刊中3275篇实验研究文章中化学数据的自动识别提取,提取了包括产物特性、合成反应参数、物性数据、谱学数据等30多种化学数据,提取的数据经过处理建成对应的数据库,已经开始对外提供《有机化学》期刊知识服务.对2022年《有机化学》期刊全部422篇文章进行的方法性能测试表明,旋光数据识别提取的正确率为100%,熔点数据识别提取的正确率为99.85%,氟核磁谱识别提取的正确率为99.55%,碳核磁谱识别提取的正确率为99.80%,物质形态数据识别提取的正确率为99.47%,产物名称识别提取的正确率为98.76%(共提取4665个产物名称,其中有问题的产物名称58个).本文中产物名称自动识别提取使用了基于局部场景的无关内容排除法,如果使用化合物系统半系统命名模式,产物名称的自动识别准确率有望进一步提高.基于语言表达模式和自然语言处理技术的自动识别提取方法原则上不受学科限制,适合所有科学数据. 展开更多
关键词 化学数据 识别提取 语言表达模式 自然语言处理
原文传递
多维语言复杂度对说明文写作语言质量的预测效应
18
作者 彭程 鲍珍 《外语与外语教学》 北大核心 2025年第2期35-45,146,共12页
本文以英语专业大三学生产出的说明文写作文本为研究对象,采用自然语言处理方法考察多维语言指标对写作语言质量的预测。研究发现,(1)词汇、句法、短语维度的部分指标和写作语言成绩呈显著相关;(2)二元序列和依存搭配的关联强度共解释... 本文以英语专业大三学生产出的说明文写作文本为研究对象,采用自然语言处理方法考察多维语言指标对写作语言质量的预测。研究发现,(1)词汇、句法、短语维度的部分指标和写作语言成绩呈显著相关;(2)二元序列和依存搭配的关联强度共解释写作语言成绩约49%的差异,从属结构占比和复杂名词占比共解释写作语言成绩12.4%的差异,实词词频可解释写作语言成绩8%的差异。研究结果表明,与词汇和句法维度的指标相比,短语关联度对说明文写作语言质量的预测更有效,对二语写作评估有一定启示。 展开更多
关键词 短语复杂度 句法复杂度 词汇复杂度 说明文写作 自然语言处理
原文传递
层次融合多元知识的命名实体识别框架——HTLR
19
作者 吕学强 王涛 +1 位作者 游新冬 徐戈 《计算机应用》 北大核心 2025年第1期40-47,共8页
中文命名实体识别(NER)任务旨在抽取非结构化文本中包含的实体并给它们分配预定义的实体类别。针对大多数中文NER方法在上下文信息缺乏时的语义学习不足问题,提出一种层次融合多元知识的NER框架——HTLR(Chinese NER method based on Hi... 中文命名实体识别(NER)任务旨在抽取非结构化文本中包含的实体并给它们分配预定义的实体类别。针对大多数中文NER方法在上下文信息缺乏时的语义学习不足问题,提出一种层次融合多元知识的NER框架——HTLR(Chinese NER method based on Hierarchical Transformer fusing Lexicon and Radical),以通过分层次融合的多元知识来帮助模型学习更丰富、全面的上下文信息和语义信息。首先,通过发布的中文词汇表和词汇向量表识别语料中包含的潜在词汇并把它们向量化,同时通过优化后的位置编码建模词汇和相关字符的语义关系,以学习中文的词汇知识;其次,通过汉典网发布的基于汉字字形的编码将语料转换为相应的编码序列以代表字形信息,并提出RFECNN(Radical Feature Extraction-Convolutional Neural Network)模型来提取字形知识;最后,提出Hierarchical Transformer模型,其中由低层模块分别学习字符和词汇以及字符和字形的语义关系,并由高层模块进一步融合字符、词汇、字形等多元知识,从而帮助模型学习语义更丰富的字符表征。在Weibo、Resume、MSRA和OntoNotes4.0公开数据集进行了实验,与主流方法NFLAT(Non-Flat-LAttice Transformer for Chinese named entity recognition)的对比结果表明,所提方法的F1值在4个数据集上分别提升了9.43、0.75、1.76和6.45个百分点,达到最优水平。可见,多元语义知识、层次化融合、RFE-CNN结构和Hierarchical Transformer结构对学习丰富的语义知识及提高模型性能是有效的。 展开更多
关键词 命名实体识别 自然语言处理 知识图谱构建 词汇增强 字形增强
在线阅读 下载PDF
方面语义增强的融合网络用于方面级情感分析
20
作者 郑诚 陈雪灵 《小型微型计算机系统》 北大核心 2025年第9期2105-2112,共8页
方面级情感分析旨在识别方面词表达的情感.最近,基于依赖树的图卷积网络已被证明在方面级情感分析任务中是有效的.然而,句法依赖树并不是特定于情感分析的工具,不能关注到特定的方面词.针对上述问题,本文提出一种方面语义增强的融合网... 方面级情感分析旨在识别方面词表达的情感.最近,基于依赖树的图卷积网络已被证明在方面级情感分析任务中是有效的.然而,句法依赖树并不是特定于情感分析的工具,不能关注到特定的方面词.针对上述问题,本文提出一种方面语义增强的融合网络模型,该模型将句法,语义和词法信息与方面词相结合,用于方面级情感分析.首先,使用快速梯度对抗训练算法进行数据增强.其次,为了充分利用句法依赖树中的有效信息,分别使用图卷积网络和注意力机制学习依赖树中的句法信息和词法信息.同时,将方面增强注意力机制与自注意力机制相结合,来增强句子的方面语义感知能力.最后,使用非对称损失作为损失函数.在基准数据集上进行了实验,验证了本文模型的有效性. 展开更多
关键词 自然语言处理 方面级情感分析 数据增强 注意力机制 句法依赖树
在线阅读 下载PDF
上一页 1 2 21 下一页 到第
使用帮助 返回顶部