期刊文献+
共找到3,942篇文章
< 1 2 198 >
每页显示 20 50 100
Exploring Recovery through Life Narratives in Psychiatric Home-Visit Nursing:A Natural Language Processing Approach Using BERTopic
1
作者 Ichiro Kutsuna Masanao Ikeya +2 位作者 Akane Fujii Aiko Hoshino Kazuya Sakai 《International Journal of Mental Health Promotion》 2026年第2期31-47,共17页
Background:In mental health,recovery is emphasized,and qualitative analyses of service users’narratives have accumulated;however,while qualitative approaches excel at capturing rich context and generating new concept... Background:In mental health,recovery is emphasized,and qualitative analyses of service users’narratives have accumulated;however,while qualitative approaches excel at capturing rich context and generating new concepts,they are limited in generalizability and feasible data volume.This study aimed to quantify the subjective life history narratives of users of psychiatric home-visit nursing using natural language processing(NLP)and to clarify the relationships between linguistic features and recovery-related indicators.Methods:We conducted audio-recorded and transcribed semi-structured interviews on daily life verbatim and collected self-report questionnaires(Recovery Assessment Scale[RAS])and clinician ratings(Global Assessment of Functioning[GAF])from Japanese users of psychiatric home-visit nursing.Using the artificial intelligence-based topic-modeling method BERTopic,we extracted topics from the interview texts and calculated each participant’s topic proportions,and then examined associations between topic proportions and recovery-related indicators using Pearson correlation analyses.Results:“School”showed a significant positive correlation with RAS(r=0.39,p=0.05),whereas“Family”showed a significant negative correlation(r=–0.46,p=0.02).GAF was positively correlated with word count(r=0.44,p=0.02)and“Hospital”(r=0.42,p=0.03),and negatively correlated with“Backchannels”(aizuchi)(r=–0.41,p=0.03).Conclusion:The present results suggest that the quantity,quality,and content of narratives can serve as useful indicators of mental health and recovery,and that objective NLP-based analysis of service users’narratives can complement traditional self-report scales and clinician ratings to inform the design of recovery-oriented care in psychiatric home-visit nursing. 展开更多
关键词 Personal recovery life history narratives natural language processing psychiatric home-visit nursing artificial intelligence
暂未订购
Natural language processing for disaster-resilient infrastructure:Research focus and future opportunities
2
作者 Muhammad Ali Moriyani Lemlem Asaye +4 位作者 Chau Le Trung Le Harun Pirim Om Parkash Yadav Tuyen Le 《Resilient Cities and Structures》 2025年第4期47-71,共25页
The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the si... The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the sig-nificant potential of natural language processing(NLP)to analyze unstructured human language during disasters,thereby facilitating the uncovering of disruptions and providing situational awareness supporting various aspects of resilience regarding CISs.Despite this potential,few studies have systematically mapped the global research on NLP applications with respect to supporting various aspects of resilience of CISs.This paper contributes to the body of knowledge by presenting a review of current knowledge using the scientometric review technique.Using 231 bibliographic records from the Scopus and Web of Science core collections,we identify five key research areas where researchers have used NLP to support the resilience of CISs during natural disasters,including sentiment analysis,crisis informatics,data and knowledge visualization,disaster impacts,and content analysis.Furthermore,we map the utility of NLP in the identified research focus with respect to four aspects of resilience(i.e.,preparedness,absorption,recovery,and adaptability)and present various common techniques used and potential future research directions.This review highlights that NLP has the potential to become a supplementary data source to support the resilience of CISs.The results of this study serve as an introductory-level guide designed to help scholars and practitioners unlock the potential of NLP for strengthening the resilience of CISs against natural disasters. 展开更多
关键词 Natural language processing nlp Critical infrastructure RESILIENCE DISASTER
在线阅读 下载PDF
Chinese DeepSeek: Performance of Various Oversampling Techniques on Public Perceptions Using Natural Language Processing
3
作者 Anees Ara Muhammad Mujahid +2 位作者 Amal Al-Rasheed Shaha Al-Otaibi Tanzila Saba 《Computers, Materials & Continua》 2025年第8期2717-2731,共15页
DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without... DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without supervised fine-tuning as a preliminary step,demonstrates remarkable reasoning capabilities of performing a wide range of tasks.DeepSeek is a prominent AI-driven chatbot that assists individuals in learning and enhances responses by generating insightful solutions to inquiries.Users possess divergent viewpoints regarding advanced models like DeepSeek,posting both their merits and shortcomings across several social media platforms.This research presents a new framework for predicting public sentiment to evaluate perceptions of DeepSeek.To transform the unstructured data into a suitable manner,we initially collect DeepSeek-related tweets from Twitter and subsequently implement various preprocessing methods.Subsequently,we annotated the tweets utilizing the Valence Aware Dictionary and sentiment Reasoning(VADER)methodology and the lexicon-driven TextBlob.Next,we classified the attitudes obtained from the purified data utilizing the proposed hybrid model.The proposed hybrid model consists of long-term,shortterm memory(LSTM)and bidirectional gated recurrent units(BiGRU).To strengthen it,we include multi-head attention,regularizer activation,and dropout units to enhance performance.Topic modeling employing KMeans clustering and Latent Dirichlet Allocation(LDA),was utilized to analyze public behavior concerning DeepSeek.The perceptions demonstrate that 82.5%of the people are positive,15.2%negative,and 2.3%neutral using TextBlob,and 82.8%positive,16.1%negative,and 1.2%neutral using the VADER analysis.The slight difference in results ensures that both analyses concur with their overall perceptions and may have distinct views of language peculiarities.The results indicate that the proposed model surpassed previous state-of-the-art approaches. 展开更多
关键词 DeepSeek PREDICTION natural language processing deep learning analysis TextBlob imbalance data
在线阅读 下载PDF
Deep Learning-Based Natural Language Processing Model and Optical Character Recognition for Detection of Online Grooming on Social Networking Services
4
作者 Sangmin Kim Byeongcheon Lee +2 位作者 Muazzam Maqsood Jihoon Moon Seungmin Rho 《Computer Modeling in Engineering & Sciences》 2025年第5期2079-2108,共30页
The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children a... The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children and adolescents who are increasingly exposed to online grooming crimes.Early and accurate identification of grooming conversations is crucial in preventing long-term harm to victims.However,research on grooming detection in South Korea remains limited,as existing models trained primarily on English text and fail to reflect the unique linguistic features of SNS conversations,leading to inaccurate classifications.To address these issues,this study proposes a novel framework that integrates optical character recognition(OCR)technology with KcELECTRA,a deep learning-based natural language processing(NLP)model that shows excellent performance in processing the colloquial Korean language.In the proposed framework,the KcELECTRA model is fine-tuned by an extensive dataset,including Korean social media conversations,Korean ethical verification data from AI-Hub,and Korean hate speech data from Hug-gingFace,to enable more accurate classification of text extracted from social media conversation images.Experimental results show that the proposed framework achieves an accuracy of 0.953,outperforming existing transformer-based models.Furthermore,OCR technology shows high accuracy in extracting text from images,demonstrating that the proposed framework is effective for online grooming detection.The proposed framework is expected to contribute to the more accurate detection of grooming text and the prevention of grooming-related crimes. 展开更多
关键词 Online grooming KcELECTRA natural language processing optical character recognition social networking service text classification
在线阅读 下载PDF
Detection of Maliciously Disseminated Hate Speech in Spanish Using Fine-Tuning and In-Context Learning Techniques with Large Language Models
5
作者 Tomás Bernal-Beltrán RonghaoPan +3 位作者 JoséAntonio García-Díaz María del Pilar Salas-Zárate Mario Andrés Paredes-Valverde Rafael Valencia-García 《Computers, Materials & Continua》 2026年第4期353-390,共38页
The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in S... The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in Spanish is challenging due to linguistic complexity and the scarcity of annotated resources.In this paper,we compare two predominant AI-based approaches for the forensic detection of malicious hate speech:(1)finetuning encoder-only models that have been trained in Spanish and(2)In-Context Learning techniques(Zero-and Few-Shot Learning)with large-scale language models.Our approach goes beyond binary classification,proposing a comprehensive,multidimensional evaluation that labels each text by:(1)type of speech,(2)recipient,(3)level of intensity(ordinal)and(4)targeted group(multi-label).Performance is evaluated using an annotated Spanish corpus,standard metrics such as precision,recall and F1-score and stability-oriented metrics to evaluate the stability of the transition from zero-shot to few-shot prompting(Zero-to-Few Shot Retention and Zero-to-Few Shot Gain)are applied.The results indicate that fine-tuned encoder-only models(notably MarIA and BETO variants)consistently deliver the strongest and most reliable performance:in our experiments their macro F1-scores lie roughly in the range of approximately 46%–66%depending on the task.Zero-shot approaches are much less stable and typically yield substantially lower performance(observed F1-scores range approximately 0%–39%),often producing invalid outputs in practice.Few-shot prompting(e.g.,Qwen 38B,Mistral 7B)generally improves stability and recall relative to pure zero-shot,bringing F1-scores into a moderate range of approximately 20%–51%but still falling short of fully fine-tuned models.These findings highlight the importance of supervised adaptation and discuss the potential of both paradigms as components in AI-powered cybersecurity and malware forensics systems designed to identify and mitigate coordinated online hate campaigns. 展开更多
关键词 Hate speech detection malicious communication campaigns AI-driven cybersecurity social media analytics large language models prompt-tuning fine-tuning in-context learning natural language processing
在线阅读 下载PDF
基于NLP与多模型融合的智慧合同审核平台的构建与效能评估
6
作者 张雨晴 吴方元 曾辉 《中阿科技论坛(中英文)》 2026年第1期47-51,共5页
传统合同审核方式不仅效率低下,还难以有效识别潜在风险。为解决这些问题,文章构建了一个基于自然语言处理(NLP)与多模型融合的智慧合同审核平台,旨在打造覆盖合同全生命周期的智能风控体系。该平台集成了BiLSTM-CRF、RoBERTa、TextCNN... 传统合同审核方式不仅效率低下,还难以有效识别潜在风险。为解决这些问题,文章构建了一个基于自然语言处理(NLP)与多模型融合的智慧合同审核平台,旨在打造覆盖合同全生命周期的智能风控体系。该平台集成了BiLSTM-CRF、RoBERTa、TextCNN等模型,能够精准提取合同中的关键条款,并对其中的风险点进行结构化分析。在包含20000份合同的数据集上进行测试,平台在关键条款提取任务中的F1值达96.0%,风险识别准确率达96.3%;在并发压力测试中,面对200名用户同时使用,系统每秒可处理超过2240笔请求。消融实验结果进一步表明,多模型融合策略使整体性能提升了4.9%。此外,用户调研结果显示,平台满意度达4.4分(满分5分)。智慧合同审核平台显著提升了合同审核效率,有效降低了履约风险,为智能合同系统的开发与应用提供了切实可行的技术路径和实践参考。 展开更多
关键词 智慧合同审核 多模型融合 自然语言处理 效能评估
在线阅读 下载PDF
基于OCR+NLP的质量控制文本识别与处理系统设计
7
作者 杨卫军 魏帅 +2 位作者 张利茸 白凯 秦企妍 《信息技术》 2026年第2期28-34,共7页
针对烟草实验室质控管理存在的问题,文中设计了一种将光学字符识别(OCR)和自然语言处理(NLP)技术相融合的质量控制文件管理系统。在传统质量管理方式的基础上,利用OCR技术对烟草实验室质控文件中的文字信息进行自动识别和提取,将纸质或... 针对烟草实验室质控管理存在的问题,文中设计了一种将光学字符识别(OCR)和自然语言处理(NLP)技术相融合的质量控制文件管理系统。在传统质量管理方式的基础上,利用OCR技术对烟草实验室质控文件中的文字信息进行自动识别和提取,将纸质或扫描文件中的文字高效转换为可编辑的电子文件,同时利用NLP进行语义分析处理,提取出关键信息与质控数据,提高了数据提取的准确性和效率。将OCR+NLP质量控制文件管理系统与其他算法进行对比,实验结果表明,所提算法的审核效率可达95%以上、精度可达98%,能够保障质控文件管理的数据可靠性。 展开更多
关键词 质量控制文件 信息管理系统 烟草质量检测 光学字符识别 自然语言处理
在线阅读 下载PDF
大模型在NLP基准测试中的方法与挑战
8
作者 吴迪 《黎明职业大学学报》 2025年第2期85-92,共8页
为有效评估大规模预训练模型(如GPT,BERT,T5等)的性能,基准测试作为一种标准化的评估方法,变得愈发重要。首先,文中论述当前大模型(LLMs)在NLP(自然语言处理)基准测试的主要方法和数据集,分析诸如在知识类问答、代码生成、数学和中文能... 为有效评估大规模预训练模型(如GPT,BERT,T5等)的性能,基准测试作为一种标准化的评估方法,变得愈发重要。首先,文中论述当前大模型(LLMs)在NLP(自然语言处理)基准测试的主要方法和数据集,分析诸如在知识类问答、代码生成、数学和中文能力等不同任务中使用的基准测试框架。然后,探讨现有基准测试的优缺点,阐述其在模型比较、性能评估和研究在推动方面的作用及不足;同时,还讨论中文基准测试面临的挑战(如中文语言特性、中文数据集、传统评估指标和可解释性不足等)。最后,提出基准测试未来的发展方向,包括引入更具挑战性的任务、增强定性评估方法及促进多模态跨领域的基准测试(如ARC-AGI任务),以期推动NLP大模型的持续进步和更具智能化。 展开更多
关键词 自然语言处理(nlp) 大模型(LLMs) 基准测试 大规模预训练模型
在线阅读 下载PDF
Research on Text Mining of Syndrome Element Syndrome Differentiation by Natural Language Processing 被引量:5
9
作者 DENG Wen-Xiang ZHU Jian-Ping +6 位作者 LI Jing YUAN Zhi-Ying WU Hua-Ying YAO Zhong-Hua ZHANG Yi-Ge ZHANG Wen-An HUANG Hui-Yong 《Digital Chinese Medicine》 2019年第2期61-71,共11页
Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis envir... Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation. 展开更多
关键词 Syndrome element syndrome differentiation (SESD) Natural language processing (nlp) Diagnostics of TCM Artificial intelligence Text mining
在线阅读 下载PDF
基于自然语言处理(NLP)的生态环境准入清单政策内容分析 被引量:3
10
作者 魏泽洋 汪自书 +3 位作者 宫曼莉 谢丹 杨洋 刘毅 《环境工程技术学报》 北大核心 2025年第1期1-10,共10页
生态环境准入清单是生态环境分区管控制度的核心抓手,通过空间布局约束、污染排放管控、环境风险防控和资源能源利用效率控制等维度实现生态环境源头预防。生态环境准入清单存在政策文本庞大、管控措施多样、表达构成复杂特点,识别准入... 生态环境准入清单是生态环境分区管控制度的核心抓手,通过空间布局约束、污染排放管控、环境风险防控和资源能源利用效率控制等维度实现生态环境源头预防。生态环境准入清单存在政策文本庞大、管控措施多样、表达构成复杂特点,识别准入清单管控的对象、方式与力度是支撑生态环境分区管控政策实施的重要基础。本研究基于自然语言机器无监督学习技术对生态环境准入清单进行政策词汇模式挖掘并对政策文本设定多维定量化标签,应用自然语言深度学习模型对生态环境准入清单管控措施进行文本分类评估。河北省是我国产业门类最齐全、资源环境问题最复杂的省份之一,其生态环境准入管控具有典型性和代表性。以河北省生态环境准入清单的产业管控措施为例,识别了10类政策关键词特征、64项主要政策关键词,对全清单中对应关键词所在的语句覆盖率达95%;构造了24个管控措施-行业的分类标签,应用并比较了BERT、RoBERTa和ALBERT深度学习模型对政策文本的分类识别效果,预测精度、召回率和F1得分最高分别可达到0.95、0.79和0.86,训练模型可较好地识别准入清单政策内容。结果显示河北省准入清单在管控措施明确化、具体化、定量化方面仍存在不足,产业精细化管控、考核指标型以及时限型内容有待补充和细化。本研究提出的方法具有较好的适用前景,建议在此基础上结合前沿人工智能方法,进一步提高模型自动处理效率、动态分析以及提供精细化政策调整建议的能力。 展开更多
关键词 生态环境分区管控 生态环境准入清单 政策文本 自然语言处理(nlp)
在线阅读 下载PDF
Deep Learning with Natural Language Processing Enabled Sentimental Analysis on Sarcasm Classification 被引量:3
11
作者 Abdul Rahaman Wahab Sait Mohamad Khairi Ishak 《Computer Systems Science & Engineering》 SCIE EI 2023年第3期2553-2567,共15页
Sentiment analysis(SA)is the procedure of recognizing the emotions related to the data that exist in social networking.The existence of sarcasm in tex-tual data is a major challenge in the efficiency of the SA.Earlier... Sentiment analysis(SA)is the procedure of recognizing the emotions related to the data that exist in social networking.The existence of sarcasm in tex-tual data is a major challenge in the efficiency of the SA.Earlier works on sarcasm detection on text utilize lexical as well as pragmatic cues namely interjection,punctuations,and sentiment shift that are vital indicators of sarcasm.With the advent of deep-learning,recent works,leveraging neural networks in learning lexical and contextual features,removing the need for handcrafted feature.In this aspect,this study designs a deep learning with natural language processing enabled SA(DLNLP-SA)technique for sarcasm classification.The proposed DLNLP-SA technique aims to detect and classify the occurrence of sarcasm in the input data.Besides,the DLNLP-SA technique holds various sub-processes namely preprocessing,feature vector conversion,and classification.Initially,the pre-processing is performed in diverse ways such as single character removal,multi-spaces removal,URL removal,stopword removal,and tokenization.Secondly,the transformation of feature vectors takes place using the N-gram feature vector technique.Finally,mayfly optimization(MFO)with multi-head self-attention based gated recurrent unit(MHSA-GRU)model is employed for the detection and classification of sarcasm.To verify the enhanced outcomes of the DLNLP-SA model,a comprehensive experimental investigation is performed on the News Headlines Dataset from Kaggle Repository and the results signified the supremacy over the existing approaches. 展开更多
关键词 Sentiment analysis sarcasm detection deep learning natural language processing N-GRAMS hyperparameter tuning
在线阅读 下载PDF
Natural Language Processing with Optimal Deep Learning-Enabled Intelligent Image Captioning System 被引量:1
12
作者 Radwa Marzouk Eatedal Alabdulkreem +5 位作者 Mohamed KNour Mesfer Al Duhayyim Mahmoud Othman Abu Sarwar Zamani Ishfaq Yaseen Abdelwahed Motwakel 《Computers, Materials & Continua》 SCIE EI 2023年第2期4435-4451,共17页
The recent developments in Multimedia Internet of Things(MIoT)devices,empowered with Natural Language Processing(NLP)model,seem to be a promising future of smart devices.It plays an important role in industrial models... The recent developments in Multimedia Internet of Things(MIoT)devices,empowered with Natural Language Processing(NLP)model,seem to be a promising future of smart devices.It plays an important role in industrial models such as speech understanding,emotion detection,home automation,and so on.If an image needs to be captioned,then the objects in that image,its actions and connections,and any silent feature that remains under-projected or missing from the images should be identified.The aim of the image captioning process is to generate a caption for image.In next step,the image should be provided with one of the most significant and detailed descriptions that is syntactically as well as semantically correct.In this scenario,computer vision model is used to identify the objects and NLP approaches are followed to describe the image.The current study develops aNatural Language Processing with Optimal Deep Learning Enabled Intelligent Image Captioning System(NLPODL-IICS).The aim of the presented NLPODL-IICS model is to produce a proper description for input image.To attain this,the proposed NLPODL-IICS follows two stages such as encoding and decoding processes.Initially,at the encoding side,the proposed NLPODL-IICS model makes use of Hunger Games Search(HGS)with Neural Search Architecture Network(NASNet)model.This model represents the input data appropriately by inserting it into a predefined length vector.Besides,during decoding phase,Chimp Optimization Algorithm(COA)with deeper Long Short Term Memory(LSTM)approach is followed to concatenate the description sentences 4436 CMC,2023,vol.74,no.2 produced by the method.The application of HGS and COA algorithms helps in accomplishing proper parameter tuning for NASNet and LSTM models respectively.The proposed NLPODL-IICS model was experimentally validated with the help of two benchmark datasets.Awidespread comparative analysis confirmed the superior performance of NLPODL-IICS model over other models. 展开更多
关键词 Natural language processing information retrieval image captioning deep learning metaheuristics
在线阅读 下载PDF
Numerical‐discrete‐scheme‐incorporated recurrent neural network for tasks in natural language processing 被引量:1
13
作者 Mei Liu Wendi Luo +3 位作者 Zangtai Cai Xiujuan Du Jiliang Zhang Shuai Li 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第4期1415-1424,共10页
A variety of neural networks have been presented to deal with issues in deep learning in the last decades.Despite the prominent success achieved by the neural network,it still lacks theoretical guidance to design an e... A variety of neural networks have been presented to deal with issues in deep learning in the last decades.Despite the prominent success achieved by the neural network,it still lacks theoretical guidance to design an efficient neural network model,and verifying the performance of a model needs excessive resources.Previous research studies have demonstrated that many existing models can be regarded as different numerical discretizations of differential equations.This connection sheds light on designing an effective recurrent neural network(RNN)by resorting to numerical analysis.Simple RNN is regarded as a discretisation of the forward Euler scheme.Considering the limited solution accuracy of the forward Euler methods,a Taylor‐type discrete scheme is presented with lower truncation error and a Taylor‐type RNN(T‐RNN)is designed with its guidance.Extensive experiments are conducted to evaluate its performance on statistical language models and emotion analysis tasks.The noticeable gains obtained by T‐RNN present its superiority and the feasibility of designing the neural network model using numerical methods. 展开更多
关键词 deep learning natural language processing neural network text analysis
在线阅读 下载PDF
Sentence,Phrase,and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset 被引量:1
14
作者 Jennifer D’Souza Sören Auer 《Journal of Data and Information Science》 CSCD 2021年第3期6-34,共29页
Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly... Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly articles via a two-stage annotation methodology:1)pilot stage-to define the scheme(described in prior work);and 2)adjudication stage-to normalize the graphing model(the focus of this paper).Design/methodology/approach:We re-annotate,a second time,the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising:contribution-centered sentences,phrases,and triple statements.To this end,specifically,care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme.Findings:The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences,4,702 contribution-information-centered phrases,and 2,980 surface-structured triples.The intra-annotation agreement between the first and second stages,in terms of F1-score,was 67.92%for sentences,41.82%for phrases,and 22.31%for triple statements indicating that with increased granularity of the information,the annotation decision variance is greater.Research limitations:NLPCONTRIBUTIONGRAPH has limited scope for structuring scholarly contributions compared with STEM(Science,Technology,Engineering,and Medicine)scholarly knowledge at large.Further,the annotation scheme in this work is designed by only an intra-annotator consensus-a single annotator first annotated the data to propose the initial scheme,following which,the same annotator reannotated the data to normalize the annotations in an adjudication stage.However,the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles.This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a“single”set of structures and relationships as the final scheme.Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe,our intraannotation procedure is well-suited.Nevertheless,the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews.This is planned as future work to produce a robust model.Practical implications:We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph(ORKG),a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge,as a viable aid to assist researchers in their day-to-day tasks.Originality/value:NLPCONTRIBUTIONGRAPH is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph,which to the best of our knowledge does not exist in the community.Furthermore,our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty. 展开更多
关键词 Scholarly knowledge graphs Open science graphs Knowledge representation Natural language processing Semantic publishing
在线阅读 下载PDF
Word Embeddings and Semantic Spaces in Natural Language Processing 被引量:2
15
作者 Peter J. Worth 《International Journal of Intelligence Science》 2023年第1期1-21,共21页
One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ... One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP. 展开更多
关键词 Natural language processing Vector Space Models Semantic Spaces Word Embeddings Representation Learning Text Vectorization Machine Learning Deep Learning
在线阅读 下载PDF
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
16
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(nlp) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
在线阅读 下载PDF
Research on the Automatic Pattem Abstraction and Recognition Methodology for Large-scale Database System based on Natural Language Processing 被引量:1
17
作者 RongWang Cuizhen Jiao Wenhua Dai 《International Journal of Technology Management》 2015年第9期125-127,共3页
In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connec... In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connection between nodes, data across different nodes and even regional distribution are well recognized. In order to reduce data redundancy and model design of the database will usually contain a lot of forms we combine the NLP theory to optimize the traditional method. The experimental analysis and simulation proves the correctness of our method. 展开更多
关键词 Pattern Abstraction and Recognition Database System Natural language processing.
在线阅读 下载PDF
Spontaneous Language Analysis in Alzheimer’s Disease:Evaluation of Natural Language Processing Technique for Analyzing Lexical Performance
18
作者 Liu Ning Yuan Zhenming 《Journal of Shanghai Jiaotong university(Science)》 EI 2022年第2期160-167,共8页
Language disorder,a common manifestation of Alzheimer’s disease(AD),has attracted widespread attention in recent years.This paper uses a novel natural language processing(NLP)method,compared with latest deep learning... Language disorder,a common manifestation of Alzheimer’s disease(AD),has attracted widespread attention in recent years.This paper uses a novel natural language processing(NLP)method,compared with latest deep learning technology,to detect AD and explore the lexical performance.Our proposed approach is based on two stages.First,the dialogue contents are summarized into two categories with the same category.Second,term frequency—inverse document frequency(TF-IDF)algorithm is used to extract the keywords of transcripts,and the similarity of keywords between the groups was calculated separately by cosine distance.Several deep learning methods are used to compare the performance.In the meanwhile,keywords with the best performance are used to analyze AD patients’lexical performance.In the Predictive Challenge of Alzheimer’s Disease held by iFlytek in 2019,the proposed AD diagnosis model achieves a better performance in binary classification by adjusting the number of keywords.The F1 score of the model has a considerable improvement over the baseline of 75.4%,and the training process of which is simple and efficient.We analyze the keywords of the model and find that AD patients use less noun and verb than normal controls.A computer-assisted AD diagnosis model on small Chinese dataset is proposed in this paper,which provides a potential way for assisting diagnosis of AD and analyzing lexical performance in clinical setting. 展开更多
关键词 natural language processing(nlp) Alzheimer's disease(AD) mild cognitive impairment term frequency-inverse document frequency(TF-IDF) bag of words
原文传递
Towards the processing breakdown of syntactic garden path phenomenon: A semantic perspective of natural language expert system 被引量:1
19
作者 DU Jia-li YU Ping-fang +1 位作者 XU Jing ZHAO Hong-yan 《通讯和计算机(中英文版)》 2008年第11期53-61,共9页
关键词 数据库 语言学 计算机技术 语义
在线阅读 下载PDF
Automated labelling of radiology reports using natural language processing:Comparison of traditional and newer methods 被引量:1
20
作者 Seo Yi Chng Paul J.W.Tern +1 位作者 Matthew R.X.Kan Lionel T.E.Cheng 《Health Care Science》 2023年第2期120-128,共9页
Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models.T... Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models.This paper explains the necessary data preprocessing steps,reviews the main methods for automated labelling and compares their performance.There are four main methods of automated labelling,namely:(1)rules-based text-matching algorithms,(2)conventional machine learning models,(3)neural network models and(4)Bidirectional Encoder Representations from Transformers(BERT)models.Rules-based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores.However,they require proper handling of negative words.Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors.Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets.Deep learning models make use of connected neural networks,often a long short-term memory network,and are similarly able to achieve good performance if trained on a large data set.BERT is a transformer-based model that utilizes attention.Pretrained BERT models only require fine-tuning with small data sets.In particular,domain-specific BERT models can achieve superior performance compared with the other methods for automated labelling. 展开更多
关键词 automated labelling machine learning natural language processing neural network RADIOLOGY
在线阅读 下载PDF
上一页 1 2 198 下一页 到第
使用帮助 返回顶部