为解决自然语言数据处理模型进行数据处理时存在效果差、资源消耗大等问题,提出一种基于多尺度特征提取和注意力机制的融合算法。通过不同尺度的特征数据提取,并在特征图上应用加权算法,从而增强对某些特定尺度特征的关注,并基于该融合...为解决自然语言数据处理模型进行数据处理时存在效果差、资源消耗大等问题,提出一种基于多尺度特征提取和注意力机制的融合算法。通过不同尺度的特征数据提取,并在特征图上应用加权算法,从而增强对某些特定尺度特征的关注,并基于该融合算法对自然语言数据处理模型进行优化。仿真实验的结果表明:该融合算法特征提取效果较好,显著提升了计算机进行数据处理的各项能力。将优化后的自然语言处理(natural language processing,NLP)数据处理模型与CSAMT数据处理模型、BETG数据处理模型和优化前的NLP数据处理模型的性能进行对比可知:经过CBAM-MS-CNN优化的NLP数据处理模型的各项性能均优于其他模型。研究结果表明:该融合算法可以满足电子化移交流程中非结构化数据管理领域中的高可靠性、智能处理等业务需求,能提升数据处理效率和数据质量,减少人工录入数据和人工复核数据的工作量。展开更多
The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the si...The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the sig-nificant potential of natural language processing(NLP)to analyze unstructured human language during disasters,thereby facilitating the uncovering of disruptions and providing situational awareness supporting various aspects of resilience regarding CISs.Despite this potential,few studies have systematically mapped the global research on NLP applications with respect to supporting various aspects of resilience of CISs.This paper contributes to the body of knowledge by presenting a review of current knowledge using the scientometric review technique.Using 231 bibliographic records from the Scopus and Web of Science core collections,we identify five key research areas where researchers have used NLP to support the resilience of CISs during natural disasters,including sentiment analysis,crisis informatics,data and knowledge visualization,disaster impacts,and content analysis.Furthermore,we map the utility of NLP in the identified research focus with respect to four aspects of resilience(i.e.,preparedness,absorption,recovery,and adaptability)and present various common techniques used and potential future research directions.This review highlights that NLP has the potential to become a supplementary data source to support the resilience of CISs.The results of this study serve as an introductory-level guide designed to help scholars and practitioners unlock the potential of NLP for strengthening the resilience of CISs against natural disasters.展开更多
近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucinati...近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucination)问题引起了学术界和工业界的广泛关注。如何有效检测大语言模型幻觉,成为确保其在文本生成等下游任务可靠、安全、可信应用的关键挑战。该研究着重对大语言模型幻觉检测方法进行综述:首先,介绍了大语言模型概念,进一步明确了幻觉的定义与分类,系统梳理了大语言模型从构建到部署应用全生命周期各环节的特点,并深入分析了幻觉的产生机制与诱因;其次,立足于实际应用需求,考虑到在不同任务场景下模型透明度的差异等因素,将幻觉检测方法划分为针对白盒模型和黑盒模型2类,并进行了重点梳理和深入对比;而后,分析总结了现阶段主流的幻觉检测基准,为后续开展幻觉检测奠定基础;最后,指出了大语言模型幻觉检测的各种潜在研究方法和新的挑战。展开更多
Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis envir...Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.展开更多
The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing,recommending,prescribing,or treating patie...The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing,recommending,prescribing,or treating patients for uniform phenotype features from patients’profile.The authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning.Therefore,here we demonstrated a mechanism to come up with uniform NER(Named Entity Recognition)tagged medical corpora that is fed with 14407 endocrine patients’data set in Comma Separated Values(CSV)format diagnosed with diabetes mellitus and comorbidity diseases.The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com.ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common natural language programming(NLP)techniques and frameworks like TensorFlow,Keras,Long Short-Term Memory(LSTM),and Bi-LSTM.In our preliminary experiments,albeit label sets in form of(instance,label)pair were tagged with Sequential()model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms.The maximum accuracy achieved for model validation was 0.8846.展开更多
随着计算机算力的提升和智能设备的普及,社会逐步进入智慧化时代。高校图书馆作为高校的文献信息中心,进行智慧化转型提升服务质量是时代所需。因此,文章借助智能问答技术,设计了基于自然语言处理(Natural Language Processing,NLP)的...随着计算机算力的提升和智能设备的普及,社会逐步进入智慧化时代。高校图书馆作为高校的文献信息中心,进行智慧化转型提升服务质量是时代所需。因此,文章借助智能问答技术,设计了基于自然语言处理(Natural Language Processing,NLP)的图书馆智能问答系统,创新图书馆参考咨询服务模式,提高图书馆服务水平和效率。展开更多
Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text...Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.展开更多
Language disorder,a common manifestation of Alzheimer’s disease(AD),has attracted widespread attention in recent years.This paper uses a novel natural language processing(NLP)method,compared with latest deep learning...Language disorder,a common manifestation of Alzheimer’s disease(AD),has attracted widespread attention in recent years.This paper uses a novel natural language processing(NLP)method,compared with latest deep learning technology,to detect AD and explore the lexical performance.Our proposed approach is based on two stages.First,the dialogue contents are summarized into two categories with the same category.Second,term frequency—inverse document frequency(TF-IDF)algorithm is used to extract the keywords of transcripts,and the similarity of keywords between the groups was calculated separately by cosine distance.Several deep learning methods are used to compare the performance.In the meanwhile,keywords with the best performance are used to analyze AD patients’lexical performance.In the Predictive Challenge of Alzheimer’s Disease held by iFlytek in 2019,the proposed AD diagnosis model achieves a better performance in binary classification by adjusting the number of keywords.The F1 score of the model has a considerable improvement over the baseline of 75.4%,and the training process of which is simple and efficient.We analyze the keywords of the model and find that AD patients use less noun and verb than normal controls.A computer-assisted AD diagnosis model on small Chinese dataset is proposed in this paper,which provides a potential way for assisting diagnosis of AD and analyzing lexical performance in clinical setting.展开更多
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in...As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.展开更多
文摘为解决自然语言数据处理模型进行数据处理时存在效果差、资源消耗大等问题,提出一种基于多尺度特征提取和注意力机制的融合算法。通过不同尺度的特征数据提取,并在特征图上应用加权算法,从而增强对某些特定尺度特征的关注,并基于该融合算法对自然语言数据处理模型进行优化。仿真实验的结果表明:该融合算法特征提取效果较好,显著提升了计算机进行数据处理的各项能力。将优化后的自然语言处理(natural language processing,NLP)数据处理模型与CSAMT数据处理模型、BETG数据处理模型和优化前的NLP数据处理模型的性能进行对比可知:经过CBAM-MS-CNN优化的NLP数据处理模型的各项性能均优于其他模型。研究结果表明:该融合算法可以满足电子化移交流程中非结构化数据管理领域中的高可靠性、智能处理等业务需求,能提升数据处理效率和数据质量,减少人工录入数据和人工复核数据的工作量。
基金financial support from the National Science Foundation(NSF)EPSCoR R.I.I.Track-2 Program,awarded under the NSF grant number 2119691.
文摘The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the sig-nificant potential of natural language processing(NLP)to analyze unstructured human language during disasters,thereby facilitating the uncovering of disruptions and providing situational awareness supporting various aspects of resilience regarding CISs.Despite this potential,few studies have systematically mapped the global research on NLP applications with respect to supporting various aspects of resilience of CISs.This paper contributes to the body of knowledge by presenting a review of current knowledge using the scientometric review technique.Using 231 bibliographic records from the Scopus and Web of Science core collections,we identify five key research areas where researchers have used NLP to support the resilience of CISs during natural disasters,including sentiment analysis,crisis informatics,data and knowledge visualization,disaster impacts,and content analysis.Furthermore,we map the utility of NLP in the identified research focus with respect to four aspects of resilience(i.e.,preparedness,absorption,recovery,and adaptability)and present various common techniques used and potential future research directions.This review highlights that NLP has the potential to become a supplementary data source to support the resilience of CISs.The results of this study serve as an introductory-level guide designed to help scholars and practitioners unlock the potential of NLP for strengthening the resilience of CISs against natural disasters.
文摘近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucination)问题引起了学术界和工业界的广泛关注。如何有效检测大语言模型幻觉,成为确保其在文本生成等下游任务可靠、安全、可信应用的关键挑战。该研究着重对大语言模型幻觉检测方法进行综述:首先,介绍了大语言模型概念,进一步明确了幻觉的定义与分类,系统梳理了大语言模型从构建到部署应用全生命周期各环节的特点,并深入分析了幻觉的产生机制与诱因;其次,立足于实际应用需求,考虑到在不同任务场景下模型透明度的差异等因素,将幻觉检测方法划分为针对白盒模型和黑盒模型2类,并进行了重点梳理和深入对比;而后,分析总结了现阶段主流的幻觉检测基准,为后续开展幻觉检测奠定基础;最后,指出了大语言模型幻觉检测的各种潜在研究方法和新的挑战。
基金the funding support from the National Natural Science Foundation of China (No. 81874429)Digital and Applied Research Platform for Diagnosis of Traditional Chinese Medicine (No. 49021003005)+1 种基金2018 Hunan Provincial Postgraduate Research Innovation Project (No. CX2018B465)Excellent Youth Project of Hunan Education Department in 2018 (No. 18B241)
文摘Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation.
基金This research is supported by Shifa International Hospital,Pakistan.Endocrine patients’data contributed for diagnosis of diabetes,and its comorbidities holds a lot of worth to come up with these observations from experimental study。
文摘The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing,recommending,prescribing,or treating patients for uniform phenotype features from patients’profile.The authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning.Therefore,here we demonstrated a mechanism to come up with uniform NER(Named Entity Recognition)tagged medical corpora that is fed with 14407 endocrine patients’data set in Comma Separated Values(CSV)format diagnosed with diabetes mellitus and comorbidity diseases.The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com.ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common natural language programming(NLP)techniques and frameworks like TensorFlow,Keras,Long Short-Term Memory(LSTM),and Bi-LSTM.In our preliminary experiments,albeit label sets in form of(instance,label)pair were tagged with Sequential()model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms.The maximum accuracy achieved for model validation was 0.8846.
文摘随着计算机算力的提升和智能设备的普及,社会逐步进入智慧化时代。高校图书馆作为高校的文献信息中心,进行智慧化转型提升服务质量是时代所需。因此,文章借助智能问答技术,设计了基于自然语言处理(Natural Language Processing,NLP)的图书馆智能问答系统,创新图书馆参考咨询服务模式,提高图书馆服务水平和效率。
文摘Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.
基金the Natural Science Foundation of Zhejiang Province(No.GF20F020063)the Fujian Province Young and Middle-Aged Teacher Education Research Project(No.JAT170480)。
文摘Language disorder,a common manifestation of Alzheimer’s disease(AD),has attracted widespread attention in recent years.This paper uses a novel natural language processing(NLP)method,compared with latest deep learning technology,to detect AD and explore the lexical performance.Our proposed approach is based on two stages.First,the dialogue contents are summarized into two categories with the same category.Second,term frequency—inverse document frequency(TF-IDF)algorithm is used to extract the keywords of transcripts,and the similarity of keywords between the groups was calculated separately by cosine distance.Several deep learning methods are used to compare the performance.In the meanwhile,keywords with the best performance are used to analyze AD patients’lexical performance.In the Predictive Challenge of Alzheimer’s Disease held by iFlytek in 2019,the proposed AD diagnosis model achieves a better performance in binary classification by adjusting the number of keywords.The F1 score of the model has a considerable improvement over the baseline of 75.4%,and the training process of which is simple and efficient.We analyze the keywords of the model and find that AD patients use less noun and verb than normal controls.A computer-assisted AD diagnosis model on small Chinese dataset is proposed in this paper,which provides a potential way for assisting diagnosis of AD and analyzing lexical performance in clinical setting.
文摘As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.