期刊文献+
共找到767篇文章
< 1 2 39 >
每页显示 20 50 100
Multilingual Text Summarization in Healthcare Using Pre-Trained Transformer-Based Language Models
1
作者 Josua Käser Thomas Nagy +1 位作者 Patrick Stirnemann Thomas Hanne 《Computers, Materials & Continua》 2025年第4期201-217,共17页
We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of t... We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of these models and their ability to perform the task of abstractive text summarization in the healthcare field.The research hypothesis was that large language models could perform high-quality abstractive text summarization on German technical healthcare texts,even if the model is not specifically trained in that language.Through experiments,the research questions explore the performance of transformer language models in dealing with complex syntax constructs,the difference in performance between models trained in English and German,and the impact of translating the source text to English before conducting the summarization.We conducted an evaluation of four PLMs(GPT-3,a translation-based approach also utilizing GPT-3,a German language Model,and a domain-specific bio-medical model approach).The evaluation considered the informativeness using 3 types of metrics based on Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and the quality of results which is manually evaluated considering 5 aspects.The results show that text summarization models could be used in the German healthcare domain and that domain-independent language models achieved the best results.The study proves that text summarization models can simplify the search for pre-existing German knowledge in various domains. 展开更多
关键词 text summarization pre-trained transformer-based language models large language models technical healthcare texts natural language processing
在线阅读 下载PDF
Deep Learning-Based Natural Language Processing Model and Optical Character Recognition for Detection of Online Grooming on Social Networking Services
2
作者 Sangmin Kim Byeongcheon Lee +2 位作者 Muazzam Maqsood Jihoon Moon Seungmin Rho 《Computer Modeling in Engineering & Sciences》 2025年第5期2079-2108,共30页
The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children a... The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children and adolescents who are increasingly exposed to online grooming crimes.Early and accurate identification of grooming conversations is crucial in preventing long-term harm to victims.However,research on grooming detection in South Korea remains limited,as existing models trained primarily on English text and fail to reflect the unique linguistic features of SNS conversations,leading to inaccurate classifications.To address these issues,this study proposes a novel framework that integrates optical character recognition(OCR)technology with KcELECTRA,a deep learning-based natural language processing(NLP)model that shows excellent performance in processing the colloquial Korean language.In the proposed framework,the KcELECTRA model is fine-tuned by an extensive dataset,including Korean social media conversations,Korean ethical verification data from AI-Hub,and Korean hate speech data from Hug-gingFace,to enable more accurate classification of text extracted from social media conversation images.Experimental results show that the proposed framework achieves an accuracy of 0.953,outperforming existing transformer-based models.Furthermore,OCR technology shows high accuracy in extracting text from images,demonstrating that the proposed framework is effective for online grooming detection.The proposed framework is expected to contribute to the more accurate detection of grooming text and the prevention of grooming-related crimes. 展开更多
关键词 Online grooming KcELECTRA natural language processing optical character recognition social networking service text classification
在线阅读 下载PDF
Research on Text Mining of Syndrome Element Syndrome Differentiation by Natural Language Processing 被引量:5
3
作者 DENG Wen-Xiang ZHU Jian-Ping +6 位作者 LI Jing YUAN Zhi-Ying WU Hua-Ying YAO Zhong-Hua ZHANG Yi-Ge ZHANG Wen-An HUANG Hui-Yong 《Digital Chinese Medicine》 2019年第2期61-71,共11页
Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis envir... Objective Natural language processing (NLP) was used to excavate and visualize the core content of syndrome element syndrome differentiation (SESD). Methods The first step was to build a text mining and analysis environment based on Python language, and built a corpus based on the core chapters of SESD. The second step was to digitalize the corpus. The main steps included word segmentation, information cleaning and merging, document-entry matrix, dictionary compilation and information conversion. The third step was to mine and display the internal information of SESD corpus by means of word cloud, keyword extraction and visualization. Results NLP played a positive role in computer recognition and comprehension of SESD. Different chapters had different keywords and weights. Deficiency syndrome elements were an important component of SESD, such as "Qi deficiency""Yang deficiency" and "Yin deficiency". The important syndrome elements of substantiality included "Blood stasis""Qi stagnation", etc. Core syndrome elements were closely related. Conclusions Syndrome differentiation and treatment was the core of SESD. Using NLP to excavate syndromes differentiation could help reveal the internal relationship between syndromes differentiation and provide basis for artificial intelligence to learn syndromes differentiation. 展开更多
关键词 Syndrome element syndrome differentiation (SESD) natural language processing (NLP) Diagnostics of TCM Artificial intelligence text mining
在线阅读 下载PDF
Automatic Text Summarization Using Genetic Algorithm and Repetitive Patterns 被引量:2
4
作者 Ebrahim Heidary Hamïd Parvïn +4 位作者 Samad Nejatian Karamollah Bagherifard Vahideh Rezaie Zulkefli Mansor Kim-Hung Pho 《Computers, Materials & Continua》 SCIE EI 2021年第4期1085-1101,共17页
Taking into account the increasing volume of text documents,automatic summarization is one of the important tools for quick and optimal utilization of such sources.Automatic summarization is a text compression process... Taking into account the increasing volume of text documents,automatic summarization is one of the important tools for quick and optimal utilization of such sources.Automatic summarization is a text compression process for producing a shorter document in order to quickly access the important goals and main features of the input document.In this study,a novel method is introduced for selective text summarization using the genetic algorithm and generation of repetitive patterns.One of the important features of the proposed summarization is to identify and extract the relationship between the main features of the input text and the creation of repetitive patterns in order to produce and optimize the vector of the main document features in the production of the summary document compared to other previous methods.In this study,attempts were made to encompass all the main parameters of the summary text including unambiguous summary with the highest precision,continuity and consistency.To investigate the efficiency of the proposed algorithm,the results of the study were evaluated with respect to the precision and recall criteria.The results of the study evaluation showed the optimization the dimensions of the features and generation of a sequence of summary document sentences having the most consistency with the main goals and features of the input document. 展开更多
关键词 natural language processing extractive summarization features optimization repetitive patterns genetic algorithm
在线阅读 下载PDF
Automatic Persian Text Summarization Using Linguistic Features from Text Structure Analysis 被引量:1
5
作者 Ebrahim Heidary Hamïd Parvïn +2 位作者 Samad Nejatian Karamollah Bagherifard Vahideh Rezaie 《Computers, Materials & Continua》 SCIE EI 2021年第12期2845-2861,共17页
With the remarkable growth of textual data sources in recent years,easy,fast,and accurate text processing has become a challenge with significant payoffs.Automatic text summarization is the process of compressing text... With the remarkable growth of textual data sources in recent years,easy,fast,and accurate text processing has become a challenge with significant payoffs.Automatic text summarization is the process of compressing text documents into shorter summaries for easier review of its core contents,which must be done without losing important features and information.This paper introduces a new hybrid method for extractive text summarization with feature selection based on text structure.The major advantage of the proposed summarization method over previous systems is the modeling of text structure and relationship between entities in the input text,which improves the sentence feature selection process and leads to the generation of unambiguous,concise,consistent,and coherent summaries.The paper also presents the results of the evaluation of the proposed method based on precision and recall criteria.It is shown that the method produces summaries consisting of chains of sentences with the aforementioned characteristics from the original text. 展开更多
关键词 natural language processing extractive summarization linguistic feature text structure analysis
在线阅读 下载PDF
Numerical‐discrete‐scheme‐incorporated recurrent neural network for tasks in natural language processing 被引量:1
6
作者 Mei Liu Wendi Luo +3 位作者 Zangtai Cai Xiujuan Du Jiliang Zhang Shuai Li 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第4期1415-1424,共10页
A variety of neural networks have been presented to deal with issues in deep learning in the last decades.Despite the prominent success achieved by the neural network,it still lacks theoretical guidance to design an e... A variety of neural networks have been presented to deal with issues in deep learning in the last decades.Despite the prominent success achieved by the neural network,it still lacks theoretical guidance to design an efficient neural network model,and verifying the performance of a model needs excessive resources.Previous research studies have demonstrated that many existing models can be regarded as different numerical discretizations of differential equations.This connection sheds light on designing an effective recurrent neural network(RNN)by resorting to numerical analysis.Simple RNN is regarded as a discretisation of the forward Euler scheme.Considering the limited solution accuracy of the forward Euler methods,a Taylor‐type discrete scheme is presented with lower truncation error and a Taylor‐type RNN(T‐RNN)is designed with its guidance.Extensive experiments are conducted to evaluate its performance on statistical language models and emotion analysis tasks.The noticeable gains obtained by T‐RNN present its superiority and the feasibility of designing the neural network model using numerical methods. 展开更多
关键词 deep learning natural language processing neural network text analysis
在线阅读 下载PDF
Word Embeddings and Semantic Spaces in Natural Language Processing 被引量:2
7
作者 Peter J. Worth 《International Journal of Intelligence Science》 2023年第1期1-21,共21页
One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ... One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP. 展开更多
关键词 natural language processing Vector Space Models Semantic Spaces Word Embeddings Representation Learning text Vectorization Machine Learning Deep Learning
在线阅读 下载PDF
Study on controllability of semantic accessibility scale from the internet-based system of automatic text summarization and evaluation 被引量:2
8
作者 DU Jia-li YU Ping-fang +1 位作者 ZHAO Hong-yan XU Jing 《通讯和计算机(中英文版)》 2008年第9期54-60,共7页
关键词 通信技术 计算机技术 控制方法 自动化系统
在线阅读 下载PDF
Literature classification and its applications in condensed matter physics and materials science by natural language processing
9
作者 吴思远 朱天念 +5 位作者 涂思佳 肖睿娟 袁洁 吴泉生 李泓 翁红明 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第5期117-123,共7页
The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classificatio... The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions. 展开更多
关键词 natural language processing text mining materials science
原文传递
Natural Language Processing with Optimal Deep Learning Based Fake News Classification
10
作者 Sara AAlthubiti Fayadh Alenezi Romany F.Mansour 《Computers, Materials & Continua》 SCIE EI 2022年第11期3529-3544,共16页
The recent advancements made in World Wide Web and social networking have eased the spread of fake news among people at a faster rate.At most of the times,the intention of fake news is to misinform the people and make... The recent advancements made in World Wide Web and social networking have eased the spread of fake news among people at a faster rate.At most of the times,the intention of fake news is to misinform the people and make manipulated societal insights.The spread of low-quality news in social networking sites has a negative influence upon people as well as the society.In order to overcome the ever-increasing dissemination of fake news,automated detection models are developed using Artificial Intelligence(AI)and Machine Learning(ML)methods.The latest advancements in Deep Learning(DL)models and complex Natural Language Processing(NLP)tasks make the former,a significant solution to achieve Fake News Detection(FND).In this background,the current study focuses on design and development of Natural Language Processing with Sea Turtle Foraging Optimizationbased Deep Learning Technique for Fake News Detection and Classification(STODL-FNDC)model.The aim of the proposed STODL-FNDC model is to discriminate fake news from legitimate news in an effectual manner.In the proposed STODL-FNDC model,the input data primarily undergoes pre-processing and Glove-based word embedding.Besides,STODL-FNDC model employs Deep Belief Network(DBN)approach for detection as well as classification of fake news.Finally,STO algorithm is utilized after adjusting the hyperparameters involved in DBN model,in an optimal manner.The novelty of the study lies in the design of STO algorithm with DBN model for FND.In order to improve the detection performance of STODL-FNDC technique,a series of simulations was carried out on benchmark datasets.The experimental outcomes established the better performance of STODL-FNDC approach over other methods with a maximum accuracy of 95.50%. 展开更多
关键词 natural language processing text mining fake news detection deep belief network machine learning evolutionary algorithm
在线阅读 下载PDF
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
11
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(NLP) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
在线阅读 下载PDF
Research on the Classification of Digital Cultural Texts Based on ASSC-TextRCNN Algorithm
12
作者 Zixuan Guo Houbin Wang +1 位作者 Sameer Kumar Yuanfang Chen 《Computers, Materials & Continua》 2026年第3期2119-2145,共27页
With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard ... With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard expression,which bring serious challenges to traditional classification methods.In order to cope with the above problems,this paper proposes a new ASSC(ALBERT,SVD,Self-Attention and Cross-Entropy)-TextRCNN digital cultural text classification model.Based on the framework of TextRCNN,the Albert pre-training language model is introduced to improve the depth and accuracy of semantic embedding.Combined with the dual attention mechanism,the model’s ability to capture and model potential key information in short texts is strengthened.The Singular Value Decomposition(SVD)was used to replace the traditional Max pooling operation,which effectively reduced the feature loss rate and retained more key semantic information.The cross-entropy loss function was used to optimize the prediction results,making the model more robust in class distribution learning.The experimental results indicate that,in the digital cultural text classification task,as compared to the baseline model,the proposed ASSC-TextRCNN method achieves an 11.85%relative improvement in accuracy and an 11.97%relative increase in the F1 score.Meanwhile,the relative error rate decreases by 53.18%.This achievement not only validates the effectiveness and advanced nature of the proposed approach but also offers a novel technical route and methodological underpinnings for the intelligent analysis and dissemination of digital cultural texts.It holds great significance for promoting the in-depth exploration and value realization of digital culture. 展开更多
关键词 text classification natural language processing textRCNN model albert pre-training singular value decomposition cross-entropy loss function
在线阅读 下载PDF
TG-SMR:AText Summarization Algorithm Based on Topic and Graph Models 被引量:1
13
作者 Mohamed Ali Rakrouki Nawaf Alharbe +1 位作者 Mashael Khayyat Abeer Aljohani 《Computer Systems Science & Engineering》 SCIE EI 2023年第4期395-408,共14页
Recently,automation is considered vital in most fields since computing methods have a significant role in facilitating work such as automatic text summarization.However,most of the computing methods that are used in r... Recently,automation is considered vital in most fields since computing methods have a significant role in facilitating work such as automatic text summarization.However,most of the computing methods that are used in real systems are based on graph models,which are characterized by their simplicity and stability.Thus,this paper proposes an improved extractive text summarization algorithm based on both topic and graph models.The methodology of this work consists of two stages.First,the well-known TextRank algorithm is analyzed and its shortcomings are investigated.Then,an improved method is proposed with a new computational model of sentence weights.The experimental results were carried out on standard DUC2004 and DUC2006 datasets and compared to four text summarization methods.Finally,through experiments on the DUC2004 and DUC2006 datasets,our proposed improved graph model algorithm TG-SMR(Topic Graph-Summarizer)is compared to other text summarization systems.The experimental results prove that the proposed TG-SMR algorithm achieves higher ROUGE scores.It is foreseen that the TG-SMR algorithm will open a new horizon that concerns the performance of ROUGE evaluation indicators. 展开更多
关键词 natural language processing text summarization graph model topic model
在线阅读 下载PDF
A Rule Based System for Speech Language Context Understanding
14
作者 Imran Sarwar Bajwa Muhammad Abbas Choudhary 《Journal of Donghua University(English Edition)》 EI CAS 2006年第6期39-42,共4页
Speech or Natural language contents are major tools of communication.This research paper presents a natural language processing based automated system for understanding speech language text.A new rule based model has ... Speech or Natural language contents are major tools of communication.This research paper presents a natural language processing based automated system for understanding speech language text.A new rule based model has been presented for analyzing the natural languages and extracting the relative meanings from the given text.User writes the natural language text in simple English in a few paragraphs and the designed system has a sound ability of analyzing the given script by the user.After composite analysis and extraction of associated information,the designed system gives particular meanings to an assortment of speech language text on the basis of its context.The designed system uses standard speech language rules that are clearly defined for all speech languages as English,Urdu,Chinese,Arabic,French,etc.The designed system provides a quick and reliable way to comprehend speech language context and generate respective meanings. 展开更多
关键词 automatic text understanding speech language processing information extraction language engineering.
在线阅读 下载PDF
Construction of an Automatic Bengali Text Summarizer Using Machine Learning Approaches
15
作者 Busrat Jahan Mahfuja Khatun +2 位作者 Zinat Ara Zabu Afranul Hoque Sayed Uddin Rayhan 《Journal of Data Analysis and Information Processing》 2022年第1期43-57,共15页
In our study, we chose python as the programming platform for finding an Automatic Bengali Document Summarizer. English has sufficient tools to process and receive summarized records. However, there is no specifically... In our study, we chose python as the programming platform for finding an Automatic Bengali Document Summarizer. English has sufficient tools to process and receive summarized records. However, there is no specifically applicable to Bengali since Bengali has a lot of ambiguity, it differs from English in terms of grammar. Afterward, this language holds an important place because this language is spoken by 26 core people all over the world. As a result, it has taken a new method to summarize Bengali documents. The proposed system has been designed by using the following stages: pre-processing the sample doc/input doc, word tagging, pronoun replacement, sentence ranking, as well as summary. Pronoun replacement has been used to reduce the incidence of swinging pronouns in the performance review. We ranked sentences based on sentence frequency, numerical figures, and pronoun replacement. Checking the similarity between two sentences in order to exclude one since it has less duplication. Hereby, we’ve taken 3000 data as input from newspaper and book documents and learned the words to be appropriate with syntax. In addition, to evaluate the performance of the designed summarizer, the design system looked at the different documents. According to the assessment method, the recall, precision, and F-score were 0.70, 0.82 and 0.74, respectively, representing 70%, 82% and 74% recall, precision, and F-score. It has been found that the proper pronoun replacement was 72%. 展开更多
关键词 natural language processing Formatting Bangla text Summarizer Bengali language processing Word Tagging Pronoun Replacement Sentence Ranking
在线阅读 下载PDF
A Hybrid Query-Based Extractive Text Summarization Based on K-Means and Latent Dirichlet Allocation Techniques
16
作者 Sohail Muhammad Muzammil Khan Sarwar Shah Khan 《Journal on Artificial Intelligence》 2024年第1期193-209,共17页
Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections.Large-sc... Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections.Large-scale time consumption is certain to scan and analyze to retrieve the most relevant textual data item from all the documents required a sophisticated technique for a query against the document collection.It is always challenging to retrieve a more accurate and fast retrieval from a large collection.Text summarization is a dominant research field in information retrieval and text processing to locate the most appropriate data object as single or multiple documents from the collection.Machine learning and knowledge-based techniques are the two query-based extractive text summarization techniques in Natural Language Processing(NLP)which can be used for precise retrieval and are considered to be the best option.NLP uses machine learning approaches for both supervised and unsupervised learning for calculating probabilistic features.The study aims to propose a hybrid approach for query-based extractive text summarization in the research study.Text-Rank Algorithm is used as a core algorithm for the flow of an implementation of the approach to gain the required goals.Query-based text summarization of multiple documents using a hybrid approach,combining the K-Means clustering technique with Latent Dirichlet Allocation(LDA)as topic modeling technique produces 0.288,0.631,and 0.328 for precision,recall,and F-score,respectively.The results show that the proposed hybrid approach performs better than the graph-based independent approach and the sentences and word frequency-based approach. 展开更多
关键词 Extractive text summarization machine learning natural language processing K-MEANS latent dirichlet allocation
在线阅读 下载PDF
一种基于RAG的离线中文Text-to-SQL技术
17
作者 周学文 江荣 +1 位作者 许超俊 秦基尧 《网络安全与数据治理》 2025年第S1期55-59,共5页
在现代数据驱动的决策过程中,数据的重要性不言而喻。有效的数据管理和分析不仅能提升业务效率,还能为策略制定提供科学依据。在众多数据处理领域,自然语言处理与结构化查询语言之间的转换显得尤为重要。针对离线环境下,大语言模型无法... 在现代数据驱动的决策过程中,数据的重要性不言而喻。有效的数据管理和分析不仅能提升业务效率,还能为策略制定提供科学依据。在众多数据处理领域,自然语言处理与结构化查询语言之间的转换显得尤为重要。针对离线环境下,大语言模型无法自动完成模型的更新迭代,这在一定程度上限制了提供精确和详细信息的能力的问题,提出一种基于RAG的离线中文Text-to-SQL技术。首先,根据用户输入自然语言查询请求,通过RAG技术对请求解析,生成结构化信息;其次,根据解析后的信息检索相关的数据库表和字段;最后,通过大语言模型生成精确的SQL查询语句。这一技术的应用,不仅能帮助非专业用户更容易地访问和分析数据,还能够有效提高模型语义理解能力和生成SQL精度,同时防止数据泄露。因此,研究和开发高效的自然语言到SQL的离线处理方法,将对数据分析的普及和应用产生深远的影响。 展开更多
关键词 text-to-SQL 离线环境 RAG 自然语言处理 大语言模型
在线阅读 下载PDF
GLMTopic:A Hybrid Chinese Topic Model Leveraging Large Language Models
18
作者 Weisi Chen Walayat Hussain Junjie Chen 《Computers, Materials & Continua》 2025年第10期1559-1583,共25页
Topic modeling is a fundamental technique of content analysis in natural language processing,widely applied in domains such as social sciences and finance.In the era of digital communication,social scientists increasi... Topic modeling is a fundamental technique of content analysis in natural language processing,widely applied in domains such as social sciences and finance.In the era of digital communication,social scientists increasingly rely on large-scale social media data to explore public discourse,collective behavior,and emerging social concerns.However,traditional models like Latent Dirichlet Allocation(LDA)and neural topic models like BERTopic struggle to capture deep semantic structures in short-text datasets,especially in complex non-English languages like Chinese.This paper presents Generative Language Model Topic(GLMTopic)a novel hybrid topic modeling framework leveraging the capabilities of large language models,designed to support social science research by uncovering coherent and interpretable themes from Chinese social media platforms.GLMTopic integrates Adaptive Community-enhanced Graph Embedding for advanced semantic representation,Uniform Manifold Approximation and Projection-based(UMAP-based)dimensionality reduction,Hierarchical Density-Based Spatial Clustering of Applications with Noise(HDBSCAN)clustering,and large language model-powered(LLM-powered)representation tuning to generate more contextually relevant and interpretable topics.By reducing dependence on extensive text preprocessing and human expert intervention in post-analysis topic label annotation,GLMTopic facilitates a fully automated and user-friendly topic extraction process.Experimental evaluations on a social media dataset sourced from Weibo demonstrate that GLMTopic outperforms Latent Dirichlet Allocation(LDA)and BERTopic in coherence score and usability with automated interpretation,providing a more scalable and semantically accurate solution for Chinese topic modeling.Future research will explore optimizing computational efficiency,integrating knowledge graphs and sentiment analysis for more complicated workflows,and extending the framework for real-time and multilingual topic modeling. 展开更多
关键词 Topic modeling large language model deep learning natural language processing text mining
在线阅读 下载PDF
Upholding Academic Integrity amidst Advanced Language Models: Evaluating BiLSTM Networks with GloVe Embeddings for Detecting AI-Generated Scientific Abstracts
19
作者 Lilia-Eliana Popescu-Apreutesei Mihai-Sorin Iosupescu +1 位作者 Sabina Cristiana Necula Vasile-Daniel Pavaloaia 《Computers, Materials & Continua》 2025年第8期2605-2644,共40页
The increasing fluency of advanced language models,such as GPT-3.5,GPT-4,and the recently introduced DeepSeek,challenges the ability to distinguish between human-authored and AI-generated academic writing.This situati... The increasing fluency of advanced language models,such as GPT-3.5,GPT-4,and the recently introduced DeepSeek,challenges the ability to distinguish between human-authored and AI-generated academic writing.This situation is raising significant concerns regarding the integrity and authenticity of academic work.In light of the above,the current research evaluates the effectiveness of Bidirectional Long Short-TermMemory(BiLSTM)networks enhanced with pre-trained GloVe(Global Vectors for Word Representation)embeddings to detect AIgenerated scientific Abstracts drawn from the AI-GA(Artificial Intelligence Generated Abstracts)dataset.Two core BiLSTM variants were assessed:a single-layer approach and a dual-layer design,each tested under static or adaptive embeddings.The single-layer model achieved nearly 97%accuracy with trainable GloVe,occasionally surpassing the deeper model.Despite these gains,neither configuration fully matched the 98.7%benchmark set by an earlier LSTMWord2Vec pipeline.Some runs were over-fitted when embeddings were fine-tuned,whereas static embeddings offered a slightly lower yet stable accuracy of around 96%.This lingering gap reinforces a key ethical and procedural concern:relying solely on automated tools,such as Turnitin’s AI-detection features,to penalize individuals’risks and unjust outcomes.Misclassifications,whether legitimate work is misread as AI-generated or engineered text,evade detection,demonstrating that these classifiers should not stand as the sole arbiters of authenticity.Amore comprehensive approach is warranted,one which weaves model outputs into a systematic process supported by expert judgment and institutional guidelines designed to protect originality. 展开更多
关键词 AI-GA dataset bidirectional LSTM GloVe embeddings AI-generated text detection academic integrity deep learning OVERFITTING natural language processing
在线阅读 下载PDF
大语言模型幻觉检测方法综述 被引量:1
20
作者 李自拓 孙建彬 +5 位作者 陈广州 方馨悦 崔瑞靖 田植良 黄震 杨克巍 《计算机研究与发展》 北大核心 2026年第1期123-146,共24页
近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucinati... 近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucination)问题引起了学术界和工业界的广泛关注。如何有效检测大语言模型幻觉,成为确保其在文本生成等下游任务可靠、安全、可信应用的关键挑战。该研究着重对大语言模型幻觉检测方法进行综述:首先,介绍了大语言模型概念,进一步明确了幻觉的定义与分类,系统梳理了大语言模型从构建到部署应用全生命周期各环节的特点,并深入分析了幻觉的产生机制与诱因;其次,立足于实际应用需求,考虑到在不同任务场景下模型透明度的差异等因素,将幻觉检测方法划分为针对白盒模型和黑盒模型2类,并进行了重点梳理和深入对比;而后,分析总结了现阶段主流的幻觉检测基准,为后续开展幻觉检测奠定基础;最后,指出了大语言模型幻觉检测的各种潜在研究方法和新的挑战。 展开更多
关键词 幻觉检测 大语言模型 事实一致性 文本生成 自然语言处理
在线阅读 下载PDF
上一页 1 2 39 下一页 到第
使用帮助 返回顶部