期刊文献+
共找到3,775篇文章
< 1 2 189 >
每页显示 20 50 100
Chinese DeepSeek: Performance of Various Oversampling Techniques on Public Perceptions Using Natural Language Processing
1
作者 Anees Ara Muhammad Mujahid +2 位作者 Amal Al-Rasheed Shaha Al-Otaibi Tanzila Saba 《Computers, Materials & Continua》 2025年第8期2717-2731,共15页
DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without... DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without supervised fine-tuning as a preliminary step,demonstrates remarkable reasoning capabilities of performing a wide range of tasks.DeepSeek is a prominent AI-driven chatbot that assists individuals in learning and enhances responses by generating insightful solutions to inquiries.Users possess divergent viewpoints regarding advanced models like DeepSeek,posting both their merits and shortcomings across several social media platforms.This research presents a new framework for predicting public sentiment to evaluate perceptions of DeepSeek.To transform the unstructured data into a suitable manner,we initially collect DeepSeek-related tweets from Twitter and subsequently implement various preprocessing methods.Subsequently,we annotated the tweets utilizing the Valence Aware Dictionary and sentiment Reasoning(VADER)methodology and the lexicon-driven TextBlob.Next,we classified the attitudes obtained from the purified data utilizing the proposed hybrid model.The proposed hybrid model consists of long-term,shortterm memory(LSTM)and bidirectional gated recurrent units(BiGRU).To strengthen it,we include multi-head attention,regularizer activation,and dropout units to enhance performance.Topic modeling employing KMeans clustering and Latent Dirichlet Allocation(LDA),was utilized to analyze public behavior concerning DeepSeek.The perceptions demonstrate that 82.5%of the people are positive,15.2%negative,and 2.3%neutral using TextBlob,and 82.8%positive,16.1%negative,and 1.2%neutral using the VADER analysis.The slight difference in results ensures that both analyses concur with their overall perceptions and may have distinct views of language peculiarities.The results indicate that the proposed model surpassed previous state-of-the-art approaches. 展开更多
关键词 DeepSeek PREDICTION natural language processing deep learning analysis TextBlob imbalance data
在线阅读 下载PDF
Deep Learning-Based Natural Language Processing Model and Optical Character Recognition for Detection of Online Grooming on Social Networking Services
2
作者 Sangmin Kim Byeongcheon Lee +2 位作者 Muazzam Maqsood Jihoon Moon Seungmin Rho 《Computer Modeling in Engineering & Sciences》 2025年第5期2079-2108,共30页
The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children a... The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children and adolescents who are increasingly exposed to online grooming crimes.Early and accurate identification of grooming conversations is crucial in preventing long-term harm to victims.However,research on grooming detection in South Korea remains limited,as existing models trained primarily on English text and fail to reflect the unique linguistic features of SNS conversations,leading to inaccurate classifications.To address these issues,this study proposes a novel framework that integrates optical character recognition(OCR)technology with KcELECTRA,a deep learning-based natural language processing(NLP)model that shows excellent performance in processing the colloquial Korean language.In the proposed framework,the KcELECTRA model is fine-tuned by an extensive dataset,including Korean social media conversations,Korean ethical verification data from AI-Hub,and Korean hate speech data from Hug-gingFace,to enable more accurate classification of text extracted from social media conversation images.Experimental results show that the proposed framework achieves an accuracy of 0.953,outperforming existing transformer-based models.Furthermore,OCR technology shows high accuracy in extracting text from images,demonstrating that the proposed framework is effective for online grooming detection.The proposed framework is expected to contribute to the more accurate detection of grooming text and the prevention of grooming-related crimes. 展开更多
关键词 Online grooming KcELECTRA natural language processing optical character recognition social networking service text classification
在线阅读 下载PDF
大模型在NLP基准测试中的方法与挑战
3
作者 吴迪 《黎明职业大学学报》 2025年第2期85-92,共8页
为有效评估大规模预训练模型(如GPT,BERT,T5等)的性能,基准测试作为一种标准化的评估方法,变得愈发重要。首先,文中论述当前大模型(LLMs)在NLP(自然语言处理)基准测试的主要方法和数据集,分析诸如在知识类问答、代码生成、数学和中文能... 为有效评估大规模预训练模型(如GPT,BERT,T5等)的性能,基准测试作为一种标准化的评估方法,变得愈发重要。首先,文中论述当前大模型(LLMs)在NLP(自然语言处理)基准测试的主要方法和数据集,分析诸如在知识类问答、代码生成、数学和中文能力等不同任务中使用的基准测试框架。然后,探讨现有基准测试的优缺点,阐述其在模型比较、性能评估和研究在推动方面的作用及不足;同时,还讨论中文基准测试面临的挑战(如中文语言特性、中文数据集、传统评估指标和可解释性不足等)。最后,提出基准测试未来的发展方向,包括引入更具挑战性的任务、增强定性评估方法及促进多模态跨领域的基准测试(如ARC-AGI任务),以期推动NLP大模型的持续进步和更具智能化。 展开更多
关键词 自然语言处理(nlp) 大模型(LLMs) 基准测试 大规模预训练模型
在线阅读 下载PDF
基于自然语言处理(NLP)的生态环境准入清单政策内容分析 被引量:2
4
作者 魏泽洋 汪自书 +3 位作者 宫曼莉 谢丹 杨洋 刘毅 《环境工程技术学报》 北大核心 2025年第1期1-10,共10页
生态环境准入清单是生态环境分区管控制度的核心抓手,通过空间布局约束、污染排放管控、环境风险防控和资源能源利用效率控制等维度实现生态环境源头预防。生态环境准入清单存在政策文本庞大、管控措施多样、表达构成复杂特点,识别准入... 生态环境准入清单是生态环境分区管控制度的核心抓手,通过空间布局约束、污染排放管控、环境风险防控和资源能源利用效率控制等维度实现生态环境源头预防。生态环境准入清单存在政策文本庞大、管控措施多样、表达构成复杂特点,识别准入清单管控的对象、方式与力度是支撑生态环境分区管控政策实施的重要基础。本研究基于自然语言机器无监督学习技术对生态环境准入清单进行政策词汇模式挖掘并对政策文本设定多维定量化标签,应用自然语言深度学习模型对生态环境准入清单管控措施进行文本分类评估。河北省是我国产业门类最齐全、资源环境问题最复杂的省份之一,其生态环境准入管控具有典型性和代表性。以河北省生态环境准入清单的产业管控措施为例,识别了10类政策关键词特征、64项主要政策关键词,对全清单中对应关键词所在的语句覆盖率达95%;构造了24个管控措施-行业的分类标签,应用并比较了BERT、RoBERTa和ALBERT深度学习模型对政策文本的分类识别效果,预测精度、召回率和F1得分最高分别可达到0.95、0.79和0.86,训练模型可较好地识别准入清单政策内容。结果显示河北省准入清单在管控措施明确化、具体化、定量化方面仍存在不足,产业精细化管控、考核指标型以及时限型内容有待补充和细化。本研究提出的方法具有较好的适用前景,建议在此基础上结合前沿人工智能方法,进一步提高模型自动处理效率、动态分析以及提供精细化政策调整建议的能力。 展开更多
关键词 生态环境分区管控 生态环境准入清单 政策文本 自然语言处理(nlp)
在线阅读 下载PDF
Unlocking the Potential:A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
5
作者 Ebtesam Ahmad Alomari 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期43-85,共43页
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in... As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues. 展开更多
关键词 Generative AI large languagemodel(LLM) natural language processing(nlp) ChatGPT GPT(generative pretraining transformer) GPT-4 sentiment analysis NER information extraction ANNOTATION text classification
在线阅读 下载PDF
Literature classification and its applications in condensed matter physics and materials science by natural language processing
6
作者 吴思远 朱天念 +5 位作者 涂思佳 肖睿娟 袁洁 吴泉生 李泓 翁红明 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第5期117-123,共7页
The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classificatio... The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions. 展开更多
关键词 natural language processing text mining materials science
原文传递
Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning
7
作者 Aizaz Ali Maqbool Khan +2 位作者 Khalil Khan Rehan Ullah Khan Abdulrahman Aloraini 《Computers, Materials & Continua》 SCIE EI 2024年第4期713-733,共21页
Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understandingpublic opinion and user sentiment across diverse languages.While numerous scholars conduct sentime... Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understandingpublic opinion and user sentiment across diverse languages.While numerous scholars conduct sentiment analysisin widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grapplingwith resource-poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language,characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu,Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguisticfeatures, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis aformidable undertaking. The limited availability of resources has fueled increased interest among researchers,prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu languagesentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into fivelabels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments andemotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, theinitial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such asnewspapers, articles, and socialmedia comments. Subsequent to this data collection, a thorough process of cleaningand preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deeplearningmodels, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for bothtraining and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning tooptimize the models’ efficacy. Evaluation metrics such as precision, recall, and the F1-score are employed to assessthe effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis,gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN,solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language. 展开更多
关键词 Urdu sentiment analysis convolutional neural networks recurrent neural network deep learning natural language processing neural networks
在线阅读 下载PDF
基于NLP和SEM的博物馆导视系统设计优化策略研究
8
作者 王朝伟 郑刚强 +1 位作者 孙嘉伟 王征 《包装工程》 北大核心 2025年第16期472-483,共12页
目的基于自然语言处理(NLP)和结构方程模型(SEM)构建博物馆导视系统的设计优化路径,系统揭示关键设计因子对游客满意度的影响机制并提出具备普适适用性的优化策略,以提升导视系统的整体质量与用户体验。方法采用文本挖掘技术从多个旅游... 目的基于自然语言处理(NLP)和结构方程模型(SEM)构建博物馆导视系统的设计优化路径,系统揭示关键设计因子对游客满意度的影响机制并提出具备普适适用性的优化策略,以提升导视系统的整体质量与用户体验。方法采用文本挖掘技术从多个旅游平台获取用户评论,结合NLP词频分析与共现矩阵构建提取游客关注焦点。在用户体验理论与信息设计原则指导下,辅以定性访谈明确核心设计范畴,进一步转化为测量指标。通过探索性因子分析与主成分分析提取潜在变量,构建并验证结构方程模型,分析关键因子对满意度的路径影响关系。结果模型拟合度良好,验证了文化功能、信息传递、视觉设计、交互性与可用性五个外生变量对满意度的显著正向影响,而信息传递为最关键因子。基于路径系数结果,提出涵盖五大设计维度的系统性优化路径,明确了导视系统设计的优先介入顺序与策略方向。结论在实证基础上提出面向满意度提升的导视系统优化路径框架,为博物馆导视系统的系统化设计与科学决策提供理论依据与方法支持,拓展了结构方程模型在设计研究中的应用边界,具有良好的迁移性与实践指导价值。 展开更多
关键词 博物馆导视系统 自然语言处理(nlp) 结构方程模型(SEM) 设计影响因素 设计优化策略
在线阅读 下载PDF
基于大语言模型的NLP数据增强方法综述 被引量:3
9
作者 许德龙 林民 +1 位作者 王玉荣 张树钧 《计算机科学与探索》 北大核心 2025年第6期1395-1413,共19页
当前,大语言模型在自然语言处理(NLP)领域展现出巨大的潜力,但其训练过程依赖于大量高质量样本。在低资源场景下,随着模型规模不断扩大,现有数据样本数量难以支撑模型训练收敛,这一问题激发了相关领域科研工作者对于数据增强方法的研究... 当前,大语言模型在自然语言处理(NLP)领域展现出巨大的潜力,但其训练过程依赖于大量高质量样本。在低资源场景下,随着模型规模不断扩大,现有数据样本数量难以支撑模型训练收敛,这一问题激发了相关领域科研工作者对于数据增强方法的研究。然而,传统数据增强方法在NLP领域大模型背景下存在应用范围有限和数据失真的问题。相比之下,基于大语言模型的数据增强方法能够更有效地应对这一挑战。全面探讨了现阶段NLP领域大语言模型数据增强方法,采用了综合性的视角研究NLP领域数据增强。对NLP领域传统数据增强方法进行分析与总结。将现阶段NLP领域多种大语言模型数据增强方法归纳总结,并深入探讨了每一种方法的适用范围、优点以及局限性。介绍了NLP领域数据增强评估方法。通过对当前方法的对比实验和结果分析讨论了NLP领域大语言模型数据增强方法的未来研究方向,并提出了前瞻性建议。 展开更多
关键词 数据增强方法 大语言模型 自然语言处理 深度学习 人工智能
在线阅读 下载PDF
Application of Natural Language Processing in Virtual Experience AI Interaction Design
10
作者 Ziqian Rong 《Journal of Intelligent Learning Systems and Applications》 2024年第4期403-417,共15页
This paper investigates the application of Natural Language Processing (NLP) in AI interaction design for virtual experiences. It analyzes the impact of various interaction methods on user experience, integrating Virt... This paper investigates the application of Natural Language Processing (NLP) in AI interaction design for virtual experiences. It analyzes the impact of various interaction methods on user experience, integrating Virtual Reality (VR) and Augmented Reality (AR) technologies to achieve more natural and intuitive interaction models through NLP techniques. Through experiments and data analysis across multiple technical models, this study proposes an innovative design solution based on natural language interaction and summarizes its advantages and limitations in immersive experiences. 展开更多
关键词 Natural language processing Virtual Reality Augmented Reality Interaction Design User Experience
在线阅读 下载PDF
Herding and investor sentiment after the cryptocurrency crash:evidence from Twitter and natural language processing
11
作者 Michael Cary 《Financial Innovation》 2024年第1期425-447,共23页
Although the 2022 cryptocurrency market crash prompted despair among investors,the rallying cry,“wagmi”(We’re all gonna make it.)emerged among cryptocurrency enthusiasts in the aftermath.Did cryptocurrency enthusia... Although the 2022 cryptocurrency market crash prompted despair among investors,the rallying cry,“wagmi”(We’re all gonna make it.)emerged among cryptocurrency enthusiasts in the aftermath.Did cryptocurrency enthusiasts respond to this crash differently compared to traditional investors?Using natural language processing techniques applied to Twitter data,this study employed a difference-in-differences method to determine whether the cryptocurrency market crash had a differential effect on investor sentiment toward cryptocurrency enthusiasts relative to more traditional investors.The results indicate that the crash affected investor sentiment among cryptocurrency enthusiastic investors differently from traditional investors.In particular,cryptocurrency enthusiasts’tweets became more neutral and,surprisingly,less negative.This result appears to be primarily driven by a deliberate,collectivist effort to promote positivity within the cryptocurrency community(“wagmi”).Considering the more nuanced emotional content of tweets,it appears that cryptocurrency enthusiasts expressed less joy and surprise in the aftermath of the cryptocurrency crash than traditional investors.Moreover,cryptocurrency enthusiasts tweeted more frequently after the cryptocurrency crash,with a relative increase in tweet frequency of approximately one tweet per day.An analysis of the specific textual content of tweets provides evidence of herding behavior among cryptocurrency enthusiasts. 展开更多
关键词 Bitcoin Cryptocurrency HERDING Investor sentiment Natural language processing Sentiment analysis TWITTER
在线阅读 下载PDF
小学教育现代化:教师视角的核心关切与现实困境分析——基于自然语言处理(NLP)技术
12
作者 杨黎 宋乃庆 谢路 《教育与教学研究》 2025年第6期83-95,共13页
小学教育现代化是实现基础教育高质量发展的关键环节。当前关于小学教育现代化的研究多聚焦宏观理论与政策设计,对教师在实践中的实际感受和意见关注不足。本研究基于全国中东西部25省市的6942位小学教师的意见数据,运用自然语言处理(N... 小学教育现代化是实现基础教育高质量发展的关键环节。当前关于小学教育现代化的研究多聚焦宏观理论与政策设计,对教师在实践中的实际感受和意见关注不足。本研究基于全国中东西部25省市的6942位小学教师的意见数据,运用自然语言处理(NLP)技术和词向量分析模型,对教师意见数据进行定量分析,系统挖掘小学教师在学校教育现代化进程中的核心关注点与现实困境,为政策制定者提供基层教育工作者的直接反馈,并在此基础上提出了小学教育现代化改进与完善的对策建议,为小学教育现代化的理论研究和实践探索提供科学依据和实践参考。 展开更多
关键词 小学教育 现代化发展 教师视角 自然语言处理(nlp)技术 词向量模型
在线阅读 下载PDF
NLPShield:基于机器遗忘的文本后门攻击防御方法
13
作者 李炳佳 熊熙 《国外电子测量技术》 2025年第2期9-16,共8页
在自然语言处理(Natural Language Processing,NLP)领域,后门攻击已成为现代NLP应用的重大威胁,严重影响系统的安全性与可靠性。尽管文本领域已提出多种防御策略,但在不接触中毒数据集也不参与后门训练过程时,面对复杂的攻击场景,现有... 在自然语言处理(Natural Language Processing,NLP)领域,后门攻击已成为现代NLP应用的重大威胁,严重影响系统的安全性与可靠性。尽管文本领域已提出多种防御策略,但在不接触中毒数据集也不参与后门训练过程时,面对复杂的攻击场景,现有方法仍难以有效应对。为此,提出一种基于机器遗忘的文本后门攻击防御方法NLPShield。该方法仅需少量干净样本,通过基于错误标注的训练和干净神经元剪枝两个关键阶段,实现对文本后门攻击的有效防御。实验在SST-2和AGNews数据集上进行,结果显示,在保持较高干净准确率的情况下,NLPShield方法相较于现有最先进基线防御方法,平均能将攻击成功率降低24.83%。这表明NLPShield方法能显著提升多种后门攻击的防御效果,切实有效地缓解文本后门攻击。 展开更多
关键词 自然语言处理 机器遗忘 后门攻击 防御
原文传递
基于NLP的基站巡检工单稽核研究
14
作者 韩龙刚 马方明 +2 位作者 郭宝 邱禹 尹若玮 《电信工程技术与标准化》 2025年第5期49-53,共5页
基站巡检工单涵盖大量文本信息,传统方法往往无法有效进行智能化处理,仍需专业人员投入大量时间进行稽核。为解决这一问题,本文引入自然语言处理技术,重点探讨了如何利用自然语言推断技术判断字段间是否存在矛盾关系,并通过微调预训练... 基站巡检工单涵盖大量文本信息,传统方法往往无法有效进行智能化处理,仍需专业人员投入大量时间进行稽核。为解决这一问题,本文引入自然语言处理技术,重点探讨了如何利用自然语言推断技术判断字段间是否存在矛盾关系,并通过微调预训练模型进行了实证研究。实证结果表明,该方法在准确性和处理速度方面均优于传统的人工方法,具有较高的实际应用价值。最后,展望了基站巡检工单稽核技术的发展趋势,提出了进一步优化和扩展的方向。 展开更多
关键词 自然语言处理 工单稽核 巡检 多模态
在线阅读 下载PDF
基于NLP-CBR的航空装备故障智能预测方法
15
作者 周乐 黄栋 康吉祥 《舰船电子工程》 2025年第4期124-127,158,共5页
航空装备各系统复杂性高,交联性强,利用传统的故障预测方法难以判断和预测故障部位。针对此类问题,文章提出了一种基于自然语言处理(Natural Language Processing,NLP)和案例推理(Case-Based Reasoning,CBR)的航空装备故障智能预测方法... 航空装备各系统复杂性高,交联性强,利用传统的故障预测方法难以判断和预测故障部位。针对此类问题,文章提出了一种基于自然语言处理(Natural Language Processing,NLP)和案例推理(Case-Based Reasoning,CBR)的航空装备故障智能预测方法,首先进行源案例库特征提取,进行数据清洗,再采用了自然语言处理(Natural Language Processing,NLP)获取源案例库中故障现象描述的词向量表示,然后使用文本相似度检测得到目标案例与源案例的相似度,并结合CBR方法进行组合预测,得出具体排故方案,最后,通过使用Python软件进行实例分析应用,验证了该模型的正确性及该方法的实用性。 展开更多
关键词 航空装备 自然语言处理 案例推理
在线阅读 下载PDF
GLMTopic:A Hybrid Chinese Topic Model Leveraging Large Language Models
16
作者 Weisi Chen Walayat Hussain Junjie Chen 《Computers, Materials & Continua》 2025年第10期1559-1583,共25页
Topic modeling is a fundamental technique of content analysis in natural language processing,widely applied in domains such as social sciences and finance.In the era of digital communication,social scientists increasi... Topic modeling is a fundamental technique of content analysis in natural language processing,widely applied in domains such as social sciences and finance.In the era of digital communication,social scientists increasingly rely on large-scale social media data to explore public discourse,collective behavior,and emerging social concerns.However,traditional models like Latent Dirichlet Allocation(LDA)and neural topic models like BERTopic struggle to capture deep semantic structures in short-text datasets,especially in complex non-English languages like Chinese.This paper presents Generative Language Model Topic(GLMTopic)a novel hybrid topic modeling framework leveraging the capabilities of large language models,designed to support social science research by uncovering coherent and interpretable themes from Chinese social media platforms.GLMTopic integrates Adaptive Community-enhanced Graph Embedding for advanced semantic representation,Uniform Manifold Approximation and Projection-based(UMAP-based)dimensionality reduction,Hierarchical Density-Based Spatial Clustering of Applications with Noise(HDBSCAN)clustering,and large language model-powered(LLM-powered)representation tuning to generate more contextually relevant and interpretable topics.By reducing dependence on extensive text preprocessing and human expert intervention in post-analysis topic label annotation,GLMTopic facilitates a fully automated and user-friendly topic extraction process.Experimental evaluations on a social media dataset sourced from Weibo demonstrate that GLMTopic outperforms Latent Dirichlet Allocation(LDA)and BERTopic in coherence score and usability with automated interpretation,providing a more scalable and semantically accurate solution for Chinese topic modeling.Future research will explore optimizing computational efficiency,integrating knowledge graphs and sentiment analysis for more complicated workflows,and extending the framework for real-time and multilingual topic modeling. 展开更多
关键词 Topic modeling large language model deep learning natural language processing text mining
在线阅读 下载PDF
Multilingual Text Summarization in Healthcare Using Pre-Trained Transformer-Based Language Models
17
作者 Josua Käser Thomas Nagy +1 位作者 Patrick Stirnemann Thomas Hanne 《Computers, Materials & Continua》 2025年第4期201-217,共17页
We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of t... We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of these models and their ability to perform the task of abstractive text summarization in the healthcare field.The research hypothesis was that large language models could perform high-quality abstractive text summarization on German technical healthcare texts,even if the model is not specifically trained in that language.Through experiments,the research questions explore the performance of transformer language models in dealing with complex syntax constructs,the difference in performance between models trained in English and German,and the impact of translating the source text to English before conducting the summarization.We conducted an evaluation of four PLMs(GPT-3,a translation-based approach also utilizing GPT-3,a German language Model,and a domain-specific bio-medical model approach).The evaluation considered the informativeness using 3 types of metrics based on Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and the quality of results which is manually evaluated considering 5 aspects.The results show that text summarization models could be used in the German healthcare domain and that domain-independent language models achieved the best results.The study proves that text summarization models can simplify the search for pre-existing German knowledge in various domains. 展开更多
关键词 Text summarization pre-trained transformer-based language models large language models technical healthcare texts natural language processing
在线阅读 下载PDF
A Critical Review of Methods and Challenges in Large Language Models
18
作者 Milad Moradi Ke Yan +2 位作者 David Colwell Matthias Samwald Rhona Asgari 《Computers, Materials & Continua》 2025年第2期1681-1698,共18页
This critical review provides an in-depth analysis of Large Language Models(LLMs),encompassing their foundational principles,diverse applications,and advanced training methodologies.We critically examine the evolution... This critical review provides an in-depth analysis of Large Language Models(LLMs),encompassing their foundational principles,diverse applications,and advanced training methodologies.We critically examine the evolution from Recurrent Neural Networks(RNNs)to Transformer models,highlighting the significant advancements and innovations in LLM architectures.The review explores state-of-the-art techniques such as in-context learning and various fine-tuning approaches,with an emphasis on optimizing parameter efficiency.We also discuss methods for aligning LLMs with human preferences,including reinforcement learning frameworks and human feedback mechanisms.The emerging technique of retrieval-augmented generation,which integrates external knowledge into LLMs,is also evaluated.Additionally,we address the ethical considerations of deploying LLMs,stressing the importance of responsible and mindful application.By identifying current gaps and suggesting future research directions,this review provides a comprehensive and critical overview of the present state and potential advancements in LLMs.This work serves as an insightful guide for researchers and practitioners in artificial intelligence,offering a unified perspective on the strengths,limitations,and future prospects of LLMs. 展开更多
关键词 Large language models artificial intelligence natural language processing machine learning generative artificial intelligence
在线阅读 下载PDF
The Development of Large Language Models in the Financial Field
19
作者 Yanling Liu Yun Li 《Proceedings of Business and Economic Studies》 2025年第2期49-54,共6页
With the rapid development of natural language processing(NLP)and machine learning technology,applying large language models(LLMs)in the financial field shows a significant growth trend.This paper systematically revie... With the rapid development of natural language processing(NLP)and machine learning technology,applying large language models(LLMs)in the financial field shows a significant growth trend.This paper systematically reviews the development status,main applications,challenges,and future development direction of LLMs in the financial field.Financial Language models(FinLLMs)have been successfully applied to many scenarios,such as sentiment analysis,automated trading,risk assessment,etc.,through deep learning architectures such as BERT,Llama,and domain data fine-tuning.However,issues such as data privacy,model interpretability,and ethical governance still pose constraints to their widespread application.Future research should focus on improving model performance,addressing bias issues,strengthening privacy protection,and establishing a sound regulatory framework to ensure the healthy development of LLMs in the financial sector. 展开更多
关键词 Large language model Fintech Natural language processing Ethics of artificial intelligence
在线阅读 下载PDF
Large Language Model-Driven Knowledge Discovery for Designing Advanced Micro/Nano Electrocatalyst Materials
20
作者 Ying Shen Shichao Zhao +3 位作者 Yanfei Lv Fei Chen Li Fu Hassan Karimi-Maleh 《Computers, Materials & Continua》 2025年第8期1921-1950,共30页
This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electroca... This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electrocatalysis is central to sustainable energy and environmental technologies,but traditional catalyst discovery is often hindered by high complexity,fragmented knowledge,and inefficiencies.LLMs,particularly those based on Transformer architectures,offer unprecedented capabilities in extracting,synthesizing,and generating scientific knowledge from vast unstructured textual corpora.This work provides the first structured synthesis of how LLMs have been leveraged across various electrocatalysis tasks,including automated information extraction from literature,text-based property prediction,hypothesis generation,synthesis planning,and knowledge graph construction.We comparatively analyze leading LLMs and domain-specific frameworks(e.g.,CatBERTa,CataLM,CatGPT)in terms of methodology,application scope,performance metrics,and limitations.Through curated case studies across key electrocatalytic reactions—HER,OER,ORR,and CO_(2)RR—we highlight emerging trends such as the growing use of embedding-based prediction,retrieval-augmented generation,and fine-tuned scientific LLMs.The review also identifies persistent challenges,including data heterogeneity,hallucination risks,lack of standard benchmarks,and limited multimodal integration.Importantly,we articulate future research directions,such as the development of multimodal and physics-informedMatSci-LLMs,enhanced interpretability tools,and the integration of LLMswith selfdriving laboratories for autonomous discovery.By consolidating fragmented advances and outlining a unified research roadmap,this review provides valuable guidance for both materials scientists and AI practitioners seeking to accelerate catalyst innovation through large language model technologies. 展开更多
关键词 Large languagemodels ELECTROCATALYSIS NANOMATERIALS knowledge discovery materials design artificial intelligence natural language processing
在线阅读 下载PDF
上一页 1 2 189 下一页 到第
使用帮助 返回顶部