期刊文献+
共找到33,916篇文章
< 1 2 250 >
每页显示 20 50 100
Gate-Attention and Dual-End Enhancement Mechanism for Multi-Label Text Classification 被引量:1
1
作者 Jieren Cheng Xiaolong Chen +3 位作者 Wenghang Xu Shuai Hua Zhu Tang Victor S.Sheng 《Computers, Materials & Continua》 SCIE EI 2023年第11期1779-1793,共15页
In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in sema... In the realm of Multi-Label Text Classification(MLTC),the dual challenges of extracting rich semantic features from text and discerning inter-label relationships have spurred innovative approaches.Many studies in semantic feature extraction have turned to external knowledge to augment the model’s grasp of textual content,often overlooking intrinsic textual cues such as label statistical features.In contrast,these endogenous insights naturally align with the classification task.In our paper,to complement this focus on intrinsic knowledge,we introduce a novel Gate-Attention mechanism.This mechanism adeptly integrates statistical features from the text itself into the semantic fabric,enhancing the model’s capacity to understand and represent the data.Additionally,to address the intricate task of mining label correlations,we propose a Dual-end enhancement mechanism.This mechanism effectively mitigates the challenges of information loss and erroneous transmission inherent in traditional long short term memory propagation.We conducted an extensive battery of experiments on the AAPD and RCV1-2 datasets.These experiments serve the dual purpose of confirming the efficacy of both the Gate-Attention mechanism and the Dual-end enhancement mechanism.Our final model unequivocally outperforms the baseline model,attesting to its robustness.These findings emphatically underscore the imperativeness of taking into account not just external knowledge but also the inherent intricacies of textual data when crafting potent MLTC models. 展开更多
关键词 multi-label text classification feature extraction label distribution information sequence generation
在线阅读 下载PDF
AI-Generated Text Detection:A Comprehensive Review of Active and Passive Approaches
2
作者 Lingyun Xiang Nian Li +1 位作者 Yuling Liu Jiayong Hu 《Computers, Materials & Continua》 2026年第3期201-229,共29页
The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and ot... The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and other risks.Detecting AI-generated text has thus become essential to safeguard the authenticity and reliability of digital information.This survey reviews recent progress in detection methods,categorizing approaches into passive and active categories based on their reliance on intrinsic textual features or embedded signals.Passive detection is further divided into surface linguistic feature-based and language model-based methods,whereas active detection encompasses watermarking-based and semantic retrieval-based approaches.This taxonomy enables systematic comparison of methodological differences in model dependency,applicability,and robustness.A key challenge for AI-generated text detection is that existing detectors are highly vulnerable to adversarial attacks,particularly paraphrasing,which substantially compromises their effectiveness.Addressing this gap highlights the need for future research on enhancing robustness and cross-domain generalization.By synthesizing current advances and limitations,this survey provides a structured reference for the field and outlines pathways toward more reliable and scalable detection solutions. 展开更多
关键词 AI-generated text detection large language models text classification WATERMARKING
在线阅读 下载PDF
A Unified Feature Selection Framework Combining Mutual Information and Regression Optimization for Multi-Label Learning
3
作者 Hyunki Lim 《Computers, Materials & Continua》 2026年第4期1262-1281,共20页
High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ... High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques. 展开更多
关键词 feature selection multi-label learning regression model optimization mutual information
在线阅读 下载PDF
Multi-Label Classification Model Using Graph Convolutional Neural Network for Social Network Nodes
4
作者 Junmin Lyu Guangyu Xu +4 位作者 Feng Bao Yu Zhou Yuxin Liu Siyu Lu Wenfeng Zheng 《Computer Modeling in Engineering & Sciences》 2026年第2期1235-1256,共22页
Graph neural networks(GNN)have shown strong performance in node classification tasks,yet most existing models rely on uniform or shared weight aggregation,lacking flexibility in modeling the varying strength of relati... Graph neural networks(GNN)have shown strong performance in node classification tasks,yet most existing models rely on uniform or shared weight aggregation,lacking flexibility in modeling the varying strength of relationships among nodes.This paper proposes a novel graph coupling convolutional model that introduces an adaptive weighting mechanism to assign distinct importance to neighboring nodes based on their similarity to the central node.Unlike traditional methods,the proposed coupling strategy enhances the interpretability of node interactions while maintaining competitive classification performance.The model operates in the spatial domain,utilizing adjacency list structures for efficient convolution and addressing the limitations of weight sharing through a coupling-based similarity computation.Extensive experiments are conducted on five graph-structured datasets,including Cora,Citeseer,PubMed,Reddit,and BlogCatalog,as well as a custom topology dataset constructed from the Open University Learning Analytics Dataset(OULAD)educational platform.Results demonstrate that the proposed model achieves good classification accuracy,while significantly reducing training time through direct second-order neighbor fusion and data preprocessing.Moreover,analysis of neighborhood order reveals that considering third-order neighbors offers limited accuracy gains but introduces considerable computational overhead,confirming the efficiency of first-and second-order convolution in practical applications.Overall,the proposed graph coupling model offers a lightweight,interpretable,and effective framework for multi-label node classification in complex networks. 展开更多
关键词 GNN social networks nodes multi-label classification model graphic convolution neural network coupling principle
在线阅读 下载PDF
基于BERT-TextCNN模型的基础研究与应用研究论文分类方法研究
5
作者 张萌萌 钟永恒 刘佳 《科技管理研究》 2026年第1期256-267,共12页
研究旨在构建一种高效且精准的分类模型用于判别单篇论文归属基础研究或应用研究。通过构建融合半自动标注的BERT-TextCNN模型,借助半自动标注策略降低人工标注工作量并提高模型分类效率,利用BERT生成文本向量,通过TextCNN提取关键特征... 研究旨在构建一种高效且精准的分类模型用于判别单篇论文归属基础研究或应用研究。通过构建融合半自动标注的BERT-TextCNN模型,借助半自动标注策略降低人工标注工作量并提高模型分类效率,利用BERT生成文本向量,通过TextCNN提取关键特征;通过文献计量法和BERTopic模型分析量子信息领域的分类结果。结果表明,该模型的F1值高达0.896,相较于BERT和TextCNN分别提升2.1%和7.9%,并显著优于Baichuan4-Turbo、DeepSeek-V3和GLM-4-Plus等大语言模型,F1值提升幅度分别为12.2%、13.1%和18.8%。这既验证了语义表征与局部特征融合机制的优越性,又有效克服了大语言模型在专业领域分类中存在的“高召回低精度”缺陷。将模型应用至量子信息领域,发现基础研究聚焦在量子态与纠缠、离子自旋等方向,应用研究重点关注密钥分发、量子传感与网络组件等研究。研究为科学文献分类提供了新方法,在科研评估与资源优化方面具有重要应用价值。 展开更多
关键词 文献分类 深度学习 半自动标注 文本挖掘 量子信息
在线阅读 下载PDF
Context-Aware Spam Detection Using BERT Embeddings with Multi-Window CNNs
6
作者 Sajid Ali Qazi Mazhar Ul Haq +3 位作者 Ala Saleh Alluhaidan Muhammad Shahid Anwar Sadique Ahmad Leila Jamel 《Computer Modeling in Engineering & Sciences》 2026年第1期1296-1310,共15页
Spam emails remain one of the most persistent threats to digital communication,necessitating effective detection solutions that safeguard both individuals and organisations.We propose a spam email classification frame... Spam emails remain one of the most persistent threats to digital communication,necessitating effective detection solutions that safeguard both individuals and organisations.We propose a spam email classification frame-work that uses Bidirectional Encoder Representations from Transformers(BERT)for contextual feature extraction and a multiple-window Convolutional Neural Network(CNN)for classification.To identify semantic nuances in email content,BERT embeddings are used,and CNN filters extract discriminative n-gram patterns at various levels of detail,enabling accurate spam identification.The proposed model outperformed Word2Vec-based baselines on a sample of 5728 labelled emails,achieving an accuracy of 98.69%,AUC of 0.9981,F1 Score of 0.9724,and MCC of 0.9639.With a medium kernel size of(6,9)and compact multi-window CNN architectures,it improves performance.Cross-validation illustrates stability and generalization across folds.By balancing high recall with minimal false positives,our method provides a reliable and scalable solution for current spam detection in advanced deep learning.By combining contextual embedding and a neural architecture,this study develops a security analysis method. 展开更多
关键词 E-mail spam detection BERT embedding text classification CYBERSECURITY CNN
在线阅读 下载PDF
Federated Multi-Label Feature Selection via Dual-Layer Hybrid Breeding Cooperative Particle Swarm Optimization with Manifold and Sparsity Regularization
7
作者 Songsong Zhang Huazhong Jin +5 位作者 Zhiwei Ye Jia Yang Jixin Zhang Dongfang Wu Xiao Zheng Dingfeng Song 《Computers, Materials & Continua》 2026年第1期1141-1159,共19页
Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal... Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics. 展开更多
关键词 multi-label feature selection federated learning manifold regularization sparse constraints hybrid breeding optimization algorithm particle swarm optimizatio algorithm privacy protection
在线阅读 下载PDF
Research on the Classification of Digital Cultural Texts Based on ASSC-TextRCNN Algorithm
8
作者 Zixuan Guo Houbin Wang +1 位作者 Sameer Kumar Yuanfang Chen 《Computers, Materials & Continua》 2026年第3期2119-2145,共27页
With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard ... With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard expression,which bring serious challenges to traditional classification methods.In order to cope with the above problems,this paper proposes a new ASSC(ALBERT,SVD,Self-Attention and Cross-Entropy)-TextRCNN digital cultural text classification model.Based on the framework of TextRCNN,the Albert pre-training language model is introduced to improve the depth and accuracy of semantic embedding.Combined with the dual attention mechanism,the model’s ability to capture and model potential key information in short texts is strengthened.The Singular Value Decomposition(SVD)was used to replace the traditional Max pooling operation,which effectively reduced the feature loss rate and retained more key semantic information.The cross-entropy loss function was used to optimize the prediction results,making the model more robust in class distribution learning.The experimental results indicate that,in the digital cultural text classification task,as compared to the baseline model,the proposed ASSC-TextRCNN method achieves an 11.85%relative improvement in accuracy and an 11.97%relative increase in the F1 score.Meanwhile,the relative error rate decreases by 53.18%.This achievement not only validates the effectiveness and advanced nature of the proposed approach but also offers a novel technical route and methodological underpinnings for the intelligent analysis and dissemination of digital cultural texts.It holds great significance for promoting the in-depth exploration and value realization of digital culture. 展开更多
关键词 text classification natural language processing textRCNN model albert pre-training singular value decomposition cross-entropy loss function
在线阅读 下载PDF
RNSQL:融合逆规范化的Text2SQL生成
9
作者 帖军 范子琪 +2 位作者 孙翀 郑禄 朱柏尔 《计算机应用与软件》 北大核心 2025年第9期31-37,86,共8页
Text2SQL是自然语言处理科研领域中的一项重要任务,在研究智能问答系统中发挥关键性的作用,其核心任务是将自然语言描述的问题自动转换为SQL查询语句。当前研究重点为提高SQL子句任务的匹配准确率,但忽略了SQL的句法生成的正确性,涉及... Text2SQL是自然语言处理科研领域中的一项重要任务,在研究智能问答系统中发挥关键性的作用,其核心任务是将自然语言描述的问题自动转换为SQL查询语句。当前研究重点为提高SQL子句任务的匹配准确率,但忽略了SQL的句法生成的正确性,涉及多表连接的SQL生成仍存在大量错误。因此,提出一种基于神经网络的Text2SQL方法,该方法通过逆规范化技术,对数据库模式进行重构,关注SQL句法生成的正确性,称为逆规范化网络(Reverse Normalization SQL,RNSQL)。经理论分析和在公共数据集Spider上实验验证,RNSQL能有效提升Text2SQL任务的质量。 展开更多
关键词 逆规范化 语义解析 text2SQL 槽填充
在线阅读 下载PDF
Multi-label text classification model based on semantic embedding 被引量:4
10
作者 Yan Danfeng Ke Nan +2 位作者 Gu Chao Cui Jianfei Ding Yiqi 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2019年第1期95-104,共10页
Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-... Text classification means to assign a document to one or more classes or categories according to content. Text classification provides convenience for users to obtain data. Because of the polysemy of text data, multi-label classification can handle text data more comprehensively. Multi-label text classification become the key problem in the data mining. To improve the performances of multi-label text classification, semantic analysis is embedded into the classification model to complete label correlation analysis, and the structure, objective function and optimization strategy of this model is designed. Then, the convolution neural network(CNN) model based on semantic embedding is introduced. In the end, Zhihu dataset is used for evaluation. The result shows that this model outperforms the related work in terms of recall and area under curve(AUC) metrics. 展开更多
关键词 multi-label text classification CONVOLUTION NEURAL network SEMANTIC analysis
原文传递
基于Transformer和Text-CNN的日志异常检测 被引量:1
11
作者 尹春勇 张小虎 《计算机工程与科学》 北大核心 2025年第3期448-458,共11页
日志数据作为软件系统中最为重要的数据资源之一,记录着系统运行期间的详细信息,自动化的日志异常检测对于维护系统安全至关重要。随着大型语言模型在自然语言处理领域的广泛应用,基于Transformer的日志异常检测方法被广泛地提出。传统... 日志数据作为软件系统中最为重要的数据资源之一,记录着系统运行期间的详细信息,自动化的日志异常检测对于维护系统安全至关重要。随着大型语言模型在自然语言处理领域的广泛应用,基于Transformer的日志异常检测方法被广泛地提出。传统的基于Transformer的方法,难以捕捉日志序列的局部特征,针对上述问题,提出了基于Transformer和Text-CNN的日志异常检测方法LogTC。首先,通过规则匹配将日志转换成结构化的日志数据,并保留日志语句中的有效信息;其次,根据日志特性采用固定窗口或会话窗口将日志语句划分为日志序列;再次,使用自然语言处理技术Sentence-BERT生成日志语句的语义化表示;最后,将日志序列的语义化向量输入到LogTC日志异常检测模型中进行检测。实验结果表明,LogTC能够有效地检测日志数据中的异常,且在2个数据集上都取得了较好的结果。 展开更多
关键词 日志异常检测 深度学习 词嵌入 TRANSFORMER text-CNN
在线阅读 下载PDF
J-TEXT托卡马克相干成像光谱诊断系统设计
12
作者 聂林 吴骏彬 +5 位作者 龙婷 雷驰 严伟 李杨波 张霄翼 J-TEXT实验团队 《核聚变与等离子体物理》 北大核心 2025年第3期273-279,共7页
相干成像光谱诊断是一种采用高速相机拍摄方式对等离子体边界的杂质离子流速进行二维成像的被动光谱诊断,对研究托卡马克边界和偏滤器等离子体环向旋转、杂质离子分布有着重要的作用。J-TEXT装置成功研制并部署了一套主要基于CⅢ线(464.... 相干成像光谱诊断是一种采用高速相机拍摄方式对等离子体边界的杂质离子流速进行二维成像的被动光谱诊断,对研究托卡马克边界和偏滤器等离子体环向旋转、杂质离子分布有着重要的作用。J-TEXT装置成功研制并部署了一套主要基于CⅢ线(464.88 nm)的相干成像光谱诊断系统。该系统的光学视场设计为12°,主要针对J-TEXT强场侧边缘等离子体区域进行观测。在性能指标方面,系统具备2 ms的时间分辨率,同时实现了11 mm(垂直方向)空间分辨率。目前该诊断系统已完成实验测试,并成功获取了等离子体边界的关键数据,为开展边界物理研究提供了新的实验手段。 展开更多
关键词 相干成像光谱诊断 环向速度 J-text托卡马克
在线阅读 下载PDF
中文短文本情感分类:融入位置感知强化的Transformer-TextCNN模型研究
13
作者 李浩君 王耀东 汪旭辉 《计算机工程与应用》 北大核心 2025年第11期216-226,共11页
针对当前中文短文本情感分类模型文本位置信息与关键特征获取不足的问题,提出了一种融入位置感知强化的Transformer-TextCNN情感分类模型。利用BERT可学习绝对位置编码与正弦位置编码强化模型的位置感知能力,融合Transformer的全局上下... 针对当前中文短文本情感分类模型文本位置信息与关键特征获取不足的问题,提出了一种融入位置感知强化的Transformer-TextCNN情感分类模型。利用BERT可学习绝对位置编码与正弦位置编码强化模型的位置感知能力,融合Transformer的全局上下文理解能力与TextCNN的局部特征捕捉能力,分别提取中文短文本全局特征与局部特征,构建位置感知强化与特征协同的情感特征输出服务,实现中文短文本情感准确分类。实验结果表明,该模型在视频弹幕数据集上的准确率达到90.23%,在SMP2020数据集上的准确率达到87.38%。相较于最优的基线模型,准确率在视频弹幕数据集和SMP2020数据集上分别提高了1.98和0.44个百分点,在中文短文本情感分类任务中取得更好的分类效果。 展开更多
关键词 文本情感分类 BERT TRANSFORMER textCNN 位置编码
在线阅读 下载PDF
基于Self-Attention和TextCNN-BiLSTM的中文评论文本情感分析模型 被引量:4
14
作者 龙宇 李秋生 《石河子大学学报(自然科学版)》 北大核心 2025年第1期111-121,共11页
目前关于中文评论文本的情感分类方法大都无法充分捕捉到句子的全局语义信息,同时也在长距离的语义连接或者情感转折理解上具有局限性,因而导致情感分析的准确度不高。针对这个问题,本文提出一种融合SelfAttention和TextCNN-BiLSTM的文... 目前关于中文评论文本的情感分类方法大都无法充分捕捉到句子的全局语义信息,同时也在长距离的语义连接或者情感转折理解上具有局限性,因而导致情感分析的准确度不高。针对这个问题,本文提出一种融合SelfAttention和TextCNN-BiLSTM的文本情感分析方法。该方法首先采用文本卷积神经网络(TextCNN)来提取局部特征,并利用双向长短期记忆网络(BiLSTM)来捕捉序列信息,从而综合考虑了全局和局部信息,在特征融合阶段,再采用自注意力机制来动态地融合不同层次的特征表示,对不同尺度特征进行加权,从而提高重要特征的响应。实验结果表明,所提出的模型在家电商品中文评论语料和谭松波酒店评论语料数据集上的准确率分别达到93.79%和90.05%,相较于基准模型分别提高0.69%~3.59%和4.44%~11.70%,优于传统的基于卷积神经网络(Convolutional Neural Networks, CNN)、BiLSTM或CNN-BiLSTM等的情感分析模型。 展开更多
关键词 自注意力机制 中文评论文本 深度学习 情感分析
在线阅读 下载PDF
面向研究生招生咨询的中文Text-to-SQL模型 被引量:1
15
作者 王庆丰 李旭 +1 位作者 姚春龙 程腾腾 《计算机工程》 北大核心 2025年第3期362-368,共7页
研究生招生咨询是一种具有代表性的短时间高频次问答应用场景。针对现有基于词向量等方法的招生问答系统返回答案不够精确,以及每年需要更新问题库的问题,引入了基于文本转结构化查询语言(Text-to-SQL)技术的RESDSQL模型,可将自然语言... 研究生招生咨询是一种具有代表性的短时间高频次问答应用场景。针对现有基于词向量等方法的招生问答系统返回答案不够精确,以及每年需要更新问题库的问题,引入了基于文本转结构化查询语言(Text-to-SQL)技术的RESDSQL模型,可将自然语言问题转化为SQL语句后到结构化数据库中查询答案并返回。搜集了研究生招生场景中的高频咨询问题,根据3所高校真实招生数据,构建问题与SQL语句模板,通过填充模板的方式构建数据集,共有训练集1501条、测试集386条。将RESDSQL的RoBERTa模型替换为具有更强多语言生成能力的XLM-RoBERTa模型、T5模型替换为mT5模型,并在目标领域数据集上进行微调,在招生领域问题上取得了较高的准确率,在mT5-large模型上执行正确率为0.95,精确匹配率为1。与基于ChatGPT3.5模型、使用零样本提示的C3SQL方法对比,该模型性能与成本均更优。 展开更多
关键词 中文文本转结构化查询语言 自然语言查询 中文SQL语句生成 预训练模型 text-to-SQL数据集
在线阅读 下载PDF
From text to image:challenges in integrating vision into ChatGPT for medical image interpretation
16
作者 Shunsuke Koga Wei Du 《Neural Regeneration Research》 SCIE CAS 2025年第2期487-488,共2页
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te... Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023). 展开更多
关键词 IMAGE DIAGNOSIS text
在线阅读 下载PDF
基于Text2Vec_AE_KMeans的微博话题聚类分析方法
17
作者 万文桐 黄润才 《智能计算机与应用》 2025年第5期82-89,共8页
传统的话题聚类分析方法使用静态词向量对微博文本进行建模,对微博文本不规范表达、一词多义等特点应对不佳,从而影响聚类效果与话题表述。针对此,提出了一种基于Text2Vec_AE_KMeans的深度文本特征提取与聚类的微博话题聚类分析方法。首... 传统的话题聚类分析方法使用静态词向量对微博文本进行建模,对微博文本不规范表达、一词多义等特点应对不佳,从而影响聚类效果与话题表述。针对此,提出了一种基于Text2Vec_AE_KMeans的深度文本特征提取与聚类的微博话题聚类分析方法。首先,使用基于MacBert预训练模型与CoSENT文本语句建模方法设计的Text2Vec预训练模型,对微博话题文本进行文本语义表示,从而改进静态词向量在文本特征建模方面的不足;然后,通过带有非线性激活函数的AutoEncoder降维网络对高维非线性文本特征进行降维;最后,在话题聚类分析的过程中采用KMeans_C-TF-IDF算法进行面向微博文本的聚类分析,从聚类簇的角度把握话题分布信息。在真实微博话题数据集上,相较于传统静态词向量建模方法,本文提出的方法在聚类评价指标上表现优异,生成的话题信息可识别性较好。 展开更多
关键词 话题聚类分析 CoSENT text2Vec 自编码器
在线阅读 下载PDF
全球家纺行业的韧性:Heimtextil 2025展览规模创新高 被引量:1
18
作者 钟梦夏 《中国纺织》 2025年第1期96-97,共2页
1月14日至17日,Heimtextil 2025法兰克福国际家用及商用纺织品展览会(以下简称“Heimtextil 2025”)在德国法兰克福展览中心隆重举行。这场为期四天的展会,来自全球142个国家和地区的3000多家展商聚集于此,50000多名观众参与其中,展商... 1月14日至17日,Heimtextil 2025法兰克福国际家用及商用纺织品展览会(以下简称“Heimtextil 2025”)在德国法兰克福展览中心隆重举行。这场为期四天的展会,来自全球142个国家和地区的3000多家展商聚集于此,50000多名观众参与其中,展商数量、观众数量、观众满意度等多项数据再创新记录。 展开更多
关键词 展览规模 家纺行业 法兰克福展览 观众满意度 text 纺织品 He
在线阅读 下载PDF
GSPT-CVAE: A New Controlled Long Text Generation Method Based on T-CVAE
19
作者 Tian Zhao Jun Tu +1 位作者 Puzheng Quan Ruisheng Xiong 《Computers, Materials & Continua》 2025年第7期1351-1377,共27页
Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper propo... Aiming at the problems of incomplete characterization of text relations,poor guidance of potential representations,and low quality of model generation in the field of controllable long text generation,this paper proposes a new GSPT-CVAE model(Graph Structured Processing,Single Vector,and Potential Attention Com-puting Transformer-Based Conditioned Variational Autoencoder model).The model obtains a more comprehensive representation of textual relations by graph-structured processing of the input text,and at the same time obtains a single vector representation by weighted merging of the vector sequences after graph-structured processing to get an effective potential representation.In the process of potential representation guiding text generation,the model adopts a combination of traditional embedding and potential attention calculation to give full play to the guiding role of potential representation for generating text,to improve the controllability and effectiveness of text generation.The experimental results show that the model has excellent representation learning ability and can learn rich and useful textual relationship representations.The model also achieves satisfactory results in the effectiveness and controllability of text generation and can generate long texts that match the given constraints.The ROUGE-1 F1 score of this model is 0.243,the ROUGE-2 F1 score is 0.041,the ROUGE-L F1 score is 0.22,and the PPL-Word score is 34.303,which gives the GSPT-CVAE model a certain advantage over the baseline model.Meanwhile,this paper compares this model with the state-of-the-art generative models T5,GPT-4,Llama2,and so on,and the experimental results show that the GSPT-CVAE model has a certain competitiveness. 展开更多
关键词 Controllable text generation textual graph structuring text relationships potential characterization
在线阅读 下载PDF
OCR-Assisted Masked BERT for Homoglyph Restoration towards Multiple Phishing Text Downstream Tasks
20
作者 Hanyong Lee Ye-Chan Park Jaesung Lee 《Computers, Materials & Continua》 2025年第12期4977-4993,共17页
Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that ap... Restoring texts corrupted by visually perturbed homoglyph characters presents significant challenges to conventional Natural Language Processing(NLP)systems,primarily due to ambiguities arising from characters that appear visually similar yet differ semantically.Traditional text restoration methods struggle with these homoglyph perturbations due to limitations such as a lack of contextual understanding and difficulty in handling cases where one character maps to multiple candidates.To address these issues,we propose an Optical Character Recognition(OCR)-assisted masked Bidirectional Encoder Representations from Transformers(BERT)model specifically designed for homoglyph-perturbed text restoration.Our method integrates OCR preprocessing with a character-level BERT architecture,where OCR preprocessing transforms visually perturbed characters into their approximate alphabetic equivalents,significantly reducing multi-correspondence ambiguities.Subsequently,the character-level BERT leverages bidirectional contextual information to accurately resolve remaining ambiguities by predicting intended characters based on surrounding semantic cues.Extensive experiments conducted on realistic phishing email datasets demonstrate that the proposed method significantly outperforms existing restoration techniques,including OCR-based,dictionarybased,and traditional BERT-based approaches,achieving a word-level restoration accuracy of up to 99.59%in fine-tuned settings.Additionally,our approach exhibits robust performance in zero-shot scenarios and maintains effectiveness under low-resource conditions.Further evaluations across multiple downstream tasks,such as part-ofspeech tagging,chunking,toxic comment classification,and homoglyph detection under conditions of severe visual perturbation(up to 40%),confirm the method’s generalizability and applicability.Our proposed hybrid approach,combining OCR preprocessing with character-level contextual modeling,represents a scalable and practical solution for mitigating visually adversarial text attacks,thereby enhancing the security and reliability of NLP systems in real-world applications. 展开更多
关键词 Homoglyph attack text restoration token-level correction text restoration character-level BERT OCR-assisted NLP
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部