期刊文献+
共找到834篇文章
< 1 2 42 >
每页显示 20 50 100
Effective short text classification via the fusion of hybrid features for IoT social data 被引量:4
1
作者 Xiong Luo Zhijian Yu +2 位作者 Zhigang Zhao Wenbing Zhao Jenq-Haur Wang 《Digital Communications and Networks》 SCIE CSCD 2022年第6期942-954,共13页
Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev... Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance. 展开更多
关键词 Information fusion short text classi fication BERT Bidirectional encoder representations fr 0om transformers Deep learning Social data
在线阅读 下载PDF
A Short Text Classification Model for Electrical Equipment Defects Based on Contextual Features 被引量:1
2
作者 LI Peipei ZENG Guohui +5 位作者 HUANG Bo YIN Ling SHI Zhicai HE Chuanpeng LIU Wei CHEN Yu 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2022年第6期465-475,共11页
The defective information of substation equipment is usually recorded in the form of text. Due to the irregular spoken expressions of equipment inspectors, the defect information lacks sufficient contextual informatio... The defective information of substation equipment is usually recorded in the form of text. Due to the irregular spoken expressions of equipment inspectors, the defect information lacks sufficient contextual information and becomes more ambiguous.To solve the problem of sparse data deficient of semantic features in classification process, a short text classification model for defects in electrical equipment that fuses contextual features is proposed. The model uses bi-directional long-short term memory in short text classification to obtain the contextual semantics of short text data. Also, the attention mechanism is introduced to assign weights to different information in the context. Meanwhile, this model optimizes the convolutional neural network parameters with the help of the genetic algorithm for extracting salient features. According to the experimental results, the model can effectively realize the classification of power equipment defect text. In addition, the model was tested on an automotive parts repair dataset provided by the project partners, thus enabling the effective application of the method in specific industrial scenarios. 展开更多
关键词 short text classification genetic algorithm convolutional neural network attention mechanism
原文传递
Sentiment Analysis of Short Texts Based on Parallel DenseNet 被引量:1
3
作者 Luqi Yan Jin Han +2 位作者 Yishi Yue Liu Zhang Yannan Qian 《Computers, Materials & Continua》 SCIE EI 2021年第10期51-65,共15页
Text sentiment analysis is a common problem in the field of natural language processing that is often resolved by using convolutional neural networks(CNNs).However,most of these CNN models focus only on learning local... Text sentiment analysis is a common problem in the field of natural language processing that is often resolved by using convolutional neural networks(CNNs).However,most of these CNN models focus only on learning local features while ignoring global features.In this paper,based on traditional densely connected convolutional networks(DenseNet),a parallel DenseNet is proposed to realize sentiment analysis of short texts.First,this paper proposes two novel feature extraction blocks that are based on DenseNet and a multiscale convolutional neural network.Second,this paper solves the problem of ignoring global features in traditional CNN models by combining the original features with features extracted by the parallel feature extraction block,and then sending the combined features into the final classifier.Last,a model based on parallel DenseNet that is capable of simultaneously learning both local and global features of short texts and shows better performance on six different databases compared to other basic models is proposed. 展开更多
关键词 Sentiment analysis short texts parallel DenseNet
在线阅读 下载PDF
Convolutional Deep Belief Network Based Short Text Classification on Arabic Corpus
4
作者 Abdelwahed Motwakel Badriyya B.Al-onazi +5 位作者 Jaber S.Alzahrani Radwa Marzouk Amira Sayed A.Aziz Abu Sarwar Zamani Ishfaq Yaseen Amgad Atta Abdelmageed1 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期3097-3113,共17页
With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 mi... With a population of 440 million,Arabic language users form the rapidly growing language group on the web in terms of the number of Internet users.11 million monthly Twitter users were active and posted nearly 27.4 million tweets every day.In order to develop a classification system for the Arabic lan-guage there comes a need of understanding the syntactic framework of the words thereby manipulating and representing the words for making their classification effective.In this view,this article introduces a Dolphin Swarm Optimization with Convolutional Deep Belief Network for Short Text Classification(DSOCDBN-STC)model on Arabic Corpus.The presented DSOCDBN-STC model majorly aims to classify Arabic short text in social media.The presented DSOCDBN-STC model encompasses preprocessing and word2vec word embedding at the preliminary stage.Besides,the DSOCDBN-STC model involves CDBN based classification model for Arabic short text.At last,the DSO technique can be exploited for optimal modification of the hyperparameters related to the CDBN method.To establish the enhanced performance of the DSOCDBN-STC model,a wide range of simulations have been performed.The simulation results con-firmed the supremacy of the DSOCDBN-STC model over existing models with improved accuracy of 99.26%. 展开更多
关键词 Arabic text short text classification dolphin swarm optimization deep learning
在线阅读 下载PDF
A Study on Short Text Matching Method Based on KS-BERT Algorithm
5
作者 YANG Hao-wen SUN Mei-feng 《印刷与数字媒体技术研究》 CAS 北大核心 2024年第5期164-173,共10页
To improve the accuracy of short text matching,a short text matching method with knowledge and structure enhancement for BERT(KS-BERT)was proposed in this study.This method first introduced external knowledge to the i... To improve the accuracy of short text matching,a short text matching method with knowledge and structure enhancement for BERT(KS-BERT)was proposed in this study.This method first introduced external knowledge to the input text,and then sent the expanded text to both the context encoder BERT and the structure encoder GAT to capture the contextual relationship features and structural features of the input text.Finally,the match was determined based on the fusion result of the two features.Experiment results based on the public datasets BQ_corpus and LCQMC showed that KS-BERT outperforms advanced models such as ERNIE 2.0.This Study showed that knowledge enhancement and structure enhancement are two effective ways to improve BERT in short text matching.In BQ_corpus,ACC was improved by 0.2%and 0.3%,respectively,while in LCQMC,ACC was improved by 0.4%and 0.9%,respectively. 展开更多
关键词 Deep learning short text matching Graph attention network Knowledge enhancement
在线阅读 下载PDF
Enhancing BERTopic with Pre-Clustered Knowledge: Reducing Feature Sparsity in Short Text Topic Modeling
6
作者 Qian Wang Biao Ma 《Journal of Data Analysis and Information Processing》 2024年第4期597-611,共15页
Modeling topics in short texts presents significant challenges due to feature sparsity, particularly when analyzing content generated by large-scale online users. This sparsity can substantially impair semantic captur... Modeling topics in short texts presents significant challenges due to feature sparsity, particularly when analyzing content generated by large-scale online users. This sparsity can substantially impair semantic capture accuracy. We propose a novel approach that incorporates pre-clustered knowledge into the BERTopic model while reducing the l2 norm for low-frequency words. Our method effectively mitigates feature sparsity during cluster mapping. Empirical evaluation on the StackOverflow dataset demonstrates that our approach outperforms baseline models, achieving superior Macro-F1 scores. These results validate the effectiveness of our proposed feature sparsity reduction technique for short-text topic modeling. 展开更多
关键词 Topic Model BERTopic short text Feature Sparsity CLUSTER
在线阅读 下载PDF
Short Text Matching Algorithm Based on BERT
7
作者 YU Yajian 《外文科技期刊数据库(文摘版)自然科学》 2021年第1期189-193,共5页
Text matching is an important basic problem in natural language processing. It can be applied to a large number of NLP tasks, such as information retrieval, question answering system, retelling problem, dialogue syste... Text matching is an important basic problem in natural language processing. It can be applied to a large number of NLP tasks, such as information retrieval, question answering system, retelling problem, dialogue system, machine translation, etc. These NLP tasks can be abstracted as text matching problems to a large extent. Different from the traditional statistical text features to solve the problem, this paper proposes a text matching method based on BERT to solve the problem that the traditional method has low accuracy and depends on statistical features. In this paper, on the task of short text matching, the BERT-based text matching model f1 value is 7% higher than the traditional matching algorithm. Experiments show that the BERT model based on pre-training is superior to the traditional model in the task of short text matching. 展开更多
关键词 short text matching BERT natural language processing deep learning
原文传递
Research of Collaborative Filtering Recommendation Algorithm for Short Text 被引量:2
8
作者 Chunxu Chao Shouning Qu Tao Du 《Journal of Computer and Communications》 2014年第14期59-66,共8页
Short text, based on the platform of web2.0, gained rapid development in a relatively short time. Recommendation systems analyzing user’s interest by short texts becomes more and more important. Collaborative filteri... Short text, based on the platform of web2.0, gained rapid development in a relatively short time. Recommendation systems analyzing user’s interest by short texts becomes more and more important. Collaborative filtering is one of the most promising recommendation technologies. However, the existing collaborative filtering methods don’t consider the drifting of user’s interest. This often leads to a big difference between the result of recommendation and user’s real demands. In this paper, according to the traditional collaborative filtering algorithm, a new personalized recommendation algorithm is proposed. It traced user’s interest by using Ebbinghaus Forgetting Curve. Some experiments have been done. The results demonstrated that the new algorithm could indeed make a contribution to getting rid of user’s overdue interests and discovering their real-time interests for more accurate recommendation. 展开更多
关键词 short text PERSONALIZED RECOMMENDATION Time WEIGHT FUNCTION
在线阅读 下载PDF
Short Text Classification Based on Improved ITC 被引量:1
9
作者 Liangliang Li Shouning Qu 《Journal of Computer and Communications》 2013年第4期22-27,共6页
The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conven... The long text classification has got great achievements, but short text classification still needs to be perfected. In this paper, at first, we describe why we select the ITC feature selection algorithm not the conventional TFIDF and the superiority of the ITC compared with the TFIDF, then we conclude the flaws of the conventional ITC algorithm, and then we present an improved ITC feature selection algorithm based on the characteristics of short text classification while combining the concepts of the Documents Distribution Entropy with the Position Distribution Weight. The improved ITC algorithm conforms to the actual situation of the short text classification. The experimental results show that the performance based on the new algorithm was much better than that based on the traditional TFIDF and ITC. 展开更多
关键词 ITC text CLASSIFICATION short text
在线阅读 下载PDF
Falcon: A Novel Chinese Short Text Classification Method
10
作者 Haiming Li Haining Huang +1 位作者 Xiang Cao Jingu Qian 《Journal of Computer and Communications》 2018年第11期216-226,共11页
For natural language processing problems, the short text classification is still a research hot topic, with obviously problem in the features sparse, high-dimensional text data and feature representation. In order to ... For natural language processing problems, the short text classification is still a research hot topic, with obviously problem in the features sparse, high-dimensional text data and feature representation. In order to express text directly, a simple but new variation which employs one-hot with low-dimension was proposed. In this paper, a Densenet-based model was proposed to short text classification. Furthermore, the feature diversity and reuse were implemented by the concat and average shuffle operation between Resnet and Densenet for enlarging short text feature selection. Finally, some benchmarks were introduced to evaluate the Falcon. From our experimental results, the Falcon method obtained significant improvements in the state-of-art models on most of them in all respects, especially in the first experiment of error rate. To sum up, the Falcon is an efficient and economical model, whilst requiring less computation to achieve high performance. 展开更多
关键词 short text Classification Word VECTOR Representation One-Hot Densenet NETWORKS Convolutional Neural NETWORKS
在线阅读 下载PDF
Enriching short text representation in microblog for clustering 被引量:14
11
作者 Jiliang TANG Xufei WANG +2 位作者 Huiji GAO Xia HU Huan LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第1期88-101,共14页
Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks.Their limited length,pervasive abbrevi-ations,and coined acronyms and words exacerbate the... Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks.Their limited length,pervasive abbrevi-ations,and coined acronyms and words exacerbate the prob-lems of synonymy and polysemy,and bring about new chal-lenges to data mining applications such as text clustering and classification.To address these issues,we dissect some poten-tial causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages.Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques.The proposed ap-proach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter.With its significant performance improvement,we further investi-gate potential factors that contribute to the improved perfor-mance. 展开更多
关键词 short texts text representation multi-languageknowledge matrix factorization social media
原文传递
Short text classification based on strong feature thesaurus 被引量:7
12
作者 Bing-kun WANG Yong-feng HUANG +1 位作者 Wan-xia YANG Xing LI 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2012年第9期649-659,共11页
Data sparseness,the evident characteristic of short text,has always been regarded as the main cause of the low ac-curacy in the classification of short texts using statistical methods.Intensive research has been condu... Data sparseness,the evident characteristic of short text,has always been regarded as the main cause of the low ac-curacy in the classification of short texts using statistical methods.Intensive research has been conducted in this area during the past decade.However,most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy.In this paper we present a new method to tackle the problem by building a strong feature thesaurus(SFT)based on latent Dirichlet allocation(LDA)and information gain(IG)models.By giving larger weights to feature terms in SFT,the classification accuracy can be improved.Specifically,our method appeared to be more effective with more detailed classification.Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine(SVM)and Naive Bayes Multinomial. 展开更多
关键词 short text CLASSIFICATION Data sparseness SEMANTIC Strong feature thesaurus(SFT) Latent Dirichlet allocation(LDA)
原文传递
Short Text Mining Framework with Specific Design for Operation and Maintenance of Power Equipment 被引量:3
13
作者 Huifang Wang Ziquan Liu +2 位作者 Yongjin Xu Xiaoxiong Wei Lixin Wang 《CSEE Journal of Power and Energy Systems》 SCIE CSCD 2021年第6期1267-1277,共11页
In order to recover the value of short texts in the operation and maintenance of power equipment,a short text mining framework with specific design is proposed.First,the process of the short text mining framework is s... In order to recover the value of short texts in the operation and maintenance of power equipment,a short text mining framework with specific design is proposed.First,the process of the short text mining framework is summarized,in which the functions of all the processing modules are introduced.Then,according to the characteristics of short texts in the operation and maintenance of power equipment,the specific design for each module is proposed,which adapts the short text mining framework to a practical application.Finally,based on the framework with the specific designed modules,two examples in terms of defect texts are given to illustrate the application of short text mining in the operation and maintenance of power equipment.The results of the examples show that the short text mining framework is suitable for operation and maintenance tasks for power equipment,and the specific design for each module is beneficial for the improvement of the application effect. 展开更多
关键词 Machine learning natural language processing operation and maintenance power equipment short text mining
原文传递
PSLDA:a novel supervised pseudo document-based topic model for short texts
14
作者 Mingtao SUN Xiaowei ZHAO +3 位作者 Jingjing LIN Jian JING Deqing WANG Guozhu JIA 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第6期71-80,共10页
Various kinds of online social media applications such as Twitter and Weibo,have brought a huge volume of short texts.However,mining semantic topics from short texts efficiently is still a challenging problem because ... Various kinds of online social media applications such as Twitter and Weibo,have brought a huge volume of short texts.However,mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics.To address the above problems,we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model(PSLDA for short).Specifically,we first assume that short texts are generated from the normal size latent pseudo documents,and the topic distributions are sampled from the pseudo documents.In this way,the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents.To make full use of labeled information in training data,we introduce labels into the model,and further propose a supervised topic model to learn the reasonable distribution of topics.Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods. 展开更多
关键词 supervised topic model short text pseudo-document
原文传递
A Short Text Classification Model Based on Chinese Part-of-Speech Information and Mutual Learning
15
作者 Yihe Deng Zuxu Dai 《国际计算机前沿大会会议论文集》 EI 2023年第2期330-343,共14页
Short text classification is one of the common tasks in natural language processing.Short text contains less information,and there is still much room for improvement in the performance of short text classification model... Short text classification is one of the common tasks in natural language processing.Short text contains less information,and there is still much room for improvement in the performance of short text classification models.This paper proposes a new short text classification model ML-BERT based on the idea of mutual learning.ML-BERT includes a BERT that only uses word vector informa-tion and a BERT that fuses word information and part-of-speech information and introduces transmissionflag to control the information transfer between the two BERTs to simulate the mutual learning process between the two models.Experi-mental results show that the ML-BERT model obtains a MAF1 score of 93.79%on the THUCNews dataset.Compared with the representative models Text-CNN,Text-RNN and BERT,the MAF1 score improves by 8.11%,6.69%and 1.69%,respectively. 展开更多
关键词 Natural language processing Neural network Chinese short text classification BERT Mutual deep learning
原文传递
Deep Neural Semantic Network for Keywords Extraction on Short Text
16
作者 Chundong She Huanying You +5 位作者 Changhai Lin Shaohua Liu Boxiang Liang Juan Jia Xinglei Zhang Yanming Qi 《国际计算机前沿大会会议论文集》 2020年第2期101-112,共12页
Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to ... Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to use high-quality keywords as a starting point.In this paper,we propose a deep learning network called deep neural semantic network(DNSN)to solve the problem of short text keyword extraction.It can map short text and words to the same semantic space,get the semantic vector of them at the same time,and then compute the similarity between short text and words to extract top-ranked words as keywords.The Bidirectional Encoder Representations from Transformers was first used to obtain the initial semantic feature vectors of short text and words,and then feed the initial semantic feature vectors to the residual network so as to obtain the final semantic vectors of short text and words at the same vector space.Finally,the keywords were extracted by calculating the similarity between short text and words.Compared with existed baseline models including Frequency,Term Frequency Inverse Document Frequency(TF-IDF)and Text-Rank,the model proposed is superior to the baseline models in Precision,Recall,and F-score on the same batch of test dataset.In addition,the precision,recall,and F-score are 6.79%,5.67%,and 11.08%higher than the baseline model in the best case,respectively. 展开更多
关键词 Semantic similarity Semantic network short text Keywords extraction
原文传递
Short-Term Memory Capacity across Time and Language Estimated from Ancient and Modern Literary Texts. Study-Case: New Testament Translations
17
作者 Emilio Matricciani 《Open Journal of Statistics》 2023年第3期379-403,共25页
We study the short-term memory capacity of ancient readers of the original New Testament written in Greek, of its translations to Latin and to modern languages. To model it, we consider the number of words between any... We study the short-term memory capacity of ancient readers of the original New Testament written in Greek, of its translations to Latin and to modern languages. To model it, we consider the number of words between any two contiguous interpunctions I<sub>p</sub>, because this parameter can model how the human mind memorizes “chunks” of information. Since I<sub>P</sub> can be calculated for any alphabetical text, we can perform experiments—otherwise impossible— with ancient readers by studying the literary works they used to read. The “experiments” compare the I<sub>P</sub> of texts of a language/translation to those of another language/translation by measuring the minimum average probability of finding joint readers (those who can read both texts because of similar short-term memory capacity) and by defining an “overlap index”. We also define the population of universal readers, people who can read any New Testament text in any language. Future work is vast, with many research tracks, because alphabetical literatures are very large and allow many experiments, such as comparing authors, translations or even texts written by artificial intelligence tools. 展开更多
关键词 Alphabetical Languages Artificial Intelligence Writing GREEK LATIN New Testament Readers Overlap Probability short-Term Memory Capacity textS Translation Words Interval
在线阅读 下载PDF
战略性技术创新与企业未来产业布局
18
作者 王伟光 吴传波 《工业技术经济》 北大核心 2026年第1期33-42,共10页
未来产业及其优先布局对于新质生产力发展壮大极为重要,但未来产业发展需要前沿性、颠覆性技术创新,而这些具有高度不确定性的技术创新单纯依靠企业自身研发投入很难产生良好效果,适当的政府研发支持即自带国家任务型属性的战略性技术... 未来产业及其优先布局对于新质生产力发展壮大极为重要,但未来产业发展需要前沿性、颠覆性技术创新,而这些具有高度不确定性的技术创新单纯依靠企业自身研发投入很难产生良好效果,适当的政府研发支持即自带国家任务型属性的战略性技术创新是必要的。本文生成了未来产业布局词典,并对上市公司的年报和专利进行文本分析,构建了企业层面的战略性技术创新与未来产业布局指标,并采用双向固定效应模型考察战略性技术创新对企业未来产业布局的影响及作用机制。结果表明战略性技术创新通过跨组织资源集聚机制与创新人才集聚机制促进企业未来产业布局,对小规模、资本密集型、大中城市中的企业促进效果更明显,且在一定程度上抑制了企业短期绩效。本文有助于从微观企业层面理解战略性技术创新在促进未来产业布局过程中所发挥的效能,并提出相关对策建议。 展开更多
关键词 战略性技术创新 未来产业 跨组织资源 创新人才 企业绩效 新质生产力 文本分析 短期绩效
在线阅读 下载PDF
融合TF-IDF和LDA的中文FastText短文本分类方法 被引量:33
19
作者 冯勇 屈渤浩 +2 位作者 徐红艳 王嵘冰 张永刚 《应用科学学报》 CAS CSCD 北大核心 2019年第3期378-388,共11页
FastText文本分类模型具有快速高效的优势,但直接将其用于中文短文本分类则存在精确率不高的问题.为此提出一种融合词频-逆文本频率(term frequency-inverse document frequency, TF-IDF)和隐含狄利克雷分布(latent Dirichlet allocatio... FastText文本分类模型具有快速高效的优势,但直接将其用于中文短文本分类则存在精确率不高的问题.为此提出一种融合词频-逆文本频率(term frequency-inverse document frequency, TF-IDF)和隐含狄利克雷分布(latent Dirichlet allocation, LDA)的中文FastText短文本分类方法.该方法在FastText文本分类模型的输入阶段对n元语法模型处理后的词典进行TF-IDF筛选,使用LDA模型进行语料库主题分析,依据所得结果对特征词典进行补充,从而在计算输入词序列向量均值时偏向高区分度的词条,使其更适用于中文短文本分类环境.对比实验结果可知,所提方法在中文短文本分类方面具有更高的精确率. 展开更多
关键词 中文短文本分类 Fasttext 词频-逆文本频率 词向量 隐含狄利克雷分布
在线阅读 下载PDF
融合类别特征扩展与N-gram子词过滤的fastText短文本分类 被引量:6
20
作者 李志明 孙艳 +1 位作者 何宜昊 申利民 《小型微型计算机系统》 CSCD 北大核心 2022年第8期1596-1601,共6页
以提升fastText短文本分类模型性能为目标,从获取高质量的类别特征、降低N-gram子词中低类别区分贡献度子词对模型学习高类别区分贡献度语义特征时产生的干扰角度展开研究,提出基于TF-IDF的LDA类别特征提取方法以提升类别特征质量,提出... 以提升fastText短文本分类模型性能为目标,从获取高质量的类别特征、降低N-gram子词中低类别区分贡献度子词对模型学习高类别区分贡献度语义特征时产生的干扰角度展开研究,提出基于TF-IDF的LDA类别特征提取方法以提升类别特征质量,提出基于词汇信息熵的N-gram子词过滤方法过滤N-gram子词中低类别区分贡献度子词,并构建更专注于高类别区分贡献度语义特征学习的EF-fastText短文本分类模型.实验结果表明基于TF-IDF的LDA类别特征提取方法,以及基于词汇信息熵的N-gram子词过滤方法对于EF-fastText短文本分类模型性能提升是有效性的. 展开更多
关键词 短文本分类 fasttext 类别特征 词汇信息熵 N-GRAM
在线阅读 下载PDF
上一页 1 2 42 下一页 到第
使用帮助 返回顶部