摘要
目的:针对现有的文本特征加权方法对文本进行向量化表示时,依赖于词频来确定单词的重要性,无法准确表达文本信息,从而导致文本表示过程中特征信息的丢失,准确率低下等问题。方法:提出一种基于任务优化文本表示学习的文本分类算法。通过引入加权因子,设计一种加权向量空间模型对每个特征进行加权,将单词的上下文信息和任务信息结合起来,采用Softmax回归算法迭代地对模型参数和文本表示进行优化学习,在提高分类性能的同时,获得对此任务最优的文本表示模型。结果:根据分类任务学习到的特征词的权值能够更加准确地表达文本的分类信息。与其他分类算法相比,本文提出的WVSM-Softmax算法精度提高了约0.8%~8.7%。结论:基于任务优化文本表示学习的Softmax回归算法在文本分类中具有更好的性能。
Aims:This paper aims to solve the problem that the text feature weighting method could not accurately express text information and only relied on word frequency to determine word importance,which led to the loss of feature information and low accuracy during the text representation process.Methods:Text classification based on task-optimized text representation learning was proposed.A weighted vector space model was set up to weight each feature by introducing weighting factors.At the same time,the Softmax regression algorithm was used to optimize the learning of model parameters and text representation iteratively,while combining the word context information with task information.In addition to improving the classification performance,the optimal text representation model for this task was obtained.Results:According to the weight of the feature words learned from the task,the text classification information could be better expressed.Compared with other classification algorithms,the accuracy of the WVSM-Softmax algorithm proposed in this paper was improved 0.8%~8.7%.Conclusions:Softmax regression based on task-optimized text representation learning achieves better performance in text classification tasks.
作者
尹雪婷
武娇
顾兴全
刘雅萱
YIN Xueting;WU Jiao;GU Xingquan;LIU Yaxuan(College of Sciences,China Jiliang University,Hangzhou 310018,China;College of Standardization,China Jiliang University,Hangzhou 310018,China)
出处
《中国计量大学学报》
2023年第1期110-119,共10页
Journal of China University of Metrology
基金
国家市场监督管理总局技术保障专项项目(No.2021YJ005)。