通过检索关键词,指定一个或多个类别标签实现文本的高效组织和自动分类,是发现文档中的隐含关系、推动知识传播和创新的重要途径。然而,检索关键词的获取位置、词性以及选取是否全面等因素,会导致关键词语义信息缺失和关键词识别准确性...通过检索关键词,指定一个或多个类别标签实现文本的高效组织和自动分类,是发现文档中的隐含关系、推动知识传播和创新的重要途径。然而,检索关键词的获取位置、词性以及选取是否全面等因素,会导致关键词语义信息缺失和关键词识别准确性较差;这两大问题,正是影响文档高效、精准自动分类的突出障碍。基于此,论文构建了一个融合TF-IDF(Term Frequency-Inverse Document Frequency)和GloVe(Global Vectors for Word Representation)的文本自动分类系统。该系统首先就词性影响因子和位置权重系数对TF-IDF算法进行改进,以弥补传统TF-IDF算法在关键词识别和语义分析上的不足;其次,使用GloVe模型对关键词集进一步扩充,使文本自动分类的准确率和召回率分别达到92.6%和90.9%;最后,通过实验比对,进一步验证该系统在处理多类别文本自动分类任务中的有效性。展开更多
In the data-driven era of the internet and business environments,constructing accurate user profiles is paramount for personalized user understanding and classification.The traditional TF-IDF algorithm has some limita...In the data-driven era of the internet and business environments,constructing accurate user profiles is paramount for personalized user understanding and classification.The traditional TF-IDF algorithm has some limitations when evaluating the impact of words on classification results.Consequently,an improved TF-IDF-K algorithm was introduced in this study,which included an equalization factor,aimed at constructing user profiles by processing and analyzing user search records.Through the training and prediction capabilities of a Support Vector Machine(SVM),it enabled the prediction of user demographic attributes.The experimental results demonstrated that the TF-IDF-K algorithm has achieved a significant improvement in classification accuracy and reliability.展开更多
文摘通过检索关键词,指定一个或多个类别标签实现文本的高效组织和自动分类,是发现文档中的隐含关系、推动知识传播和创新的重要途径。然而,检索关键词的获取位置、词性以及选取是否全面等因素,会导致关键词语义信息缺失和关键词识别准确性较差;这两大问题,正是影响文档高效、精准自动分类的突出障碍。基于此,论文构建了一个融合TF-IDF(Term Frequency-Inverse Document Frequency)和GloVe(Global Vectors for Word Representation)的文本自动分类系统。该系统首先就词性影响因子和位置权重系数对TF-IDF算法进行改进,以弥补传统TF-IDF算法在关键词识别和语义分析上的不足;其次,使用GloVe模型对关键词集进一步扩充,使文本自动分类的准确率和召回率分别达到92.6%和90.9%;最后,通过实验比对,进一步验证该系统在处理多类别文本自动分类任务中的有效性。
文摘In the data-driven era of the internet and business environments,constructing accurate user profiles is paramount for personalized user understanding and classification.The traditional TF-IDF algorithm has some limitations when evaluating the impact of words on classification results.Consequently,an improved TF-IDF-K algorithm was introduced in this study,which included an equalization factor,aimed at constructing user profiles by processing and analyzing user search records.Through the training and prediction capabilities of a Support Vector Machine(SVM),it enabled the prediction of user demographic attributes.The experimental results demonstrated that the TF-IDF-K algorithm has achieved a significant improvement in classification accuracy and reliability.