期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Semi-Supervised Learning in Large Scale Text Categorization 被引量:1
1
作者 许泽文 李建强 +3 位作者 刘博 毕敬 李蓉 毛睿 《Journal of Shanghai Jiaotong university(Science)》 EI 2017年第3期291-302,共12页
The rapid development of the Internet brings a variety of original information including text information, audio information, etc. However, it is difficult to find the most useful knowledge rapidly and accurately beca... The rapid development of the Internet brings a variety of original information including text information, audio information, etc. However, it is difficult to find the most useful knowledge rapidly and accurately because of its huge number. Automatic text classification technology based on machine learning can classify a large number of natural language documents into the corresponding subject categories according to its correct semantics. It is helpful to grasp the text information directly. By learning from a set of hand-labeled documents,we obtain the traditional supervised classifier for text categorization(TC). However, labeling all data by human is labor intensive and time consuming. To solve this problem, some scholars proposed a semi-supervised learning method to train classifier, but it is unfeasible for various kinds and great number of Web data since it still needs a part of hand-labeled data. In 2012, Li et al. invented a fully automatic categorization approach for text(FACT)based on supervised learning, where no manual labeling efforts are required. But automatically labeling all data can bring noise into experiment and cause the fact that the result cannot meet the accuracy requirement. We put forward a new idea that part of data with high accuracy can be automatically tagged based on the semantic of category name, then a semi-supervised way is taken to train classifier with both labeled and unlabeled data,and ultimately a precise classification of massive text data can be achieved. The empirical experiments show that the method outperforms the supervised support vector machine(SVM) in terms of both F1 performance and classification accuracy in most cases. It proves the effectiveness of the semi-supervised algorithm in automatic TC. 展开更多
关键词 text data mining SEMI-SUPERVISED automatic tagging CLASSIFIER
原文传递
A Comparative Study on Two Techniques of Reducing the Dimension of Text Feature Space
2
作者 Yin Zhonghang, Wang Yongcheng, Cai Wei & Diao Qian School of Electronic & Information Technology, Shanghai Jiaotong University, Shanghai 200030, P.R.China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2002年第1期87-92,共6页
With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension... With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension has become a practical problem in the field. Here we present two clustering methods, i.e. concept association and concept abstract, to achieve the goal. The first refers to the keyword clustering based on the co occurrence of 展开更多
关键词 in the same text and the second refers to that in the same category. Then we compare the difference between them. Our experiment results show that they are efficient to reduce the dimension of text feature space. Keywords: text data mining
在线阅读 下载PDF
Analyzing corporate ESG reporting through data mining:evolutionary trends and strategic model
3
作者 Ziyuan Xia Anchen Sun +1 位作者 Xiaodong Cai Saixing Zeng a 《Journal of Management Analytics》 2025年第4期634-664,共31页
The environmental,social,and governance(ESG)report is globally recognized as a keystone in sustainable enterprise development.However,current literature has not concluded the development of topics and trends in ESG co... The environmental,social,and governance(ESG)report is globally recognized as a keystone in sustainable enterprise development.However,current literature has not concluded the development of topics and trends in ESG contexts in the twenty-first century.Therefore,we selected 1114 ESG reports from global firms in the technology industry to analyze the evolutionary trends of ESG topics by text mining.We discovered the homogenization effect toward low environmental,medium governance,and high social features in the evolution.We also designed a strategic framework to look closer into the dynamic changes of firms’within-industry representiveness and cross-sector distinctiveness,which demonstrates corporate social responsibility and sustainability.We found that companies are gradually converging toward the third quadrant,which indicates that firms contribute less to industrial outstanding and professional distinctiveness in ESG reporting.Firms choose to imitate ESG reports from each other to mitigate uncertainty and enhance behavioral legitimacy. 展开更多
关键词 ESG information disclosure SUSTAINABILITY text data mining and analytics strategic model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部