Topic modeling is a probabilistic model that identifies topics covered in text(s). In this paper, topics were loaded from two implementations of topic modeling, namely, Latent Semantic Indexing (LSI) and Latent Dirich...Topic modeling is a probabilistic model that identifies topics covered in text(s). In this paper, topics were loaded from two implementations of topic modeling, namely, Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). This analysis was performed in a corpus of 1000 academic papers written in English, obtained from PLOS ONE website, in the areas of Biology, Medicine, Physics and Social Sciences. The objective is to verify if the four academic fields were represented in the four topics obtained by topic modeling. The four topics obtained from Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) did not represent the four academic fields.展开更多
Following the expanding of VSM and LSI, a text classification based on Concept Space is proposed in thispaper. Information gaining is applied to acquire concepts based on large training set. Concept Space is built by ...Following the expanding of VSM and LSI, a text classification based on Concept Space is proposed in thispaper. Information gaining is applied to acquire concepts based on large training set. Concept Space is built by acquir-ing latent semantic indexing data, building a latent semantic space by LSI, and then adding the class-basis vector. Thecalculating method of the word-similarity, the text-similarity, the similarity of the text vector and the class-basis vec-tor in Concept Space are presented. Experiment results show the Concept Space method is superior to Vector SpaceModel. This paper also discusses the future work the problem of concept space learning.展开更多
文摘Topic modeling is a probabilistic model that identifies topics covered in text(s). In this paper, topics were loaded from two implementations of topic modeling, namely, Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). This analysis was performed in a corpus of 1000 academic papers written in English, obtained from PLOS ONE website, in the areas of Biology, Medicine, Physics and Social Sciences. The objective is to verify if the four academic fields were represented in the four topics obtained by topic modeling. The four topics obtained from Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) did not represent the four academic fields.
文摘Following the expanding of VSM and LSI, a text classification based on Concept Space is proposed in thispaper. Information gaining is applied to acquire concepts based on large training set. Concept Space is built by acquir-ing latent semantic indexing data, building a latent semantic space by LSI, and then adding the class-basis vector. Thecalculating method of the word-similarity, the text-similarity, the similarity of the text vector and the class-basis vec-tor in Concept Space are presented. Experiment results show the Concept Space method is superior to Vector SpaceModel. This paper also discusses the future work the problem of concept space learning.