Feature selection is one of the important topics in text classification. However, most of existing feature selection methods are serial and inefficient to be applied to massive text data sets. In this case, a feature ...Feature selection is one of the important topics in text classification. However, most of existing feature selection methods are serial and inefficient to be applied to massive text data sets. In this case, a feature selection method based on parallel collaborative evolutionary genetic algorithm is presented. The presented method uses genetic algorithm to select feature subsets and takes advantage of parallel collaborative evolution to enhance time efficiency, so it can quickly acquire the feature subsets which are more representative. The experimental results show that, for accuracy ratio and recall ratio, the presented method is better than information gain, x2 statistics, and mutual information methods; the consumed time of the presented method with only one CPU is inferior to that of these three methods, but the presented method is supe rior after using the parallel strategy.展开更多
As the national Chinese medicine market develops,Chinese medicinal materials price index(CMMPI)trend is worthy of attention.Predicting future CMMPI trend plays a significant role in risk prevention,cultivation,and tra...As the national Chinese medicine market develops,Chinese medicinal materials price index(CMMPI)trend is worthy of attention.Predicting future CMMPI trend plays a significant role in risk prevention,cultivation,and trade for farmers and investors.This study aims to design a high-precision model to predict the future trend of the CMMPI.The model incorporates environmental factors such as weather conditions and air quality that have a greater impact on the growth of Chinese medical plants and the supply of Chinese medicinal materials market.Specifically,we collected multi-source heterogeneous data,including weather data,air quality data,and historical CMMPI data,to construct informative features.Additionally,we proposed a feature selection method based on the genetic algorithm and XGBoost to select features.Finally,we transferred the selected features to the bidirectional GRU deep learning to realize the accurate prediction of the CMMPI trend.We collected 46 CMMPI datasets to test the proposed model.The results show that the proposed model obtained more superior prediction compared to the state-of-the-art methods,and specialized in predicting long-term goal(90 days).Taking the Yunnan and Xizang origin index as examples,the experiment results also show the weather and air quality data can improve the prediction performance,as these factors are known to influence the growth and market supply of Chinese medicinal materials.展开更多
Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features ma...Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. The conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised feature selection. However, in unsupervised learning, decision class labels are not provided. In this paper, we propose a new unsupervised quick reduct (QR) algorithm using rough set theory. The quality of the reduced data is measured by the classification performance and it is evaluated using WEKA classifier tool. The method is compared with existing supervised methods and the result demonstrates the efficiency of the proposed algorithm.展开更多
In order to select suitable sites for farmland consolidation projects,correlation analysis and evolutionary algorithms were used to optimize the evaluation of ecological,social and economic factors,avoiding subjective...In order to select suitable sites for farmland consolidation projects,correlation analysis and evolutionary algorithms were used to optimize the evaluation of ecological,social and economic factors,avoiding subjective selection and ignorance of spatial relationships among land attributes.Multi-objective Genetic Algorithms(MOGA)were applied to select the best sites from the perspective of spatial relationship and land attribute evaluation.With carefully defined restrictions and variables,multi-objective optimization is able to select several suitable sites for farmland consolidation projects.The results from a case study in Yangshan,Guangdong of China showed that the selected sites were on the central and southern Yangshan with expected flat terrain and abundant water resources.An empirical experiment also demonstrated that the proposed method is able to provide well selected sites for land consolidation projects.展开更多
本文采用潜在语义索引(LSI)和遗传算法(GA)进行文本特征提取。在采用潜在语义索引将语义关系体现在VSM(Vector Space Model)中,通过奇异值分解(SVD,Singular Value De-composition)可以有效地降低向量空间的维数,但通过维数约简后的文...本文采用潜在语义索引(LSI)和遗传算法(GA)进行文本特征提取。在采用潜在语义索引将语义关系体现在VSM(Vector Space Model)中,通过奇异值分解(SVD,Singular Value De-composition)可以有效地降低向量空间的维数,但通过维数约简后的文本特征仍要保持在数百维左右,因此本文采用遗传算法在此基础上继续降维。实验结果表明,这两种方法结合可以极大的降低文本向量空间的维数,并能提高分类准确率。展开更多
基金supported by the Science and Technology Plan Projects of Sichuan Province of China under Grant No.2008GZ0003the Key Technologies R & D Program of Sichuan Province of China under Grant No.2008SZ0100
文摘Feature selection is one of the important topics in text classification. However, most of existing feature selection methods are serial and inefficient to be applied to massive text data sets. In this case, a feature selection method based on parallel collaborative evolutionary genetic algorithm is presented. The presented method uses genetic algorithm to select feature subsets and takes advantage of parallel collaborative evolution to enhance time efficiency, so it can quickly acquire the feature subsets which are more representative. The experimental results show that, for accuracy ratio and recall ratio, the presented method is better than information gain, x2 statistics, and mutual information methods; the consumed time of the presented method with only one CPU is inferior to that of these three methods, but the presented method is supe rior after using the parallel strategy.
基金supported by the National Natural Science Foundation of China under Grant Nos.71771034 and 72371049the Science and Technology Program of Jieyang under Grant No.2017xm0410and the Dalian High Level Talents Innovation Support Plan under Grant No.2021RD01.
文摘As the national Chinese medicine market develops,Chinese medicinal materials price index(CMMPI)trend is worthy of attention.Predicting future CMMPI trend plays a significant role in risk prevention,cultivation,and trade for farmers and investors.This study aims to design a high-precision model to predict the future trend of the CMMPI.The model incorporates environmental factors such as weather conditions and air quality that have a greater impact on the growth of Chinese medical plants and the supply of Chinese medicinal materials market.Specifically,we collected multi-source heterogeneous data,including weather data,air quality data,and historical CMMPI data,to construct informative features.Additionally,we proposed a feature selection method based on the genetic algorithm and XGBoost to select features.Finally,we transferred the selected features to the bidirectional GRU deep learning to realize the accurate prediction of the CMMPI trend.We collected 46 CMMPI datasets to test the proposed model.The results show that the proposed model obtained more superior prediction compared to the state-of-the-art methods,and specialized in predicting long-term goal(90 days).Taking the Yunnan and Xizang origin index as examples,the experiment results also show the weather and air quality data can improve the prediction performance,as these factors are known to influence the growth and market supply of Chinese medicinal materials.
基金supported by the UGC, SERO, Hyderabad under FDP during XI plan periodthe UGC, New Delhi for financial assistance under major research project Grant No. F-34-105/2008
文摘Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. The conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised feature selection. However, in unsupervised learning, decision class labels are not provided. In this paper, we propose a new unsupervised quick reduct (QR) algorithm using rough set theory. The quality of the reduced data is measured by the classification performance and it is evaluated using WEKA classifier tool. The method is compared with existing supervised methods and the result demonstrates the efficiency of the proposed algorithm.
基金The study is supported by the National Natural Science Foundation of China(No.41001310)and the Natural Science Foundation of China-Guangdong Joint Fund(U1301253)The authors greatly appreciate Bureau of Land and Resource Yangshan for providing valuable land consolidation data.
文摘In order to select suitable sites for farmland consolidation projects,correlation analysis and evolutionary algorithms were used to optimize the evaluation of ecological,social and economic factors,avoiding subjective selection and ignorance of spatial relationships among land attributes.Multi-objective Genetic Algorithms(MOGA)were applied to select the best sites from the perspective of spatial relationship and land attribute evaluation.With carefully defined restrictions and variables,multi-objective optimization is able to select several suitable sites for farmland consolidation projects.The results from a case study in Yangshan,Guangdong of China showed that the selected sites were on the central and southern Yangshan with expected flat terrain and abundant water resources.An empirical experiment also demonstrated that the proposed method is able to provide well selected sites for land consolidation projects.
文摘本文采用潜在语义索引(LSI)和遗传算法(GA)进行文本特征提取。在采用潜在语义索引将语义关系体现在VSM(Vector Space Model)中,通过奇异值分解(SVD,Singular Value De-composition)可以有效地降低向量空间的维数,但通过维数约简后的文本特征仍要保持在数百维左右,因此本文采用遗传算法在此基础上继续降维。实验结果表明,这两种方法结合可以极大的降低文本向量空间的维数,并能提高分类准确率。