The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in...The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in its application,weak signals are extracted from complex,overlapping and changing information.This study focused on the stability of NIR modeling.The Orthogonal Partial Least Squares(OPLS)and Successive Projections Algorithm(SPA)eliminates noise and extracts effective spectra,and an ensemble learning method MIX-PLS,is applied to establish the model.The elastic modulus of timber is taken as an example,and 201 wood samples of three species,Xylosmacongesta(Lour.)Merr.,Acer pictum subsp.mono,and Betula pendula,samples were divided into three groups to investigate modelling performance.The results show that OPLS can preprocess the near-infrared spectroscopy information according to the target object in the face of the system error and reduce errors to minimum.SPA finally selects 13 spectral bands,simplifies the NIR spectral data and improves model accuracy.The Pearson's correlation coefficient of Calibration(Rc)and the Pearson's correlation coefficient of Prediction(Rp)of Mix Partial Least Squares(MIX-PLS)were 0.95 and 0.90,and Root Mean Square Error of Calibration(RMSEC)and Root Mean Square Error of Prediction(RMSEP)are 2.075 and 6.001,respectively,which shows the model has good generalization abilities.展开更多
The documents contain a large amount of valuable knowledge on various subjects and, more recently, documents on the Internet are available from various sources. Therefore, automatic, rapid and accurate classification ...The documents contain a large amount of valuable knowledge on various subjects and, more recently, documents on the Internet are available from various sources. Therefore, automatic, rapid and accurate classification of these documents with less human interaction has become necessary. In this paper, we introduce a new algorithm called the highest repetition of words in a text document (HRWiTD) to classify the automatic Arabic text. The corpus is divided into a train set and a test set to be applied to proposed classification technique. The train set is analyzed for learning and the learning data is stored in the Learning Dataset file. The category that contains the highest repetition for each word is assigned as a category for the word in Learning Dataset file. This file includes non-duplicate words with the value of higher repetition and categories and they get from all texts in the train set. For each text in the test set, the category of words is assigned to a specific category by using Learning Dataset file. The category that contains the largest number of words is assigned as the predicted category of the text. To evaluate the classification accuracy of the HRWiTD algorithm, the confusion matrix method is used. The HRWiTD algorithm has been applied to convergent samples from six categories of Arabic news at SPA (Saudi Press Agency). As a result, the accuracy of the HRWiTD algorithm is 86.84%. In addition, we used the same corpus with the most popular machine learning algorithms which are C5.0, KNN, SVM, NB and C4.5, and their results of classification accuracy are 52.86%, 52.38%, 51.90%, 51.90% and 30%, respectively. Thus, the HRWiTD algorithm gives better classification accuracy compared to the most popular machine learning algorithms on the selected domain.展开更多
基金supported financially by the China State Forestry Administration“948”projects(2015-4-52)Heilongjiang Natural Science Foundation(C2017005)。
文摘The identification of timber properties is important for safe application.Near Infrared Spectroscopy(NIRS)technology is widely-used because of its simplicity,efficiency,and positive environmental attributes.However,in its application,weak signals are extracted from complex,overlapping and changing information.This study focused on the stability of NIR modeling.The Orthogonal Partial Least Squares(OPLS)and Successive Projections Algorithm(SPA)eliminates noise and extracts effective spectra,and an ensemble learning method MIX-PLS,is applied to establish the model.The elastic modulus of timber is taken as an example,and 201 wood samples of three species,Xylosmacongesta(Lour.)Merr.,Acer pictum subsp.mono,and Betula pendula,samples were divided into three groups to investigate modelling performance.The results show that OPLS can preprocess the near-infrared spectroscopy information according to the target object in the face of the system error and reduce errors to minimum.SPA finally selects 13 spectral bands,simplifies the NIR spectral data and improves model accuracy.The Pearson's correlation coefficient of Calibration(Rc)and the Pearson's correlation coefficient of Prediction(Rp)of Mix Partial Least Squares(MIX-PLS)were 0.95 and 0.90,and Root Mean Square Error of Calibration(RMSEC)and Root Mean Square Error of Prediction(RMSEP)are 2.075 and 6.001,respectively,which shows the model has good generalization abilities.
文摘The documents contain a large amount of valuable knowledge on various subjects and, more recently, documents on the Internet are available from various sources. Therefore, automatic, rapid and accurate classification of these documents with less human interaction has become necessary. In this paper, we introduce a new algorithm called the highest repetition of words in a text document (HRWiTD) to classify the automatic Arabic text. The corpus is divided into a train set and a test set to be applied to proposed classification technique. The train set is analyzed for learning and the learning data is stored in the Learning Dataset file. The category that contains the highest repetition for each word is assigned as a category for the word in Learning Dataset file. This file includes non-duplicate words with the value of higher repetition and categories and they get from all texts in the train set. For each text in the test set, the category of words is assigned to a specific category by using Learning Dataset file. The category that contains the largest number of words is assigned as the predicted category of the text. To evaluate the classification accuracy of the HRWiTD algorithm, the confusion matrix method is used. The HRWiTD algorithm has been applied to convergent samples from six categories of Arabic news at SPA (Saudi Press Agency). As a result, the accuracy of the HRWiTD algorithm is 86.84%. In addition, we used the same corpus with the most popular machine learning algorithms which are C5.0, KNN, SVM, NB and C4.5, and their results of classification accuracy are 52.86%, 52.38%, 51.90%, 51.90% and 30%, respectively. Thus, the HRWiTD algorithm gives better classification accuracy compared to the most popular machine learning algorithms on the selected domain.