This study aims to realize the sharing of near-infrared analysis models of lignin and holocellulose content in pulp wood on two different batches of spectrometers and proposes a combined algorithm of SPA-DS,MCUVE-DS a...This study aims to realize the sharing of near-infrared analysis models of lignin and holocellulose content in pulp wood on two different batches of spectrometers and proposes a combined algorithm of SPA-DS,MCUVE-DS and SiPLS-DS.The Successive Projection Algorithm(SPA),the Monte-Carlo of Uninformative Variable Elimination(MCUVE)and the Synergy Interval Partial Least Squares(SiPLS)algorithms are respectively used to reduce the adverse effects of redundant information in the transmission process of the full spectrum DS algorithm model.These three algorithms can improve model transfer accuracy and efficiency and reduce the manpower and material consumption required for modeling.These results show that the modeling effects of the characteristic wavelengths screened by the SPA,MCUVE and SiPLS algorithms are all greatly improved compared with the full-spectrum modeling,in which the SPA-PLS result in the best prediction with RPDs above 6.5 for both components.The three wavelength selection methods combined with the DS algorithm are used to transfer the models of the two instruments.Among them,the MCUVE combined with the DS algorithm has the best transfer effect.After the model transfer,the RMSEP of lignin is 0.701,and the RMSEP of holocellulose is 0.839,which was improved significantly than the full-spectrum model transfer of 0.759 and 0.918.展开更多
To address the problem of identifying multiple types of additives in lubricating oil,a method based on midinfrared spectral band selection using the eXtreme Gradient Boosting(XGBoost)algorithm combined with the ant co...To address the problem of identifying multiple types of additives in lubricating oil,a method based on midinfrared spectral band selection using the eXtreme Gradient Boosting(XGBoost)algorithm combined with the ant colony optimization(ACO)algorithm is proposed.The XGBoost algorithm was used to train and test three additives,T534(alkyl diphenylamine),T308(isooctyl acid thiophospholipid octadecylamine),and T306(trimethylphenol phosphate),separately,in order to screen for the optimal combination of spectral bands for each additive.The ACO algorithm was used to optimize the parameters of the XGBoost algorithm to improve the identification accuracy.During this process,the support vector machine(SVM)and hybrid bat algorithms(HBA)were included as a comparison,generating four models:ACO-XGBoost,ACO-SVM,HBA-XGboost,and HBA-SVM.The results showed that all four models could identify the three additives efficiently,with the ACO-XGBoost model achieving 100%recognition of all three additives.In addition,the generalizability of the ACO-XGBoost model was further demonstrated by predicting a lubricating oil containing the three additives prepared in our laboratory and a collected sample of commercial oil currently in use。展开更多
玉米育种过程中,灌浆期籽粒含水率检测时,通常需要脱粒,采集穗中间200粒为检测样本。为了保护亲本,避免破坏性检测,该研究提出一种基于近红外光谱的灌浆期玉米籽粒水分定量分析通用模型,用于灌浆期玉米籽粒水分的田间原位检测。首先构建...玉米育种过程中,灌浆期籽粒含水率检测时,通常需要脱粒,采集穗中间200粒为检测样本。为了保护亲本,避免破坏性检测,该研究提出一种基于近红外光谱的灌浆期玉米籽粒水分定量分析通用模型,用于灌浆期玉米籽粒水分的田间原位检测。首先构建GA-IRIV-DS光谱数据处理策略。利用遗传算法(genetic algorithm,GA)和迭代保留信息变量(iterative retention of information variables,IRIV)二次波长筛选方法,提取光谱数据中有效的水分变量信息,减小特征空间维度的同时提高模型预测精度;再结合直接校正算法(direct standardization,DS),降低预测样本与建模样本的差异性,将玉米灌浆期穗尖部籽粒光谱数据校正为中间200籽粒的光谱,使水分定量分析模型能够具备中间200籽粒和穗尖部籽粒2种检测样本的通用性。在GA-IRIV-DS光谱数据处理策略的基础上,构建基于偏最小二乘法(partial lpeast squares regression,PLSR)的水分定量分析通用模型。经过验证,GA-IRIV-DS光谱数据处理策略校正后的光谱差异性降低了59.4%。为了进一步验证GA-IRIV-DS光谱数据处理策略的有效性,分析了GA+IRIVN组合波长筛选提取光谱特征,并分别与全光谱、多种典型波长筛选方法结合DS方法构建基于偏最小二乘法(PLSR)的水分定量分析模型结果相比较。试验结果表明,两种样本预测集GA-IRIVN-DS-PLSR模型效果均优于全光谱和其他模型,中间籽粒样本和穗尖部籽粒样本的预测决定系数(R^(2))达到了0.9715和0.9012,均方根误差(RMSEP)较全光谱下降了80.10%和64.60%。证明基于GA-IRIVN-DS光谱数据处理策略建立的近红外光谱水分定量分析模型具有一定泛化能力,可以为玉米育种过程中,减少检测过程中的样本破坏和提高检测效率提供可行的参考方法。展开更多
Genetic Algorithm Neural Network(GANN)for multi-class was used to predict the ripeness grades of oil palm fresh fruit using Near Infrared(NIR)spectral data.NIR spectral data provide sufficient information about compou...Genetic Algorithm Neural Network(GANN)for multi-class was used to predict the ripeness grades of oil palm fresh fruit using Near Infrared(NIR)spectral data.NIR spectral data provide sufficient information about compound structure of samples from the near infrared light that passes through.The variables used in the GANN modeling process were the new variables obtained as a result of dimensional reduction from original NIR spectral data using Principal Component Analysis(PCA).Three statistical measures such asMean Absolute Error(MAE),Root Mean Squared Error(RMSE)and the percentage(%)of good classification were used to assess adequacy of the GANN model.Based on the results,the GANN model created was precise enough to be used as the model calibration for this multi-class problem.展开更多
Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multi...Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multivariate calibration of NIR spectra is proposed.In the approach,a subset of non-collinear variables is generated using successive projections algorithm(SPA) for each variable in the reduced spectra by uninformative variables elimination(UVE).Then sub-models are built using the variable subsets and the calibration subsets determined by Monte Carlo(MC) re-sampling,and the sub-model that produces minimal error in cross validation is selected as a member model.With repetition of the MC re-sampling,a series of member models are built and a consensus model is achieved by averaging all the member models.Since member models are built with the best variable subset and the randomly selected calibration subset,both the quality and the diversity of the member models are insured for the consensus model.Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method.The superiority of the method in both accuracy and reliability is demonstrated.展开更多
基金The authors are grateful for the support of the Fundamental Research Funds of Research Institute of Forest New Technology,CAF(CAFYBB2019SY039).
文摘This study aims to realize the sharing of near-infrared analysis models of lignin and holocellulose content in pulp wood on two different batches of spectrometers and proposes a combined algorithm of SPA-DS,MCUVE-DS and SiPLS-DS.The Successive Projection Algorithm(SPA),the Monte-Carlo of Uninformative Variable Elimination(MCUVE)and the Synergy Interval Partial Least Squares(SiPLS)algorithms are respectively used to reduce the adverse effects of redundant information in the transmission process of the full spectrum DS algorithm model.These three algorithms can improve model transfer accuracy and efficiency and reduce the manpower and material consumption required for modeling.These results show that the modeling effects of the characteristic wavelengths screened by the SPA,MCUVE and SiPLS algorithms are all greatly improved compared with the full-spectrum modeling,in which the SPA-PLS result in the best prediction with RPDs above 6.5 for both components.The three wavelength selection methods combined with the DS algorithm are used to transfer the models of the two instruments.Among them,the MCUVE combined with the DS algorithm has the best transfer effect.After the model transfer,the RMSEP of lignin is 0.701,and the RMSEP of holocellulose is 0.839,which was improved significantly than the full-spectrum model transfer of 0.759 and 0.918.
基金the Beijing Natural Science Foundation(Grant No.2232066)the Open Project Foundation of State Key Laboratory of Solid Lubrication(Grant LSL-2212).
文摘To address the problem of identifying multiple types of additives in lubricating oil,a method based on midinfrared spectral band selection using the eXtreme Gradient Boosting(XGBoost)algorithm combined with the ant colony optimization(ACO)algorithm is proposed.The XGBoost algorithm was used to train and test three additives,T534(alkyl diphenylamine),T308(isooctyl acid thiophospholipid octadecylamine),and T306(trimethylphenol phosphate),separately,in order to screen for the optimal combination of spectral bands for each additive.The ACO algorithm was used to optimize the parameters of the XGBoost algorithm to improve the identification accuracy.During this process,the support vector machine(SVM)and hybrid bat algorithms(HBA)were included as a comparison,generating four models:ACO-XGBoost,ACO-SVM,HBA-XGboost,and HBA-SVM.The results showed that all four models could identify the three additives efficiently,with the ACO-XGBoost model achieving 100%recognition of all three additives.In addition,the generalizability of the ACO-XGBoost model was further demonstrated by predicting a lubricating oil containing the three additives prepared in our laboratory and a collected sample of commercial oil currently in use。
文摘玉米育种过程中,灌浆期籽粒含水率检测时,通常需要脱粒,采集穗中间200粒为检测样本。为了保护亲本,避免破坏性检测,该研究提出一种基于近红外光谱的灌浆期玉米籽粒水分定量分析通用模型,用于灌浆期玉米籽粒水分的田间原位检测。首先构建GA-IRIV-DS光谱数据处理策略。利用遗传算法(genetic algorithm,GA)和迭代保留信息变量(iterative retention of information variables,IRIV)二次波长筛选方法,提取光谱数据中有效的水分变量信息,减小特征空间维度的同时提高模型预测精度;再结合直接校正算法(direct standardization,DS),降低预测样本与建模样本的差异性,将玉米灌浆期穗尖部籽粒光谱数据校正为中间200籽粒的光谱,使水分定量分析模型能够具备中间200籽粒和穗尖部籽粒2种检测样本的通用性。在GA-IRIV-DS光谱数据处理策略的基础上,构建基于偏最小二乘法(partial lpeast squares regression,PLSR)的水分定量分析通用模型。经过验证,GA-IRIV-DS光谱数据处理策略校正后的光谱差异性降低了59.4%。为了进一步验证GA-IRIV-DS光谱数据处理策略的有效性,分析了GA+IRIVN组合波长筛选提取光谱特征,并分别与全光谱、多种典型波长筛选方法结合DS方法构建基于偏最小二乘法(PLSR)的水分定量分析模型结果相比较。试验结果表明,两种样本预测集GA-IRIVN-DS-PLSR模型效果均优于全光谱和其他模型,中间籽粒样本和穗尖部籽粒样本的预测决定系数(R^(2))达到了0.9715和0.9012,均方根误差(RMSEP)较全光谱下降了80.10%和64.60%。证明基于GA-IRIVN-DS光谱数据处理策略建立的近红外光谱水分定量分析模型具有一定泛化能力,可以为玉米育种过程中,减少检测过程中的样本破坏和提高检测效率提供可行的参考方法。
文摘Genetic Algorithm Neural Network(GANN)for multi-class was used to predict the ripeness grades of oil palm fresh fruit using Near Infrared(NIR)spectral data.NIR spectral data provide sufficient information about compound structure of samples from the near infrared light that passes through.The variables used in the GANN modeling process were the new variables obtained as a result of dimensional reduction from original NIR spectral data using Principal Component Analysis(PCA).Three statistical measures such asMean Absolute Error(MAE),Root Mean Squared Error(RMSE)and the percentage(%)of good classification were used to assess adequacy of the GANN model.Based on the results,the GANN model created was precise enough to be used as the model calibration for this multi-class problem.
基金supported by the National Natural Science Foundation of China (20835002)
文摘Consensus methods have presented promising tools for improving the reliability of quantitative models in near-infrared(NIR) spectroscopic analysis.A strategy for improving the performance of consensus methods in multivariate calibration of NIR spectra is proposed.In the approach,a subset of non-collinear variables is generated using successive projections algorithm(SPA) for each variable in the reduced spectra by uninformative variables elimination(UVE).Then sub-models are built using the variable subsets and the calibration subsets determined by Monte Carlo(MC) re-sampling,and the sub-model that produces minimal error in cross validation is selected as a member model.With repetition of the MC re-sampling,a series of member models are built and a consensus model is achieved by averaging all the member models.Since member models are built with the best variable subset and the randomly selected calibration subset,both the quality and the diversity of the member models are insured for the consensus model.Two NIR spectral datasets of tobacco lamina are used to investigate the proposed method.The superiority of the method in both accuracy and reliability is demonstrated.