Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently i...Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.展开更多
Assessing canopy nitrogen content(CNC) and canopy carbon content(CCC) of maize by hyperspectral remote sensing data permits estimating cropland productivity, protecting farmland ecology, and investigating the nitrogen...Assessing canopy nitrogen content(CNC) and canopy carbon content(CCC) of maize by hyperspectral remote sensing data permits estimating cropland productivity, protecting farmland ecology, and investigating the nitrogen and carbon cycles in the atmosphere. This study aimed to assess maize CNC and CCC using canopy hyperspectral information and uninformative variable elimination(UVE). Vegetation indices(VIs) and wavelet functions were adopted for estimating CNC and CCC under varying water and nitrogen regimes. Linear, nonlinear, and partial least squares(PLS) regression models were fitted to VIs and wavelet functions to estimate CNC and CCC, and were evaluated for their prediction accuracy.UVE was used to eliminate uninformative variables, improve the prediction accuracy of the models, and simplify the PLS regression models(UVE-PLS). For estimating CNC and CCC, the normalized difference vegetation index(NDVI, based on red edge and NIR wavebands) yielded the highest correlation coefficients(r > 0.88). PLS regression models showed the lowest root mean square error(RMSE) among all models. However, PLS regression models required nine VIs and four wavelet functions, increasing their complexity. UVE was used to retain valid spectral parameters and optimize the PLS regression models.UVE-PLS regression models improved validation accuracy and resulted in more accurate CNC and CCC than the PLS regression models. Thus, canopy spectral reflectance integrated with UVE-PLS can accurately reflect maize leaf nitrogen and carbon status.展开更多
Understanding the thermal stability of the proteins in human serum is essential since human serum is the important source of pharmaceutical proteins. Near-infrared(NIR) spectroscopy was applied to the investigation ...Understanding the thermal stability of the proteins in human serum is essential since human serum is the important source of pharmaceutical proteins. Near-infrared(NIR) spectroscopy was applied to the investigation of thermal changes in secondary structure and hydration of human serum proteins.However, as a multicomponent system, the overlap of the broad NIR bands makes the structural analysis very difficult directly using the spectra of serum samples. Therefore, continuous wavelet transform(CWT) was used to improve the resolution of NIR spectra, and Monte Carlo-uninformative variable elimination(MC-UVE) method was applied to the selection of the variables associated with the proteins for the structural analysis. The variables(5956, 5867, 5815, 5747, 4525, 4401, 4359 and 4328 cm^-1) related to protein secondary structures and those(7074, 6951, 6827 and 6700 cm 1) connected with water species were selected. Then, the thermal stability was analyzed through the intensity variations of the selected variables with temperature from 30℃ to 80 ℃. It was found that the variation of the spectral variables related to both a-helix and b-sheet changes apparently around 60 ℃, indicating the beginning of the thermal denaturation and the transition from a-helix to b-sheet. Moreover, an obvious change was found around 60℃for the content of the water specie S3, i.e., the water cluster containing three hydrogen bonds. The result demonstrates that MC-UVE can identify the protein-related NIR spectral variables, and the water species may be a marker for investigation of the structural change of proteins in biochemical systems.展开更多
Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis o...Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis of soluble solids content(SSC)in navel oranges.Moving window partial least squares(MW-PLS),Monte Carlo uninformative variables elimination(MC-UVE)and wavelet transform(WT)combined with the MC-UVE method were used to select the spectral variables and develop the calibration models of online analysis of SSC in navel oranges.The performances of these methods were compared for modeling the Vis NIR data sets of navel orange samples.Results show that the WT-MC-UVE methods gave better calibration models with the higher correlation cofficient(r)of 0.89 and lower root mean square error of prediction(RMSEP)of 0.54 at 5 fruits per second.It concluded that Vis NIR spectroscopy coupled with WT-MC-UVE may be a fast and efective tool for online quantitative analysis of SSC in navel oranges.展开更多
【目的】筛选整粒小麦籽粒蛋白质的近红外特征光谱波段并建立优化模型,可实现快速、无损测定整粒小麦籽粒蛋白质含量,为田间便携式小麦籽粒蛋白质含量速测仪设计提供依据。【方法】2012—2013年以蛋白质含量有明显差异的8个冬小麦品种...【目的】筛选整粒小麦籽粒蛋白质的近红外特征光谱波段并建立优化模型,可实现快速、无损测定整粒小麦籽粒蛋白质含量,为田间便携式小麦籽粒蛋白质含量速测仪设计提供依据。【方法】2012—2013年以蛋白质含量有明显差异的8个冬小麦品种为试验品种,设置3个施氮量和2个灌溉量共6个处理,建立丰富的样本类型,共采集176个小麦籽粒光谱数据;将ASD Field Spec Pro光谱仪采集到的基于全反射下垫面的整粒小麦籽粒反射光谱通过公式A=log(1/R)转换为吸收光谱,对吸收光谱采用S-G平滑、多元散射校正和基线校正等方法进行预处理,以消除背景噪声,然后采用交叉验证偏最小二乘回归方法进行特征波段压缩;分析比较无信息变量剔除法(UVE)结合交叉验证偏最小二乘回归、连续投影算法(SPA)结合交叉验证偏最小二乘回归、UVE与SPA组合后结合交叉验证偏最小二乘回归、UVE与SPA组合后结合多元线性回归(MLR)及UVE与SPA组合后结合逐步多元线性回归(SMLR)等多种特征光谱筛选方法选出的蛋白质特征波段的优劣,并与凯氏定氮法测定的小麦籽粒蛋白质含量进行回归分析,构建并优选小麦籽粒蛋白质最佳预测模型。【结果】利用无信息变量剔除(UVE)方法可将与小麦籽粒蛋白质含量无关的信息变量剔除,把籽粒的原始光谱由1 621个波段压缩至717个,在保留了蛋白质信息的同时,实现了特征谱段的初次优选;对逐步多元线性回归(SMLR)、连续投影算法(SPA)、连续投影算法(SPA)+逐步多元线性回归(SMLR)及连续投影算法(SPA)+偏最小二乘回归(PLS)+交叉验证(CV)等特征波段优选算法比较发现,不同的方法获得的特征谱段有差异,构建的模型及精度也明显不同。对经过无信息变量剔除(UVE)法筛选光谱特征谱段,利用SPA消除光谱矩阵中波段共线性影响,再利用SMLR筛选出小麦籽粒蛋白质信息贡献最大的15个特征谱段,所得模型的预测均方根误差(RMSEP)和R2分别为0.5898和0.9410,模型预测精度最高。【结论】本研究利用UVE、SPA与SMLR方法有效压缩了整粒小麦籽粒光谱矩阵,基于所筛选的蛋白质含量特征谱段数构建的预测模型可以实现无损、快速测定整粒小麦籽粒蛋白质含量,预测模型精度可靠,方法经济有效,为设计田间便携式整粒小麦籽粒蛋白质测定仪的波段选择和开发奠定了基础。展开更多
基金Supported by China 973 Program (No.2002CB312200), the National Natural Science Foundation of China (No.60574019 and No.60474045), the Key Technologies R&D Program of Zhejiang Province (No.2005C21087) and the Academician Foundation of Zhejiang Province (No.2005A1001-13).
文摘Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.
基金supported by the National Key Research and Development Program of China (2016YFD0300602)China Agricultural Research System (CARS-04-PS19)Chengdu Science and Technology Project (2020-YF09-00033-SN)。
文摘Assessing canopy nitrogen content(CNC) and canopy carbon content(CCC) of maize by hyperspectral remote sensing data permits estimating cropland productivity, protecting farmland ecology, and investigating the nitrogen and carbon cycles in the atmosphere. This study aimed to assess maize CNC and CCC using canopy hyperspectral information and uninformative variable elimination(UVE). Vegetation indices(VIs) and wavelet functions were adopted for estimating CNC and CCC under varying water and nitrogen regimes. Linear, nonlinear, and partial least squares(PLS) regression models were fitted to VIs and wavelet functions to estimate CNC and CCC, and were evaluated for their prediction accuracy.UVE was used to eliminate uninformative variables, improve the prediction accuracy of the models, and simplify the PLS regression models(UVE-PLS). For estimating CNC and CCC, the normalized difference vegetation index(NDVI, based on red edge and NIR wavebands) yielded the highest correlation coefficients(r > 0.88). PLS regression models showed the lowest root mean square error(RMSE) among all models. However, PLS regression models required nine VIs and four wavelet functions, increasing their complexity. UVE was used to retain valid spectral parameters and optimize the PLS regression models.UVE-PLS regression models improved validation accuracy and resulted in more accurate CNC and CCC than the PLS regression models. Thus, canopy spectral reflectance integrated with UVE-PLS can accurately reflect maize leaf nitrogen and carbon status.
基金supported by National Natural Science Foundation of China(No.21475068)
文摘Understanding the thermal stability of the proteins in human serum is essential since human serum is the important source of pharmaceutical proteins. Near-infrared(NIR) spectroscopy was applied to the investigation of thermal changes in secondary structure and hydration of human serum proteins.However, as a multicomponent system, the overlap of the broad NIR bands makes the structural analysis very difficult directly using the spectra of serum samples. Therefore, continuous wavelet transform(CWT) was used to improve the resolution of NIR spectra, and Monte Carlo-uninformative variable elimination(MC-UVE) method was applied to the selection of the variables associated with the proteins for the structural analysis. The variables(5956, 5867, 5815, 5747, 4525, 4401, 4359 and 4328 cm^-1) related to protein secondary structures and those(7074, 6951, 6827 and 6700 cm 1) connected with water species were selected. Then, the thermal stability was analyzed through the intensity variations of the selected variables with temperature from 30℃ to 80 ℃. It was found that the variation of the spectral variables related to both a-helix and b-sheet changes apparently around 60 ℃, indicating the beginning of the thermal denaturation and the transition from a-helix to b-sheet. Moreover, an obvious change was found around 60℃for the content of the water specie S3, i.e., the water cluster containing three hydrogen bonds. The result demonstrates that MC-UVE can identify the protein-related NIR spectral variables, and the water species may be a marker for investigation of the structural change of proteins in biochemical systems.
基金support provided by National Natural Science Foundation of China (60844007,61178036,21265006)National Science and Technology Support Plan (2008BAD96B04)+1 种基金Special Science and Technology Support Program for Foreign Science and Technology Cooperation Plan (2009BHB15200)Technological expertise and academic leaders training plan of Jiangxi Province (2009DD00700)。
文摘Variable selection is applied widely for visible-near infrared(Vis-NIR)spectroscopy analysis of internal quality in fruits.Different spectral variable selection methods were compared for online quantitative analysis of soluble solids content(SSC)in navel oranges.Moving window partial least squares(MW-PLS),Monte Carlo uninformative variables elimination(MC-UVE)and wavelet transform(WT)combined with the MC-UVE method were used to select the spectral variables and develop the calibration models of online analysis of SSC in navel oranges.The performances of these methods were compared for modeling the Vis NIR data sets of navel orange samples.Results show that the WT-MC-UVE methods gave better calibration models with the higher correlation cofficient(r)of 0.89 and lower root mean square error of prediction(RMSEP)of 0.54 at 5 fruits per second.It concluded that Vis NIR spectroscopy coupled with WT-MC-UVE may be a fast and efective tool for online quantitative analysis of SSC in navel oranges.
文摘【目的】筛选整粒小麦籽粒蛋白质的近红外特征光谱波段并建立优化模型,可实现快速、无损测定整粒小麦籽粒蛋白质含量,为田间便携式小麦籽粒蛋白质含量速测仪设计提供依据。【方法】2012—2013年以蛋白质含量有明显差异的8个冬小麦品种为试验品种,设置3个施氮量和2个灌溉量共6个处理,建立丰富的样本类型,共采集176个小麦籽粒光谱数据;将ASD Field Spec Pro光谱仪采集到的基于全反射下垫面的整粒小麦籽粒反射光谱通过公式A=log(1/R)转换为吸收光谱,对吸收光谱采用S-G平滑、多元散射校正和基线校正等方法进行预处理,以消除背景噪声,然后采用交叉验证偏最小二乘回归方法进行特征波段压缩;分析比较无信息变量剔除法(UVE)结合交叉验证偏最小二乘回归、连续投影算法(SPA)结合交叉验证偏最小二乘回归、UVE与SPA组合后结合交叉验证偏最小二乘回归、UVE与SPA组合后结合多元线性回归(MLR)及UVE与SPA组合后结合逐步多元线性回归(SMLR)等多种特征光谱筛选方法选出的蛋白质特征波段的优劣,并与凯氏定氮法测定的小麦籽粒蛋白质含量进行回归分析,构建并优选小麦籽粒蛋白质最佳预测模型。【结果】利用无信息变量剔除(UVE)方法可将与小麦籽粒蛋白质含量无关的信息变量剔除,把籽粒的原始光谱由1 621个波段压缩至717个,在保留了蛋白质信息的同时,实现了特征谱段的初次优选;对逐步多元线性回归(SMLR)、连续投影算法(SPA)、连续投影算法(SPA)+逐步多元线性回归(SMLR)及连续投影算法(SPA)+偏最小二乘回归(PLS)+交叉验证(CV)等特征波段优选算法比较发现,不同的方法获得的特征谱段有差异,构建的模型及精度也明显不同。对经过无信息变量剔除(UVE)法筛选光谱特征谱段,利用SPA消除光谱矩阵中波段共线性影响,再利用SMLR筛选出小麦籽粒蛋白质信息贡献最大的15个特征谱段,所得模型的预测均方根误差(RMSEP)和R2分别为0.5898和0.9410,模型预测精度最高。【结论】本研究利用UVE、SPA与SMLR方法有效压缩了整粒小麦籽粒光谱矩阵,基于所筛选的蛋白质含量特征谱段数构建的预测模型可以实现无损、快速测定整粒小麦籽粒蛋白质含量,预测模型精度可靠,方法经济有效,为设计田间便携式整粒小麦籽粒蛋白质测定仪的波段选择和开发奠定了基础。