Piezo actuators are widely used in ultra-precision fields because of their high response and nano-scale step length.However,their hysteresis characteristics seriously affect the accuracy and stability of piezo actuato...Piezo actuators are widely used in ultra-precision fields because of their high response and nano-scale step length.However,their hysteresis characteristics seriously affect the accuracy and stability of piezo actuators.Existing methods for fitting hysteresis loops include operator class,differential equation class,and machine learning class.The modeling cost of operator class and differential equation class methods is high,the model complexity is high,and the process of machine learning,such as neural network calculation,is opaque.The physical model framework cannot be directly extracted.Therefore,the sparse identification of nonlinear dynamics(SINDy)algorithm is proposed to fit hysteresis loops.Furthermore,the SINDy algorithm is improved.While the SINDy algorithm builds an orthogonal candidate database for modeling,the sparse regression model is simplified,and the Relay operator is introduced for piecewise fitting to solve the distortion problem of the SINDy algorithm fitting singularities.The Relay-SINDy algorithm proposed in this paper is applied to fitting hysteresis loops.Good performance is obtained with the experimental results of open and closed loops.Compared with the existing methods,the modeling cost and model complexity are reduced,and the modeling accuracy of the hysteresis loop is improved.展开更多
This paper studies variable selection using the penalized likelihood method for dis-tributed sparse regression with large sample size n under a limited memory constraint.This is a much needed research problem to be so...This paper studies variable selection using the penalized likelihood method for dis-tributed sparse regression with large sample size n under a limited memory constraint.This is a much needed research problem to be solved in the big data era.A naive divide-and-conquer method solving this problem is to split the whole data into N parts and run each part on one of N machines,aggregate the results from all machines via averaging,andfinally obtain the selected variables.However,it tends to select more noise variables,and the false discovery rate may not be well controlled.We improve it by a special designed weighted average in aggregation.Although the alternating direction method of multiplier can be used to deal with massive data in the literature,our proposed method reduces the computational burden a lot and performs better by mean square error in most cases.Theoretically,we establish asymptotic properties of the resulting estimators for the likelihood models with a diverging number of parame-ters.Under some regularity conditions,we establish oracle properties in the sense that our distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample.Computationally,a distributed penalized likelihood algorithm is proposed to refine the results in the context of general likelihoods.Furthermore,the proposed method is evaluated by simulations and a real example.展开更多
Statistical and machine learning theory has developed several conditionsensuring that popular estimators such as the Lasso or the Dantzig selector performwell in high-dimensional sparse regression,including the restri...Statistical and machine learning theory has developed several conditionsensuring that popular estimators such as the Lasso or the Dantzig selector performwell in high-dimensional sparse regression,including the restricted eigenvalue,compatibility,and lq sensitivity properties.However,some of the central aspects of theseconditions are not well understood.For instance,it is unknown if these conditions canbe checked efficiently on any given dataset.This is problematic,because they are atthe core of the theory of sparse regression.Here we provide a rigorous proof that theseconditions are NP-hard to check.This shows that the conditions are computation-ally infeasible to verify,and raises some questions about their practical applications.However,by taking an average-case perspective instead of the worst-case view of NP-hardness,we show that a particular condition,Cq sensitivity,has certain desirableproperties.This condition is weaker and more general than the others.We show thatit holds with high probability in models where the parent population is well behaved,and that it is robust to certain data processing steps.These results are desirable,as theyprovide guidance about when the condition,and more generally the theory of sparseregression,may be relevant in the analysis of high-dimensional correlated observa-tional data.展开更多
Urban air pollution has brought great troubles to physical and mental health,economic development,environmental protection,and other aspects.Predicting the changes and trends of air pollution can provide a scientific ...Urban air pollution has brought great troubles to physical and mental health,economic development,environmental protection,and other aspects.Predicting the changes and trends of air pollution can provide a scientific basis for governance and prevention efforts.In this paper,we propose an interval prediction method that considers the spatio-temporal characteristic information of PM_(2.5)signals from multiple stations.K-nearest neighbor(KNN)algorithm interpolates the lost signals in the process of collection,transmission,and storage to ensure the continuity of data.Graph generative network(GGN)is used to process time-series meteorological data with complex structures.The graph U-Nets framework is introduced into the GGN model to enhance its controllability to the graph generation process,which is beneficial to improve the efficiency and robustness of the model.In addition,sparse Bayesian regression is incorporated to improve the dimensional disaster defect of traditional kernel density estimation(KDE)interval prediction.With the support of sparse strategy,sparse Bayesian regression kernel density estimation(SBR-KDE)is very efficient in processing high-dimensional large-scale data.The PM_(2.5)data of spring,summer,autumn,and winter from 34 air quality monitoring sites in Beijing verified the accuracy,generalization,and superiority of the proposed model in interval prediction.展开更多
A Single Image Super-Resolution (SISR) reconstruction method that uses clustered sparse representation and adaptive patch aggregation is proposed. First, we randomly extract image patch pairs from the training images,...A Single Image Super-Resolution (SISR) reconstruction method that uses clustered sparse representation and adaptive patch aggregation is proposed. First, we randomly extract image patch pairs from the training images, and divide these patch pairs into different groups by K-means clustering. Then, we learn an over-complete sub-dictionary pair offline from corresponding group patch pairs. For a given low-resolution patch, we adaptively select one sub-dictionary to reconstruct the high resolution patch online. In addition, non-local self-similarity and steering kernel regression constraints are integrated into patch aggregation to improve the quality of the recovered images. Experiments show that the proposed method is able to realize state-of-the-art performance in terms of both objective evaluation and visual perception.展开更多
Machine learning of partial differential equations(PDEs)from data is a potential breakthrough for addressing the lack of physical equations in complex dynamic systems.Recently,sparse regression has emerged as an attra...Machine learning of partial differential equations(PDEs)from data is a potential breakthrough for addressing the lack of physical equations in complex dynamic systems.Recently,sparse regression has emerged as an attractive approach.However,noise presents the biggest challenge in sparse regression for identifying equations,as it relies on local derivative evaluations of noisy data.This study proposes a simple and general approach that significantly improves noise robustness by projecting the evaluated time derivative and partial differential term into a subspace with less noise.This method enables accurate reconstruction of PDEs involving high-order derivatives,even from data with considerable noise.Additionally,we discuss and compare the effects of the proposed method based on Fourier subspace and POD(proper orthogonal decomposition)subspace.Generally,the latter yields better results since it preserves the maximum amount of information.展开更多
In the engineering field,switching systems have been extensively studied,where sudden changes of parameter value and structural form have a significant impact on the operational performance of the system.Therefore,it ...In the engineering field,switching systems have been extensively studied,where sudden changes of parameter value and structural form have a significant impact on the operational performance of the system.Therefore,it is important to predict the behavior of the switching system,which includes the accurate detection of mutation points and rapid reidentification of the model.However,few efforts have been contributed to accurately locating the mutation points.In this paper,we propose a new measure of mutation detection—the threshold-based switching index by analogy with the Lyapunov exponent.We give the algorithm for selecting the optimal threshold,which greatly reduces the additional data collection and the relative error of mutation detection.In the system identification part,considering the small data amount available and noise in the data,the abrupt sparse Bayesian regression(abrupt-SBR)method is proposed.This method captures the model changes by updating the previously identified model,which requires less data and is more robust to noise than identifying the new model from scratch.With two representative dynamical systems,we illustrate the application and effectiveness of the proposed methods.Our research contributes to the accurate prediction and possible control of switching system behavior.展开更多
The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likeliho...The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likelihood-based. In this paper, we propose a two-step method for estimating the high-dimensional Gaussian graphical model. Specifically, the first step serves as a screening step, in which many entries of the concentration matrix are identified as zeros and thus removed from further consideration. Then in the second step, we focus on the remaining entries of the concentration matrix and perform selection and estimation for nonzero entries of the concentration matrix. Since the dimension of the parameter space is effectively reduced by the screening step,the estimation accuracy of the estimated concentration matrix can be potentially improved. We show that the proposed method enjoys desirable asymptotic properties. Numerical comparisons of the proposed method with several existing methods indicate that the proposed method works well. We also apply the proposed method to a breast cancer microarray data set and obtain some biologically meaningful results.展开更多
Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis th...Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis than those without. Therefore, p CR is now considered as the best prognosticator for patients with neoadjuvant chemotherapy. However, not all patients can benefit from this treatment. As a result, we need to find a way to predict what kind of patients can induce p CR. Various gene signatures of chemosensitivity in breast cancer have been identified, from which such predictors can be built. Nevertheless, many of them have their prediction accuracy around 80%. As such, identifying gene signatures that could be employed to build high accuracy predictors is a prerequisite for their clinical tests and applications. Furthermore, to elucidate the importance of each individual gene in a signature is another pressing need before such signature could be tested in clinical settings. In this study, Genetic Algorithm(GA) and Sparse Logistic Regression(SLR) along with t-test were employed to identify one signature. It had 28 probe sets selected by GA from the top 65 probe sets that were highly overexpressed between p CR and Residual Disease(RD) and was used to build an SLR predictor of p CR(SLR-28). This predictor tested on a training set(n = 81) and validation set(n = 52) had very precise predictions measured by accuracy,specificity, sensitivity, positive predictive value, and negative predictive value with their corresponding P value all zero. Furthermore, this predictor discovered 12 important genes in the 28 probe set signature. Our findings also demonstrated that the most discriminative genes measured by SLR as a group selected by GA were not necessarily those with the smallest P values by t-test as individual genes, highlighting the ability of GA to capture the interacting genes in p CR prediction as multivariate techniques. Our gene signature produced superior performance over a signature found in one previous study with prediction accuracy 92% vs 76%, demonstrating the potential of GA and SLR in identifying robust gene signatures in chemo response prediction in breast cancer.展开更多
基金National Natural Science Foundation of China(62203118)。
文摘Piezo actuators are widely used in ultra-precision fields because of their high response and nano-scale step length.However,their hysteresis characteristics seriously affect the accuracy and stability of piezo actuators.Existing methods for fitting hysteresis loops include operator class,differential equation class,and machine learning class.The modeling cost of operator class and differential equation class methods is high,the model complexity is high,and the process of machine learning,such as neural network calculation,is opaque.The physical model framework cannot be directly extracted.Therefore,the sparse identification of nonlinear dynamics(SINDy)algorithm is proposed to fit hysteresis loops.Furthermore,the SINDy algorithm is improved.While the SINDy algorithm builds an orthogonal candidate database for modeling,the sparse regression model is simplified,and the Relay operator is introduced for piecewise fitting to solve the distortion problem of the SINDy algorithm fitting singularities.The Relay-SINDy algorithm proposed in this paper is applied to fitting hysteresis loops.Good performance is obtained with the experimental results of open and closed loops.Compared with the existing methods,the modeling cost and model complexity are reduced,and the modeling accuracy of the hysteresis loop is improved.
基金supported by NSFC(11871263)NSF grant of Guangdong Province of China(No.2017A030313012).
文摘This paper studies variable selection using the penalized likelihood method for dis-tributed sparse regression with large sample size n under a limited memory constraint.This is a much needed research problem to be solved in the big data era.A naive divide-and-conquer method solving this problem is to split the whole data into N parts and run each part on one of N machines,aggregate the results from all machines via averaging,andfinally obtain the selected variables.However,it tends to select more noise variables,and the false discovery rate may not be well controlled.We improve it by a special designed weighted average in aggregation.Although the alternating direction method of multiplier can be used to deal with massive data in the literature,our proposed method reduces the computational burden a lot and performs better by mean square error in most cases.Theoretically,we establish asymptotic properties of the resulting estimators for the likelihood models with a diverging number of parame-ters.Under some regularity conditions,we establish oracle properties in the sense that our distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample.Computationally,a distributed penalized likelihood algorithm is proposed to refine the results in the context of general likelihoods.Furthermore,the proposed method is evaluated by simulations and a real example.
基金NIH Grants RO1GM100474-04 and NIHRO1-GM072611-10 and NSF Grants DMS-1206464 and DMS-1406266.
文摘Statistical and machine learning theory has developed several conditionsensuring that popular estimators such as the Lasso or the Dantzig selector performwell in high-dimensional sparse regression,including the restricted eigenvalue,compatibility,and lq sensitivity properties.However,some of the central aspects of theseconditions are not well understood.For instance,it is unknown if these conditions canbe checked efficiently on any given dataset.This is problematic,because they are atthe core of the theory of sparse regression.Here we provide a rigorous proof that theseconditions are NP-hard to check.This shows that the conditions are computation-ally infeasible to verify,and raises some questions about their practical applications.However,by taking an average-case perspective instead of the worst-case view of NP-hardness,we show that a particular condition,Cq sensitivity,has certain desirableproperties.This condition is weaker and more general than the others.We show thatit holds with high probability in models where the parent population is well behaved,and that it is robust to certain data processing steps.These results are desirable,as theyprovide guidance about when the condition,and more generally the theory of sparseregression,may be relevant in the analysis of high-dimensional correlated observa-tional data.
基金Project(2020YFC2008605)supported by the National Key Research and Development Project of ChinaProject(52072412)supported by the National Natural Science Foundation of ChinaProject(2021JJ30359)supported by the Natural Science Foundation of Hunan Province,China。
文摘Urban air pollution has brought great troubles to physical and mental health,economic development,environmental protection,and other aspects.Predicting the changes and trends of air pollution can provide a scientific basis for governance and prevention efforts.In this paper,we propose an interval prediction method that considers the spatio-temporal characteristic information of PM_(2.5)signals from multiple stations.K-nearest neighbor(KNN)algorithm interpolates the lost signals in the process of collection,transmission,and storage to ensure the continuity of data.Graph generative network(GGN)is used to process time-series meteorological data with complex structures.The graph U-Nets framework is introduced into the GGN model to enhance its controllability to the graph generation process,which is beneficial to improve the efficiency and robustness of the model.In addition,sparse Bayesian regression is incorporated to improve the dimensional disaster defect of traditional kernel density estimation(KDE)interval prediction.With the support of sparse strategy,sparse Bayesian regression kernel density estimation(SBR-KDE)is very efficient in processing high-dimensional large-scale data.The PM_(2.5)data of spring,summer,autumn,and winter from 34 air quality monitoring sites in Beijing verified the accuracy,generalization,and superiority of the proposed model in interval prediction.
基金partially supported by the National Natural Science Foundation of China under Grants No. 61071146, No. 61171165the Natural Science Foundation of Jiangsu Province under Grant No. BK2010488+1 种基金sponsored by Qing Lan Project, Project 333 "The Six Top Talents" of Jiangsu Province
文摘A Single Image Super-Resolution (SISR) reconstruction method that uses clustered sparse representation and adaptive patch aggregation is proposed. First, we randomly extract image patch pairs from the training images, and divide these patch pairs into different groups by K-means clustering. Then, we learn an over-complete sub-dictionary pair offline from corresponding group patch pairs. For a given low-resolution patch, we adaptively select one sub-dictionary to reconstruct the high resolution patch online. In addition, non-local self-similarity and steering kernel regression constraints are integrated into patch aggregation to improve the quality of the recovered images. Experiments show that the proposed method is able to realize state-of-the-art performance in terms of both objective evaluation and visual perception.
基金the support of the National Natural Science Foundation of China(Grant No.92152301)。
文摘Machine learning of partial differential equations(PDEs)from data is a potential breakthrough for addressing the lack of physical equations in complex dynamic systems.Recently,sparse regression has emerged as an attractive approach.However,noise presents the biggest challenge in sparse regression for identifying equations,as it relies on local derivative evaluations of noisy data.This study proposes a simple and general approach that significantly improves noise robustness by projecting the evaluated time derivative and partial differential term into a subspace with less noise.This method enables accurate reconstruction of PDEs involving high-order derivatives,even from data with considerable noise.Additionally,we discuss and compare the effects of the proposed method based on Fourier subspace and POD(proper orthogonal decomposition)subspace.Generally,the latter yields better results since it preserves the maximum amount of information.
基金the National Natural Science Foundation of China(Grant No.12072261)。
文摘In the engineering field,switching systems have been extensively studied,where sudden changes of parameter value and structural form have a significant impact on the operational performance of the system.Therefore,it is important to predict the behavior of the switching system,which includes the accurate detection of mutation points and rapid reidentification of the model.However,few efforts have been contributed to accurately locating the mutation points.In this paper,we propose a new measure of mutation detection—the threshold-based switching index by analogy with the Lyapunov exponent.We give the algorithm for selecting the optimal threshold,which greatly reduces the additional data collection and the relative error of mutation detection.In the system identification part,considering the small data amount available and noise in the data,the abrupt sparse Bayesian regression(abrupt-SBR)method is proposed.This method captures the model changes by updating the previously identified model,which requires less data and is more robust to noise than identifying the new model from scratch.With two representative dynamical systems,we illustrate the application and effectiveness of the proposed methods.Our research contributes to the accurate prediction and possible control of switching system behavior.
基金National Natural Science Foundation of China (Grant No. 11671059)。
文摘The problem of estimating high-dimensional Gaussian graphical models has gained much attention in recent years. Most existing methods can be considered as one-step approaches, being either regression-based or likelihood-based. In this paper, we propose a two-step method for estimating the high-dimensional Gaussian graphical model. Specifically, the first step serves as a screening step, in which many entries of the concentration matrix are identified as zeros and thus removed from further consideration. Then in the second step, we focus on the remaining entries of the concentration matrix and perform selection and estimation for nonzero entries of the concentration matrix. Since the dimension of the parameter space is effectively reduced by the screening step,the estimation accuracy of the estimated concentration matrix can be potentially improved. We show that the proposed method enjoys desirable asymptotic properties. Numerical comparisons of the proposed method with several existing methods indicate that the proposed method works well. We also apply the proposed method to a breast cancer microarray data set and obtain some biologically meaningful results.
文摘Neoadjuvant chemotherapy for breast cancer patients with large tumor size is a necessary treatment.After this treatment patients who achieve a pathologic Complete Response(p CR) usually have a favorable prognosis than those without. Therefore, p CR is now considered as the best prognosticator for patients with neoadjuvant chemotherapy. However, not all patients can benefit from this treatment. As a result, we need to find a way to predict what kind of patients can induce p CR. Various gene signatures of chemosensitivity in breast cancer have been identified, from which such predictors can be built. Nevertheless, many of them have their prediction accuracy around 80%. As such, identifying gene signatures that could be employed to build high accuracy predictors is a prerequisite for their clinical tests and applications. Furthermore, to elucidate the importance of each individual gene in a signature is another pressing need before such signature could be tested in clinical settings. In this study, Genetic Algorithm(GA) and Sparse Logistic Regression(SLR) along with t-test were employed to identify one signature. It had 28 probe sets selected by GA from the top 65 probe sets that were highly overexpressed between p CR and Residual Disease(RD) and was used to build an SLR predictor of p CR(SLR-28). This predictor tested on a training set(n = 81) and validation set(n = 52) had very precise predictions measured by accuracy,specificity, sensitivity, positive predictive value, and negative predictive value with their corresponding P value all zero. Furthermore, this predictor discovered 12 important genes in the 28 probe set signature. Our findings also demonstrated that the most discriminative genes measured by SLR as a group selected by GA were not necessarily those with the smallest P values by t-test as individual genes, highlighting the ability of GA to capture the interacting genes in p CR prediction as multivariate techniques. Our gene signature produced superior performance over a signature found in one previous study with prediction accuracy 92% vs 76%, demonstrating the potential of GA and SLR in identifying robust gene signatures in chemo response prediction in breast cancer.