The noise that comes from finite element simulation often causes the model to fall into the local optimal solution and over fitting during optimization of generator.Thus,this paper proposes a Gaussian Process Regressi...The noise that comes from finite element simulation often causes the model to fall into the local optimal solution and over fitting during optimization of generator.Thus,this paper proposes a Gaussian Process Regression(GPR)model based on Conditional Likelihood Lower Bound Search(CLLBS)to optimize the design of the generator,which can filter the noise in the data and search for global optimization by combining the Conditional Likelihood Lower Bound Search method.Taking the efficiency optimization of 15 kW Permanent Magnet Synchronous Motor as an example.Firstly,this method uses the elementary effect analysis to choose the sensitive variables,combining the evolutionary algorithm to design the super Latin cube sampling plan;Then the generator-converter system is simulated by establishing a co-simulation platform to obtain data.A Gaussian process regression model combing the method of the conditional likelihood lower bound search is established,which combined the chi-square test to optimize the accuracy of the model globally.Secondly,after the model reaches the accuracy,the Pareto frontier is obtained through the NSGA-II algorithm by considering the maximum output torque as a constraint.Last,the constrained optimization is transformed into an unconstrained optimizing problem by introducing maximum constrained improvement expectation(CEI)optimization method based on the re-interpolation model,which cross-validated the optimization results of the Gaussian process regression model.The above method increase the efficiency of generator by 0.76%and 0.5%respectively;And this method can be used for rapid modeling and multi-objective optimization of generator systems.展开更多
Truncated data are commonly observed in economics,epidemiology,and other fields.The analysis of truncated data is challenging because the observed data are usually a biased sample of the target population due to trunc...Truncated data are commonly observed in economics,epidemiology,and other fields.The analysis of truncated data is challenging because the observed data are usually a biased sample of the target population due to truncation.Existing methods of handling truncated data largely depend on conditional likelihood which is the joint distribution of the data given that they are observed,and may be unreliable or have potential efficiency loss.In this paper,the authors develop a maximum full likelihood inference method for truncated data under a parametric model for the conditional distribution of the target variable given covatiates.The distribution of the truncation variable is left unspecified.The authors establish the asymptotic normalities of the maximum likelihood estimators(MLE)for various parameters,and the likelihood ratio statistics have central chisquare limiting distributions.As a by-product,the proposed method provides a natural MLE for the total number of the observed and unobserved data,which may shed light on the extent of truncation bias.A score test is provided to check the correctness of the assumed parametric model.Our simulation results indicate that the proposed estimation method generally produces more reliable point and interval estimates.For illustration,the authors apply the proposed approaches to analyze a breast cancer data in the Rotterdam Tumor Bank.展开更多
Non-responses leading to missing data are common in most studies and causes inefficient and biased statistical inferences if ignored. When faced with missing data, many studies choose to employ complete case analysis ...Non-responses leading to missing data are common in most studies and causes inefficient and biased statistical inferences if ignored. When faced with missing data, many studies choose to employ complete case analysis approach to estimate the parameters of the model. This however compromises on the susceptibility of the estimates to reduced bias and minimum variance as expected. Several classical and model based techniques of imputing the missing values have been mentioned in literature. Bayesian approach to missingness is deemed superior amongst the other techniques through its natural self-lending to missing data settings where the missing values are treated as unobserved random variables that have a distribution which depends on the observed data. This paper digs up the superiority of Bayesian imputation to Multiple Imputation with Chained Equations (MICE) when estimating logistic panel data models with single fixed effects. The study validates the superiority of conditional maximum likelihood estimates for nonlinear binary choice logit panel model in the presence of missing observations. A Monte Carlo simulation was designed to determine the magnitude of bias and root mean square errors (RMSE) arising from MICE and Full Bayesian imputation. The simulation results show that the conditional maximum likelihood (ML) logit estimator presented in this paper is less biased and more efficient when Bayesian imputation is performed to curb non-responses.展开更多
In social network analysis, logistic regression models have been widely used to establish the relationship between the response variable and covariates. However, such models often require the network relationships to ...In social network analysis, logistic regression models have been widely used to establish the relationship between the response variable and covariates. However, such models often require the network relationships to be mutually independent, after controlling for a set of covariates. To assess the validity of this assumption,we propose test statistics, under the logistic regression setting, for three important social network drivers. They are, respectively, reciprocity, centrality, and transitivity. The asymptotic distributions of those test statistics are obtained. Extensive simulation studies are also presented to demonstrate their finite sample performance and usefulness.展开更多
In social network analysis, link prediction is a problem of fundamental importance. How to conduct a comprehensive and principled link prediction, by taking various network structure information into consideration,is ...In social network analysis, link prediction is a problem of fundamental importance. How to conduct a comprehensive and principled link prediction, by taking various network structure information into consideration,is of great interest. To this end, we propose here a dynamic logistic regression method. Specifically, we assume that one has observed a time series of network structure. Then the proposed model dynamically predicts future links by studying the network structure in the past. To estimate the model, we find that the standard maximum likelihood estimation(MLE) is computationally forbidden. To solve the problem, we introduce a novel conditional maximum likelihood estimation(CMLE) method, which is computationally feasible for large-scale networks. We demonstrate the performance of the proposed method by extensive numerical studies.展开更多
The main purpose of this paper is using capture-recapture data to estimate the population size when some covariate values are missing, possibly non-ignorable. Conditional likelihood method is adopted, with a sub-model...The main purpose of this paper is using capture-recapture data to estimate the population size when some covariate values are missing, possibly non-ignorable. Conditional likelihood method is adopted, with a sub-model describing various missing mechanisms. The derived estimate is proved to be asymptotically normal, and simulation studies via a version of EM algorithm show that it is approximately unbiased. The proposed method is applied to a real example, and the result is compared with previous ones.展开更多
Linear regression models for interval-valued data have been widely studied.Most literatures are to split an interval into two real numbers,i.e.,the left-and right-endpoints or the center and radius of this interval,an...Linear regression models for interval-valued data have been widely studied.Most literatures are to split an interval into two real numbers,i.e.,the left-and right-endpoints or the center and radius of this interval,and fit two separate real-valued or two dimension linear regression models.This paper is focused on the bias-corrected and heteroscedasticity-adjusted modeling by imposing order constraint to the endpoints of the response interval and weighted linear least squares with estimated covariance matrix,based on a generalized linear model for interval-valued data.A three step estimation method is proposed.Theoretical conclusions and numerical evaluations show that the proposed estimator has higher efficiency than previous estimators.展开更多
The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizu...The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizure counts of an epileptic patient and the number of cases of campylobacterosis infections,etc.Since the structure of such data is generally high-order and sparse,studies about order shrinkage and selection for the model attract many attentions.In this paper,we propose a penalized conditional maximum likelihood(PCML)method to solve this problem.The PCML method can effectively select significant orders and estimate the parameters,simultaneously.Some simulations and a real data analysis are carried out to illustrate the usefulness of our method.展开更多
The binomial autoregressive(BAR(1))process is very useful to model the integer-valued time series data defined on a finite range.It is commonly observed that the autoregressive coefficient is assumed to be a constant....The binomial autoregressive(BAR(1))process is very useful to model the integer-valued time series data defined on a finite range.It is commonly observed that the autoregressive coefficient is assumed to be a constant.To make the BAR(1)model more practical,this paper introduces a new random coefficient binomial autoregressive model,which is driven by covariates.Basic probabilistic and statistical properties of this model are discussed.Conditional least squares and conditional maximum likelihood estimators of the model parameters are derived,and the asymptotic properties are obtained.The performance of these estimators is compared via a simulation study.An application to a real data example is also provided.The results show that the proposed model and methods perform well for the simulations and application.展开更多
基金supported in part by the National Key Research and Development Program of China(2019YFB1503700)the Hunan Natural Science Foundation-Science and Education Joint Project(2019JJ70063)。
文摘The noise that comes from finite element simulation often causes the model to fall into the local optimal solution and over fitting during optimization of generator.Thus,this paper proposes a Gaussian Process Regression(GPR)model based on Conditional Likelihood Lower Bound Search(CLLBS)to optimize the design of the generator,which can filter the noise in the data and search for global optimization by combining the Conditional Likelihood Lower Bound Search method.Taking the efficiency optimization of 15 kW Permanent Magnet Synchronous Motor as an example.Firstly,this method uses the elementary effect analysis to choose the sensitive variables,combining the evolutionary algorithm to design the super Latin cube sampling plan;Then the generator-converter system is simulated by establishing a co-simulation platform to obtain data.A Gaussian process regression model combing the method of the conditional likelihood lower bound search is established,which combined the chi-square test to optimize the accuracy of the model globally.Secondly,after the model reaches the accuracy,the Pareto frontier is obtained through the NSGA-II algorithm by considering the maximum output torque as a constraint.Last,the constrained optimization is transformed into an unconstrained optimizing problem by introducing maximum constrained improvement expectation(CEI)optimization method based on the re-interpolation model,which cross-validated the optimization results of the Gaussian process regression model.The above method increase the efficiency of generator by 0.76%and 0.5%respectively;And this method can be used for rapid modeling and multi-objective optimization of generator systems.
基金supported by the National Natural Science Foundation of China under Grant Nos.72331005,71931004,12171157,and 32030063the 111 Project under Grant No.B14019。
文摘Truncated data are commonly observed in economics,epidemiology,and other fields.The analysis of truncated data is challenging because the observed data are usually a biased sample of the target population due to truncation.Existing methods of handling truncated data largely depend on conditional likelihood which is the joint distribution of the data given that they are observed,and may be unreliable or have potential efficiency loss.In this paper,the authors develop a maximum full likelihood inference method for truncated data under a parametric model for the conditional distribution of the target variable given covatiates.The distribution of the truncation variable is left unspecified.The authors establish the asymptotic normalities of the maximum likelihood estimators(MLE)for various parameters,and the likelihood ratio statistics have central chisquare limiting distributions.As a by-product,the proposed method provides a natural MLE for the total number of the observed and unobserved data,which may shed light on the extent of truncation bias.A score test is provided to check the correctness of the assumed parametric model.Our simulation results indicate that the proposed estimation method generally produces more reliable point and interval estimates.For illustration,the authors apply the proposed approaches to analyze a breast cancer data in the Rotterdam Tumor Bank.
文摘Non-responses leading to missing data are common in most studies and causes inefficient and biased statistical inferences if ignored. When faced with missing data, many studies choose to employ complete case analysis approach to estimate the parameters of the model. This however compromises on the susceptibility of the estimates to reduced bias and minimum variance as expected. Several classical and model based techniques of imputing the missing values have been mentioned in literature. Bayesian approach to missingness is deemed superior amongst the other techniques through its natural self-lending to missing data settings where the missing values are treated as unobserved random variables that have a distribution which depends on the observed data. This paper digs up the superiority of Bayesian imputation to Multiple Imputation with Chained Equations (MICE) when estimating logistic panel data models with single fixed effects. The study validates the superiority of conditional maximum likelihood estimates for nonlinear binary choice logit panel model in the presence of missing observations. A Monte Carlo simulation was designed to determine the magnitude of bias and root mean square errors (RMSE) arising from MICE and Full Bayesian imputation. The simulation results show that the conditional maximum likelihood (ML) logit estimator presented in this paper is less biased and more efficient when Bayesian imputation is performed to curb non-responses.
文摘In social network analysis, logistic regression models have been widely used to establish the relationship between the response variable and covariates. However, such models often require the network relationships to be mutually independent, after controlling for a set of covariates. To assess the validity of this assumption,we propose test statistics, under the logistic regression setting, for three important social network drivers. They are, respectively, reciprocity, centrality, and transitivity. The asymptotic distributions of those test statistics are obtained. Extensive simulation studies are also presented to demonstrate their finite sample performance and usefulness.
基金supported by National Natural Science Foundation of China (Grant Nos. 11131002, 11271031, 71532001, 11525101, 71271210 and 714711730)the Business Intelligence Research Center at Peking University+5 种基金the Center for Statistical Science at Peking Universitythe Fundamental Research Funds for the Central Universitiesthe Research Funds of Renmin University of China (Grant No. 16XNLF01)Ministry of Education Humanities Social Science Key Research Institute in University Foundation (Grant No. 14JJD910002)the Center for Applied Statistics, School of Statistics, Renmin University of ChinallChina Postdoctoral Science Foundation (Grant No. 2016M600155)
文摘In social network analysis, link prediction is a problem of fundamental importance. How to conduct a comprehensive and principled link prediction, by taking various network structure information into consideration,is of great interest. To this end, we propose here a dynamic logistic regression method. Specifically, we assume that one has observed a time series of network structure. Then the proposed model dynamically predicts future links by studying the network structure in the past. To estimate the model, we find that the standard maximum likelihood estimation(MLE) is computationally forbidden. To solve the problem, we introduce a novel conditional maximum likelihood estimation(CMLE) method, which is computationally feasible for large-scale networks. We demonstrate the performance of the proposed method by extensive numerical studies.
基金Supported in part by the National Natural Science Foundation of China under Grant No.11171006
文摘The main purpose of this paper is using capture-recapture data to estimate the population size when some covariate values are missing, possibly non-ignorable. Conditional likelihood method is adopted, with a sub-model describing various missing mechanisms. The derived estimate is proved to be asymptotically normal, and simulation studies via a version of EM algorithm show that it is approximately unbiased. The proposed method is applied to a real example, and the result is compared with previous ones.
基金the National Nature Science Foundation of China under Grant Nos.11571024and 11771032the Humanities and Social Science Foundation of Ministry of Education of China under Grant No.20YJCZH245。
文摘Linear regression models for interval-valued data have been widely studied.Most literatures are to split an interval into two real numbers,i.e.,the left-and right-endpoints or the center and radius of this interval,and fit two separate real-valued or two dimension linear regression models.This paper is focused on the bias-corrected and heteroscedasticity-adjusted modeling by imposing order constraint to the endpoints of the response interval and weighted linear least squares with estimated covariance matrix,based on a generalized linear model for interval-valued data.A three step estimation method is proposed.Theoretical conclusions and numerical evaluations show that the proposed estimator has higher efficiency than previous estimators.
文摘The integer-valued generalized autoregressive conditional heteroskedastic(INGARCH)model is often utilized to describe data in biostatistics,such as the number of people infected with dengue fever,daily epileptic seizure counts of an epileptic patient and the number of cases of campylobacterosis infections,etc.Since the structure of such data is generally high-order and sparse,studies about order shrinkage and selection for the model attract many attentions.In this paper,we propose a penalized conditional maximum likelihood(PCML)method to solve this problem.The PCML method can effectively select significant orders and estimate the parameters,simultaneously.Some simulations and a real data analysis are carried out to illustrate the usefulness of our method.
基金This paper is supported by the National Natural Science Foundation of China(Nos.11871028,11731015,11901053)the Natural Science Foundation of Jilin Province(No.20180101216JC).
文摘The binomial autoregressive(BAR(1))process is very useful to model the integer-valued time series data defined on a finite range.It is commonly observed that the autoregressive coefficient is assumed to be a constant.To make the BAR(1)model more practical,this paper introduces a new random coefficient binomial autoregressive model,which is driven by covariates.Basic probabilistic and statistical properties of this model are discussed.Conditional least squares and conditional maximum likelihood estimators of the model parameters are derived,and the asymptotic properties are obtained.The performance of these estimators is compared via a simulation study.An application to a real data example is also provided.The results show that the proposed model and methods perform well for the simulations and application.