F-test is the most popular test in the general linear model. However, there is few discussions on the robustness of F-test under the singular linear model. In this paper, the necessary and sufficient conditions of rob...F-test is the most popular test in the general linear model. However, there is few discussions on the robustness of F-test under the singular linear model. In this paper, the necessary and sufficient conditions of robust F-test statistic are given under the general linear models or their partition models, which allows that the design matrix has deficient rank and the covariance matrix of error is a nonnegative definite matrix with parameters. The main results obtained in this paper include the existing findings of the general linear model under the definite covariance matrix. The usage of the theorems is illustrated by an example.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
基金Supported by National Social Science Foundation of China(Grant No.13CTJ012)National Natural Science Foundation of China(Grant No.11171058)+2 种基金Zhejiang Provincial Natural Science Foundation of China(Grant No.LQ13A010002)Guangdong Provincial Natural Science Foundation of China(Grant No.S2012040007622)he National Statistical Science Research Project(Grant No.2012LY129)
文摘F-test is the most popular test in the general linear model. However, there is few discussions on the robustness of F-test under the singular linear model. In this paper, the necessary and sufficient conditions of robust F-test statistic are given under the general linear models or their partition models, which allows that the design matrix has deficient rank and the covariance matrix of error is a nonnegative definite matrix with parameters. The main results obtained in this paper include the existing findings of the general linear model under the definite covariance matrix. The usage of the theorems is illustrated by an example.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.