The aim of this paper is to present a generalization of the Shapiro-Wilk W-test or Shapiro-Francia W'-test for application to two or more variables. It consists of calculating all the unweighted linear combination...The aim of this paper is to present a generalization of the Shapiro-Wilk W-test or Shapiro-Francia W'-test for application to two or more variables. It consists of calculating all the unweighted linear combinations of the variables and their W- or W'-statistics with the Royston’s log-transformation and standardization, z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub>. Because the calculation of the probability of z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub> is to the right tail, negative values are truncated to 0 before doing their sum of squares. Independence in the sequence of these half-normally distributed values is required for the test statistic to follow a chi-square distribution. This assumption is checked using the robust Ljung-Box test. One degree of freedom is lost for each cancelled value. Defined the new test with its two variants (Q-test or Q'-test), 50 random samples with 4 variables and 20 participants were generated, 20% following a multivariate normal distribution and 80% deviating from this distribution. The new test was compared with Mardia’s, runs, and Royston’s tests. Central tendency differences in type II error and statistical power were tested using the Friedman’s test and pairwise comparisons using the Wilcoxon’s test. Differences in the frequency of successes in statistical decision making were compared using the Cochran’s Q test and pairwise comparisons using the McNemar’s test. Sensitivity, specificity and efficiency proportions were compared using the McNemar’s Z test. The generated 50 samples were classified into five ordered categories of deviation from multivariate normality, the correlation between this variable and p-value of each test was calculated using the Spearman’s coefficient and these correlations were compared. Family-wise error rate corrections were applied. The new test and the Royston’s test were the best choices, with a very slight advantage Q-test over Q'-test. Based on these promising results, further study and use of this new sensitive, specific and effective test are suggested.展开更多
The computation of the multivariate normal integral over a Complex Subspace is a challenge, especially when the inte-gration region is of a complex nature. Such integrals are met with, for example, in the generalized ...The computation of the multivariate normal integral over a Complex Subspace is a challenge, especially when the inte-gration region is of a complex nature. Such integrals are met with, for example, in the generalized Neyman-Pearson criterion, conditional Bayesian problems of testing many hypotheses and so on. The Monte-Carlo methods could be used for their computation, but at increasing dimensionality of the integral the computation time increases unjustifiedly. Therefore a method of computation of such integrals by series after reduction of dimensionality to one without information loss is offered below. The calculation results are given.展开更多
Based on the De.Morgan laws and Boolean simplification, a recursive decomposition method is introduced in this paper to identify the main exclusive safe paths and failed paths of a network. The reliability or the reli...Based on the De.Morgan laws and Boolean simplification, a recursive decomposition method is introduced in this paper to identify the main exclusive safe paths and failed paths of a network. The reliability or the reliability bound of a network can be conveniently expressed as the summation of the joint probabilities of these paths. Under the multivariate normal distribution assumption, a conditioned reliability index method is developed to evaluate joint probabilities of various exclusive safe paths and failed paths, and, finally, the seismic reliability or the reliability bound of an electric power system. Examples given in the paper show that the method is very simple and provides accurate results in the seismic reliability analysis.展开更多
A new dimension-reduction graphical method for testing high- dimensional normality is developed by using the theory of spherical distributions and the idea of principal component analysis. The dimension reduction is r...A new dimension-reduction graphical method for testing high- dimensional normality is developed by using the theory of spherical distributions and the idea of principal component analysis. The dimension reduction is realized by projecting high-dimensional data onto some selected eigenvector directions. The asymptotic statistical independence of the plotting functions on the selected eigenvector directions provides the principle for the new plot. A departure from multivariate normality of the raw data could be captured by at least one plot on the selected eigenvector direction. Acceptance regions associated with the plots are provided to enhance interpretability of the plots. Monte Carlo studies and an illustrative example show that the proposed graphical method has competitive power performance and improves the existing graphical method significantly in testing high-dimensional normality.展开更多
In this paper, the Bayes estimator and the parametric empirical Bayes estimator(PEBE) of mean vector in multivariate normal distribution are obtained. The superiority of the PEBE over the minimum variance unbiased est...In this paper, the Bayes estimator and the parametric empirical Bayes estimator(PEBE) of mean vector in multivariate normal distribution are obtained. The superiority of the PEBE over the minimum variance unbiased estimator(MVUE) and a revised James-Stein estimators(RJSE) are investigated respectively under mean square error(MSE) criterion. Extensive simulations are conducted to show that performance of the PEBE is optimal among these three estimators under the MSE criterion.展开更多
Sampling from a truncated multivariate normal distribution (TMVND) constitutes the core computational module in fitting many statistical and econometric models. We propose two efficient methods, an iterative data au...Sampling from a truncated multivariate normal distribution (TMVND) constitutes the core computational module in fitting many statistical and econometric models. We propose two efficient methods, an iterative data augmentation (DA) algorithm and a non-iterative inverse Bayes formulae (IBF) sampler, to simulate TMVND and generalize them to multivariate normal distributions with linear inequality constraints. By creating a Bayesian incomplete-data structure, the posterior step of the DA Mgorithm directly generates random vector draws as opposed to single element draws, resulting obvious computational advantage and easy coding with common statistical software packages such as S-PLUS, MATLAB and GAUSS. Furthermore, the DA provides a ready structure for implementing a fast EM algorithm to identify the mode of TMVND, which has many potential applications in statistical inference of constrained parameter problems. In addition, utilizing this mode as an intermediate result, the IBF sampling provides a novel alternative to Gibbs sampling and elimi- nares problems with convergence and possible slow convergence due to the high correlation between components of a TMVND. The DA algorithm is applied to a linear regression model with constrained parameters and is illustrated with a published data set. Numerical comparisons show that the proposed DA algorithm and IBF sampler are more efficient than the Gibbs sampler and the accept-reject algorithm.展开更多
In a previous article,an R script was developed and divided into three parts to implement the multivariate normality(MVN)Q-test based on both the chisquare approximation and the bootstrap approach,using either the Sha...In a previous article,an R script was developed and divided into three parts to implement the multivariate normality(MVN)Q-test based on both the chisquare approximation and the bootstrap approach,using either the Shapiro-Wilk W statistic(QSWa and QSWb)or the Shapiro-Francia W’statistic(QSFa and QSFb).Royston’s H-test was included as a supplementary MVN test.The aim of this study is to compare the hit rate and statistical power of the four Qtest variants and the H-test using 200 samples drawn from multivariate stand-ard normal distributions and 200 samples from multivariate t-distributions with five degrees of freedom.The simulations vary in sample size(50,75,100,125,150,200,250,and 500),number of variables(from 2 to 6),and homoge-neous inter-variable correlation(0,0.3,0.5,0.7,and 0.9).The H-test outper-formed QSWb and QSFb,but not QSWa in the multivariate normal samples or QSFa in the multivariate t-distribution samples.QSFb performed better than QSWb.It is concluded that the bootstrap approach is conservative under the null hypothesis of multivariate normality.However,when the assumption of independence is violated,the bootstrap approach is theoretically more ap-propriate than QSWa and QSFa.A 10%significance level is recommended for QSFb in terms of hit rate,but in terms of statistical power,only when rejecting the null hypothesis.展开更多
In 2023,a multivariate normality test based on a chi-square approximation was developed.This method assumes independence among Gaussian random variables,and defines the test statistic,denoted by Q,as the sum of square...In 2023,a multivariate normality test based on a chi-square approximation was developed.This method assumes independence among Gaussian random variables,and defines the test statistic,denoted by Q,as the sum of squared values.This study aims to develop R scripts that implement the Q-test for mul-tivariate normality using either the Shapiro-Wilk W statistic(QSWa)or the Shapiro-Francia W’statistic(QSFa).A bootstrap version of the Q-test(QSWb and QSFb),which does not assume independence,is also included.Addition-ally,it incorporates Royston’s H-test.The use of the scripts is illustrated with a sample of 50 participants assessed on a variable across four yearly admin-istrations.The sampling distribution generated by the bootstrap method dif-fers from the chi-square distribution and corresponds to a generalized chisquare distribution-namely,the distribution of a sum of squares of correlated variables.This distribution is less peaked and has a heavier right tail than the chi-square distribution.It is concluded that the bootstrap approach is con-servative under the null hypothesis of multivariate normality;however,it is theoretically more appropriate than the chi-square approximation.To ap-proximate the distributions of the two versions of the Q-test,it is recom-mended that the z or z’values set to zero in the calculation of the Q statistic not be subtracted when determining the degrees of freedom in the chi-square approximation.Moreover,a significance level of 10%is suggested for the bootstrap approach,rather than the conventional 5%.展开更多
Suppose Y - N(β, σ^2 In), where β ∈ R^n and σ^2 〉 0 are unknown. We study the admissibility of linear estimators of mean vector under a quadratic loss function. A necessary and sufficient condition of the admi...Suppose Y - N(β, σ^2 In), where β ∈ R^n and σ^2 〉 0 are unknown. We study the admissibility of linear estimators of mean vector under a quadratic loss function. A necessary and sufficient condition of the admissible linear estimator is given.展开更多
Suppose that an order restriction is imposed among several p-variate normal mean vectors. We are interested in testing the homogeneity of these mean vectors under this restriction. This problem is an extension of Sasa...Suppose that an order restriction is imposed among several p-variate normal mean vectors. We are interested in testing the homogeneity of these mean vectors under this restriction. This problem is an extension of Sasabuchi, Tanaka and Tsukamoto's problem.展开更多
Suppose that we observe y|θ,τ∼N_(p)(Xθ,τ^(−1)I_(p)),where θ is an unknown vector with unknown precisionτ.Estimating the regression coefficient θ with known τ has been well studied.However,statistical properti...Suppose that we observe y|θ,τ∼N_(p)(Xθ,τ^(−1)I_(p)),where θ is an unknown vector with unknown precisionτ.Estimating the regression coefficient θ with known τ has been well studied.However,statistical properties such as admissibility in estimating θ with unknownτare not well studied.Han[(2009).Topics in shrinkage estimation and in causal inference(PhD thesis).Warton School,University of Pennsylvania]appears to be the first to consider the problem,developing sufficient conditions for the admissibility of estimating means of multivariate normal distributions with unknown variance.We generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model.2-level and 3-level hierarchical models with unknown precisionτare investigated when a standard class of hierarchical priors leads to admissible estimators of θ under the normalised squared error loss.One reason to consider this problem is the importance of admissibility in the hierarchical prior selection,and we expect that our study could be helpful in providing some reference for choosing hierarchical priors.展开更多
This paper investigates and discusses the use of information divergence,through the widely used Kullback–Leibler(KL)divergence,under the multivariate(generalized)γ-order normal distribution(γ-GND).The behavior of t...This paper investigates and discusses the use of information divergence,through the widely used Kullback–Leibler(KL)divergence,under the multivariate(generalized)γ-order normal distribution(γ-GND).The behavior of the KL divergence,as far as its symmetricity is concerned,is studied by calculating the divergence of γ-GND over the Student’s multivariate t-distribution and vice versa.Certain special cases are also given and discussed.Furthermore,three symmetrized forms of the KL divergence,i.e.,the Jeffreys distance,the geometric-KL as well as the harmonic-KL distances,are computed between two members of the γ-GND family,while the corresponding differences between those information distances are also discussed.展开更多
文摘The aim of this paper is to present a generalization of the Shapiro-Wilk W-test or Shapiro-Francia W'-test for application to two or more variables. It consists of calculating all the unweighted linear combinations of the variables and their W- or W'-statistics with the Royston’s log-transformation and standardization, z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub>. Because the calculation of the probability of z<sub>ln(1-W)</sub> or z<sub>ln(1-W</sub><sub>'</sub><sub>)</sub> is to the right tail, negative values are truncated to 0 before doing their sum of squares. Independence in the sequence of these half-normally distributed values is required for the test statistic to follow a chi-square distribution. This assumption is checked using the robust Ljung-Box test. One degree of freedom is lost for each cancelled value. Defined the new test with its two variants (Q-test or Q'-test), 50 random samples with 4 variables and 20 participants were generated, 20% following a multivariate normal distribution and 80% deviating from this distribution. The new test was compared with Mardia’s, runs, and Royston’s tests. Central tendency differences in type II error and statistical power were tested using the Friedman’s test and pairwise comparisons using the Wilcoxon’s test. Differences in the frequency of successes in statistical decision making were compared using the Cochran’s Q test and pairwise comparisons using the McNemar’s test. Sensitivity, specificity and efficiency proportions were compared using the McNemar’s Z test. The generated 50 samples were classified into five ordered categories of deviation from multivariate normality, the correlation between this variable and p-value of each test was calculated using the Spearman’s coefficient and these correlations were compared. Family-wise error rate corrections were applied. The new test and the Royston’s test were the best choices, with a very slight advantage Q-test over Q'-test. Based on these promising results, further study and use of this new sensitive, specific and effective test are suggested.
文摘The computation of the multivariate normal integral over a Complex Subspace is a challenge, especially when the inte-gration region is of a complex nature. Such integrals are met with, for example, in the generalized Neyman-Pearson criterion, conditional Bayesian problems of testing many hypotheses and so on. The Monte-Carlo methods could be used for their computation, but at increasing dimensionality of the integral the computation time increases unjustifiedly. Therefore a method of computation of such integrals by series after reduction of dimensionality to one without information loss is offered below. The calculation results are given.
基金National Outstanding Youth Science Foundation of China Under Grant No.598251005
文摘Based on the De.Morgan laws and Boolean simplification, a recursive decomposition method is introduced in this paper to identify the main exclusive safe paths and failed paths of a network. The reliability or the reliability bound of a network can be conveniently expressed as the summation of the joint probabilities of these paths. Under the multivariate normal distribution assumption, a conditioned reliability index method is developed to evaluate joint probabilities of various exclusive safe paths and failed paths, and, finally, the seismic reliability or the reliability bound of an electric power system. Examples given in the paper show that the method is very simple and provides accurate results in the seismic reliability analysis.
文摘A new dimension-reduction graphical method for testing high- dimensional normality is developed by using the theory of spherical distributions and the idea of principal component analysis. The dimension reduction is realized by projecting high-dimensional data onto some selected eigenvector directions. The asymptotic statistical independence of the plotting functions on the selected eigenvector directions provides the principle for the new plot. A departure from multivariate normality of the raw data could be captured by at least one plot on the selected eigenvector direction. Acceptance regions associated with the plots are provided to enhance interpretability of the plots. Monte Carlo studies and an illustrative example show that the proposed graphical method has competitive power performance and improves the existing graphical method significantly in testing high-dimensional normality.
基金supported by National Natural Science Foundation of China(Grant Nos.11201452 and 11271346)the Specialized Research Fund for the Doctoral Program of Higher Education of China(Grant No.20123402120017)the Fundamental Research Funds for the Central Universities(Grant No.WK0010000052)
文摘In this paper, the Bayes estimator and the parametric empirical Bayes estimator(PEBE) of mean vector in multivariate normal distribution are obtained. The superiority of the PEBE over the minimum variance unbiased estimator(MVUE) and a revised James-Stein estimators(RJSE) are investigated respectively under mean square error(MSE) criterion. Extensive simulations are conducted to show that performance of the PEBE is optimal among these three estimators under the MSE criterion.
基金Supported by the National Social Science Foundation of China (No. 09BTJ012)Scientific Research Fund ofHunan Provincial Education Department (No. 09c390)+1 种基金supported in part by a HKUSeed Funding Program for Basic Research (Project No. 2009-1115-9042)a grant from Hong Kong ResearchGrant Council-General Research Fund (Project No. HKU779210M)
文摘Sampling from a truncated multivariate normal distribution (TMVND) constitutes the core computational module in fitting many statistical and econometric models. We propose two efficient methods, an iterative data augmentation (DA) algorithm and a non-iterative inverse Bayes formulae (IBF) sampler, to simulate TMVND and generalize them to multivariate normal distributions with linear inequality constraints. By creating a Bayesian incomplete-data structure, the posterior step of the DA Mgorithm directly generates random vector draws as opposed to single element draws, resulting obvious computational advantage and easy coding with common statistical software packages such as S-PLUS, MATLAB and GAUSS. Furthermore, the DA provides a ready structure for implementing a fast EM algorithm to identify the mode of TMVND, which has many potential applications in statistical inference of constrained parameter problems. In addition, utilizing this mode as an intermediate result, the IBF sampling provides a novel alternative to Gibbs sampling and elimi- nares problems with convergence and possible slow convergence due to the high correlation between components of a TMVND. The DA algorithm is applied to a linear regression model with constrained parameters and is illustrated with a published data set. Numerical comparisons show that the proposed DA algorithm and IBF sampler are more efficient than the Gibbs sampler and the accept-reject algorithm.
文摘In a previous article,an R script was developed and divided into three parts to implement the multivariate normality(MVN)Q-test based on both the chisquare approximation and the bootstrap approach,using either the Shapiro-Wilk W statistic(QSWa and QSWb)or the Shapiro-Francia W’statistic(QSFa and QSFb).Royston’s H-test was included as a supplementary MVN test.The aim of this study is to compare the hit rate and statistical power of the four Qtest variants and the H-test using 200 samples drawn from multivariate stand-ard normal distributions and 200 samples from multivariate t-distributions with five degrees of freedom.The simulations vary in sample size(50,75,100,125,150,200,250,and 500),number of variables(from 2 to 6),and homoge-neous inter-variable correlation(0,0.3,0.5,0.7,and 0.9).The H-test outper-formed QSWb and QSFb,but not QSWa in the multivariate normal samples or QSFa in the multivariate t-distribution samples.QSFb performed better than QSWb.It is concluded that the bootstrap approach is conservative under the null hypothesis of multivariate normality.However,when the assumption of independence is violated,the bootstrap approach is theoretically more ap-propriate than QSWa and QSFa.A 10%significance level is recommended for QSFb in terms of hit rate,but in terms of statistical power,only when rejecting the null hypothesis.
文摘In 2023,a multivariate normality test based on a chi-square approximation was developed.This method assumes independence among Gaussian random variables,and defines the test statistic,denoted by Q,as the sum of squared values.This study aims to develop R scripts that implement the Q-test for mul-tivariate normality using either the Shapiro-Wilk W statistic(QSWa)or the Shapiro-Francia W’statistic(QSFa).A bootstrap version of the Q-test(QSWb and QSFb),which does not assume independence,is also included.Addition-ally,it incorporates Royston’s H-test.The use of the scripts is illustrated with a sample of 50 participants assessed on a variable across four yearly admin-istrations.The sampling distribution generated by the bootstrap method dif-fers from the chi-square distribution and corresponds to a generalized chisquare distribution-namely,the distribution of a sum of squares of correlated variables.This distribution is less peaked and has a heavier right tail than the chi-square distribution.It is concluded that the bootstrap approach is con-servative under the null hypothesis of multivariate normality;however,it is theoretically more appropriate than the chi-square approximation.To ap-proximate the distributions of the two versions of the Q-test,it is recom-mended that the z or z’values set to zero in the calculation of the Q statistic not be subtracted when determining the degrees of freedom in the chi-square approximation.Moreover,a significance level of 10%is suggested for the bootstrap approach,rather than the conventional 5%.
基金This work is supported by The NNSF of China with Nos.10071090 and 10271013
文摘Suppose Y - N(β, σ^2 In), where β ∈ R^n and σ^2 〉 0 are unknown. We study the admissibility of linear estimators of mean vector under a quadratic loss function. A necessary and sufficient condition of the admissible linear estimator is given.
基金Supported by Shanghai Leading Academic Discipline Project, Project No. B803
文摘Suppose that an order restriction is imposed among several p-variate normal mean vectors. We are interested in testing the homogeneity of these mean vectors under this restriction. This problem is an extension of Sasabuchi, Tanaka and Tsukamoto's problem.
基金supported by the 111 Project of China(No.B14019)the National Natural Science Foundation of China[Grant No.11671146].
文摘Suppose that we observe y|θ,τ∼N_(p)(Xθ,τ^(−1)I_(p)),where θ is an unknown vector with unknown precisionτ.Estimating the regression coefficient θ with known τ has been well studied.However,statistical properties such as admissibility in estimating θ with unknownτare not well studied.Han[(2009).Topics in shrinkage estimation and in causal inference(PhD thesis).Warton School,University of Pennsylvania]appears to be the first to consider the problem,developing sufficient conditions for the admissibility of estimating means of multivariate normal distributions with unknown variance.We generalise the sufficient conditions for admissibility and apply these results to the normal linear regression model.2-level and 3-level hierarchical models with unknown precisionτare investigated when a standard class of hierarchical priors leads to admissible estimators of θ under the normalised squared error loss.One reason to consider this problem is the importance of admissibility in the hierarchical prior selection,and we expect that our study could be helpful in providing some reference for choosing hierarchical priors.
文摘This paper investigates and discusses the use of information divergence,through the widely used Kullback–Leibler(KL)divergence,under the multivariate(generalized)γ-order normal distribution(γ-GND).The behavior of the KL divergence,as far as its symmetricity is concerned,is studied by calculating the divergence of γ-GND over the Student’s multivariate t-distribution and vice versa.Certain special cases are also given and discussed.Furthermore,three symmetrized forms of the KL divergence,i.e.,the Jeffreys distance,the geometric-KL as well as the harmonic-KL distances,are computed between two members of the γ-GND family,while the corresponding differences between those information distances are also discussed.