The set of probability functions is a convex subset of L1 and it does not have a linear space structure when using ordinary sum and multiplication by real constants. Moreover, difficulties arise when dealing with dist...The set of probability functions is a convex subset of L1 and it does not have a linear space structure when using ordinary sum and multiplication by real constants. Moreover, difficulties arise when dealing with distances between densities. The crucial point is that usual distances are not invariant under relevant transformations of densities. To overcome these limitations, Aitchison's ideas on compositional data analysis are used, generalizing perturbation and power transformation, as well as the Aitchison inner product, to operations on probability density functions with support on a finite interval. With these operations at hand, it is shown that the set of bounded probability density functions on finite intervals is a pre-Hilbert space. A Hilbert space of densities, whose logarithm is square-integrable, is obtained as the natural completion of the pre-Hilbert space.展开更多
Jean Aitchison是英国牛津大学的Ru-pert Murdoch语言与传播学教授,我国读者较熟稔的是她的<会说话的哺乳动物>(TheArticulate Mammal).其实她的另一本关于语言起源和进化的书<言语的萌发>(TheSeeds of Speech),也由外语教...Jean Aitchison是英国牛津大学的Ru-pert Murdoch语言与传播学教授,我国读者较熟稔的是她的<会说话的哺乳动物>(TheArticulate Mammal).其实她的另一本关于语言起源和进化的书<言语的萌发>(TheSeeds of Speech),也由外语教学与研究出版社同时引进,写得同样出色.<言语的萌发>已有陈国华写的导读,导读不但对书的内容和篇章结构作了详细的叙述,而且提供了许多关于语言起源的背景材料,为通读这本书提供了很多方便.陈的导读也为我的这篇评论作了很多铺垫,我可以不必全面介绍书的内容,进入讨论我感兴趣的问题,而不至于挂一漏万.展开更多
Indicator kriging (IK) is a spatial interpolation technique devised for estimating a conditional cumulative distribution function at an unsampled location. The result is a discrete approximation, and its correspondi...Indicator kriging (IK) is a spatial interpolation technique devised for estimating a conditional cumulative distribution function at an unsampled location. The result is a discrete approximation, and its corresponding estimated probability density function can be viewed as a composition in the simplex. This fact suggested a compositional approach to IK which, by construction, avoids all its standard drawbacks (negative predictions, not-ordered or larger than one). Here, a simple algorithm to develop the procedure is presented.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
基金the Dirección General de Investigación of the Spanish Ministry for ScienceTechnology through the project BFM2003-05640/MATE and from the Departament d'Universitats,Recerca i Societat de la Informac
文摘The set of probability functions is a convex subset of L1 and it does not have a linear space structure when using ordinary sum and multiplication by real constants. Moreover, difficulties arise when dealing with distances between densities. The crucial point is that usual distances are not invariant under relevant transformations of densities. To overcome these limitations, Aitchison's ideas on compositional data analysis are used, generalizing perturbation and power transformation, as well as the Aitchison inner product, to operations on probability density functions with support on a finite interval. With these operations at hand, it is shown that the set of bounded probability density functions on finite intervals is a pre-Hilbert space. A Hilbert space of densities, whose logarithm is square-integrable, is obtained as the natural completion of the pre-Hilbert space.
基金the Dirección General de Ensen~anza Superior, Ministerió de Educación y Cultura (Spain) (BFM2003-05640 and MTM2006-03040)the Universitat de Girona (Spain) (BR01/03) the Deutsche Akademische Austauschdienst (Germany) (A/04/33586).
文摘Indicator kriging (IK) is a spatial interpolation technique devised for estimating a conditional cumulative distribution function at an unsampled location. The result is a discrete approximation, and its corresponding estimated probability density function can be viewed as a composition in the simplex. This fact suggested a compositional approach to IK which, by construction, avoids all its standard drawbacks (negative predictions, not-ordered or larger than one). Here, a simple algorithm to develop the procedure is presented.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.