The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,an...The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.展开更多
Spatially Constrained Mixture Model(SCMM)is an image segmentation model that works over the framework of maximum a-posteriori and Markov Random Field(MAP-MRF).It developed its own maximization step to be used within t...Spatially Constrained Mixture Model(SCMM)is an image segmentation model that works over the framework of maximum a-posteriori and Markov Random Field(MAP-MRF).It developed its own maximization step to be used within this framework.This research has proposed an improvement in the SCMM’s maximization step for segmenting simulated brain Magnetic Resonance Images(MRIs).The improved model is named as the Weighted Spatially Constrained Finite Mixture Model(WSCFMM).To compare the performance of SCMM and WSCFMM,simulated T1-Weighted normal MRIs were segmented.A region of interest(ROI)was extracted from segmented images.The similarity level between the extracted ROI and the ground truth(GT)was found by using the Jaccard and Dice similarity measuring method.According to the Jaccard similarity measuring method,WSCFMM showed an overall improvement of 4.72%,whereas the Dice similarity measuring method provided an overall improvement of 2.65%against the SCMM.Besides,WSCFMM signicantly stabilized and reduced the execution time by showing an improvement of 83.71%.The study concludes that WSCFMM is a stable model and performs better as compared to the SCMM in noisy and noise-free environments.展开更多
Since its first flight in 2007,the UAVSAR instrument of NASA has acquired a large number of fully Polarimetric SAR(PolSAR)data in very high spatial resolution.It is possible to observe small spatial features in this t...Since its first flight in 2007,the UAVSAR instrument of NASA has acquired a large number of fully Polarimetric SAR(PolSAR)data in very high spatial resolution.It is possible to observe small spatial features in this type of data,offering the opportunity to explore structures in the images.In general,the structured scenes would present multimodal or spiky histograms.The finite mixture model has great advantages in modeling data with irregular histograms.In this paper,a type of important statistics called log-cumulants,which could be used to design parameter estimator or goodness-of-fit tests,are derived for the finite mixture model.They are compared with logcumulants of the texture models.The results are adopted to UAVSAR data analysis to determine which model is better for different land types.展开更多
Mixture models have become more popular in modelling compared to standard distributions. The mixing distributions play a role in capturing the variability of the random variable in the conditional distribution. Studie...Mixture models have become more popular in modelling compared to standard distributions. The mixing distributions play a role in capturing the variability of the random variable in the conditional distribution. Studies have lately focused on finite mixture models as mixing distributions in the mixing mechanism. In the present work, we consider a Normal Variance Mean mix<span>ture model. The mixing distribution is a finite mixture of two special cases of</span><span> Generalised Inverse Gaussian distribution with indexes <span style="white-space:nowrap;">-1/2 and -3/2</span>. The </span><span>parameters of the mixed model are obtained via the Expectation-Maximization</span><span> (EM) algorithm. The iterative scheme is based on a presentation of the normal equations. An application to some financial data has been done.展开更多
The computational accuracy and efficiency of modeling the stress spectrum derived from bridge monitoring data significantly influence the fatigue life assessment of steel bridges.Therefore,determining the optimal stre...The computational accuracy and efficiency of modeling the stress spectrum derived from bridge monitoring data significantly influence the fatigue life assessment of steel bridges.Therefore,determining the optimal stress spectrum model is crucial for further fatigue reliability analysis.This study investigates the performance of the REBMIX algorithm in modeling both univariate(stress range)and multivariate(stress range and mean stress)distributions of the rain-flowmatrix for a steel arch bridge,usingAkaike’s Information Criterion(AIC)as a performance metric.Four types of finitemixture distributions—Normal,Lognormal,Weibull,and Gamma—are employed tomodel the stress range.Additionally,mixed distributions,including Normal-Normal,Lognormal-Normal,Weibull-Normal,and Gamma-Normal,are utilized to model the joint distribution of stress range and mean stress.The REBMIX algorithm estimates the number of components,component weights,and component parameters for each candidate finite mixture distribution.The results demonstrate that the REBMIX algorithm-based mixture parameter estimation approach effectively identifies the optimal distribution based on AIC values.Furthermore,the algorithm exhibits superior computational efficiency compared to traditional methods,making it highly suitable for practical applications.展开更多
Numerous clustering algorithms are valuable in pattern recognition in forest vegetation,with new ones continually being proposed.While some are well-known,others are underutilized in vegetation science.This study comp...Numerous clustering algorithms are valuable in pattern recognition in forest vegetation,with new ones continually being proposed.While some are well-known,others are underutilized in vegetation science.This study compares the performance of practical iterative reallocation algorithms with model-based clustering algorithms.The data is from forest vegetation in Virginia(United States),the Hyrcanian Forest(Asia),and European beech forests.Practical iterative reallocation algorithms were applied as non-hierarchical methods and Finite Gaussian mixture modeling was used as a model-based clustering method.Due to limitations on dimensionality in model-based clustering,principal coordinates analysis was employed to reduce the dataset’s dimensions.A log transformation was applied to achieve a normal distribution for the pseudo-species data before calculating the Bray-Curtis dissimilarity.The findings indicate that the reallocation of misclassified objects based on silhouette width(OPTSIL)with Flexible-β(-0.25)had the highest mean among the tested clustering algorithms with Silhouette width 1(REMOS1)with Flexible-β(-0.25)second.However,model-based clustering performed poorly.Based on these results,it is recommended using OPTSIL with Flexible-β(-0.25)and REMOS1 with Flexible-β(-0.25)for forest vegetation classification instead of model-based clustering particularly for heterogeneous datasets common in forest vegetation community data.展开更多
Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the...Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.展开更多
<p> <span style="color:#000000;"><span style="color:#000000;">Normal Variance-Mean Mixture (NVMM) provide</span></span><span style="color:#000000;"><...<p> <span style="color:#000000;"><span style="color:#000000;">Normal Variance-Mean Mixture (NVMM) provide</span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">s</span></span></span><span><span><span><span style="color:#000000;"> a general framework for deriving models with desirable properties for modelling financial market variables such as exchange rates, equity prices, and interest rates measured over short time intervals, </span><i><span style="color:#000000;">i.e.</span></i><span style="color:#000000;"> daily or weekly. Such data sets are characterized by non-normality and are usually skewed, fat-tailed and exhibit excess kurtosis. </span><span style="color:#000000;">The Generalised Hyperbolic distribution (GHD) introduced by Barndorff-</span><span style="color:#000000;">Nielsen </span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">(1977)</span></span></span><span><span><span><span style="color:#000000;"> which act as Normal variance-mean mixtures with Generalised Inverse Gaussian (GIG) mixing distribution nest a number of special and limiting case distributions. The Normal Inverse Gaussian (NIG) distribution is obtained when the Inverse Gaussian is the mixing distribution, </span><i><span style="color:#000000;">i.e</span></i></span></span></span><span style="color:#000000;"><span style="color:#000000;"><i><span style="color:#000000;">.</span></i></span></span><span><span><span><span style="color:#000000;">, the index parameter of the GIG is</span><span style="color:red;"> <img src="Edit_721a4317-7ef5-4796-9713-b9057bc426fc.bmp" alt="" /></span><span style="color:#000000;">. The NIG is very popular because of its analytical tractability. In the mixing mechanism</span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">,</span></span></span><span><span><span><span><span style="color:#000000;"> the mixing distribution characterizes the prior information of the random variable of the conditional distribution. Therefore, considering finite mixture models is one way of extending the work. The GIG is a three parameter distribution denoted by </span><img src="Edit_d21f2e1e-d426-401e-bf8b-f56d268dddb6.bmp" alt="" /></span><span><span style="color:#000000;"> and nest several special and limiting cases. When </span><img src="Edit_ffee9824-2b75-4ea6-a3d2-e048d49b553f.bmp" alt="" /></span><span><span style="color:#000000;">, we have </span><img src="Edit_654ea565-9798-4435-9a59-a0a1a7c282df.bmp" alt="" /></span><span style="color:#000000;"> which is called an Inverse Gaussian (IG) distribution. </span><span><span><span style="color:#000000;">When </span><img src="Edit_b15daf3d-849f-440a-9e4f-7b0c78d519e5.bmp" alt="" /></span><span style="color:red;"><span style="color:#000000;">, </span><img src="Edit_08a2088c-f57e-401c-8fb9-9974eec5947a.bmp" alt="" /><span style="color:#000000;">, </span><img src="Edit_130f4d7c-3e27-4937-b60f-6bf6e41f1f52.bmp" alt="" /><span style="color:#000000;">,</span></span><span><span style="color:#000000;"> we have </span><img src="Edit_215e67cb-b0d9-44e1-88d1-a2598dea05af.bmp" alt="" /></span><span style="color:red;"><span style="color:#000000;">, </span><img src="Edit_6bf9602b-a9c9-4a9d-aed0-049c47fe8dfe.bmp" alt="" /></span></span><span style="color:red;"><span style="color:#000000;"> </span><span><span style="color:#000000;">and </span><img src="Edit_d642ba7f-8b63-4830-aea1-d6e5fba31cc8.bmp" alt="" /></span></span><span><span style="color:#000000;"> distributions respectively. These distributions are related to </span><img src="Edit_0ca6658e-54cb-4d4d-87fa-25eb3a0a8934.bmp" alt="" /></span><span style="color:#000000;"> and are called weighted inverse Gaussian distributions. In this</span> <span style="color:#000000;">work</span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">,</span></span></span><span><span><span><span style="color:#000000;"> we consider a finite mixture of </span><img src="Edit_30ee74b7-0bfc-413d-b4d6-43902ec6c69d.bmp" alt="" /></span></span></span><span><span><span><span><span style="color:#000000;"> and </span><img src="Edit_ba62dff8-eb11-48f9-8388-68f5ee954c00.bmp" alt="" /></span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;"> and show that the mixture is also a weighted Inverse Gaussian distribution and use it to construct a NVMM. Due to the complexity of the likelihood, direct maximization is difficult. An EM type algorithm is provided for the Maximum Likelihood estimation of the parameters of the proposed model. We adopt an iterative scheme which is not based on explicit solution to the normal equations. This subtle approach reduces the computational difficulty of solving the complicated quantities involved directly to designing an iterative scheme based on a representation of the normal equation. The algorithm is easily programmable and we obtained a monotonic convergence for the data sets used.</span></span></span> </p>展开更多
The particle Probability Hypotheses Density (particle-PHD) filter is a tractable approach for Random Finite Set (RFS) Bayes estimation, but the particle-PHD filter can not directly derive the target track. Most existi...The particle Probability Hypotheses Density (particle-PHD) filter is a tractable approach for Random Finite Set (RFS) Bayes estimation, but the particle-PHD filter can not directly derive the target track. Most existing approaches combine the data association step to solve this problem. This paper proposes an algorithm which does not need the association step. Our basic ideal is based on the clustering algorithm of Finite Mixture Models (FMM). The intensity distribution is first derived by the particle-PHD filter, and then the clustering algorithm is applied to estimate the multitarget states and tracks jointly. The clustering process includes two steps: the prediction and update. The key to the proposed algorithm is to use the prediction as the initial points and the convergent points as the es- timates. Besides, Expectation-Maximization (EM) and Markov Chain Monte Carlo (MCMC) ap- proaches are used for the FMM parameter estimation.展开更多
The classical risk process that is perturbed by diffusion is studied. The explicit expressions for the ruin probability and the surplus distribution of the risk process at the time of ruin are obtained when the claim ...The classical risk process that is perturbed by diffusion is studied. The explicit expressions for the ruin probability and the surplus distribution of the risk process at the time of ruin are obtained when the claim amount distribution is a finite mixture of exponential distributions or a Gamma (2, α) distribution.展开更多
High frequency financial data is characterized by non-normality: asymmetric, leptokurtic and fat-tailed behaviour. The normal distribution is therefore inadequate in capturing these characteristics. To this end, vario...High frequency financial data is characterized by non-normality: asymmetric, leptokurtic and fat-tailed behaviour. The normal distribution is therefore inadequate in capturing these characteristics. To this end, various flexible distributions have been proposed. It is well known that mixture distributions produce flexible models with good statistical and probabilistic properties. In this work, a finite mixture of two special cases of Generalized Inverse Gaussian distribution has been constructed. Using this finite mixture as a mixing distribution to the Normal Variance Mean Mixture we get a Normal Weighted Inverse Gaussian (NWIG) distribution. The second objective, therefore, is to construct and obtain properties of the NWIG distribution. The maximum likelihood parameter estimates of the proposed model are estimated via EM algorithm and three data sets are used for application. The result shows that the proposed model is flexible and fits the data well.展开更多
This paper examines city growth patterns and the corresponding city size distribution evolution over long periods of time using a simple New Economic Geography(NEG) model and urban population data from Canada. The mai...This paper examines city growth patterns and the corresponding city size distribution evolution over long periods of time using a simple New Economic Geography(NEG) model and urban population data from Canada. The main findings are twofold. First, there is a transition from sequential to parallel growth of cities over long periods of time: city growth shows a sequential mode in the stage of rapid urbanization, i.e., the cities with the best development conditions will take the lead in growth, after which the cities with higher ranks will become the fastest-growing cities; in the late stage of urbanization, city growth converges according to Gibrat′s law, and exhibits a parallel growth pattern. Second, city size distribution is found to have persistent structural characteristics: the city system is self-organized into multiple discrete size groups; city growth shows club convergence characteristics, and the cities with similar development conditions eventually converge to a similar size. The results will not only enhance our understanding of urbanization process, but will also provide a timely and clear policy reference for promoting the healthy urbanization of developing countries.展开更多
For plant-wide processes with multiple operating conditions,the multimode feature imposes some challenges to conventional monitoring techniques.Hence,to solve this problem,this paper provides a novel local component b...For plant-wide processes with multiple operating conditions,the multimode feature imposes some challenges to conventional monitoring techniques.Hence,to solve this problem,this paper provides a novel local component based principal component analysis(LCPCA)approach for monitoring the status of a multimode process.In LCPCA,the process prior knowledge of mode division is not required and it purely based on the process data.Firstly,LCPCA divides the processes data into multiple local components using finite Gaussian mixture model mixture(FGMM).Then,calculating the posterior probability is applied to determine each sample belonging to which local component.After that,the local component information(such as mean and standard deviation)is used to standardize each sample of local component.Finally,the standardized samples of each local component are combined to train PCA monitoring model.Based on the PCA monitoring model,two monitoring statistics T^(2) and SPE are used for monitoring multimode processes.Through a numerical example and the Tennessee Eastman(TE)process,the monitoring result demonstrates that LCPCA outperformed conventional PCA and LNS-PCA in the fault detection rate.展开更多
Two-parameter gamma distributions are widely used in liability theory, lifetime data analysis, financial statistics, and other areas. Finite mixtures of gamma distributions are their natural extensions, and they are p...Two-parameter gamma distributions are widely used in liability theory, lifetime data analysis, financial statistics, and other areas. Finite mixtures of gamma distributions are their natural extensions, and they are particularly useful when the population is suspected of heterogeneity. These distributions are successfully employed in various applications, but many researchers falsely believe that the maximum likelihood estimator of the mixing distribution is consistent. Similarly to finite mixtures of normal distributions, the likelihood function under finite gamma mixtures is unbounded. Because of this, each observed value leads to a global maximum that is irrelevant to the true distribution. We apply a seemingly negligible penalty to the likelihood according to the shape parameters in the fitted model. We show that this penalty restores the consistency of the likelihoodbased estimator of the mixing distribution under finite gamma mixture models. We present simulation results to validate the consistency conclusion, and we give an example to illustrate the key points.展开更多
In this paper, we propose a novel performance monitoring and fault detection method, which is based on modified structure analysis and globality and locality preserving (MSAGL) projection, for non-Gaussian processes...In this paper, we propose a novel performance monitoring and fault detection method, which is based on modified structure analysis and globality and locality preserving (MSAGL) projection, for non-Gaussian processes with multiple operation conditions. By using locality preserving projection to analyze the embedding geometrical manifold and extracting the non-Gaussian features by independent component analysis, MSAGL preserves both the global and local structures of the data simultaneously. Furthermore, the tradeoff parameter of MSAGL is tuned adaptively in order to find the projection direction optimal for revealing the hidden structural information. The validity and effectiveness of this approach are illustrated by applying the proposed technique to the Tennessee Eastman process simulation under multiple operation conditions. The results demonstrate the advantages of the proposed method over conventional eigendecomposition-based monitoring methotis.展开更多
基金funded by the National Key Research and Development Program of China(No.2022YFD2200503-02)。
文摘The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.
文摘Spatially Constrained Mixture Model(SCMM)is an image segmentation model that works over the framework of maximum a-posteriori and Markov Random Field(MAP-MRF).It developed its own maximization step to be used within this framework.This research has proposed an improvement in the SCMM’s maximization step for segmenting simulated brain Magnetic Resonance Images(MRIs).The improved model is named as the Weighted Spatially Constrained Finite Mixture Model(WSCFMM).To compare the performance of SCMM and WSCFMM,simulated T1-Weighted normal MRIs were segmented.A region of interest(ROI)was extracted from segmented images.The similarity level between the extracted ROI and the ground truth(GT)was found by using the Jaccard and Dice similarity measuring method.According to the Jaccard similarity measuring method,WSCFMM showed an overall improvement of 4.72%,whereas the Dice similarity measuring method provided an overall improvement of 2.65%against the SCMM.Besides,WSCFMM signicantly stabilized and reduced the execution time by showing an improvement of 83.71%.The study concludes that WSCFMM is a stable model and performs better as compared to the SCMM in noisy and noise-free environments.
基金This work has been supported in part by the Shenzhen Science&Technology Program[grant number JSGG20150512145714247]the State Key Program of National Natural Science of China[grant number 61331016]National Key Research Plan of China[grant number 2016YFC0500201-07].
文摘Since its first flight in 2007,the UAVSAR instrument of NASA has acquired a large number of fully Polarimetric SAR(PolSAR)data in very high spatial resolution.It is possible to observe small spatial features in this type of data,offering the opportunity to explore structures in the images.In general,the structured scenes would present multimodal or spiky histograms.The finite mixture model has great advantages in modeling data with irregular histograms.In this paper,a type of important statistics called log-cumulants,which could be used to design parameter estimator or goodness-of-fit tests,are derived for the finite mixture model.They are compared with logcumulants of the texture models.The results are adopted to UAVSAR data analysis to determine which model is better for different land types.
文摘Mixture models have become more popular in modelling compared to standard distributions. The mixing distributions play a role in capturing the variability of the random variable in the conditional distribution. Studies have lately focused on finite mixture models as mixing distributions in the mixing mechanism. In the present work, we consider a Normal Variance Mean mix<span>ture model. The mixing distribution is a finite mixture of two special cases of</span><span> Generalised Inverse Gaussian distribution with indexes <span style="white-space:nowrap;">-1/2 and -3/2</span>. The </span><span>parameters of the mixed model are obtained via the Expectation-Maximization</span><span> (EM) algorithm. The iterative scheme is based on a presentation of the normal equations. An application to some financial data has been done.
基金jointly supported by the Fundamental Research Funds for the Central Universities(Grant No.xzy012023075)the Zhejiang Engineering Research Center of Intelligent Urban Infrastructure(Grant No.IUI2023-YB-12).
文摘The computational accuracy and efficiency of modeling the stress spectrum derived from bridge monitoring data significantly influence the fatigue life assessment of steel bridges.Therefore,determining the optimal stress spectrum model is crucial for further fatigue reliability analysis.This study investigates the performance of the REBMIX algorithm in modeling both univariate(stress range)and multivariate(stress range and mean stress)distributions of the rain-flowmatrix for a steel arch bridge,usingAkaike’s Information Criterion(AIC)as a performance metric.Four types of finitemixture distributions—Normal,Lognormal,Weibull,and Gamma—are employed tomodel the stress range.Additionally,mixed distributions,including Normal-Normal,Lognormal-Normal,Weibull-Normal,and Gamma-Normal,are utilized to model the joint distribution of stress range and mean stress.The REBMIX algorithm estimates the number of components,component weights,and component parameters for each candidate finite mixture distribution.The results demonstrate that the REBMIX algorithm-based mixture parameter estimation approach effectively identifies the optimal distribution based on AIC values.Furthermore,the algorithm exhibits superior computational efficiency compared to traditional methods,making it highly suitable for practical applications.
基金financially supported by the vice chancellor for research and technology of Urmia University
文摘Numerous clustering algorithms are valuable in pattern recognition in forest vegetation,with new ones continually being proposed.While some are well-known,others are underutilized in vegetation science.This study compares the performance of practical iterative reallocation algorithms with model-based clustering algorithms.The data is from forest vegetation in Virginia(United States),the Hyrcanian Forest(Asia),and European beech forests.Practical iterative reallocation algorithms were applied as non-hierarchical methods and Finite Gaussian mixture modeling was used as a model-based clustering method.Due to limitations on dimensionality in model-based clustering,principal coordinates analysis was employed to reduce the dataset’s dimensions.A log transformation was applied to achieve a normal distribution for the pseudo-species data before calculating the Bray-Curtis dissimilarity.The findings indicate that the reallocation of misclassified objects based on silhouette width(OPTSIL)with Flexible-β(-0.25)had the highest mean among the tested clustering algorithms with Silhouette width 1(REMOS1)with Flexible-β(-0.25)second.However,model-based clustering performed poorly.Based on these results,it is recommended using OPTSIL with Flexible-β(-0.25)and REMOS1 with Flexible-β(-0.25)for forest vegetation classification instead of model-based clustering particularly for heterogeneous datasets common in forest vegetation community data.
文摘Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.
文摘<p> <span style="color:#000000;"><span style="color:#000000;">Normal Variance-Mean Mixture (NVMM) provide</span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">s</span></span></span><span><span><span><span style="color:#000000;"> a general framework for deriving models with desirable properties for modelling financial market variables such as exchange rates, equity prices, and interest rates measured over short time intervals, </span><i><span style="color:#000000;">i.e.</span></i><span style="color:#000000;"> daily or weekly. Such data sets are characterized by non-normality and are usually skewed, fat-tailed and exhibit excess kurtosis. </span><span style="color:#000000;">The Generalised Hyperbolic distribution (GHD) introduced by Barndorff-</span><span style="color:#000000;">Nielsen </span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">(1977)</span></span></span><span><span><span><span style="color:#000000;"> which act as Normal variance-mean mixtures with Generalised Inverse Gaussian (GIG) mixing distribution nest a number of special and limiting case distributions. The Normal Inverse Gaussian (NIG) distribution is obtained when the Inverse Gaussian is the mixing distribution, </span><i><span style="color:#000000;">i.e</span></i></span></span></span><span style="color:#000000;"><span style="color:#000000;"><i><span style="color:#000000;">.</span></i></span></span><span><span><span><span style="color:#000000;">, the index parameter of the GIG is</span><span style="color:red;"> <img src="Edit_721a4317-7ef5-4796-9713-b9057bc426fc.bmp" alt="" /></span><span style="color:#000000;">. The NIG is very popular because of its analytical tractability. In the mixing mechanism</span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">,</span></span></span><span><span><span><span><span style="color:#000000;"> the mixing distribution characterizes the prior information of the random variable of the conditional distribution. Therefore, considering finite mixture models is one way of extending the work. The GIG is a three parameter distribution denoted by </span><img src="Edit_d21f2e1e-d426-401e-bf8b-f56d268dddb6.bmp" alt="" /></span><span><span style="color:#000000;"> and nest several special and limiting cases. When </span><img src="Edit_ffee9824-2b75-4ea6-a3d2-e048d49b553f.bmp" alt="" /></span><span><span style="color:#000000;">, we have </span><img src="Edit_654ea565-9798-4435-9a59-a0a1a7c282df.bmp" alt="" /></span><span style="color:#000000;"> which is called an Inverse Gaussian (IG) distribution. </span><span><span><span style="color:#000000;">When </span><img src="Edit_b15daf3d-849f-440a-9e4f-7b0c78d519e5.bmp" alt="" /></span><span style="color:red;"><span style="color:#000000;">, </span><img src="Edit_08a2088c-f57e-401c-8fb9-9974eec5947a.bmp" alt="" /><span style="color:#000000;">, </span><img src="Edit_130f4d7c-3e27-4937-b60f-6bf6e41f1f52.bmp" alt="" /><span style="color:#000000;">,</span></span><span><span style="color:#000000;"> we have </span><img src="Edit_215e67cb-b0d9-44e1-88d1-a2598dea05af.bmp" alt="" /></span><span style="color:red;"><span style="color:#000000;">, </span><img src="Edit_6bf9602b-a9c9-4a9d-aed0-049c47fe8dfe.bmp" alt="" /></span></span><span style="color:red;"><span style="color:#000000;"> </span><span><span style="color:#000000;">and </span><img src="Edit_d642ba7f-8b63-4830-aea1-d6e5fba31cc8.bmp" alt="" /></span></span><span><span style="color:#000000;"> distributions respectively. These distributions are related to </span><img src="Edit_0ca6658e-54cb-4d4d-87fa-25eb3a0a8934.bmp" alt="" /></span><span style="color:#000000;"> and are called weighted inverse Gaussian distributions. In this</span> <span style="color:#000000;">work</span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;">,</span></span></span><span><span><span><span style="color:#000000;"> we consider a finite mixture of </span><img src="Edit_30ee74b7-0bfc-413d-b4d6-43902ec6c69d.bmp" alt="" /></span></span></span><span><span><span><span><span style="color:#000000;"> and </span><img src="Edit_ba62dff8-eb11-48f9-8388-68f5ee954c00.bmp" alt="" /></span></span></span></span><span style="color:#000000;"><span style="color:#000000;"><span style="color:#000000;"> and show that the mixture is also a weighted Inverse Gaussian distribution and use it to construct a NVMM. Due to the complexity of the likelihood, direct maximization is difficult. An EM type algorithm is provided for the Maximum Likelihood estimation of the parameters of the proposed model. We adopt an iterative scheme which is not based on explicit solution to the normal equations. This subtle approach reduces the computational difficulty of solving the complicated quantities involved directly to designing an iterative scheme based on a representation of the normal equation. The algorithm is easily programmable and we obtained a monotonic convergence for the data sets used.</span></span></span> </p>
基金Supported by the National Key Fundamental Research & Development Program of China (2007CB11006)the Zhejiang Natural Science Foundation (R106745, Y1080422)
文摘The particle Probability Hypotheses Density (particle-PHD) filter is a tractable approach for Random Finite Set (RFS) Bayes estimation, but the particle-PHD filter can not directly derive the target track. Most existing approaches combine the data association step to solve this problem. This paper proposes an algorithm which does not need the association step. Our basic ideal is based on the clustering algorithm of Finite Mixture Models (FMM). The intensity distribution is first derived by the particle-PHD filter, and then the clustering algorithm is applied to estimate the multitarget states and tracks jointly. The clustering process includes two steps: the prediction and update. The key to the proposed algorithm is to use the prediction as the initial points and the convergent points as the es- timates. Besides, Expectation-Maximization (EM) and Markov Chain Monte Carlo (MCMC) ap- proaches are used for the FMM parameter estimation.
文摘The classical risk process that is perturbed by diffusion is studied. The explicit expressions for the ruin probability and the surplus distribution of the risk process at the time of ruin are obtained when the claim amount distribution is a finite mixture of exponential distributions or a Gamma (2, α) distribution.
文摘High frequency financial data is characterized by non-normality: asymmetric, leptokurtic and fat-tailed behaviour. The normal distribution is therefore inadequate in capturing these characteristics. To this end, various flexible distributions have been proposed. It is well known that mixture distributions produce flexible models with good statistical and probabilistic properties. In this work, a finite mixture of two special cases of Generalized Inverse Gaussian distribution has been constructed. Using this finite mixture as a mixing distribution to the Normal Variance Mean Mixture we get a Normal Weighted Inverse Gaussian (NWIG) distribution. The second objective, therefore, is to construct and obtain properties of the NWIG distribution. The maximum likelihood parameter estimates of the proposed model are estimated via EM algorithm and three data sets are used for application. The result shows that the proposed model is flexible and fits the data well.
基金Under the auspices of Key Program of Chinese Academy of Sciences(No.KZZD-EW-06-01)
文摘This paper examines city growth patterns and the corresponding city size distribution evolution over long periods of time using a simple New Economic Geography(NEG) model and urban population data from Canada. The main findings are twofold. First, there is a transition from sequential to parallel growth of cities over long periods of time: city growth shows a sequential mode in the stage of rapid urbanization, i.e., the cities with the best development conditions will take the lead in growth, after which the cities with higher ranks will become the fastest-growing cities; in the late stage of urbanization, city growth converges according to Gibrat′s law, and exhibits a parallel growth pattern. Second, city size distribution is found to have persistent structural characteristics: the city system is self-organized into multiple discrete size groups; city growth shows club convergence characteristics, and the cities with similar development conditions eventually converge to a similar size. The results will not only enhance our understanding of urbanization process, but will also provide a timely and clear policy reference for promoting the healthy urbanization of developing countries.
基金National Natural Science Foundation of China(61673279)。
文摘For plant-wide processes with multiple operating conditions,the multimode feature imposes some challenges to conventional monitoring techniques.Hence,to solve this problem,this paper provides a novel local component based principal component analysis(LCPCA)approach for monitoring the status of a multimode process.In LCPCA,the process prior knowledge of mode division is not required and it purely based on the process data.Firstly,LCPCA divides the processes data into multiple local components using finite Gaussian mixture model mixture(FGMM).Then,calculating the posterior probability is applied to determine each sample belonging to which local component.After that,the local component information(such as mean and standard deviation)is used to standardize each sample of local component.Finally,the standardized samples of each local component are combined to train PCA monitoring model.Based on the PCA monitoring model,two monitoring statistics T^(2) and SPE are used for monitoring multimode processes.Through a numerical example and the Tennessee Eastman(TE)process,the monitoring result demonstrates that LCPCA outperformed conventional PCA and LNS-PCA in the fault detection rate.
基金supported by Grants from One Thousand Talents at Yunnan Universitya Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (Grant No. RGPIN–2014–03743)
文摘Two-parameter gamma distributions are widely used in liability theory, lifetime data analysis, financial statistics, and other areas. Finite mixtures of gamma distributions are their natural extensions, and they are particularly useful when the population is suspected of heterogeneity. These distributions are successfully employed in various applications, but many researchers falsely believe that the maximum likelihood estimator of the mixing distribution is consistent. Similarly to finite mixtures of normal distributions, the likelihood function under finite gamma mixtures is unbounded. Because of this, each observed value leads to a global maximum that is irrelevant to the true distribution. We apply a seemingly negligible penalty to the likelihood according to the shape parameters in the fitted model. We show that this penalty restores the consistency of the likelihoodbased estimator of the mixing distribution under finite gamma mixture models. We present simulation results to validate the consistency conclusion, and we give an example to illustrate the key points.
文摘In this paper, we propose a novel performance monitoring and fault detection method, which is based on modified structure analysis and globality and locality preserving (MSAGL) projection, for non-Gaussian processes with multiple operation conditions. By using locality preserving projection to analyze the embedding geometrical manifold and extracting the non-Gaussian features by independent component analysis, MSAGL preserves both the global and local structures of the data simultaneously. Furthermore, the tradeoff parameter of MSAGL is tuned adaptively in order to find the projection direction optimal for revealing the hidden structural information. The validity and effectiveness of this approach are illustrated by applying the proposed technique to the Tennessee Eastman process simulation under multiple operation conditions. The results demonstrate the advantages of the proposed method over conventional eigendecomposition-based monitoring methotis.