This study developed a hierarchical Bayesian(HB)model for local and regional flood frequency analysis in the Dongting Lake Basin,in China.The annual maximum daily flows from 15 streamflow-gauged sites in the study are...This study developed a hierarchical Bayesian(HB)model for local and regional flood frequency analysis in the Dongting Lake Basin,in China.The annual maximum daily flows from 15 streamflow-gauged sites in the study area were analyzed with the HB model.The generalized extreme value(GEV)distribution was selected as the extreme flood distribution,and the GEV distribution location and scale parameters were spatially modeled through a regression approach with the drainage area as a covariate.The Markov chain Monte Carlo(MCMC)method with Gibbs sampling was employed to calculate the posterior distribution in the HB model.The results showed that the proposed HB model provided satisfactory Bayesian credible intervals for flood quantiles,while the traditional delta method could not provide reliable uncertainty estimations for large flood quantiles,due to the fact that the lower confidence bounds tended to decrease as the return periods increased.Furthermore,the HB model for regional analysis allowed for a reduction in the value of some restrictive assumptions in the traditional index flood method,such as the homogeneity region assumption and the scale invariance assumption.The HB model can also provide an uncertainty band of flood quantile prediction at a poorly gauged or ungauged site,but the index flood method with L-moments does not demonstrate this uncertainty directly.Therefore,the HB model is an effective method of implementing the flexible local and regional frequency analysis scheme,and of quantifying the associated predictive uncertainty.展开更多
Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator....Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator. Hierarchical Bayesian approach to SAE problems offers several advantages over traditional SAE models including the ability of appropriately accounting for the type of surveyed variable. In this paper, a number of model specifications for estimating small area counts are discussed and their relative merits are illustrated. We conducted a simulation study by reproducing in a simplified form the Italian Labour Force Survey and taking the Local Labor Markets as target areas. Simulated data were generated by assuming population characteristics of interest as well as survey sampling design as known. In one set of experiments, numbers of employment/unemployment from census data were utilized, in others population characteristics were varied. Results show persistent model failures for some standard Fay-Herriot specifications and for generalized linear Poisson models with (log-)normal sampling stage, whilst either unmatched or nonnormal sampling stage models get the best performance in terms of bias, accuracy and reliability. Though, the study also found that any model noticeably improves on its performance by letting sampling variances be stochastically determined rather than assumed as known as is the general practice. Moreover, we address the issue of model determination to point out limits and possible deceptions of commonly used criteria for model selection and checking in SAE context.展开更多
This study seeks to investigate the variations associated with lane lateral locations and days of the week in the stochastic and dynamic transition of traffic regimes(DTTR).In the proposed analysis,hierarchical regres...This study seeks to investigate the variations associated with lane lateral locations and days of the week in the stochastic and dynamic transition of traffic regimes(DTTR).In the proposed analysis,hierarchical regression models fitted using Bayesian frameworks were used to calibrate the transition probabilities that describe the DTTR.Datasets of two sites on a freeway facility located in Jacksonville,Florida,were selected for the analysis.The traffic speed thresholds to define traffic regimes were estimated using the Gaussian mixture model(GMM).The GMM revealed that two and three regimes were adequate mixture components for estimating the traffic speed distributions for Site 1 and 2 datasets,respectively.The results of hierarchical regression models show that there is considerable evidence that there are heterogeneity characteristics in the DTTR associated with lateral lane locations.In particular,the hierarchical regressions reveal that the breakdown process is more affected by the variations compared to other evaluated transition processes with the estimated intra-class correlation(ICC)of about 73%.The transition from congestion on-set/dissolution(COD)to the congested regime is estimated with the highest ICC of 49.4%in the three-regime model,and the lowest ICC of 1%was observed on the transition from the congested to COD regime.On the other hand,different days of the week are not found to contribute to the variations(the highest ICC was 1.44%)on the DTTR.These findings can be used in developing effective congestion countermeasures,particularly in the application of intelligent transportation systems,such as dynamic lane-management strategies.展开更多
A concept map is a diagram depicting relationships among concepts which is used as a knowledge representation tool in many knowledge domains. In this paper, we build on the modeling framework of Hui et al. (2008) in o...A concept map is a diagram depicting relationships among concepts which is used as a knowledge representation tool in many knowledge domains. In this paper, we build on the modeling framework of Hui et al. (2008) in order to develop a concept map suitable for testing the empirical evidence of theories. We identify a theory by a set of core tenets each asserting that one set of independent variables affects one dependent variable, moreover every variable can have several operational definitions. Data consist of a selected sample of scientific articles from the empirical literature on the theory under investigation. Our “tenet map” features a number of complexities more than the original version. First the links are two-layer: first-layer links connect variables which are related in the test of the theory at issue;second-layer links represent connections which are found statistically significant. Besides, either layer matrix of link-formation probabilities is block-symmetric. In addition to a form of censoring which resembles the Hui et al. pruning step, observed maps are subject to a further censoring related to second-layer links. Still, we perform a full Bayesian analysis instead of adopting the empirical Bayes approach. Lastly, we develop a three-stage model which accounts for dependence either of data or of parameters. The investigation of the empirical support and consensus degree of new economic theories of the firm motivated the proposed methodology. In this paper, the Transaction Cost Economics view is tested by a tenet map analysis. Both the two-stage and the multilevel models identify the same tenets as the most corroborated by empirical evidence though the latter provides a more comprehensive and complex insight of relationships between constructs.展开更多
Computations involved in Bayesian approach to practical model selection problems are usually very difficult. Computational simplifications are sometimes possible, but are not generally applicable. There is a large lit...Computations involved in Bayesian approach to practical model selection problems are usually very difficult. Computational simplifications are sometimes possible, but are not generally applicable. There is a large literature available on a methodology based on information theory called Minimum Description Length (MDL). It is described here how many of these techniques are either directly Bayesian in nature, or are very good objective approximations to Bayesian solutions. First, connections between the Bayesian approach and MDL are theoretically explored;thereafter a few illustrations are provided to describe how MDL can give useful computational simplifications.展开更多
This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be infe...This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonpara-metric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic rela-tionships compared to the hierarchical latent Dirichlet allocation model.展开更多
Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studie...Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation ofpiecewise linear regression models. The method used to estimate the parameters ofpicewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC (Marcov Chain Monte Carlo) algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters ofpicewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.展开更多
理解环境过滤、生物相互作用与中性过程如何共同塑造物种分布与群落结构,是现代群落生态学的核心问题。然而,传统多样性指数、排序分析及单物种分布模型(single-species distribution models, SDMs)难以同时整合物种间关联、环境梯度、...理解环境过滤、生物相互作用与中性过程如何共同塑造物种分布与群落结构,是现代群落生态学的核心问题。然而,传统多样性指数、排序分析及单物种分布模型(single-species distribution models, SDMs)难以同时整合物种间关联、环境梯度、性状与谱系等多维信息,导致对群落构建机制的解析能力受限。联合物种分布模型(joint species distribution models, JSDMs)特别是生物群落层次建模(hierarchical modelling of species communities, HMSC)框架的提出,为群落尺度的机制推断提供了统一而灵活的贝叶斯工具。本文系统综述了HMSC的统计结构、数学原理与推断机制,构建了一个从数据组织、模型设定、马尔可夫链蒙特卡洛(Markov Chain Monte Carlo, MCMC)估计、模型评估到生态解释与预测的完整分析流程。同时结合苔藓群落数据配套编写了《联合物种分布模型HMSC的应用分步教程》,通过分步讲解与可运行R代码,助力研究者快速掌握该方法的实操应用。在理论部分,本文明确了HMSC如何在统一的贝叶斯层级框架下整合环境梯度、物种性状、系统发育关系以及空间结构,从而分离环境过滤、生物过滤与扩散限制的统计信号。在方法层面,本文通过解析潜变量模型的数学结构,阐明了残差相关在生态解释中的边界,为理解物种共现信号、区分环境效应与未观测因子提供了理论依据;对比了HMSC与其他主流JSDMs工具及传统群落统计方法的优势及适用性。在应用层面,综述了其在森林、湿地、草原、海洋、城市及微生物生态学中的应用进展,展示了其在保护规划、入侵种风险评估、共现网络分析及情景预测中的广泛价值;随着图形处理器加速与迁移学习与大规模高维数据框架的发展, HMSC可提升稀有物种生态位估计与分布预测,使数十万物种的群落建模成为可能。综上, JSDMs及HMSC不仅在生态统计方法论上实现了从单物种预测到多物种-多维信息整合的跨越,更为生态理论检验、群落构建机制解析及保护决策制定提供了高效、可扩展且能量化不确定性的工具平台。展开更多
A major challenge in spatial transcriptomics(ST)is resolving cellular composition,especially in technologies lacking single-cell resolution.The mixture of transcriptional signals within spatial spots complicates decon...A major challenge in spatial transcriptomics(ST)is resolving cellular composition,especially in technologies lacking single-cell resolution.The mixture of transcriptional signals within spatial spots complicates deconvolution and downstream analyses.To uncover the spatial heterogeneity of tissues,we introduce SvdRFCTD,a reference-free spatial transcriptomics deconvolution method,which estimates the cell type proportions at each spot on the tissue.To fully capture the heterogeneity in the ST data,we combine SvdRFCTD with a Bayesian hierarchical negative binomial model with spatial effects incorporated in both the mean and dispersion of the gene expression,which is used to explicitly model the generative mechanism of cell type proportions.By integrating spatial information and leveraging marker gene information,SvdRFCTD accurately estimates cell type proportions and uncovers complex spatial patterns.We demonstrate the ability of SvdRFCTD to identify cell types on simulated datasets.By applying SvdRFCTD to mouse brain and human pancreatic ductal adenocarcinomas datasets,we observe significant cellular heterogeneity within the tissue sections and successfully identify regions with high proportions of aggregated cell types,along with the spatial relationships between different cell types.展开更多
As one of the top four commercially important species in China,yellow croaker(Larimichthys polyactis)with two geographic subpopulations,has undergone profound changes during the last several decades.It is widely compr...As one of the top four commercially important species in China,yellow croaker(Larimichthys polyactis)with two geographic subpopulations,has undergone profound changes during the last several decades.It is widely comprehended that understanding its population dynamics is critically important for sustainable management of this valuable fishery in China.The only two existing population dynamics models assessed the population of yellow croaker using short time-series data,without considering geographical variations.In this study,Bayesian models with and without hierarchical subpopulation structure were developed to explore the spatial heterogeneity of the population dynamics of yellow croaker from 1968 to 2015.Alternative hypotheses were constructed to test potential temporal patterns in yellow croaker’s population dynamics.Substantial variations in population dynamics characteristics among space and time were found through this study.The population growth rate was revealed to increase since the late 1980s,and the catchability increased more than twice from 1981 to 2015.The East China Sea’s subpopulation witnesses faster growth,but suffers from higher fishing pressure than that in the Bohai Sea and Yellow Sea.The global population and two subpopulations all have high risks of overfishing and being overfished according to the MSY-based reference points in recent years.More conservative management strategies with subpopulation considerations are imperative for the fishery management of yellow croaker in China.The methodology developed in this study could also be applied to the stock assessment and fishery management of other species,especially for those species with large spatial heterogeneity data.展开更多
基金supported by the National Natural Science Foundation of China(Grants No.51779074 and 41371052)the Special Fund for the Public Welfare Industry of the Ministry of Water Resources of China(Grant No.201501059)+3 种基金the National Key Research and Development Program of China(Grant No.2017YFC0404304)the Jiangsu Water Conservancy Science and Technology Project(Grant No.2017027)the Program for Outstanding Young Talents in Colleges and Universities of Anhui Province(Grant No.gxyq2018143)the Natural Science Foundation of Wanjiang University of Technology(Grant No.WG18030)
文摘This study developed a hierarchical Bayesian(HB)model for local and regional flood frequency analysis in the Dongting Lake Basin,in China.The annual maximum daily flows from 15 streamflow-gauged sites in the study area were analyzed with the HB model.The generalized extreme value(GEV)distribution was selected as the extreme flood distribution,and the GEV distribution location and scale parameters were spatially modeled through a regression approach with the drainage area as a covariate.The Markov chain Monte Carlo(MCMC)method with Gibbs sampling was employed to calculate the posterior distribution in the HB model.The results showed that the proposed HB model provided satisfactory Bayesian credible intervals for flood quantiles,while the traditional delta method could not provide reliable uncertainty estimations for large flood quantiles,due to the fact that the lower confidence bounds tended to decrease as the return periods increased.Furthermore,the HB model for regional analysis allowed for a reduction in the value of some restrictive assumptions in the traditional index flood method,such as the homogeneity region assumption and the scale invariance assumption.The HB model can also provide an uncertainty band of flood quantile prediction at a poorly gauged or ungauged site,but the index flood method with L-moments does not demonstrate this uncertainty directly.Therefore,the HB model is an effective method of implementing the flexible local and regional frequency analysis scheme,and of quantifying the associated predictive uncertainty.
文摘Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator. Hierarchical Bayesian approach to SAE problems offers several advantages over traditional SAE models including the ability of appropriately accounting for the type of surveyed variable. In this paper, a number of model specifications for estimating small area counts are discussed and their relative merits are illustrated. We conducted a simulation study by reproducing in a simplified form the Italian Labour Force Survey and taking the Local Labor Markets as target areas. Simulated data were generated by assuming population characteristics of interest as well as survey sampling design as known. In one set of experiments, numbers of employment/unemployment from census data were utilized, in others population characteristics were varied. Results show persistent model failures for some standard Fay-Herriot specifications and for generalized linear Poisson models with (log-)normal sampling stage, whilst either unmatched or nonnormal sampling stage models get the best performance in terms of bias, accuracy and reliability. Though, the study also found that any model noticeably improves on its performance by letting sampling variances be stochastically determined rather than assumed as known as is the general practice. Moreover, we address the issue of model determination to point out limits and possible deceptions of commonly used criteria for model selection and checking in SAE context.
文摘This study seeks to investigate the variations associated with lane lateral locations and days of the week in the stochastic and dynamic transition of traffic regimes(DTTR).In the proposed analysis,hierarchical regression models fitted using Bayesian frameworks were used to calibrate the transition probabilities that describe the DTTR.Datasets of two sites on a freeway facility located in Jacksonville,Florida,were selected for the analysis.The traffic speed thresholds to define traffic regimes were estimated using the Gaussian mixture model(GMM).The GMM revealed that two and three regimes were adequate mixture components for estimating the traffic speed distributions for Site 1 and 2 datasets,respectively.The results of hierarchical regression models show that there is considerable evidence that there are heterogeneity characteristics in the DTTR associated with lateral lane locations.In particular,the hierarchical regressions reveal that the breakdown process is more affected by the variations compared to other evaluated transition processes with the estimated intra-class correlation(ICC)of about 73%.The transition from congestion on-set/dissolution(COD)to the congested regime is estimated with the highest ICC of 49.4%in the three-regime model,and the lowest ICC of 1%was observed on the transition from the congested to COD regime.On the other hand,different days of the week are not found to contribute to the variations(the highest ICC was 1.44%)on the DTTR.These findings can be used in developing effective congestion countermeasures,particularly in the application of intelligent transportation systems,such as dynamic lane-management strategies.
文摘A concept map is a diagram depicting relationships among concepts which is used as a knowledge representation tool in many knowledge domains. In this paper, we build on the modeling framework of Hui et al. (2008) in order to develop a concept map suitable for testing the empirical evidence of theories. We identify a theory by a set of core tenets each asserting that one set of independent variables affects one dependent variable, moreover every variable can have several operational definitions. Data consist of a selected sample of scientific articles from the empirical literature on the theory under investigation. Our “tenet map” features a number of complexities more than the original version. First the links are two-layer: first-layer links connect variables which are related in the test of the theory at issue;second-layer links represent connections which are found statistically significant. Besides, either layer matrix of link-formation probabilities is block-symmetric. In addition to a form of censoring which resembles the Hui et al. pruning step, observed maps are subject to a further censoring related to second-layer links. Still, we perform a full Bayesian analysis instead of adopting the empirical Bayes approach. Lastly, we develop a three-stage model which accounts for dependence either of data or of parameters. The investigation of the empirical support and consensus degree of new economic theories of the firm motivated the proposed methodology. In this paper, the Transaction Cost Economics view is tested by a tenet map analysis. Both the two-stage and the multilevel models identify the same tenets as the most corroborated by empirical evidence though the latter provides a more comprehensive and complex insight of relationships between constructs.
文摘Computations involved in Bayesian approach to practical model selection problems are usually very difficult. Computational simplifications are sometimes possible, but are not generally applicable. There is a large literature available on a methodology based on information theory called Minimum Description Length (MDL). It is described here how many of these techniques are either directly Bayesian in nature, or are very good objective approximations to Bayesian solutions. First, connections between the Bayesian approach and MDL are theoretically explored;thereafter a few illustrations are provided to describe how MDL can give useful computational simplifications.
基金Project (No. 60773180) supported by the National Natural Science Foundation of China
文摘This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonpara-metric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic rela-tionships compared to the hierarchical latent Dirichlet allocation model.
文摘Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation ofpiecewise linear regression models. The method used to estimate the parameters ofpicewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC (Marcov Chain Monte Carlo) algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters ofpicewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.
文摘理解环境过滤、生物相互作用与中性过程如何共同塑造物种分布与群落结构,是现代群落生态学的核心问题。然而,传统多样性指数、排序分析及单物种分布模型(single-species distribution models, SDMs)难以同时整合物种间关联、环境梯度、性状与谱系等多维信息,导致对群落构建机制的解析能力受限。联合物种分布模型(joint species distribution models, JSDMs)特别是生物群落层次建模(hierarchical modelling of species communities, HMSC)框架的提出,为群落尺度的机制推断提供了统一而灵活的贝叶斯工具。本文系统综述了HMSC的统计结构、数学原理与推断机制,构建了一个从数据组织、模型设定、马尔可夫链蒙特卡洛(Markov Chain Monte Carlo, MCMC)估计、模型评估到生态解释与预测的完整分析流程。同时结合苔藓群落数据配套编写了《联合物种分布模型HMSC的应用分步教程》,通过分步讲解与可运行R代码,助力研究者快速掌握该方法的实操应用。在理论部分,本文明确了HMSC如何在统一的贝叶斯层级框架下整合环境梯度、物种性状、系统发育关系以及空间结构,从而分离环境过滤、生物过滤与扩散限制的统计信号。在方法层面,本文通过解析潜变量模型的数学结构,阐明了残差相关在生态解释中的边界,为理解物种共现信号、区分环境效应与未观测因子提供了理论依据;对比了HMSC与其他主流JSDMs工具及传统群落统计方法的优势及适用性。在应用层面,综述了其在森林、湿地、草原、海洋、城市及微生物生态学中的应用进展,展示了其在保护规划、入侵种风险评估、共现网络分析及情景预测中的广泛价值;随着图形处理器加速与迁移学习与大规模高维数据框架的发展, HMSC可提升稀有物种生态位估计与分布预测,使数十万物种的群落建模成为可能。综上, JSDMs及HMSC不仅在生态统计方法论上实现了从单物种预测到多物种-多维信息整合的跨越,更为生态理论检验、群落构建机制解析及保护决策制定提供了高效、可扩展且能量化不确定性的工具平台。
基金supported by the Natural Science Foundation of China[grant numbers 12201219,12271168,12171229]Natural Science Foundation of Shanghai[grant numbers 23JS1400500,23JS1400800,22ZR1420500].
文摘A major challenge in spatial transcriptomics(ST)is resolving cellular composition,especially in technologies lacking single-cell resolution.The mixture of transcriptional signals within spatial spots complicates deconvolution and downstream analyses.To uncover the spatial heterogeneity of tissues,we introduce SvdRFCTD,a reference-free spatial transcriptomics deconvolution method,which estimates the cell type proportions at each spot on the tissue.To fully capture the heterogeneity in the ST data,we combine SvdRFCTD with a Bayesian hierarchical negative binomial model with spatial effects incorporated in both the mean and dispersion of the gene expression,which is used to explicitly model the generative mechanism of cell type proportions.By integrating spatial information and leveraging marker gene information,SvdRFCTD accurately estimates cell type proportions and uncovers complex spatial patterns.We demonstrate the ability of SvdRFCTD to identify cell types on simulated datasets.By applying SvdRFCTD to mouse brain and human pancreatic ductal adenocarcinomas datasets,we observe significant cellular heterogeneity within the tissue sections and successfully identify regions with high proportions of aggregated cell types,along with the spatial relationships between different cell types.
基金Foundation item:The National Key R&D Program of China under contract No.2017YFE0104400the National Natural Science Foundation of China under contract No.31772852the Fundamental Research Funds for the Central Universities under contract Nos 201512002 and 201562030.
文摘As one of the top four commercially important species in China,yellow croaker(Larimichthys polyactis)with two geographic subpopulations,has undergone profound changes during the last several decades.It is widely comprehended that understanding its population dynamics is critically important for sustainable management of this valuable fishery in China.The only two existing population dynamics models assessed the population of yellow croaker using short time-series data,without considering geographical variations.In this study,Bayesian models with and without hierarchical subpopulation structure were developed to explore the spatial heterogeneity of the population dynamics of yellow croaker from 1968 to 2015.Alternative hypotheses were constructed to test potential temporal patterns in yellow croaker’s population dynamics.Substantial variations in population dynamics characteristics among space and time were found through this study.The population growth rate was revealed to increase since the late 1980s,and the catchability increased more than twice from 1981 to 2015.The East China Sea’s subpopulation witnesses faster growth,but suffers from higher fishing pressure than that in the Bohai Sea and Yellow Sea.The global population and two subpopulations all have high risks of overfishing and being overfished according to the MSY-based reference points in recent years.More conservative management strategies with subpopulation considerations are imperative for the fishery management of yellow croaker in China.The methodology developed in this study could also be applied to the stock assessment and fishery management of other species,especially for those species with large spatial heterogeneity data.