Proper understanding of global distribution of infectious diseases is an important part of disease management and policy making. However, data are subject to complexities caused by heterogeneities across host classes ...Proper understanding of global distribution of infectious diseases is an important part of disease management and policy making. However, data are subject to complexities caused by heterogeneities across host classes and space-time epidemic processes. This paper seeks to suggest or propose Bayesian spatio-temporal model for modeling and mapping tuberculosis relative risks in space and time as well identify risks factors associated with the tuberculosis and counties in Kenya with high tuberculosis relative risks. In this paper, we used spatio-temporal Bayesian hierarchical models to study the pattern of tuberculosis relative risks in Kenya. The Markov Chain Monte Carlo method via WinBUGS and R packages were used for simulations and estimation of the parameter estimates. The best fitting model is selected using the Deviance Information Criterion proposed by Spiegelhalter and colleagues. Among the spatio-temporal models used, the Knorr-Held model with space-time interaction type III and IV fit the data well but type IV appears better than type III. Variation in tuberculosis risk is observed among Kenya counties and clustering among counties with high tuberculosis relative risks. The prevalence of HIV is identified as the determinant of TB. We found clustering and heterogeneity of TB risk among high rate counties and the overall tuberculosis risk is slightly decreasing from 2002-2009. We proposed that the Knorr-Held model with interaction type IV should be used to model and map Kenyan tuberculosis relative risks. Interaction of TB relative risk in space and time increases among rural counties that share boundaries with urban counties with high tuberculosis risk. This is due to the ability of models to borrow strength from neighboring counties, such that nearby counties have similar risk. Although the approaches are less than ideal, we hope that our study provide a useful stepping stone in the development of spatial and spatio-temporal methodology for the statistical analysis of risk from tuberculosis in Kenya.展开更多
Objective To investigate the spatiotemporal patterns and socioeconomic factors influencing the incidence of tuberculosis(TB)in the Guangdong Province between 2010 and 2019.Method Spatial and temporal variations in TB ...Objective To investigate the spatiotemporal patterns and socioeconomic factors influencing the incidence of tuberculosis(TB)in the Guangdong Province between 2010 and 2019.Method Spatial and temporal variations in TB incidence were mapped using heat maps and hierarchical clustering.Socioenvironmental influencing factors were evaluated using a Bayesian spatiotemporal conditional autoregressive(ST-CAR)model.Results Annual incidence of TB in Guangdong decreased from 91.85/100,000 in 2010 to 53.06/100,000in 2019.Spatial hotspots were found in northeastern Guangdong,particularly in Heyuan,Shanwei,and Shantou,while Shenzhen,Dongguan,and Foshan had the lowest rates in the Pearl River Delta.The STCAR model showed that the TB risk was lower with higher per capita Gross Domestic Product(GDP)[Relative Risk(RR),0.91;95%Confidence Interval(CI):0.86–0.98],more the ratio of licensed physicians and physician(RR,0.94;95%CI:0.90-0.98),and higher per capita public expenditure(RR,0.94;95%CI:0.90–0.97),with a marginal effect of population density(RR,0.86;95%CI:0.86–1.00).Conclusion The incidence of TB in Guangdong varies spatially and temporally.Areas with poor economic conditions and insufficient healthcare resources are at an increased risk of TB infection.Strategies focusing on equitable health resource distribution and economic development are the key to TB control.展开更多
This study developed a hierarchical Bayesian(HB)model for local and regional flood frequency analysis in the Dongting Lake Basin,in China.The annual maximum daily flows from 15 streamflow-gauged sites in the study are...This study developed a hierarchical Bayesian(HB)model for local and regional flood frequency analysis in the Dongting Lake Basin,in China.The annual maximum daily flows from 15 streamflow-gauged sites in the study area were analyzed with the HB model.The generalized extreme value(GEV)distribution was selected as the extreme flood distribution,and the GEV distribution location and scale parameters were spatially modeled through a regression approach with the drainage area as a covariate.The Markov chain Monte Carlo(MCMC)method with Gibbs sampling was employed to calculate the posterior distribution in the HB model.The results showed that the proposed HB model provided satisfactory Bayesian credible intervals for flood quantiles,while the traditional delta method could not provide reliable uncertainty estimations for large flood quantiles,due to the fact that the lower confidence bounds tended to decrease as the return periods increased.Furthermore,the HB model for regional analysis allowed for a reduction in the value of some restrictive assumptions in the traditional index flood method,such as the homogeneity region assumption and the scale invariance assumption.The HB model can also provide an uncertainty band of flood quantile prediction at a poorly gauged or ungauged site,but the index flood method with L-moments does not demonstrate this uncertainty directly.Therefore,the HB model is an effective method of implementing the flexible local and regional frequency analysis scheme,and of quantifying the associated predictive uncertainty.展开更多
Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator....Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator. Hierarchical Bayesian approach to SAE problems offers several advantages over traditional SAE models including the ability of appropriately accounting for the type of surveyed variable. In this paper, a number of model specifications for estimating small area counts are discussed and their relative merits are illustrated. We conducted a simulation study by reproducing in a simplified form the Italian Labour Force Survey and taking the Local Labor Markets as target areas. Simulated data were generated by assuming population characteristics of interest as well as survey sampling design as known. In one set of experiments, numbers of employment/unemployment from census data were utilized, in others population characteristics were varied. Results show persistent model failures for some standard Fay-Herriot specifications and for generalized linear Poisson models with (log-)normal sampling stage, whilst either unmatched or nonnormal sampling stage models get the best performance in terms of bias, accuracy and reliability. Though, the study also found that any model noticeably improves on its performance by letting sampling variances be stochastically determined rather than assumed as known as is the general practice. Moreover, we address the issue of model determination to point out limits and possible deceptions of commonly used criteria for model selection and checking in SAE context.展开更多
A concept map is a diagram depicting relationships among concepts which is used as a knowledge representation tool in many knowledge domains. In this paper, we build on the modeling framework of Hui et al. (2008) in o...A concept map is a diagram depicting relationships among concepts which is used as a knowledge representation tool in many knowledge domains. In this paper, we build on the modeling framework of Hui et al. (2008) in order to develop a concept map suitable for testing the empirical evidence of theories. We identify a theory by a set of core tenets each asserting that one set of independent variables affects one dependent variable, moreover every variable can have several operational definitions. Data consist of a selected sample of scientific articles from the empirical literature on the theory under investigation. Our “tenet map” features a number of complexities more than the original version. First the links are two-layer: first-layer links connect variables which are related in the test of the theory at issue;second-layer links represent connections which are found statistically significant. Besides, either layer matrix of link-formation probabilities is block-symmetric. In addition to a form of censoring which resembles the Hui et al. pruning step, observed maps are subject to a further censoring related to second-layer links. Still, we perform a full Bayesian analysis instead of adopting the empirical Bayes approach. Lastly, we develop a three-stage model which accounts for dependence either of data or of parameters. The investigation of the empirical support and consensus degree of new economic theories of the firm motivated the proposed methodology. In this paper, the Transaction Cost Economics view is tested by a tenet map analysis. Both the two-stage and the multilevel models identify the same tenets as the most corroborated by empirical evidence though the latter provides a more comprehensive and complex insight of relationships between constructs.展开更多
This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be infe...This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonpara-metric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic rela-tionships compared to the hierarchical latent Dirichlet allocation model.展开更多
This study seeks to investigate the variations associated with lane lateral locations and days of the week in the stochastic and dynamic transition of traffic regimes(DTTR).In the proposed analysis,hierarchical regres...This study seeks to investigate the variations associated with lane lateral locations and days of the week in the stochastic and dynamic transition of traffic regimes(DTTR).In the proposed analysis,hierarchical regression models fitted using Bayesian frameworks were used to calibrate the transition probabilities that describe the DTTR.Datasets of two sites on a freeway facility located in Jacksonville,Florida,were selected for the analysis.The traffic speed thresholds to define traffic regimes were estimated using the Gaussian mixture model(GMM).The GMM revealed that two and three regimes were adequate mixture components for estimating the traffic speed distributions for Site 1 and 2 datasets,respectively.The results of hierarchical regression models show that there is considerable evidence that there are heterogeneity characteristics in the DTTR associated with lateral lane locations.In particular,the hierarchical regressions reveal that the breakdown process is more affected by the variations compared to other evaluated transition processes with the estimated intra-class correlation(ICC)of about 73%.The transition from congestion on-set/dissolution(COD)to the congested regime is estimated with the highest ICC of 49.4%in the three-regime model,and the lowest ICC of 1%was observed on the transition from the congested to COD regime.On the other hand,different days of the week are not found to contribute to the variations(the highest ICC was 1.44%)on the DTTR.These findings can be used in developing effective congestion countermeasures,particularly in the application of intelligent transportation systems,such as dynamic lane-management strategies.展开更多
Computations involved in Bayesian approach to practical model selection problems are usually very difficult. Computational simplifications are sometimes possible, but are not generally applicable. There is a large lit...Computations involved in Bayesian approach to practical model selection problems are usually very difficult. Computational simplifications are sometimes possible, but are not generally applicable. There is a large literature available on a methodology based on information theory called Minimum Description Length (MDL). It is described here how many of these techniques are either directly Bayesian in nature, or are very good objective approximations to Bayesian solutions. First, connections between the Bayesian approach and MDL are theoretically explored;thereafter a few illustrations are provided to describe how MDL can give useful computational simplifications.展开更多
Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studie...Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation ofpiecewise linear regression models. The method used to estimate the parameters ofpicewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC (Marcov Chain Monte Carlo) algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters ofpicewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.展开更多
This paper presents an investigation into the spatio-temporal dynamics of Severe Acute Respiratory Syndrome(SARS)across the diverse health regions of Brazil from 2016 to 2024.Leveraging extensive datasets that include...This paper presents an investigation into the spatio-temporal dynamics of Severe Acute Respiratory Syndrome(SARS)across the diverse health regions of Brazil from 2016 to 2024.Leveraging extensive datasets that include SARS cases,climate data,hospitalization records,and COVID-19 vaccination information,our study employs a Bayesian spatio-temporal generalized linear model to capture the intricate dependencies inherent in the dataset.The analysis reveals significant variations in the incidence of SARS cases over time,particularly during and between the distinct eras of pre-COVID-19,during,and post-COVID-19.Our modeling approach accommodates explanatory variables such as humidity,temperature,and COVID-19 vaccine doses,providing a comprehensive understanding of the factors influencing SARS dynamics.Our modeling revealed unique temporal trends in SARS cases for each region,resembling neighborhood patterns.Low temperature and high humidity were linked to decreased cases,while in the COVID-19 era,temperature and vaccination coverage played significant roles.The findings contribute valuable insights into the spatial and temporal patterns of SARS in Brazil,offering a foundation for targeted public health interventions and preparedness strategies.展开更多
A major challenge in spatial transcriptomics(ST)is resolving cellular composition,especially in technologies lacking single-cell resolution.The mixture of transcriptional signals within spatial spots complicates decon...A major challenge in spatial transcriptomics(ST)is resolving cellular composition,especially in technologies lacking single-cell resolution.The mixture of transcriptional signals within spatial spots complicates deconvolution and downstream analyses.To uncover the spatial heterogeneity of tissues,we introduce SvdRFCTD,a reference-free spatial transcriptomics deconvolution method,which estimates the cell type proportions at each spot on the tissue.To fully capture the heterogeneity in the ST data,we combine SvdRFCTD with a Bayesian hierarchical negative binomial model with spatial effects incorporated in both the mean and dispersion of the gene expression,which is used to explicitly model the generative mechanism of cell type proportions.By integrating spatial information and leveraging marker gene information,SvdRFCTD accurately estimates cell type proportions and uncovers complex spatial patterns.We demonstrate the ability of SvdRFCTD to identify cell types on simulated datasets.By applying SvdRFCTD to mouse brain and human pancreatic ductal adenocarcinomas datasets,we observe significant cellular heterogeneity within the tissue sections and successfully identify regions with high proportions of aggregated cell types,along with the spatial relationships between different cell types.展开更多
Highway work zones are locations where severe traffic crashes tend to occur.Most of the extant research on work zone crash severity neglects the discrepancy in the injuries sustained by different drivers involved in t...Highway work zones are locations where severe traffic crashes tend to occur.Most of the extant research on work zone crash severity neglects the discrepancy in the injuries sustained by different drivers involved in the same crash.Admittedly,it is essential to analyse crash-level factors to their highest injury severity;but it is equally important to understand driver-level contributing factors to their injury severity to establish effective safety countermeasures to minimize drivers’injury severity.Thus,this research aims to identify the factors with significant impacts on the driver injury severity of work zone crashes and estimate their effects on each severity level.Data on 3880 drivers involved in 2134 work zone crashes are obtained from the Crash Report Sampling System(CRSS)database of the United States and employed for the empirical investigation.A Bayesian hierarchical generalized ordered probit model is advocated for analysing the driver injury severity.Model performance indices suggest that the advocated hierarchical model is superior to the generalized ordered probit model,and considerable within-crash correlation is found across the observed driver injury severity.The estimated parameters show that driver age and sex,alcohol use,vehicle age and type,speeding and speed limit,weather conditions,lighting conditions and crash type have significant effects on the driver injury severity in work zone crashes.Marginal effects of the significant factors on each divided injury severity level are also estimated.Countermeasures are proposed from the results to reduce severe injuries sustained by drivers involved in work zone crashes.展开更多
As one of the top four commercially important species in China,yellow croaker(Larimichthys polyactis)with two geographic subpopulations,has undergone profound changes during the last several decades.It is widely compr...As one of the top four commercially important species in China,yellow croaker(Larimichthys polyactis)with two geographic subpopulations,has undergone profound changes during the last several decades.It is widely comprehended that understanding its population dynamics is critically important for sustainable management of this valuable fishery in China.The only two existing population dynamics models assessed the population of yellow croaker using short time-series data,without considering geographical variations.In this study,Bayesian models with and without hierarchical subpopulation structure were developed to explore the spatial heterogeneity of the population dynamics of yellow croaker from 1968 to 2015.Alternative hypotheses were constructed to test potential temporal patterns in yellow croaker’s population dynamics.Substantial variations in population dynamics characteristics among space and time were found through this study.The population growth rate was revealed to increase since the late 1980s,and the catchability increased more than twice from 1981 to 2015.The East China Sea’s subpopulation witnesses faster growth,but suffers from higher fishing pressure than that in the Bohai Sea and Yellow Sea.The global population and two subpopulations all have high risks of overfishing and being overfished according to the MSY-based reference points in recent years.More conservative management strategies with subpopulation considerations are imperative for the fishery management of yellow croaker in China.The methodology developed in this study could also be applied to the stock assessment and fishery management of other species,especially for those species with large spatial heterogeneity data.展开更多
Indirect approaches to estimation of biomass factors are often applied to measure carbon flux in the forestry sector. An assumption underlying a country-level carbon stock estimate is the representativeness of these f...Indirect approaches to estimation of biomass factors are often applied to measure carbon flux in the forestry sector. An assumption underlying a country-level carbon stock estimate is the representativeness of these factors. Although intensive studies have been conducted to quantify biomass factors, each study typically covers a limited geographic area. The goal of this study was to employ a meta-analysis approach to develop regional bio- mass factors for Quercus mongolica forests in South Korea. The biomass factors of interest were biomass conversion and expansion factor (BCEF), biomass expansion factor (BEF) and root-to-shoot ratio (RSR). Our objectives were to select probability density functions (PDFs) that best fitted the three biomass factors and to quantify their means and uncertainties. A total of 12 scientific publications were selected as data sources based on a set of criteria. Fromthese publications we chose 52 study sites spread out across South Korea. The statistical model for the meta- analysis was a multilevel model with publication (data source) as the nesting factor specified under the Bayesian framework. Gamma, Log-normal and Weibull PDFs were evaluated. The Log-normal PDF yielded the best quanti- tative and qualitative fit for the three biomass factors. However, a poor fit of the PDF to the long right tail of observed BEF and RSR distributions was apparent. The median posterior estimates for means and 95 % credible intervals for BCEF, BEF and RSR across all 12 publica- tions were 1.016 (0.800-1.299), 1.414 (1.304-1.560) and 0.260 (0.200-0.335), respectively. The Log-normal PDF proved useful for estimating carbon stock of Q. mongolica forests on a regional scale and for uncertainty analysis based on Monte Carlo simulation.展开更多
Ore sorting is a preconcentration technology and can dramatically reduce energy and water usage to improve the sustainability and profitability of a mining operation.In porphyry Cu deposits,Cu is the primary target,wi...Ore sorting is a preconcentration technology and can dramatically reduce energy and water usage to improve the sustainability and profitability of a mining operation.In porphyry Cu deposits,Cu is the primary target,with ores usually containing secondary‘pay’metals such as Au,Mo and gangue elements such as Fe and As.Due to sensing technology limitations,secondary and deleterious materials vary in correlation type and strength with Cu but cannot be detected simultaneously via magnetic resonance(MR)ore sorting.Inferring the relationships between Cu and other elemental abundances is particularly critical for mineral processing.The variations in metal grade relationships occur due to the transition into different geological domains.This raises two questions-how to define these geological domains and how the metal grade relationship is influenced by these geological domains.In this paper,linear relationship is assumed between Cu grade and other metal grades.We applies a Bayesian hierarchical(partial-pooling)model to quantify the linear relationships between Cu,Au,and Fe grades from geochemical bore core data.The hierarchical model was compared with two other models-‘complete-pooling’model and‘nopooling’model.Mining blocks were split based on spatial domain to construct hierarchical model.Geochemical bore core data records metal grades measured from laboratory assay with spatial coordinates of sample location.Two case studies from different porphyry Cu deposits were used to evaluate the performance of the hierarchical model.Markov chain Monte Carlo(MCMC)was used to sample the posterior parameters.Our results show that the Bayesian hierarchical model dramatically reduced the posterior predictive variance for metal grades regression compared to the no-pooling model.In addition,the posterior inference in the hierarchical model is insensitive to the choice of prior.The data is wellrepresented in the posterior which indicates a robust model.The results show that the spatial domain can be successfully utilised for metal grade regression.Uncertainty in estimating the relationship between pay metals and both secondary and gangue elements is quantified and shown to be reduced with partial-pooling.Thus,the proposed Bayesian hierarchical model can offer a reliable and stable way to monitor the relationship between metal grades for ore sorting and other mineral processing options.展开更多
The state of in situ stress is a crucial parameter in subsurface engineering,especially for critical projects like nuclear waste repository.As one of the two ISRM suggested methods,the overcoring(OC)method is widely u...The state of in situ stress is a crucial parameter in subsurface engineering,especially for critical projects like nuclear waste repository.As one of the two ISRM suggested methods,the overcoring(OC)method is widely used to estimate the full stress tensors in rocks by independent regression analysis of the data from each OC test.However,such customary independent analysis of individual OC tests,known as no pooling,is liable to yield unreliable test-specific stress estimates due to various uncertainty sources involved in the OC method.To address this problem,a practical and no-cost solution is considered by incorporating into OC data analysis additional information implied within adjacent OC tests,which are usually available in OC measurement campaigns.Hence,this paper presents a Bayesian partial pooling(hierarchical)model for combined analysis of adjacent OC tests.We performed five case studies using OC test data made at a nuclear waste repository research site of Sweden.The results demonstrate that partial pooling of adjacent OC tests indeed allows borrowing of information across adjacent tests,and yields improved stress tensor estimates with reduced uncertainties simultaneously for all individual tests than they are independently analysed as no pooling,particularly for those unreliable no pooling stress estimates.A further model comparison shows that the partial pooling model also gives better predictive performance,and thus confirms that the information borrowed across adjacent OC tests is relevant and effective.展开更多
In genetic association studies of complex diseases, endo-phenotypes such as expression profiles, epigenetic data, or clinical intermediate-phenotypes provide insight to understand the underlying biological path of the...In genetic association studies of complex diseases, endo-phenotypes such as expression profiles, epigenetic data, or clinical intermediate-phenotypes provide insight to understand the underlying biological path of the disease. In such situations, in order to establish the path from the gene to the disease, we have to decide whether the gene acts on the disease phenotype primarily through a specific endo-phenotype or whether the gene influences the disease through an unidentified path which is characterized by different intermediate phenotypes. Here, we address the question that a genetic locus, given its effect on an endo-phenotype, influences the trait of interest primarily through the path of the endo-phenotype. We propose a Bayesian approach that can evaluate the genetic association between the genetic locus and the phenotype of interest in the presence of the genetic effect on the endo-phenotype. Using simulation studies, we verify that our approach has the desired properties and compare this approach with a mediation approach. The proposed Bayesian approach is illustrated by an application to genome-wide association study for childhood asthma (CAMP) that contains expression profiles.展开更多
文摘Proper understanding of global distribution of infectious diseases is an important part of disease management and policy making. However, data are subject to complexities caused by heterogeneities across host classes and space-time epidemic processes. This paper seeks to suggest or propose Bayesian spatio-temporal model for modeling and mapping tuberculosis relative risks in space and time as well identify risks factors associated with the tuberculosis and counties in Kenya with high tuberculosis relative risks. In this paper, we used spatio-temporal Bayesian hierarchical models to study the pattern of tuberculosis relative risks in Kenya. The Markov Chain Monte Carlo method via WinBUGS and R packages were used for simulations and estimation of the parameter estimates. The best fitting model is selected using the Deviance Information Criterion proposed by Spiegelhalter and colleagues. Among the spatio-temporal models used, the Knorr-Held model with space-time interaction type III and IV fit the data well but type IV appears better than type III. Variation in tuberculosis risk is observed among Kenya counties and clustering among counties with high tuberculosis relative risks. The prevalence of HIV is identified as the determinant of TB. We found clustering and heterogeneity of TB risk among high rate counties and the overall tuberculosis risk is slightly decreasing from 2002-2009. We proposed that the Knorr-Held model with interaction type IV should be used to model and map Kenyan tuberculosis relative risks. Interaction of TB relative risk in space and time increases among rural counties that share boundaries with urban counties with high tuberculosis risk. This is due to the ability of models to borrow strength from neighboring counties, such that nearby counties have similar risk. Although the approaches are less than ideal, we hope that our study provide a useful stepping stone in the development of spatial and spatio-temporal methodology for the statistical analysis of risk from tuberculosis in Kenya.
基金supported by the Guangdong Provincial Clinical Research Center for Tuberculosis(No.2020B1111170014)。
文摘Objective To investigate the spatiotemporal patterns and socioeconomic factors influencing the incidence of tuberculosis(TB)in the Guangdong Province between 2010 and 2019.Method Spatial and temporal variations in TB incidence were mapped using heat maps and hierarchical clustering.Socioenvironmental influencing factors were evaluated using a Bayesian spatiotemporal conditional autoregressive(ST-CAR)model.Results Annual incidence of TB in Guangdong decreased from 91.85/100,000 in 2010 to 53.06/100,000in 2019.Spatial hotspots were found in northeastern Guangdong,particularly in Heyuan,Shanwei,and Shantou,while Shenzhen,Dongguan,and Foshan had the lowest rates in the Pearl River Delta.The STCAR model showed that the TB risk was lower with higher per capita Gross Domestic Product(GDP)[Relative Risk(RR),0.91;95%Confidence Interval(CI):0.86–0.98],more the ratio of licensed physicians and physician(RR,0.94;95%CI:0.90-0.98),and higher per capita public expenditure(RR,0.94;95%CI:0.90–0.97),with a marginal effect of population density(RR,0.86;95%CI:0.86–1.00).Conclusion The incidence of TB in Guangdong varies spatially and temporally.Areas with poor economic conditions and insufficient healthcare resources are at an increased risk of TB infection.Strategies focusing on equitable health resource distribution and economic development are the key to TB control.
基金supported by the National Natural Science Foundation of China(Grants No.51779074 and 41371052)the Special Fund for the Public Welfare Industry of the Ministry of Water Resources of China(Grant No.201501059)+3 种基金the National Key Research and Development Program of China(Grant No.2017YFC0404304)the Jiangsu Water Conservancy Science and Technology Project(Grant No.2017027)the Program for Outstanding Young Talents in Colleges and Universities of Anhui Province(Grant No.gxyq2018143)the Natural Science Foundation of Wanjiang University of Technology(Grant No.WG18030)
文摘This study developed a hierarchical Bayesian(HB)model for local and regional flood frequency analysis in the Dongting Lake Basin,in China.The annual maximum daily flows from 15 streamflow-gauged sites in the study area were analyzed with the HB model.The generalized extreme value(GEV)distribution was selected as the extreme flood distribution,and the GEV distribution location and scale parameters were spatially modeled through a regression approach with the drainage area as a covariate.The Markov chain Monte Carlo(MCMC)method with Gibbs sampling was employed to calculate the posterior distribution in the HB model.The results showed that the proposed HB model provided satisfactory Bayesian credible intervals for flood quantiles,while the traditional delta method could not provide reliable uncertainty estimations for large flood quantiles,due to the fact that the lower confidence bounds tended to decrease as the return periods increased.Furthermore,the HB model for regional analysis allowed for a reduction in the value of some restrictive assumptions in the traditional index flood method,such as the homogeneity region assumption and the scale invariance assumption.The HB model can also provide an uncertainty band of flood quantile prediction at a poorly gauged or ungauged site,but the index flood method with L-moments does not demonstrate this uncertainty directly.Therefore,the HB model is an effective method of implementing the flexible local and regional frequency analysis scheme,and of quantifying the associated predictive uncertainty.
文摘Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator. Hierarchical Bayesian approach to SAE problems offers several advantages over traditional SAE models including the ability of appropriately accounting for the type of surveyed variable. In this paper, a number of model specifications for estimating small area counts are discussed and their relative merits are illustrated. We conducted a simulation study by reproducing in a simplified form the Italian Labour Force Survey and taking the Local Labor Markets as target areas. Simulated data were generated by assuming population characteristics of interest as well as survey sampling design as known. In one set of experiments, numbers of employment/unemployment from census data were utilized, in others population characteristics were varied. Results show persistent model failures for some standard Fay-Herriot specifications and for generalized linear Poisson models with (log-)normal sampling stage, whilst either unmatched or nonnormal sampling stage models get the best performance in terms of bias, accuracy and reliability. Though, the study also found that any model noticeably improves on its performance by letting sampling variances be stochastically determined rather than assumed as known as is the general practice. Moreover, we address the issue of model determination to point out limits and possible deceptions of commonly used criteria for model selection and checking in SAE context.
文摘A concept map is a diagram depicting relationships among concepts which is used as a knowledge representation tool in many knowledge domains. In this paper, we build on the modeling framework of Hui et al. (2008) in order to develop a concept map suitable for testing the empirical evidence of theories. We identify a theory by a set of core tenets each asserting that one set of independent variables affects one dependent variable, moreover every variable can have several operational definitions. Data consist of a selected sample of scientific articles from the empirical literature on the theory under investigation. Our “tenet map” features a number of complexities more than the original version. First the links are two-layer: first-layer links connect variables which are related in the test of the theory at issue;second-layer links represent connections which are found statistically significant. Besides, either layer matrix of link-formation probabilities is block-symmetric. In addition to a form of censoring which resembles the Hui et al. pruning step, observed maps are subject to a further censoring related to second-layer links. Still, we perform a full Bayesian analysis instead of adopting the empirical Bayes approach. Lastly, we develop a three-stage model which accounts for dependence either of data or of parameters. The investigation of the empirical support and consensus degree of new economic theories of the firm motivated the proposed methodology. In this paper, the Transaction Cost Economics view is tested by a tenet map analysis. Both the two-stage and the multilevel models identify the same tenets as the most corroborated by empirical evidence though the latter provides a more comprehensive and complex insight of relationships between constructs.
基金Project (No. 60773180) supported by the National Natural Science Foundation of China
文摘This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonpara-metric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic rela-tionships compared to the hierarchical latent Dirichlet allocation model.
文摘This study seeks to investigate the variations associated with lane lateral locations and days of the week in the stochastic and dynamic transition of traffic regimes(DTTR).In the proposed analysis,hierarchical regression models fitted using Bayesian frameworks were used to calibrate the transition probabilities that describe the DTTR.Datasets of two sites on a freeway facility located in Jacksonville,Florida,were selected for the analysis.The traffic speed thresholds to define traffic regimes were estimated using the Gaussian mixture model(GMM).The GMM revealed that two and three regimes were adequate mixture components for estimating the traffic speed distributions for Site 1 and 2 datasets,respectively.The results of hierarchical regression models show that there is considerable evidence that there are heterogeneity characteristics in the DTTR associated with lateral lane locations.In particular,the hierarchical regressions reveal that the breakdown process is more affected by the variations compared to other evaluated transition processes with the estimated intra-class correlation(ICC)of about 73%.The transition from congestion on-set/dissolution(COD)to the congested regime is estimated with the highest ICC of 49.4%in the three-regime model,and the lowest ICC of 1%was observed on the transition from the congested to COD regime.On the other hand,different days of the week are not found to contribute to the variations(the highest ICC was 1.44%)on the DTTR.These findings can be used in developing effective congestion countermeasures,particularly in the application of intelligent transportation systems,such as dynamic lane-management strategies.
文摘Computations involved in Bayesian approach to practical model selection problems are usually very difficult. Computational simplifications are sometimes possible, but are not generally applicable. There is a large literature available on a methodology based on information theory called Minimum Description Length (MDL). It is described here how many of these techniques are either directly Bayesian in nature, or are very good objective approximations to Bayesian solutions. First, connections between the Bayesian approach and MDL are theoretically explored;thereafter a few illustrations are provided to describe how MDL can give useful computational simplifications.
文摘Piecewise linear regression models are very flexible models for modeling the data. If the piecewise linear regression models are matched against the data, then the parameters are generally not known. This paper studies the problem of parameter estimation ofpiecewise linear regression models. The method used to estimate the parameters ofpicewise linear regression models is Bayesian method. But the Bayes estimator can not be found analytically. To overcome these problems, the reversible jump MCMC (Marcov Chain Monte Carlo) algorithm is proposed. Reversible jump MCMC algorithm generates the Markov chain converges to the limit distribution of the posterior distribution of the parameters ofpicewise linear regression models. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of picewise linear regression models.
文摘This paper presents an investigation into the spatio-temporal dynamics of Severe Acute Respiratory Syndrome(SARS)across the diverse health regions of Brazil from 2016 to 2024.Leveraging extensive datasets that include SARS cases,climate data,hospitalization records,and COVID-19 vaccination information,our study employs a Bayesian spatio-temporal generalized linear model to capture the intricate dependencies inherent in the dataset.The analysis reveals significant variations in the incidence of SARS cases over time,particularly during and between the distinct eras of pre-COVID-19,during,and post-COVID-19.Our modeling approach accommodates explanatory variables such as humidity,temperature,and COVID-19 vaccine doses,providing a comprehensive understanding of the factors influencing SARS dynamics.Our modeling revealed unique temporal trends in SARS cases for each region,resembling neighborhood patterns.Low temperature and high humidity were linked to decreased cases,while in the COVID-19 era,temperature and vaccination coverage played significant roles.The findings contribute valuable insights into the spatial and temporal patterns of SARS in Brazil,offering a foundation for targeted public health interventions and preparedness strategies.
基金supported by the Natural Science Foundation of China[grant numbers 12201219,12271168,12171229]Natural Science Foundation of Shanghai[grant numbers 23JS1400500,23JS1400800,22ZR1420500].
文摘A major challenge in spatial transcriptomics(ST)is resolving cellular composition,especially in technologies lacking single-cell resolution.The mixture of transcriptional signals within spatial spots complicates deconvolution and downstream analyses.To uncover the spatial heterogeneity of tissues,we introduce SvdRFCTD,a reference-free spatial transcriptomics deconvolution method,which estimates the cell type proportions at each spot on the tissue.To fully capture the heterogeneity in the ST data,we combine SvdRFCTD with a Bayesian hierarchical negative binomial model with spatial effects incorporated in both the mean and dispersion of the gene expression,which is used to explicitly model the generative mechanism of cell type proportions.By integrating spatial information and leveraging marker gene information,SvdRFCTD accurately estimates cell type proportions and uncovers complex spatial patterns.We demonstrate the ability of SvdRFCTD to identify cell types on simulated datasets.By applying SvdRFCTD to mouse brain and human pancreatic ductal adenocarcinomas datasets,we observe significant cellular heterogeneity within the tissue sections and successfully identify regions with high proportions of aggregated cell types,along with the spatial relationships between different cell types.
基金supported by the Open Fund of the Key Laboratory of Highway Engineering of Ministry of Education(Changsha University of Science&Technology)(Grant No.kfj230301).
文摘Highway work zones are locations where severe traffic crashes tend to occur.Most of the extant research on work zone crash severity neglects the discrepancy in the injuries sustained by different drivers involved in the same crash.Admittedly,it is essential to analyse crash-level factors to their highest injury severity;but it is equally important to understand driver-level contributing factors to their injury severity to establish effective safety countermeasures to minimize drivers’injury severity.Thus,this research aims to identify the factors with significant impacts on the driver injury severity of work zone crashes and estimate their effects on each severity level.Data on 3880 drivers involved in 2134 work zone crashes are obtained from the Crash Report Sampling System(CRSS)database of the United States and employed for the empirical investigation.A Bayesian hierarchical generalized ordered probit model is advocated for analysing the driver injury severity.Model performance indices suggest that the advocated hierarchical model is superior to the generalized ordered probit model,and considerable within-crash correlation is found across the observed driver injury severity.The estimated parameters show that driver age and sex,alcohol use,vehicle age and type,speeding and speed limit,weather conditions,lighting conditions and crash type have significant effects on the driver injury severity in work zone crashes.Marginal effects of the significant factors on each divided injury severity level are also estimated.Countermeasures are proposed from the results to reduce severe injuries sustained by drivers involved in work zone crashes.
基金Foundation item:The National Key R&D Program of China under contract No.2017YFE0104400the National Natural Science Foundation of China under contract No.31772852the Fundamental Research Funds for the Central Universities under contract Nos 201512002 and 201562030.
文摘As one of the top four commercially important species in China,yellow croaker(Larimichthys polyactis)with two geographic subpopulations,has undergone profound changes during the last several decades.It is widely comprehended that understanding its population dynamics is critically important for sustainable management of this valuable fishery in China.The only two existing population dynamics models assessed the population of yellow croaker using short time-series data,without considering geographical variations.In this study,Bayesian models with and without hierarchical subpopulation structure were developed to explore the spatial heterogeneity of the population dynamics of yellow croaker from 1968 to 2015.Alternative hypotheses were constructed to test potential temporal patterns in yellow croaker’s population dynamics.Substantial variations in population dynamics characteristics among space and time were found through this study.The population growth rate was revealed to increase since the late 1980s,and the catchability increased more than twice from 1981 to 2015.The East China Sea’s subpopulation witnesses faster growth,but suffers from higher fishing pressure than that in the Bohai Sea and Yellow Sea.The global population and two subpopulations all have high risks of overfishing and being overfished according to the MSY-based reference points in recent years.More conservative management strategies with subpopulation considerations are imperative for the fishery management of yellow croaker in China.The methodology developed in this study could also be applied to the stock assessment and fishery management of other species,especially for those species with large spatial heterogeneity data.
文摘Indirect approaches to estimation of biomass factors are often applied to measure carbon flux in the forestry sector. An assumption underlying a country-level carbon stock estimate is the representativeness of these factors. Although intensive studies have been conducted to quantify biomass factors, each study typically covers a limited geographic area. The goal of this study was to employ a meta-analysis approach to develop regional bio- mass factors for Quercus mongolica forests in South Korea. The biomass factors of interest were biomass conversion and expansion factor (BCEF), biomass expansion factor (BEF) and root-to-shoot ratio (RSR). Our objectives were to select probability density functions (PDFs) that best fitted the three biomass factors and to quantify their means and uncertainties. A total of 12 scientific publications were selected as data sources based on a set of criteria. Fromthese publications we chose 52 study sites spread out across South Korea. The statistical model for the meta- analysis was a multilevel model with publication (data source) as the nesting factor specified under the Bayesian framework. Gamma, Log-normal and Weibull PDFs were evaluated. The Log-normal PDF yielded the best quanti- tative and qualitative fit for the three biomass factors. However, a poor fit of the PDF to the long right tail of observed BEF and RSR distributions was apparent. The median posterior estimates for means and 95 % credible intervals for BCEF, BEF and RSR across all 12 publica- tions were 1.016 (0.800-1.299), 1.414 (1.304-1.560) and 0.260 (0.200-0.335), respectively. The Log-normal PDF proved useful for estimating carbon stock of Q. mongolica forests on a regional scale and for uncertainty analysis based on Monte Carlo simulation.
基金This research was funded by the CSIRO ResearchPlus Science Leader Grant Program.
文摘Ore sorting is a preconcentration technology and can dramatically reduce energy and water usage to improve the sustainability and profitability of a mining operation.In porphyry Cu deposits,Cu is the primary target,with ores usually containing secondary‘pay’metals such as Au,Mo and gangue elements such as Fe and As.Due to sensing technology limitations,secondary and deleterious materials vary in correlation type and strength with Cu but cannot be detected simultaneously via magnetic resonance(MR)ore sorting.Inferring the relationships between Cu and other elemental abundances is particularly critical for mineral processing.The variations in metal grade relationships occur due to the transition into different geological domains.This raises two questions-how to define these geological domains and how the metal grade relationship is influenced by these geological domains.In this paper,linear relationship is assumed between Cu grade and other metal grades.We applies a Bayesian hierarchical(partial-pooling)model to quantify the linear relationships between Cu,Au,and Fe grades from geochemical bore core data.The hierarchical model was compared with two other models-‘complete-pooling’model and‘nopooling’model.Mining blocks were split based on spatial domain to construct hierarchical model.Geochemical bore core data records metal grades measured from laboratory assay with spatial coordinates of sample location.Two case studies from different porphyry Cu deposits were used to evaluate the performance of the hierarchical model.Markov chain Monte Carlo(MCMC)was used to sample the posterior parameters.Our results show that the Bayesian hierarchical model dramatically reduced the posterior predictive variance for metal grades regression compared to the no-pooling model.In addition,the posterior inference in the hierarchical model is insensitive to the choice of prior.The data is wellrepresented in the posterior which indicates a robust model.The results show that the spatial domain can be successfully utilised for metal grade regression.Uncertainty in estimating the relationship between pay metals and both secondary and gangue elements is quantified and shown to be reduced with partial-pooling.Thus,the proposed Bayesian hierarchical model can offer a reliable and stable way to monitor the relationship between metal grades for ore sorting and other mineral processing options.
基金supported by the Guangdong Basic and Applied Basic Research Foundation(2023A1515011244).
文摘The state of in situ stress is a crucial parameter in subsurface engineering,especially for critical projects like nuclear waste repository.As one of the two ISRM suggested methods,the overcoring(OC)method is widely used to estimate the full stress tensors in rocks by independent regression analysis of the data from each OC test.However,such customary independent analysis of individual OC tests,known as no pooling,is liable to yield unreliable test-specific stress estimates due to various uncertainty sources involved in the OC method.To address this problem,a practical and no-cost solution is considered by incorporating into OC data analysis additional information implied within adjacent OC tests,which are usually available in OC measurement campaigns.Hence,this paper presents a Bayesian partial pooling(hierarchical)model for combined analysis of adjacent OC tests.We performed five case studies using OC test data made at a nuclear waste repository research site of Sweden.The results demonstrate that partial pooling of adjacent OC tests indeed allows borrowing of information across adjacent tests,and yields improved stress tensor estimates with reduced uncertainties simultaneously for all individual tests than they are independently analysed as no pooling,particularly for those unreliable no pooling stress estimates.A further model comparison shows that the partial pooling model also gives better predictive performance,and thus confirms that the information borrowed across adjacent OC tests is relevant and effective.
文摘In genetic association studies of complex diseases, endo-phenotypes such as expression profiles, epigenetic data, or clinical intermediate-phenotypes provide insight to understand the underlying biological path of the disease. In such situations, in order to establish the path from the gene to the disease, we have to decide whether the gene acts on the disease phenotype primarily through a specific endo-phenotype or whether the gene influences the disease through an unidentified path which is characterized by different intermediate phenotypes. Here, we address the question that a genetic locus, given its effect on an endo-phenotype, influences the trait of interest primarily through the path of the endo-phenotype. We propose a Bayesian approach that can evaluate the genetic association between the genetic locus and the phenotype of interest in the presence of the genetic effect on the endo-phenotype. Using simulation studies, we verify that our approach has the desired properties and compare this approach with a mediation approach. The proposed Bayesian approach is illustrated by an application to genome-wide association study for childhood asthma (CAMP) that contains expression profiles.