In this paper, we consider a general form of the increments for a two-parameter Wiener process. Both the Csorgo-Revesz's increments and a class of the lag increments are the special cases of this general form of i...In this paper, we consider a general form of the increments for a two-parameter Wiener process. Both the Csorgo-Revesz's increments and a class of the lag increments are the special cases of this general form of increments. Our results imply the theorem that have been given by Csorgo and Revesz (1978), and some of their conditions are removed.展开更多
Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream p...Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.展开更多
This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be infe...This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonpara-metric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic rela-tionships compared to the hierarchical latent Dirichlet allocation model.展开更多
In this paper, Spike-and-Slab Dirichlet Process (SS-DP) priors are introduced and discussed for non-parametric Bayesian modeling and inference, especially in the mixture models context. Specifying a spike-and-slab bas...In this paper, Spike-and-Slab Dirichlet Process (SS-DP) priors are introduced and discussed for non-parametric Bayesian modeling and inference, especially in the mixture models context. Specifying a spike-and-slab base measure for DP priors combines the merits of Dirichlet process and spike-and-slab priors and serves as a flexible approach in Bayesian model selection and averaging. Computationally, Bayesian Expectation-Maximization (BEM) is utilized to obtain MAP estimates. Two simulated examples in mixture modeling and time series analysis contexts demonstrate the models and computational methodology.展开更多
Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections.Large-sc...Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections.Large-scale time consumption is certain to scan and analyze to retrieve the most relevant textual data item from all the documents required a sophisticated technique for a query against the document collection.It is always challenging to retrieve a more accurate and fast retrieval from a large collection.Text summarization is a dominant research field in information retrieval and text processing to locate the most appropriate data object as single or multiple documents from the collection.Machine learning and knowledge-based techniques are the two query-based extractive text summarization techniques in Natural Language Processing(NLP)which can be used for precise retrieval and are considered to be the best option.NLP uses machine learning approaches for both supervised and unsupervised learning for calculating probabilistic features.The study aims to propose a hybrid approach for query-based extractive text summarization in the research study.Text-Rank Algorithm is used as a core algorithm for the flow of an implementation of the approach to gain the required goals.Query-based text summarization of multiple documents using a hybrid approach,combining the K-Means clustering technique with Latent Dirichlet Allocation(LDA)as topic modeling technique produces 0.288,0.631,and 0.328 for precision,recall,and F-score,respectively.The results show that the proposed hybrid approach performs better than the graph-based independent approach and the sentences and word frequency-based approach.展开更多
针对高铁列车运行数据中异常样本难以实时识别和聚类结构随数据演化动态变化等问题,本文提出一种基于狄利克雷过程混合模型的后验归类式增量聚类与异常检测方法(Posterior Classification-based Incremental Dirichlet Process Mixture ...针对高铁列车运行数据中异常样本难以实时识别和聚类结构随数据演化动态变化等问题,本文提出一种基于狄利克雷过程混合模型的后验归类式增量聚类与异常检测方法(Posterior Classification-based Incremental Dirichlet Process Mixture Model,PC-IDPMM)。该方法在离线阶段构建聚类模型并识别异常样本,在线阶段结合后验概率快速归类新样本,并通过密度聚类提取新结构,实现模型的结构扩展与参数更新。为验证模型性能,本文基于广深高铁实测数据开展实验。结果表明:PC-IDPMM在保持聚类结构一致性的同时,实现主簇统计特征的稳定更新,AUC(Area Under the Curve)达90.55%,优于多种离线方法;计算效率方面,训练时间与内存消耗较离线模型分别减少约85%和80%。此外,PC-IDPMM可基于列车前序站点数据实现实时异常预警,辅助调度系统在延误初期干预,将累计晚点由572 min降至320 min,实验结果验证了该方法在高频数据环境下的实时性与应用价值。展开更多
基金Supported by the National Natural Science Foundation of ChinaZhejiang Province Natural Science Fund
文摘In this paper, we consider a general form of the increments for a two-parameter Wiener process. Both the Csorgo-Revesz's increments and a class of the lag increments are the special cases of this general form of increments. Our results imply the theorem that have been given by Csorgo and Revesz (1978), and some of their conditions are removed.
文摘Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.
基金Project (No. 60773180) supported by the National Natural Science Foundation of China
文摘This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonpara-metric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic rela-tionships compared to the hierarchical latent Dirichlet allocation model.
文摘In this paper, Spike-and-Slab Dirichlet Process (SS-DP) priors are introduced and discussed for non-parametric Bayesian modeling and inference, especially in the mixture models context. Specifying a spike-and-slab base measure for DP priors combines the merits of Dirichlet process and spike-and-slab priors and serves as a flexible approach in Bayesian model selection and averaging. Computationally, Bayesian Expectation-Maximization (BEM) is utilized to obtain MAP estimates. Two simulated examples in mixture modeling and time series analysis contexts demonstrate the models and computational methodology.
文摘Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections.Large-scale time consumption is certain to scan and analyze to retrieve the most relevant textual data item from all the documents required a sophisticated technique for a query against the document collection.It is always challenging to retrieve a more accurate and fast retrieval from a large collection.Text summarization is a dominant research field in information retrieval and text processing to locate the most appropriate data object as single or multiple documents from the collection.Machine learning and knowledge-based techniques are the two query-based extractive text summarization techniques in Natural Language Processing(NLP)which can be used for precise retrieval and are considered to be the best option.NLP uses machine learning approaches for both supervised and unsupervised learning for calculating probabilistic features.The study aims to propose a hybrid approach for query-based extractive text summarization in the research study.Text-Rank Algorithm is used as a core algorithm for the flow of an implementation of the approach to gain the required goals.Query-based text summarization of multiple documents using a hybrid approach,combining the K-Means clustering technique with Latent Dirichlet Allocation(LDA)as topic modeling technique produces 0.288,0.631,and 0.328 for precision,recall,and F-score,respectively.The results show that the proposed hybrid approach performs better than the graph-based independent approach and the sentences and word frequency-based approach.
文摘针对高铁列车运行数据中异常样本难以实时识别和聚类结构随数据演化动态变化等问题,本文提出一种基于狄利克雷过程混合模型的后验归类式增量聚类与异常检测方法(Posterior Classification-based Incremental Dirichlet Process Mixture Model,PC-IDPMM)。该方法在离线阶段构建聚类模型并识别异常样本,在线阶段结合后验概率快速归类新样本,并通过密度聚类提取新结构,实现模型的结构扩展与参数更新。为验证模型性能,本文基于广深高铁实测数据开展实验。结果表明:PC-IDPMM在保持聚类结构一致性的同时,实现主簇统计特征的稳定更新,AUC(Area Under the Curve)达90.55%,优于多种离线方法;计算效率方面,训练时间与内存消耗较离线模型分别减少约85%和80%。此外,PC-IDPMM可基于列车前序站点数据实现实时异常预警,辅助调度系统在延误初期干预,将累计晚点由572 min降至320 min,实验结果验证了该方法在高频数据环境下的实时性与应用价值。