The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this...The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far concentrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modeling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our approach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well-known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents.展开更多
目的当竞争风险存在时,基于限制平均损失时间(restricted mean time lost,RMTL)的方法具有较少的模型假设条件和更直观的解释性。组间效应量为RMTL差值(RMTL difference,RMTLd),对应假设检验基于大样本下构建,而在小样本假设下的表现效...目的当竞争风险存在时,基于限制平均损失时间(restricted mean time lost,RMTL)的方法具有较少的模型假设条件和更直观的解释性。组间效应量为RMTL差值(RMTL difference,RMTLd),对应假设检验基于大样本下构建,而在小样本假设下的表现效果未知。方法本文探讨RMTLd在小样本下的表现,并发展了几种RMTL的变量转换法以提高此时的统计性能,且通过Monte Carlo模拟评价它们在不同情形下的Ⅰ类错误和检验效能。结果在小样本下,RMTLd检验原方法存在Ⅰ类错误膨胀的现象,而四种转换法之一的逻辑转换法能够保持较好的统计性能。结论在分析小样本竞争风险数据时,推荐使用RMTL的逻辑转换进行统计分析。展开更多
文摘The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far concentrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modeling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our approach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well-known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents.
文摘目的当竞争风险存在时,基于限制平均损失时间(restricted mean time lost,RMTL)的方法具有较少的模型假设条件和更直观的解释性。组间效应量为RMTL差值(RMTL difference,RMTLd),对应假设检验基于大样本下构建,而在小样本假设下的表现效果未知。方法本文探讨RMTLd在小样本下的表现,并发展了几种RMTL的变量转换法以提高此时的统计性能,且通过Monte Carlo模拟评价它们在不同情形下的Ⅰ类错误和检验效能。结果在小样本下,RMTLd检验原方法存在Ⅰ类错误膨胀的现象,而四种转换法之一的逻辑转换法能够保持较好的统计性能。结论在分析小样本竞争风险数据时,推荐使用RMTL的逻辑转换进行统计分析。