期刊文献+

贝叶斯推断在MCDB分布式平台上的实现 被引量:1

Implementation of Bayesian Inference on MCDB Distributed System
在线阅读 下载PDF
导出
摘要 提出了应用贝叶斯统计方法在分布式数据库MCDB上处理超大规模数据的实现方法,并以贝叶斯线性回归、话题模型的LDA和狄利克雷过程的聚类算法为例进行了论证。用户可以通过SQL语言定义变量之间的关系进行模拟。探索了一种使用简洁的SQL设计大规模统计学习系统的方法,其利用MCDB能够自动解决并行化和资源优化问题,以获得高性能的并行处理能力。 This paper described how the Monte Carlo database system (MCDB) can be used to easily implement Baye- sian inference via Markov chain Monte Carlo (MCMC) over very large datasets. Linear Bayesian regression, LDA and Dirichlet clustering were used as examples to demonstrate this task. To implement an MCMC simulation in MCDB, a programmer specifies dependencies among variables and how they parameterize one another using the SQL language. This paper devised a simple scheme for developing large scale machine learning systems with SQL,whieh with the help of MCDB, can automaticly deal with parallelization and optimization problems, to achieve high efficiency in computation.
出处 《计算机科学》 CSCD 北大核心 2013年第6期256-259,287,共5页 Computer Science
基金 国家自然科学基金(61272539)资助
关键词 贝叶斯推断 并行算法 SQL 分布式系统 Bayesian inference, Parallel algorithms, SQL, Distributed system
  • 相关文献

参考文献11

  • 1Drost I,Dunning T,Eastman J,et al.Introduction to Apache Mahout[Z].mahout.apache.org.2011.
  • 2Lunn D,Spiegelhalter D,Thomas A,et al.The BUGS project:Evolution,critique and future directions[J].Statist.Med.,2009,28(25):3049-3067.
  • 3Jampani R,Xu Fei,Wu Ming-xi,et al.The Monte Carlo Database System:Stochastic analysis close to the data[J].ACM Trans.Database Syst.,2011,36(3):18.
  • 4Singh S,Subramanya A,Pereira F,et al.Distributed MAP inference for undirected graphical models[C]//Neural Information Processing Systems (NIPS),Workshop on Learning on Cores,Clusters and Clouds.2010.
  • 5Cai Z,Vagena Z,Jermaine C,et al.Very Large Scale Bayesian Inference Using MCDB[C]//Big Learn Workshop,Neural Information Processing Systems.2011.
  • 6Blei D M,Ng A Y,Jordan M I.Latent Dirichlet Allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
  • 7Porteous I,Newman D,Ihler A T,et al.Fast collapsed Gibbs sampling for Latent Dirichlet Allocation[C]// ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2008:569-577.
  • 8Liu Zhi-yuan,Zhang Yu-zhou,Chang E Y,et al.Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Processing[J].ACM Transactions on Intelligent Systems and Technology,special issue on Large Scale Machine Learning,2011,2 (3):26.
  • 9Smola A J,Narayanamurthy S.An Architecture for Parallel Topic models[J].The Proceedings of the VLDB Endowment,2010,3 (1):703-710.
  • 10Newman D,Asuncion A,Smyth P.et al,Distributed Inference for Latent Dirichlet Allocation[C]// Neural Information Processing Systems.2007.

二级参考文献18

共引文献14

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部