期刊文献+

符号序列的预训练HMM分类方法 被引量:3

A pre⁃training HMM classification method for symbolic sequences
在线阅读 下载PDF
导出
摘要 隐马尔可夫模型(Hidden Markov Model,HMM)是一种双重随机概率模型,已广泛应用于序列数据建模.针对符号序列分类中距离度量定义的困难,提出一种符号序列的预训练HMM分类新方法.首先,定义一种基于HMM状态转移矩阵的序列距离新度量;其次,为得到不同序列在HMM隐状态共享条件下的状态转移矩阵,提出一种两阶段的预训练方法,先在所有序列上进行HMM预训练以学习所有序列共享的隐状态,再使用共享状态为每条序列进行训练得到各自的状态转移矩阵;最后用近邻分类器对符号序列进行基于距离的分类.在三个应用领域的真实序列上进行了实验,并与基于子序列、HMM变体模型等现有分类方法进行对比,结果表明,所提出的方法能使用较低的特征维度取得较理想的分类精度. Hidden Markov Model(HMM)is a doubly stochastic probability model,which has been widely used in sequence data modeling.To address the problems of the difficulty in defining distance measure for symbolic sequences,a pre⁃training HMM method is proposed for symbolic sequence classification.First,a new distance measure for sequences is defined based on the state transition matrix of HMM.Second,in order to obtain the state transition matrix for different sequences with regard to the same HMM hidden states,a pre⁃training method is proposed.The method is divided into two stages.In the first stage called the pre⁃training stage,all sequences are used to train a unique HMM to learn the hidden states they shared,which is then used in the second stage to train each sequence to obtain its own state transition matrix.Finally,the nearest neighbor classifier is applied to classify symbolic sequences based on proposed distance measure.Experiments were carried out on real⁃world sequence sets from three domains,and the results showed that the proposed method can achieve higher classification accuracy with fewer features,compared with existing methods including the subsequence⁃based and HMM⁃based methods.
作者 陈炳鑫 陈黎飞 Chen Bingxin;Chen Lifei(College of Mathematics and Informatics,Fujian Normal University,Fuzhou,350117,China;Digital Fujian Internet-of-Things Laboratory of Environmental Monitoring,Fujian Normal University,Fuzhou,350117,China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2021年第1期52-58,共7页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(U1805263,61672157) 福建师范大学创新团队资助项目(IRTL1704)。
关键词 符号序列 序列距离度量 预训练HMM 特征表示 分类 symbolic sequences distance measurement of sequences pre⁃training HMM feature representation classification
  • 相关文献

参考文献3

二级参考文献41

  • 1熊刚,孟姣,曹自刚,王勇,郭莉,方滨兴.网络流量分类研究进展与展望[J].集成技术,2012,1(1):32-42. 被引量:25
  • 2Manning A, Brass C, Goble, Keane J. Clustering techniques in biological sequence analysis. In First European Symposium on Principles of Data Mining und Knowledge Discovery, 1997. 315-322
  • 3Vijaya P A, Murty M N, Subramanian D K. An Efficient Technique for Protein Sequence Clustering and Classification. In:Proc.of the 17^th Intl. Conf. on Pattern Recognition,2004
  • 4Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989,77(2):257-286
  • 5Krogh A. An Introduction to Hidden Markov Models for Biological Sequences. In Computational Methods in Molecular Biology.Elsevier, 1998.45-63
  • 6Smyth P. Clustering sequences with hidden Markov models. Advances in Neural Information Processing Systems 9, 1997. 648-654
  • 7Owsley L, Atlas L, Bernard G. Self organizing feature maps and hidden Markov models for machine-tool monitoring. IEEE Transactions on Signal Processing, 1997
  • 8Cadez I, Heckerman D. Model Based Clustering and Visualization of Navigation Patterns on a Web Site. Data Mining and Knowledge Discovery,2003,7:399-424
  • 9Li C, Biswas G. Temporal pattern generation using hidden Markov model based unsupervised classification. In: Proc. of the Third International Symposium on Intelligent Data Analysis, 1999
  • 10Juang B H,Rabiner L R. A probabilistic distance measure for hidden Markov models. AT& T Tech. J, 1985,64(2) : 391-408

共引文献13

同被引文献14

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部