摘要
隐马尔可夫模型(Hidden Markov Model,HMM)是一种双重随机概率模型,已广泛应用于序列数据建模.针对符号序列分类中距离度量定义的困难,提出一种符号序列的预训练HMM分类新方法.首先,定义一种基于HMM状态转移矩阵的序列距离新度量;其次,为得到不同序列在HMM隐状态共享条件下的状态转移矩阵,提出一种两阶段的预训练方法,先在所有序列上进行HMM预训练以学习所有序列共享的隐状态,再使用共享状态为每条序列进行训练得到各自的状态转移矩阵;最后用近邻分类器对符号序列进行基于距离的分类.在三个应用领域的真实序列上进行了实验,并与基于子序列、HMM变体模型等现有分类方法进行对比,结果表明,所提出的方法能使用较低的特征维度取得较理想的分类精度.
Hidden Markov Model(HMM)is a doubly stochastic probability model,which has been widely used in sequence data modeling.To address the problems of the difficulty in defining distance measure for symbolic sequences,a pre⁃training HMM method is proposed for symbolic sequence classification.First,a new distance measure for sequences is defined based on the state transition matrix of HMM.Second,in order to obtain the state transition matrix for different sequences with regard to the same HMM hidden states,a pre⁃training method is proposed.The method is divided into two stages.In the first stage called the pre⁃training stage,all sequences are used to train a unique HMM to learn the hidden states they shared,which is then used in the second stage to train each sequence to obtain its own state transition matrix.Finally,the nearest neighbor classifier is applied to classify symbolic sequences based on proposed distance measure.Experiments were carried out on real⁃world sequence sets from three domains,and the results showed that the proposed method can achieve higher classification accuracy with fewer features,compared with existing methods including the subsequence⁃based and HMM⁃based methods.
作者
陈炳鑫
陈黎飞
Chen Bingxin;Chen Lifei(College of Mathematics and Informatics,Fujian Normal University,Fuzhou,350117,China;Digital Fujian Internet-of-Things Laboratory of Environmental Monitoring,Fujian Normal University,Fuzhou,350117,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2021年第1期52-58,共7页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(U1805263,61672157)
福建师范大学创新团队资助项目(IRTL1704)。
关键词
符号序列
序列距离度量
预训练HMM
特征表示
分类
symbolic sequences
distance measurement of sequences
pre⁃training HMM
feature representation
classification