数据流分类方法研究在开放环境下的模型动态更新,以期从实时到达且不断变化的数据流中检测并适应概念演化,目前多数数据流分类方法通常假设数据流中样本的类别数是固定的,并且样本的标签可以不受限制地获取,这在真实场景下是不现实的。...数据流分类方法研究在开放环境下的模型动态更新,以期从实时到达且不断变化的数据流中检测并适应概念演化,目前多数数据流分类方法通常假设数据流中样本的类别数是固定的,并且样本的标签可以不受限制地获取,这在真实场景下是不现实的。为此,该文提出了一种概念演化数据流主动学习方法(Active Learning Method for Concept Evolution Data Stream,ALM-CEDS)。定义基于样本标准差的基分类器重要性度量,提出基于加权预测概率的样本预测方法,提升分类器的分类性能;提出基于混合标签查询策略的分类器更新方法,使用难区分和代表当前数据分布的样本更新分类器;提出基于微簇q-近邻轮廓系数的新类检测方法,在数据流中快速识别新类。在4个真实数据流与5个合成数据流上的对比实验表明,该概念演化数据流主动学习方法在分类性能上优于已有的6种数据流学习方法。展开更多
Discriminant space defining area classes is an important conceptual construct for uncertainty characterization in area-class maps.Discriminant models were promoted as they can enhance consistency in area-class mapping...Discriminant space defining area classes is an important conceptual construct for uncertainty characterization in area-class maps.Discriminant models were promoted as they can enhance consistency in area-class mapping and replicability in error modeling.As area classes are rarely completely separable in empirically realized discriminant space,where class inseparabil-ity becomes more complicated for change categorization,we seek to quantify uncertainty in area classes(and change classes)due to measurement errors and semantic discrepancy separately and hence assess their relative margins objectively.Experiments using real datasets were carried out,and a Bayesian method was used to obtain change maps.We found that there are large differences be-tween uncertainty statistics referring to data classes and information classes.Therefore,uncertainty characterization in change categorization should be based on discriminant modeling of measurement errors and semantic mismatch analysis,enabling quanti-fication of uncertainty due to partially random measurement errors,and systematic categorical discrepancies,respectively.展开更多
文摘数据流分类方法研究在开放环境下的模型动态更新,以期从实时到达且不断变化的数据流中检测并适应概念演化,目前多数数据流分类方法通常假设数据流中样本的类别数是固定的,并且样本的标签可以不受限制地获取,这在真实场景下是不现实的。为此,该文提出了一种概念演化数据流主动学习方法(Active Learning Method for Concept Evolution Data Stream,ALM-CEDS)。定义基于样本标准差的基分类器重要性度量,提出基于加权预测概率的样本预测方法,提升分类器的分类性能;提出基于混合标签查询策略的分类器更新方法,使用难区分和代表当前数据分布的样本更新分类器;提出基于微簇q-近邻轮廓系数的新类检测方法,在数据流中快速识别新类。在4个真实数据流与5个合成数据流上的对比实验表明,该概念演化数据流主动学习方法在分类性能上优于已有的6种数据流学习方法。
基金Supported by the National Natural Science Foundation of China (No.41171346,No. 41071286)the Fundamental Research Funds for the Central Universities (No. 20102130103000005)the National 973 Program of China (No. 2007CB714402‐5)
文摘Discriminant space defining area classes is an important conceptual construct for uncertainty characterization in area-class maps.Discriminant models were promoted as they can enhance consistency in area-class mapping and replicability in error modeling.As area classes are rarely completely separable in empirically realized discriminant space,where class inseparabil-ity becomes more complicated for change categorization,we seek to quantify uncertainty in area classes(and change classes)due to measurement errors and semantic discrepancy separately and hence assess their relative margins objectively.Experiments using real datasets were carried out,and a Bayesian method was used to obtain change maps.We found that there are large differences be-tween uncertainty statistics referring to data classes and information classes.Therefore,uncertainty characterization in change categorization should be based on discriminant modeling of measurement errors and semantic mismatch analysis,enabling quanti-fication of uncertainty due to partially random measurement errors,and systematic categorical discrepancies,respectively.