With the rapid development of future network, there has been an explosive growth in multimedia data such as web images. Hence, an efficient image retrieval engine is necessary. Previous studies concentrate on the sing...With the rapid development of future network, there has been an explosive growth in multimedia data such as web images. Hence, an efficient image retrieval engine is necessary. Previous studies concentrate on the single concept image retrieval, which has limited practical usability. In practice, users always employ an Internet image retrieval system with multi-concept queries, but, the related existing approaches are often ineffective because the only combination of single-concept query techniques is adopted. At present semantic concept based multi-concept image retrieval is becoming an urgent issue to be solved. In this paper, a novel Multi-Concept image Retrieval Model(MCRM) based on the multi-concept detector is proposed, which takes a multi-concept as a whole and directly learns each multi-concept from the rearranged multi-concept training set. After the corresponding retrieval algorithm is presented, and the log-likelihood function of predictions is maximized by the gradient descent approach. Besides, semantic correlations among single-concepts and multiconcepts are employed to improve the retrieval performance, in which the semantic correlation probability is estimated with three correlation measures, and the visual evidence is expressed by Bayes theorem, estimated by Support Vector Machine(SVM). Experimental results on Corel and IAPR data sets show that the approach outperforms the state-of-the-arts. Furthermore, the model is beneficial for multi-concept retrieval and difficult retrieval with few relevant images.展开更多
数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主...数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.展开更多
基金supported by National Natural Science Foundation of China(Grant Nos.6137022961370178+4 种基金61272067)National Key Technology R&D Program(Grant No.2013BAH72B01)MOE-China Mobile Research Fund(Grant No.MCM20130651)the Natural Science Foundation of GDP(Grant No.S2013010015178)Science-Technology Project of GDED(Grant No.2012KJCX0037)
文摘With the rapid development of future network, there has been an explosive growth in multimedia data such as web images. Hence, an efficient image retrieval engine is necessary. Previous studies concentrate on the single concept image retrieval, which has limited practical usability. In practice, users always employ an Internet image retrieval system with multi-concept queries, but, the related existing approaches are often ineffective because the only combination of single-concept query techniques is adopted. At present semantic concept based multi-concept image retrieval is becoming an urgent issue to be solved. In this paper, a novel Multi-Concept image Retrieval Model(MCRM) based on the multi-concept detector is proposed, which takes a multi-concept as a whole and directly learns each multi-concept from the rearranged multi-concept training set. After the corresponding retrieval algorithm is presented, and the log-likelihood function of predictions is maximized by the gradient descent approach. Besides, semantic correlations among single-concepts and multiconcepts are employed to improve the retrieval performance, in which the semantic correlation probability is estimated with three correlation measures, and the visual evidence is expressed by Bayes theorem, estimated by Support Vector Machine(SVM). Experimental results on Corel and IAPR data sets show that the approach outperforms the state-of-the-arts. Furthermore, the model is beneficial for multi-concept retrieval and difficult retrieval with few relevant images.
文摘数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.