摘要
近年来,随着机器学习、深度学习等技术的发展,多标签分类技术已趋于成熟。然而,现有多标签分类方法往往假设数据是易获取的完整数据,在真实场景中,该假设往往存在局限性,很多数据的获取往往需要一定的代价。为此,面向真实场景中的不完整数据,针对数据需有偿获取的特性,提出基于深度强化学习的个性化多标签分类框架(Reinforcement Learning based Personalized Multi-label Classification,RLPMC),包含特征编码器、特征选择器和多标签分类器三部分。首先,针对不完整数据存在缺失值的问题,设计基于集合编码的特征编码器将不定长的数据编码成定长的向量,输入到多标签分类器和特征选择器中;其次,设计基于深度强化学习的特征选择器学习个性化特征获取策略,平衡特征获取成本和分类准确性;然后,基于选择的特征,采用多标签分类方法实现准确的分类。最后,在合成和公开数据集上进行多组实验验证了方法的有效性。
In recent years,with the development of technologies such as machine learning and deep learning,multi-label classification techniques have become mature.However,existing multi-label classification methods often assume that data is readily available and complete.In real-world scenarios,this assumption is frequently limited,as acquir-ing many datasets can be costly.To address this,a deep reinforcement learning based personalized multi-label classification framework(RLPMC)is proposed for in-complete data in real scenarios,considering the cost-based nature of data acquisition.This framework includes a feature encoder,feature selector,and multi-label classifier.First,to address the issue of missing values in incomplete data,the feature encoder based on set embedding converts variable-length data into fixed-length vectors,which are inputted into the multi-label classifier and feature selector.Next,a feature selector based on deep reinforcement learning is designed to learn personalized feature acquisition strategies,balancing the cost of feature acquisition and classification accuracy.Then,based on the selected features,accurate classification is achieved using multi-label classification methods.Finally,multiple experiments on synthetic and public datasets validate the effectiveness of the approach.
作者
朱孟笑
段昊辰
岳昆
周锋
朱孟杰
ZHU Mengxiao;DUAN Haochen;YUE Kun;ZHOU Feng;ZHU Mengjie(School of Information Science and Engineering,North China University of Technology,Beijing 100144;School of Information Science and Engineering,Yunnan University,Kunming 650500;College of Medical Information Engineering,Jining Medical University,Jining 272067)
出处
《计算机与数字工程》
2025年第8期2057-2062,2088,共7页
Computer & Digital Engineering
基金
云南省智能系统与计算重点实验室项目(编号:202405AV340009)
北方工业大学毓秀创新项目(编号:2024NCUTYXCX202)
北方工业大学科研启动基金项目资助。
关键词
多标签分类
不完整数据
特征获取策略
深度强化学习
特征编码
multi-label classification
incomplete data
feature acquisition strategies
deep reinforcement learning
feature encoding