摘要
频繁模式挖掘(Frequent Pattern Mining,FPM)是图数据挖掘领域中的关键任务之一,目标是从大规模图数据中挖掘支持度高于预设阈值的模式.受评估指标单一且忽略主观偏好的制约,FPM往往面临挖掘结果与用户需求匹配度较低的问题.因此,文中提出可信伪标签增强的模式兴趣主动学习评估框架(Certified Pseudo-label Enhanced Active Learning Framework for Pattern Interest Evaluation,CPALF),旨在通过少量用户交互,精准预测用户对模式的主观偏好.CPALF采用主动学习策略,通过人机交互高效收集用户偏好,实现对用户兴趣的建模.考虑到仅依赖有限的标注数据训练模型面临的诸多挑战,CPALF进一步融合半监督学习机制,面向未标记数据生成带有可信伪标签的训练样本,在降低对标注数据依赖的同时,显著提升模型预测效果.实验表明,CPALF能高效捕捉用户的主观偏好,并在少量标注数据的情况下,获得较高的预测准确率.
Frequent pattern mining(FPM)is one of the key tasks of graph data mining.The objective of FPM is to extract patterns with support values higher than predefined thresholds from large-scale graph data.However,constrained by single-dimensional evaluation metrics and neglect of subjective preferences,traditional FPM methods often fail to align mining results with the expectations of users.To address this issue,a certified pseudo-label enhanced active learning framework for pattern interest evaluation(CPALF)is proposed.CPALF is designed to accurately predict subjective pattern preferences of users through minimal human interaction.An active learning strategy is employed to efficiently collect the preferences of users via human-computer interaction.CPALF incorporates semi-supervised learning to generate high-confidence pseudo-labeled training samples from unlabeled data,thereby significantly improving prediction performance while reducing annotation dependency.Experiments demonstrate that CPALF effectively captures the preferences of users with high prediction accuracy under limited labeled data.
作者
王甜
王璐
谢文波
王欣
WANG Tian;WANG Lu;XIE Wenbo;WANG Xin(School of Computer Science and Software Engineer,Southwest Petroleum University,Chengdu 610500)
出处
《模式识别与人工智能》
北大核心
2025年第8期699-713,共15页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金面上项目(No.62172102)
四川省自然科学基金项目(No.2024NSFSC1464)
四川省科技创新人才基金项目(No.2022JDRC0009)
西南石油大学自然科学“启航计划”项目(No.2023QHZ010)资助。
关键词
主动学习
人机交互
频繁模式挖掘(FPM)
可信伪标签
半监督学习
Active Learning
Human-Computer Interaction
Frequent Pattern Mining(FPM)
Certified Pseudo Label
Semi-supervised Learning