期刊文献+

基于加速扩散模型的缺失值插补算法

Missing value imputation algorithm based on accelerated diffusion model
在线阅读 下载PDF
导出
摘要 为了解决表格数据中数据缺失对后续任务产生的不利影响,提出使用扩散模型进行缺失值插补的方法.针对原始扩散模型在生成过程中耗时过长的问题,设计基于加速扩散模型的数据插补方法(PNDM_Tab).扩散模型的前向过程通过高斯加噪方法实现,采用基于扩散模型的伪数值方法进行反向过程加速.使用U-Net与注意力机制相结合的网络结构从数据中高效提取显著特征,实现噪声的准确预测.为了使模型在训练阶段有监督目标,使用随机掩码处理训练数据以生成新的缺失数据.在9个数据集中的插补方法对比实验结果表明:相较其他插补方法,PNDM_Tab在6个数据集中的均方根误差最低.实验结果证明,相较于原始的扩散模型,反向过程使用扩散模型的伪数值方法能够在减少采样步数的同时保持生成性能不变. To address the adverse effects of missing data in tabular data on subsequent tasks,a method for imputation using diffusion models was proposed.An accelerated diffusion model-based imputation method(PNDM_Tab)was designed aiming at the problem that the original diffusion models being time-consuming during the generation process.The forward process of the diffusion model was realized through Gaussian noise addition,and the pseudo-numerical methods derived from diffusion models were employed to achieve acceleration of the reverse process.Using a network structure combining U-Net with attention mechanisms,significant features were extracted efficiently from the data to predict noise accurately.To provide supervised targets during the training phase,random masking of the training data generated new missing data.Comparative experiments were conducted in nine datasets,and the results showed that PNDM_Tab achieved the lowest root mean square error in six datasets compared to other imputation methods.Experimental results demonstrate that,compared to the original diffusion models,the use of pseudo-numerical methods in the reverse process can reduce the number of sampling steps while maintaining equivalent generative performance.
作者 王圣举 张赞 WANG Shengju;ZHANG Zan(School of Electronics and Control Engineering,Chang’an University,Xi’an 710064,China)
出处 《浙江大学学报(工学版)》 北大核心 2025年第7期1471-1480,1503,共11页 Journal of Zhejiang University(Engineering Science)
关键词 表格数据 扩散模型 数据插补 注意力机制 深度学习 tabular data diffusion model data imputation attention mechanism deep learning
  • 相关文献

参考文献1

二级参考文献5

  • 1L.基什.抽样调查[M].北京:中国统计出版社,1997.
  • 2[美]Donald.B.Rubin. Multiple Imputation For Nonresponse In Surveys [M], New York :John Wiley & Sons Inc.1987.
  • 3[美]Roderick J. A. Little, Donald B. Rubin. Statistical Analysis with Missing Data [M], New York :John Wiley & Sons Inc.2002.
  • 4金进等编著.抽样技术[M].北京:中国统计出版社,2008.
  • 5庞新生.缺失数据处理方法的比较[J].统计与决策,2010,26(24):152-155. 被引量:28

共引文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部