期刊文献+

基于扩散模型多模态提示的电力人员行为图像生成

Image generation for power personnel behaviors based on diffusion model with multimodal prompts
在线阅读 下载PDF
导出
摘要 电力人员行为的特殊性与复杂性导致其图像数据稀缺,给数据驱动下的行为识别带来了挑战.在稳定扩散模型的基础上,充分融合人体骨架、掩膜以及文本描述信息,加入关键点损失函数,建立多模态条件控制的电力人员行为图像生成模型PoseNet,该模型可以生成高质量的可控人体图像.设计基于关键点相似度的图像滤波器,以去除错误、低质量的生成图像;采用双阶段训练策略,在通用数据上对模型进行预训练,并在私有数据上微调,提升模型性能;针对电力人员行为特点,设计集通用、专用评价指标于一体的生成图像评价指标集,分析不同评价指标下的图像生成效果.实验结果表明,与主流人体生成模型ControlNet、HumanSD相比,该模型的生成结果更精准、真实、效果更优. A multimodal conditional-control image generation model PoseNet for power personnel behaviors was established to address the challenges posed to data-driven behavior identification due to the scarcity of image data caused by the unique and complex nature of power personnel behaviors.On the basis of the stable diffusion model,the human skeleton,mask and text description information were fully integrated,and the key point loss function was added to the model,enabling the model to generate high-quality and controllable human body images.An image filter based on the similarity of the key points was designed to remove the erroneous and low-quality generated images,and the two-stage training strategy was used to pre-train the model on the generic data and fine-tune the model on the private data to improve the model performance.For the behavioral characteristics of the power personnel,a set of evaluation metrics for generating images integrating the generic and specialized evaluation metrics was designed,and the image generation performance under different evaluation metrics was analyzed.The experimental results showed that compared with the mainstream human generation models ControlNet and HumanSD,this model achieved more accurate,realistic and superior results.
作者 朱志航 闫云凤 齐冬莲 ZHU Zhihang;YAN Yunfeng;QI Donglian(College of Electrical Engineering,Zhejiang University,Hangzhou 310027,China;Hainan Institute of Zhejiang University,Sanya 572025,China)
出处 《浙江大学学报(工学版)》 北大核心 2026年第1期43-51,70,共10页 Journal of Zhejiang University(Engineering Science)
关键词 条件图像生成模型 数据扩充 人体关键点 图像分割 扩散模型 深度学习 conditional image generation model data augmentation human body keypoint image segmentation diffusion model deep learning
  • 相关文献

参考文献13

二级参考文献259

共引文献350

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部