We propose Mixed-Precision Multibranch Network(M+MNet)to compensate for the neglect of background information in image aesthetics assessment(IAA)while providing strategies for overcoming the dilemma between training c...We propose Mixed-Precision Multibranch Network(M+MNet)to compensate for the neglect of background information in image aesthetics assessment(IAA)while providing strategies for overcoming the dilemma between training costs and performance.First,two exponentially weighted pooling methods are used to selectively boost the extraction of background and salient information during downsampling.Second,we propose Corner Grid,an unsupervised data augmentation method that leverages the diffusive characteristics of convolution to force the network to seek more relevant background information.Third,we perform mixed-precision training by switching the precision format,thus significantly reducing the time and memory consumption of data representation and transmission.Most of our methods specifically designed for IAA tasks have demonstrated generalizability to other IAA works.For performance verification,we develop a large-scale benchmark(the most comprehensive thus far)by comparing 17 methods with M+MNet on two representative datasets:the Aesthetic Visual Analysis(AVA)dataset and FLICKR-Aesthetic Evaluation Subset(FLICKR-AES).M+MNet achieves state-of-the-art performance on all tasks.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62502040the ZTE Industry-University-Institute Cooperation Funds under Grant No.IA20230700001.
文摘We propose Mixed-Precision Multibranch Network(M+MNet)to compensate for the neglect of background information in image aesthetics assessment(IAA)while providing strategies for overcoming the dilemma between training costs and performance.First,two exponentially weighted pooling methods are used to selectively boost the extraction of background and salient information during downsampling.Second,we propose Corner Grid,an unsupervised data augmentation method that leverages the diffusive characteristics of convolution to force the network to seek more relevant background information.Third,we perform mixed-precision training by switching the precision format,thus significantly reducing the time and memory consumption of data representation and transmission.Most of our methods specifically designed for IAA tasks have demonstrated generalizability to other IAA works.For performance verification,we develop a large-scale benchmark(the most comprehensive thus far)by comparing 17 methods with M+MNet on two representative datasets:the Aesthetic Visual Analysis(AVA)dataset and FLICKR-Aesthetic Evaluation Subset(FLICKR-AES).M+MNet achieves state-of-the-art performance on all tasks.
文摘目的 为了更好地实现轻量化的人体姿态估计,在轻量级模型极为有限的资源下实现更高的检测性能。基于高分辨率网络(high resolution network,HRNet)提出了结合密集连接网络的轻量级高分辨率人体姿态估计网络(lightweight high-resolution human estimation combined with densely connected network,LDHNet)。方法 通过重新设计HRNet中的阶段分支结构以及提出新的轻量级特征提取模块,构建了轻量高效的特征提取单元,同时对多分支之间特征融合部分进行了轻量化改进,进一步降低模型的复杂度,最终大幅降低了模型的参数量与计算量,实现了轻量化的设计目标,并且保证了模型的性能。结果 实验表明,在MPII(Max Planck Institute for Informatics)测试集上相比于自顶向下的轻量级人体姿态估计模型LiteHRNet,LDHNet仅通过增加少量参数量与计算量,平均预测准确度即提升了1.5%,与LiteHRNet的改进型DiteHRNet相比也提升了0.9%,在COCO(common objects in context)验证集上的结果表明,与LiteHRNet相比,LDHNet的平均检测准确度提升了3.4%,与DiteHRNet相比也提升了2.3%,与融合Transformer的HRFormer相比,LDHNet在参数量和计算量都更低的条件下有近似的检测性能,在面对实际场景时LDHNet也有着稳定的表现,在同样的环境下LDHNet的推理速度要高于基线HRNet以及LiteHRNet等。结论 该模型有效实现了轻量化并保证了预测性能。