摘要
Intrinsic decomposition,the process of decomposing an image into reflectance and shading,is widely used in virtual and augmented reality tasks.Reflectance and shading often exhibit large gradients at the object edges,and the intrinsic properties on the same object tend to be similar.This spatial coherence is closely related to semantic consistency because objects within the same semantic category often exhibit similar intrinsic properties.Therefore,incorporating semantic segmentation into a deep intrinsic decomposition framework helps the network distinguish between different object instances and understand high-level scene structures.To this end,we design an intrinsic decomposition network jointly trained with a dedicated semantic segmentation module,allowing semantic cues to enhance the decomposition of reflectance and shading.The semantic module provides guidance during training but is removed during inference,improving performance without increasing the inference cost.Additionally,to capture the global contextual dependencies critical for intrinsic decomposition,we adopt a Transformer-based backbone.The proposed backbone enables the model to associate distant regions with similar material properties,thereby maintaining consistency in reflectance and learning smooth illumination patterns across a scene.A convolutional decoder is also designed to output predictions with improved details.Experiments demonstrate that our approach achieves state-of-the-art performance in the quantitative evaluations on the Intrinsic Images in the Wild(IIW)and Shading Annotations in the wild(SAW)datasets.
基金
Supported by Science and Technology Innovation 2030:Major Project of“New Generation Artificial Intelligence”(No.2022ZD0115901)
the National Natural Science Foundation of China(No.62332003).