Learning-based multiple view stereo has gained significant attention recently.However,most methods rely on direct network supervision using provided ground-truth depth,which poses three inherent problems:resolution-de...Learning-based multiple view stereo has gained significant attention recently.However,most methods rely on direct network supervision using provided ground-truth depth,which poses three inherent problems:resolution-dependent ground-truth artifacts,excessively challenging training examples(with relatively featureless textures),and use of less-viewed reference pixels for supervision,all of which hinder network optimization.To alleviate these problems,we propose an accurate network supervision paradigm that includes a ground-truth mask,an entropy mask,and a consistency mask,which provide more accurate supervision signals to aid network optimization.展开更多
基金supported by the National Natural Science Foundation of China(Nos.U22B2055,62273345,and 62222302)the Beijing Natural Science Foundation(No.L223003)a Key R&D Project of Henan Province(No.231111210300).
文摘Learning-based multiple view stereo has gained significant attention recently.However,most methods rely on direct network supervision using provided ground-truth depth,which poses three inherent problems:resolution-dependent ground-truth artifacts,excessively challenging training examples(with relatively featureless textures),and use of less-viewed reference pixels for supervision,all of which hinder network optimization.To alleviate these problems,we propose an accurate network supervision paradigm that includes a ground-truth mask,an entropy mask,and a consistency mask,which provide more accurate supervision signals to aid network optimization.