摘要
针对目前单目图像在深度估计中依然存在边缘以及深度最大区域预测不准确的问题,提出了一种基于金字塔分割注意力网络的单目深度估计方法(PS-Net)。首先,PS-Net以边界引导和场景聚合网络(BS-Net)为基础,引入金字塔分割注意力(PSA)模块处理多尺度特征的空间信息并且有效建立多尺度通道注意力间的长期依赖关系,从而提取深度梯度变化剧烈的边界和深度最大的区域;然后,使用Mish函数作为解码器中的激活函数,以进一步提升网络的性能;最后,在NYUD v2(New York University Depth dataset v2)和iBims-1(independent Benchmark images and matched scans v1)数据集上进行训练评估。iBims-1数据集上的实验结果显示,所提网络在衡量定向深度误差(DDE)方面与BS-Net相比减小了1.42个百分点,正确预测深度像素的比例达到81.69%。以上表明所提网络在深度预测上具有较高的准确性。
Aiming at the problem of inaccurate prediction of edges and the farthest region in monocular image depth estimation,a monocular depth estimation method based on Pyramid Split attention Network(PS-Net)was proposed.Firstly,based on Boundary-induced and Scene-aggregated Network(BS-Net),Pyramid Split Attention(PSA)module was introduced in PS-Net to process the spatial information of multi-scale features and effectively establish the long-term dependence between multi-scale channel attentions,thereby extracting the boundary with sharp change depth gradient and the farthest region.Then,the Mish function was used as the activation function in the decoder to further improve the performance of the network.Finally,training and evaluation were performed on NYUD v2(New York University Depth dataset v2)and iBims-1(independent Benchmark images and matched scans v1)datasets.Experimental results on iBims-1 dataset show that the proposed network reduced 1.42 percentage points compared with BS-Net in measuring Directed Depth Error(DDE),and has the proportion of correctly predicted depth pixels reached 81.69%.The above proves that the proposed network has high accuracy in depth prediction.
作者
李文举
李梦颖
崔柳
储王慧
张益
高慧
LI Wenju;LI Mengying;CUI Liu;CHU Wanghui;ZHANG Yi;GAO Hui(School of Computer Science and Information Engineering,Shanghai Institute of Technology,Shanghai 201418,China;School of Art and Design,Shanghai Institute of Technology,Shanghai 201418,China)
出处
《计算机应用》
CSCD
北大核心
2023年第6期1736-1742,共7页
journal of Computer Applications
基金
国家自然科学基金资助项目(61903256,61973307)。
关键词
深度估计
金字塔分割注意力
三维场景
深度特征
监督学习
depth estimation
Pyramid Split Attention(PSA)
Three-Dimensional(3D)scene
depth feature
supervised learning