Although deep neural networks(DNNs)have achieved great success in semantic segmentation tasks,it is still challenging for real-time applications.A large number of feature channels,parameters,and floating-point operati...Although deep neural networks(DNNs)have achieved great success in semantic segmentation tasks,it is still challenging for real-time applications.A large number of feature channels,parameters,and floating-point operations make the network sluggish and computationally heavy,which is not desirable for real-time tasks such as robotics and autonomous driving.Most approaches,however,usually sacrifice spatial resolution to achieve inference speed in real time,resulting in poor performance.In this paper,we propose a light-weight stage-pooling semantic segmentation network(SPSSN),which can efficiently reuse the paramount features from early layers at multiple stages,at different spatial resolutions.SPSSN takes input of full resolution 2048×1024 pixels,uses only 1.42×10~6 parameters,yields 69.4%m Io U accuracy without pre-training,and obtains an inference speed of 59 frames/s on the Cityscapes dataset.SPSSN can run directly on mobile devices in real time,due to its light-weight architecture.To demonstrate the effectiveness of the proposed network,we compare our results with those of state-of-the-art networks.展开更多
基金Project supported by the National Key R&D Program of China(No.2017YFB1300205)。
文摘Although deep neural networks(DNNs)have achieved great success in semantic segmentation tasks,it is still challenging for real-time applications.A large number of feature channels,parameters,and floating-point operations make the network sluggish and computationally heavy,which is not desirable for real-time tasks such as robotics and autonomous driving.Most approaches,however,usually sacrifice spatial resolution to achieve inference speed in real time,resulting in poor performance.In this paper,we propose a light-weight stage-pooling semantic segmentation network(SPSSN),which can efficiently reuse the paramount features from early layers at multiple stages,at different spatial resolutions.SPSSN takes input of full resolution 2048×1024 pixels,uses only 1.42×10~6 parameters,yields 69.4%m Io U accuracy without pre-training,and obtains an inference speed of 59 frames/s on the Cityscapes dataset.SPSSN can run directly on mobile devices in real time,due to its light-weight architecture.To demonstrate the effectiveness of the proposed network,we compare our results with those of state-of-the-art networks.