Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively ...Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively overcomes the segmentation problem[1],[2]of traditional models dealing with unknown categories[3].By deeply learning the deep semantic mapping between vision and language,the network significantly improves its ability to recognize unlabeled categories and exceeds current state-of-the-art methods in the task of scene understanding in open-vocabulary.展开更多
基金supported by CAFUC(ZHMH 2022-005)Key Laboratory of Flight Techniques and Flight Safety(FZ2022ZZ06)Flight Technology and Flight Safety of Civil Aviation Administration of China(FZ2022KF10).
文摘Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively overcomes the segmentation problem[1],[2]of traditional models dealing with unknown categories[3].By deeply learning the deep semantic mapping between vision and language,the network significantly improves its ability to recognize unlabeled categories and exceeds current state-of-the-art methods in the task of scene understanding in open-vocabulary.