Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively ...Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively overcomes the segmentation problem[1],[2]of traditional models dealing with unknown categories[3].By deeply learning the deep semantic mapping between vision and language,the network significantly improves its ability to recognize unlabeled categories and exceeds current state-of-the-art methods in the task of scene understanding in open-vocabulary.展开更多
Purpose–Conventional image super-resolution reconstruction by the conventional deep learning architectures suffers from the problems of hard training and gradient disappearing.In order to solve such problems,the purp...Purpose–Conventional image super-resolution reconstruction by the conventional deep learning architectures suffers from the problems of hard training and gradient disappearing.In order to solve such problems,the purposeof this paperis to proposea novel image super-resolutionalgorithmbasedon improved generative adversarial networks(GANs)with Wasserstein distance and gradient penalty.Design/methodology/approach–The proposed algorithm first introduces the conventional GANs architecture,the Wasserstein distance and the gradient penalty for the task of image super-resolution reconstruction(SRWGANs-GP).In addition,a novel perceptual loss function is designed for the SRWGANs-GP to meet the task of image super-resolution reconstruction.The content loss is extracted from the deep model’s feature maps,and such features are introduced to calculate mean square error(MSE)for the loss calculation of generators.Findings–To validate the effectiveness and feasibility of the proposed algorithm,a lot of compared experiments are applied on three common data sets,i.e.Set5,Set14 and BSD100.Experimental results have shown that the proposed SRWGANs-GP architecture has a stable error gradient and iteratively convergence.Compared with the baseline deep models,the proposed GANs models have a significant improvement on performance and efficiency for image super-resolution reconstruction.The MSE calculated by the deep model’s feature maps gives more advantages for constructing contour and texture.Originality/value–Compared with the state-of-the-art algorithms,the proposed algorithm obtains a better performance on image super-resolution and better reconstruction results on contour and texture.展开更多
基金supported by CAFUC(ZHMH 2022-005)Key Laboratory of Flight Techniques and Flight Safety(FZ2022ZZ06)Flight Technology and Flight Safety of Civil Aviation Administration of China(FZ2022KF10).
文摘Dear Editor,This letter proposes an innovative open-vocabulary 3D scene understanding model based on visual-language model.By efficiently integrating 3D point cloud data,image data,and text data,our model effectively overcomes the segmentation problem[1],[2]of traditional models dealing with unknown categories[3].By deeply learning the deep semantic mapping between vision and language,the network significantly improves its ability to recognize unlabeled categories and exceeds current state-of-the-art methods in the task of scene understanding in open-vocabulary.
文摘Purpose–Conventional image super-resolution reconstruction by the conventional deep learning architectures suffers from the problems of hard training and gradient disappearing.In order to solve such problems,the purposeof this paperis to proposea novel image super-resolutionalgorithmbasedon improved generative adversarial networks(GANs)with Wasserstein distance and gradient penalty.Design/methodology/approach–The proposed algorithm first introduces the conventional GANs architecture,the Wasserstein distance and the gradient penalty for the task of image super-resolution reconstruction(SRWGANs-GP).In addition,a novel perceptual loss function is designed for the SRWGANs-GP to meet the task of image super-resolution reconstruction.The content loss is extracted from the deep model’s feature maps,and such features are introduced to calculate mean square error(MSE)for the loss calculation of generators.Findings–To validate the effectiveness and feasibility of the proposed algorithm,a lot of compared experiments are applied on three common data sets,i.e.Set5,Set14 and BSD100.Experimental results have shown that the proposed SRWGANs-GP architecture has a stable error gradient and iteratively convergence.Compared with the baseline deep models,the proposed GANs models have a significant improvement on performance and efficiency for image super-resolution reconstruction.The MSE calculated by the deep model’s feature maps gives more advantages for constructing contour and texture.Originality/value–Compared with the state-of-the-art algorithms,the proposed algorithm obtains a better performance on image super-resolution and better reconstruction results on contour and texture.