期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Survey on path and view planning for UAVs
1
作者 Xiaohui ZHOU Zimu YI +2 位作者 Yilin LIU Kai HUANG Hui HUANG 《Virtual Reality & Intelligent Hardware》 2020年第1期56-69,共14页
Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one ... Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one key issue is path and view planning,which tells UAVs exactly where to fly and how to search.Methods With specific consideration for three popular UAV applications(scene reconstruction,environment exploration,and aerial cinematography),we present a survey that should assist researchers in positioning and evaluating their works in the context of existing solutions.Results/Conclusions It should also help newcomers and practitioners in related fields quickly gain an overview of the vast literature.In addition to the current research status,we analyze and elaborate on advantages,disadvantages,and potential explorative trends for each application domain. 展开更多
关键词 Unmanned aerial vehicle Path planning View panning Multi-view reconstruction Autonomous exploration Scene navigation Obstacle avoidance Drone cinematography Camera control
在线阅读 下载PDF
Taming diffusion model for exemplar-based image translation
2
作者 Hao Ma Jingyuan Yang Hui Huang 《Computational Visual Media》 CSCD 2024年第6期1031-1043,共13页
Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given exemplar.However,most existing GAN-based translation methods fail to produce photorealisti... Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given exemplar.However,most existing GAN-based translation methods fail to produce photorealistic results.In this study,we propose a new diffusion model-based approach for generating high-quality images that are semantically aligned with the input mask and resemble an exemplar in style.The proposed method trains a conditional denoising diffusion probabilistic model(DDPM)with a SPADE module to integrate the semantic map.We then used a novel contextual loss and auxiliary color loss to guide the optimization process,resulting in images that were visually pleasing and semantically accurate.Experiments demonstrate that our method outperforms state-of-the-art approaches in terms of both visual quality and quantitative metrics. 展开更多
关键词 EXEMPLAR image translation denoising diffusion probabilistic model(DDPM)
原文传递
CLIP-Flow:Decoding images encoded in CLIP space
3
作者 Hao Ma Ming Li +4 位作者 Jingyuan Yang Or Patashnik Dani Lischinski Daniel Cohen-Or Hui Huang 《Computational Visual Media》 CSCD 2024年第6期1157-1168,共12页
This study introduces CLIP-Flow,a novel network for generating images from a given image or text.To effectively utilize the rich semantics contained in both modalities,we designed a semantics-guided methodology for im... This study introduces CLIP-Flow,a novel network for generating images from a given image or text.To effectively utilize the rich semantics contained in both modalities,we designed a semantics-guided methodology for image-and text-to-image synthesis.In particular,we adopted Contrastive Language-Image Pretraining(CLIP)as an encoder to extract semantics and StyleGAN as a decoder to generate images from such information.Moreover,to bridge the embedding space of CLIP and latent space of StyleGAN,real NVP is employed and modified with activation normalization and invertible convolution.As the images and text in CLIP share the same representation space,text prompts can be fed directly into CLIP-Flow to achieve text-to-image synthesis.We conducted extensive experiments on several datasets to validate the effectiveness of the proposed image-to-image synthesis method.In addition,we tested on the public dataset Multi-Modal CelebA-HQ,for text-to-image synthesis.Experiments validated that our approach can generate high-quality text-matching images,and is comparable with state-of-the-art methods,both qualitatively and quantitatively. 展开更多
关键词 image-to-image text-to-image contrastive language-image pretraining(CLIP) FLOW StyleGAN
原文传递
Recurrent 3D attentional networks for end-to-end active object recognition 被引量:1
4
作者 Min Liu Yifei Shi +3 位作者 Lintao Zheng Kai Xu Hui Huang Dinesh Manocha 《Computational Visual Media》 CSCD 2019年第1期91-103,共13页
Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the ... Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy. 展开更多
关键词 active object RECOGNITION RECURRENT neural network next-best-view 3D ATTENTION
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部