Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one ...Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one key issue is path and view planning,which tells UAVs exactly where to fly and how to search.Methods With specific consideration for three popular UAV applications(scene reconstruction,environment exploration,and aerial cinematography),we present a survey that should assist researchers in positioning and evaluating their works in the context of existing solutions.Results/Conclusions It should also help newcomers and practitioners in related fields quickly gain an overview of the vast literature.In addition to the current research status,we analyze and elaborate on advantages,disadvantages,and potential explorative trends for each application domain.展开更多
Metallic mesh is a transparent electromagnetic shielding film with a fine metal line structure.However,in production preparation or actual use it can develop defects that affect the optoelectronic performance.The deve...Metallic mesh is a transparent electromagnetic shielding film with a fine metal line structure.However,in production preparation or actual use it can develop defects that affect the optoelectronic performance.The development of in situ non-destructive testing(NDT)devices for metallic mesh requires long working distances,reflective optical path design,and miniaturization.To address the limitations of existing smartphone microscopes,which feature short working distances and inadequate transmission imaging for industrial in situ inspection,we propose a novel long-working-distance reflective smartphone microscopy(LD-RSM)system.LD-RSM comprises a 4f optical imaging system with external optical components and a smartphone.This system uses a beam splitter to achieve reflective imaging with the illumination system and imaging system on the same side of the sample.It achieves an optical resolution of 4.92µm and a working distance of up to 22.23 mm.Additionally,we introduce dual-prior weighted robust principal component analysis(DW-RPCA)for defect detection.This approach leverages spectral filter fusion and the Hough transform to model different defect types,which enhances the accuracy and efficiency of defect identification.Coupled with a double-threshold segmentation approach,the DW-RPCA method achieves a pixel-level defect detection accuracy(f-value)of 0.856 and 0.848 in square and circular metallic mesh datasets,respectively.Our work shows strong potential in the field of in situ industrial product inspection.展开更多
Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given exemplar.However,most existing GAN-based translation methods fail to produce photorealisti...Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given exemplar.However,most existing GAN-based translation methods fail to produce photorealistic results.In this study,we propose a new diffusion model-based approach for generating high-quality images that are semantically aligned with the input mask and resemble an exemplar in style.The proposed method trains a conditional denoising diffusion probabilistic model(DDPM)with a SPADE module to integrate the semantic map.We then used a novel contextual loss and auxiliary color loss to guide the optimization process,resulting in images that were visually pleasing and semantically accurate.Experiments demonstrate that our method outperforms state-of-the-art approaches in terms of both visual quality and quantitative metrics.展开更多
This study introduces CLIP-Flow,a novel network for generating images from a given image or text.To effectively utilize the rich semantics contained in both modalities,we designed a semantics-guided methodology for im...This study introduces CLIP-Flow,a novel network for generating images from a given image or text.To effectively utilize the rich semantics contained in both modalities,we designed a semantics-guided methodology for image-and text-to-image synthesis.In particular,we adopted Contrastive Language-Image Pretraining(CLIP)as an encoder to extract semantics and StyleGAN as a decoder to generate images from such information.Moreover,to bridge the embedding space of CLIP and latent space of StyleGAN,real NVP is employed and modified with activation normalization and invertible convolution.As the images and text in CLIP share the same representation space,text prompts can be fed directly into CLIP-Flow to achieve text-to-image synthesis.We conducted extensive experiments on several datasets to validate the effectiveness of the proposed image-to-image synthesis method.In addition,we tested on the public dataset Multi-Modal CelebA-HQ,for text-to-image synthesis.Experiments validated that our approach can generate high-quality text-matching images,and is comparable with state-of-the-art methods,both qualitatively and quantitatively.展开更多
In this paper, we survey recent approaches to blue-noise sampling and discuss their beneficial applications. We discuss the sampling algorithms that use points as sampling primitives and classify the sampling algorith...In this paper, we survey recent approaches to blue-noise sampling and discuss their beneficial applications. We discuss the sampling algorithms that use points as sampling primitives and classify the sampling algorithms based on various aspects, e.g., the sampling domain and the type of algorithm. We demonstrate several well-known applications that can be improved by recent blue-noise sampling techniques, as well as some new applications such as dynamic sampling and blue-noise remeshing.展开更多
Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the ...Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.展开更多
We propose an approach for automatic generation of building models by assembling a set of boxes using a Manhattan-world assumption.The method first aligns the point cloud with a per-building local coordinate system,an...We propose an approach for automatic generation of building models by assembling a set of boxes using a Manhattan-world assumption.The method first aligns the point cloud with a per-building local coordinate system,and then fits axis-aligned planes to the point cloud through an iterative regularization process.The refined planes partition the space of the data into a series of compact cubic cells(candidate boxes)spanning the entire 3D space of the input data.We then choose to approximate the target building by the assembly of a subset of these candidate boxes using a binary linear programming formulation.The objective function is designed to maximize the point cloud coverage and the compactness of the final model.Finally,all selected boxes are merged into a lightweight polygonal mesh model,which is suitable for interactive visualization of large scale urban scenes.Experimental results and a comparison with state-of-the-art methods demonstrate the effectiveness of the proposed framework.展开更多
基金LHTD(20170003)and the Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ).
文摘Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one key issue is path and view planning,which tells UAVs exactly where to fly and how to search.Methods With specific consideration for three popular UAV applications(scene reconstruction,environment exploration,and aerial cinematography),we present a survey that should assist researchers in positioning and evaluating their works in the context of existing solutions.Results/Conclusions It should also help newcomers and practitioners in related fields quickly gain an overview of the vast literature.In addition to the current research status,we analyze and elaborate on advantages,disadvantages,and potential explorative trends for each application domain.
基金supported by the National Natural Science Foundation of China(Nos.61975046 and 62375068)。
文摘Metallic mesh is a transparent electromagnetic shielding film with a fine metal line structure.However,in production preparation or actual use it can develop defects that affect the optoelectronic performance.The development of in situ non-destructive testing(NDT)devices for metallic mesh requires long working distances,reflective optical path design,and miniaturization.To address the limitations of existing smartphone microscopes,which feature short working distances and inadequate transmission imaging for industrial in situ inspection,we propose a novel long-working-distance reflective smartphone microscopy(LD-RSM)system.LD-RSM comprises a 4f optical imaging system with external optical components and a smartphone.This system uses a beam splitter to achieve reflective imaging with the illumination system and imaging system on the same side of the sample.It achieves an optical resolution of 4.92µm and a working distance of up to 22.23 mm.Additionally,we introduce dual-prior weighted robust principal component analysis(DW-RPCA)for defect detection.This approach leverages spectral filter fusion and the Hough transform to model different defect types,which enhances the accuracy and efficiency of defect identification.Coupled with a double-threshold segmentation approach,the DW-RPCA method achieves a pixel-level defect detection accuracy(f-value)of 0.856 and 0.848 in square and circular metallic mesh datasets,respectively.Our work shows strong potential in the field of in situ industrial product inspection.
基金supported in part by National Natural Science Foundation of China(U21B2023)DEGP Innovation Team(2022KCXTD025)+1 种基金Shenzhen Science and Technology Program(KQTD20210811090044003,RCJC20200714114435012)Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ).
文摘Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given exemplar.However,most existing GAN-based translation methods fail to produce photorealistic results.In this study,we propose a new diffusion model-based approach for generating high-quality images that are semantically aligned with the input mask and resemble an exemplar in style.The proposed method trains a conditional denoising diffusion probabilistic model(DDPM)with a SPADE module to integrate the semantic map.We then used a novel contextual loss and auxiliary color loss to guide the optimization process,resulting in images that were visually pleasing and semantically accurate.Experiments demonstrate that our method outperforms state-of-the-art approaches in terms of both visual quality and quantitative metrics.
基金supported in parts by the National Natural Science Foundation of China(62161146005,U21B2023)Shenzhen Science and Technology Program(KQTD20210811090044003,RCJC20200714114435012)Israel Science Foundation.
文摘This study introduces CLIP-Flow,a novel network for generating images from a given image or text.To effectively utilize the rich semantics contained in both modalities,we designed a semantics-guided methodology for image-and text-to-image synthesis.In particular,we adopted Contrastive Language-Image Pretraining(CLIP)as an encoder to extract semantics and StyleGAN as a decoder to generate images from such information.Moreover,to bridge the embedding space of CLIP and latent space of StyleGAN,real NVP is employed and modified with activation normalization and invertible convolution.As the images and text in CLIP share the same representation space,text prompts can be fed directly into CLIP-Flow to achieve text-to-image synthesis.We conducted extensive experiments on several datasets to validate the effectiveness of the proposed image-to-image synthesis method.In addition,we tested on the public dataset Multi-Modal CelebA-HQ,for text-to-image synthesis.Experiments validated that our approach can generate high-quality text-matching images,and is comparable with state-of-the-art methods,both qualitatively and quantitatively.
基金This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 61372168, 61373071, 61372190, and 61331018, the Scientific Research Foundation for the Returned Overseas Chinese Scholars of State Education Ministry of China, the Visual Computing Center at King Abudullah University of Science and Technology (KAUST), and the Open Funding Project of the State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, under Grant Nos. BUAA-VR- 15KF-06 and BUAA-VR-14KF-10.
文摘In this paper, we survey recent approaches to blue-noise sampling and discuss their beneficial applications. We discuss the sampling algorithms that use points as sampling primitives and classify the sampling algorithms based on various aspects, e.g., the sampling domain and the type of algorithm. We demonstrate several well-known applications that can be improved by recent blue-noise sampling techniques, as well as some new applications such as dynamic sampling and blue-noise remeshing.
基金supported by National Natural Science Foundation of China (Nos. 61572507, 61622212, and 61532003)supported by the China Scholarship Council
文摘Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.
文摘We propose an approach for automatic generation of building models by assembling a set of boxes using a Manhattan-world assumption.The method first aligns the point cloud with a per-building local coordinate system,and then fits axis-aligned planes to the point cloud through an iterative regularization process.The refined planes partition the space of the data into a series of compact cubic cells(candidate boxes)spanning the entire 3D space of the input data.We then choose to approximate the target building by the assembly of a subset of these candidate boxes using a binary linear programming formulation.The objective function is designed to maximize the point cloud coverage and the compactness of the final model.Finally,all selected boxes are merged into a lightweight polygonal mesh model,which is suitable for interactive visualization of large scale urban scenes.Experimental results and a comparison with state-of-the-art methods demonstrate the effectiveness of the proposed framework.