期刊文献+
共找到20篇文章
< 1 >
每页显示 20 50 100
Generating animatable 3D cartoon faces from single portraits
1
作者 Chuanyu PAN Guowei YANG +1 位作者 Taijiang MU yu-kun lai 《虚拟现实与智能硬件(中英文)》 EI 2024年第4期292-307,共16页
Background With the development of virtual reality(VR)technology,there is a growing need for customized 3D avatars.However,traditional methods for 3D avatar modeling are either time-consuming or fail to retain the sim... Background With the development of virtual reality(VR)technology,there is a growing need for customized 3D avatars.However,traditional methods for 3D avatar modeling are either time-consuming or fail to retain the similarity to the person being modeled.This study presents a novel framework for generating animatable 3D cartoon faces from a single portrait image.Methods First,we transferred an input real-world portrait to a stylized cartoon image using StyleGAN.We then proposed a two-stage reconstruction method to recover a 3D cartoon face with detailed texture.Our two-stage strategy initially performs coarse estimation based on template models and subsequently refines the model by nonrigid deformation under landmark supervision.Finally,we proposed a semantic-preserving face-rigging method based on manually created templates and deformation transfer.Conclusions Compared with prior arts,the qualitative and quantitative results show that our method achieves better accuracy,aesthetics,and similarity criteria.Furthermore,we demonstrated the capability of the proposed 3D model for real-time facial animation. 展开更多
关键词 3D reconstruction Cartoon face reconstruction Face rigging Stylized reconstruction Virtual reality
在线阅读 下载PDF
MH-HMR:Human mesh recovery from monocular images via multi-hypothesis learning
2
作者 Haibiao Xuan Jinsong Zhang +1 位作者 yu-kun lai Kun Li 《CAAI Transactions on Intelligence Technology》 2024年第5期1263-1274,共12页
Recovering 3D human meshes from monocular images is an inherently ill-posed and challenging task due to depth ambiguity,joint occlusion,and truncation.However,most existing approaches do not model such uncertainties,t... Recovering 3D human meshes from monocular images is an inherently ill-posed and challenging task due to depth ambiguity,joint occlusion,and truncation.However,most existing approaches do not model such uncertainties,typically yielding a single reconstruction for one input.In contrast,the ambiguity of the reconstruction is embraced and the problem is considered as an inverse problem for which multiple feasible solutions exist.To address these issues,the authors propose a multi-hypothesis approach,multi-hypothesis human mesh recovery(MH-HMR),to efficiently model the multi-hypothesis representation and build strong relationships among the hypothetical features.Specifically,the task is decomposed into three stages:(1)generating a reasonable set of initial recovery results(i.e.,multiple hypotheses)given a single colour image;(2)modelling intra-hypothesis refinement to enhance every single-hypothesis feature;and(3)establishing inter-hypothesis communication and regressing the final human meshes.Meanwhile,the authors take further advantage of multiple hypotheses and the recovery process to achieve human mesh recovery from multiple uncalibrated views.Compared with state-of-the-art methods,the MH-HMR approach achieves superior performance and recovers more accurate human meshes on challenging benchmark datasets,such as Human3.6M and 3DPW,while demonstrating the effectiveness across a variety of settings.The code will be publicly available at https://cic.tju.edu.cn/faculty/likun/projects/MH-HMR. 展开更多
关键词 3-D computer vision human reconstraction
在线阅读 下载PDF
FRNeRF:Fusion and regularization fields for dynamic view synthesis
3
作者 Xinyi Jing Tao Yu +2 位作者 Renyuan He yu-kun lai Kun Li 《Computational Visual Media》 2025年第5期965-981,共17页
Novel space-time view synthesis for monocular video is a highly challenging task:both static and dynamic objects usually appear in the video,but only a single view of the current scene is available,resulting in inaccu... Novel space-time view synthesis for monocular video is a highly challenging task:both static and dynamic objects usually appear in the video,but only a single view of the current scene is available,resulting in inaccurate synthesis results.To address this challenge,we propose FRNeRF,a novel spacetime view synthesis method with a fusion regularization field.Specifically,we design a 2D-3D fusion regularization field for the original dynamic neural field,which helps reduce blurring of dynamic objects in the scene.In addition,we add image prior features to the hierarchical sampling to solve the problem that the traditional hierarchical sampling strategy cannot obtain sufficient sampling points during training.We evaluate our method extensively on multiple datasets and show the results of dynamic space-time view synthesis.Our method achieves state-of-the-art performance both qualitatively and quantitatively. 展开更多
关键词 neural radiance fields(NeRFs) space time view synthesis dynamic scene reconstruction flow fields
原文传递
A Survey on Human Performance Capture and Animation 被引量:13
4
作者 Shihong Xia Lin Gao +2 位作者 yu-kun lai Ming-Ze Yuan Jinxiang Chai 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第3期536-554,共19页
With the rapid development of computing technology, three-dimensional (3D) human body models and their dynamic motions are widely used in the digital entertainment industry. Human performance mainly involves human b... With the rapid development of computing technology, three-dimensional (3D) human body models and their dynamic motions are widely used in the digital entertainment industry. Human performance mainly involves human body shapes and motions. Key research problems in human performance animation include how to capture and analyze static geometric appearance and dynamic movement of human bodies, and how to simulate human body motions with physical effects. In this survey, according to the main research directions of human body performance capture and animation, we summarize recent advances in key research topics, namely human body surface reconstruction, motion capture and synthesis, as well as physics-based motion simulation, and further discuss future research problems and directions. We hope this will be helpful for readers to have a comprehensive understanding of human performance capture and animation. 展开更多
关键词 human surface reconstruction body motion capture motion synthesis physics-based motion simulation
原文传递
Saliency guided local and global descriptors for effective action recognition 被引量:10
5
作者 Ashwan Abdulmunem yu-kun lai Xianfang Sun 《Computational Visual Media》 2016年第1期97-106,共10页
This paper presents a novel framework for human action recognition based on salient object detection and a new combination of local and global descriptors. We first detect salient objects in video frames and only extr... This paper presents a novel framework for human action recognition based on salient object detection and a new combination of local and global descriptors. We first detect salient objects in video frames and only extract features for such objects.We then use a simple strategy to identify and process only those video frames that contain salient objects. Processing salient objects instead of all frames not only makes the algorithm more efficient, but more importantly also suppresses the interference of background pixels. We combine this approach with a new combination of local and global descriptors, namely3D-SIFT and histograms of oriented optical flow(HOOF), respectively. The resulting saliency guided3D-SIFT–HOOF(SGSH) feature is used along with a multi-class support vector machine(SVM) classifier for human action recognition. Experiments conducted on the standard KTH and UCF-Sports action benchmarks show that our new method outperforms the competing state-of-the-art spatiotemporal feature-based human action recognition methods. 展开更多
关键词 action RECOGNITION SALIENCY detection local and global DESCRIPTORS BAG of visual words(Bo VWs) classification
原文传递
A survey on deep geometry learning:From a representation perspective 被引量:15
6
作者 Yun-Peng Xiao yu-kun lai +2 位作者 Fang-Lue Zhang Chunpeng Li Lin Gao 《Computational Visual Media》 CSCD 2020年第2期113-133,共21页
Researchers have achieved great success in dealing with 2 D images using deep learning.In recent years,3 D computer vision and geometry deep learning have gained ever more attention.Many advanced techniques for 3 D sh... Researchers have achieved great success in dealing with 2 D images using deep learning.In recent years,3 D computer vision and geometry deep learning have gained ever more attention.Many advanced techniques for 3 D shapes have been proposed for different applications.Unlike 2 D images,which can be uniformly represented by a regular grid of pixels,3 D shapes have various representations,such as depth images,multi-view images,voxels,point clouds,meshes,implicit surfaces,etc.The performance achieved in different applications largely depends on the representation used,and there is no unique representation that works well for all applications.Therefore,in this survey,we review recent developments in deep learning for 3 D geometry from a representation perspective,summarizing the advantages and disadvantages of different representations for different applications.We also present existing datasets in these representations and further discuss future research directions. 展开更多
关键词 3D shape representation geometry learning neural networks computer graphics
原文传递
3D indoor scene modeling from RGB-D data:a survey 被引量:6
7
作者 Kang Chen yu-kun lai Shi-Min Hu 《Computational Visual Media》 2015年第4期267-278,共12页
3D scene modeling has long been a fundamental problem in computer graphics and computer vision. With the popularity of consumer-level RGB-D cameras,there is a growing interest in digitizing real-world indoor 3D scenes... 3D scene modeling has long been a fundamental problem in computer graphics and computer vision. With the popularity of consumer-level RGB-D cameras,there is a growing interest in digitizing real-world indoor 3D scenes. However,modeling indoor3 D scenes remains a challenging problem because of the complex structure of interior objects and poor quality of RGB-D data acquired by consumer-level sensors.Various methods have been proposed to tackle these challenges. In this survey,we provide an overview of recent advances in indoor scene modeling techniques,as well as public datasets and code libraries which can facilitate experiments and evaluation. 展开更多
关键词 RGB-D camera 3D indoor scenes geometric modeling semantic modeling SURVEY
原文传递
ClusterSLAM:A SLAM backend for simultaneous rigid body clustering and motion estimation 被引量:6
8
作者 Jiahui Huang Sheng Yang +2 位作者 Zishuo Zhao yu-kun lai Shi-Min Hu 《Computational Visual Media》 EI CSCD 2021年第1期87-101,共15页
We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments.While recent factor graph based state optimization algo... We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments.While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers,their dynamic motions are rarely considered.In this paper,we exploit the consensus of 3 D motions for landmarks extracted from the same rigid body for clustering,and to identify static and dynamic objects in a unified manner.Specifically,our algorithm builds a noise-aware motion affinity matrix from landmarks,and uses agglomerative clustering to distinguish rigid bodies.Using decoupled factor graph optimization to revise their shapes and trajectories,we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally.Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach,and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects. 展开更多
关键词 dynamic SLAM motion segmentation scene perception
原文传递
Knowledge graph construction with structure and parameter learning for indoor scene design 被引量:4
9
作者 Yuan Liang Fei Xu +2 位作者 Song-Hai Zhang yu-kun lai Taijiang Mu 《Computational Visual Media》 CSCD 2018年第2期123-137,共15页
We consider the problem of learning a representation of both spatial relations and dependencies between objects for indoor scene design.We propose a novel knowledge graph framework based on the entity-relation model f... We consider the problem of learning a representation of both spatial relations and dependencies between objects for indoor scene design.We propose a novel knowledge graph framework based on the entity-relation model for representation of facts in indoor scene design, and further develop a weaklysupervised algorithm for extracting the knowledge graph representation from a small dataset using both structure and parameter learning. The proposed framework is flexible, transferable, and readable. We present a variety of computer-aided indoor scene design applications using this representation, to show the usefulness and robustness of the proposed framework. 展开更多
关键词 knowledge graph scene design structure learning parameter learning
原文传递
A Revisit of Shape Editing Techniques:From the Geometric to the Neural Viewpoint 被引量:1
10
作者 Yu-Jie Yuan yu-kun lai +2 位作者 Tong Wu Lin Gao Ligang Liu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第3期520-554,共35页
3D shape editing is widely used in a range of applications such as movie production,computer games and computer aided design.It is also a popular research topic in computer graphics and computer vision.In past decades... 3D shape editing is widely used in a range of applications such as movie production,computer games and computer aided design.It is also a popular research topic in computer graphics and computer vision.In past decades,researchers have developed a series of editing methods to make the editing process faster,more robust,and more reliable.Traditionally,the deformed shape is determined by the optimal transformation and weights for an energy formulation.With increasing availability of 3D shapes on the Internet,data-driven methods were proposed to improve the editing results.More recently as the deep neural networks became popular,many deep learning based editing methods have been developed in this field,which are naturally data-driven.We mainly survey recent research studies from the geometric viewpoint to those emerging neural deformation techniques and categorize them into organic shape editing methods and man-made model editing methods.Both traditional methods and recent neural network based methods are reviewed. 展开更多
关键词 mesh deformation man-made model editing deformation representation optimization deep learning
原文传递
3D computational modeling and perceptual analysis of kinetic depth effects 被引量:1
11
作者 Meng-Yao Cui Shao-Ping Lu +3 位作者 Miao Wang Yong-Liang Yang yu-kun lai Paul L.Rosin 《Computational Visual Media》 CSCD 2020年第3期265-277,共13页
Humans have the ability to perceive kinetic depth effects, i.e., to perceived 3 D shapes from 2 D projections of rotating 3 D objects. This process is based on a variety of visual cues such as lighting and shading eff... Humans have the ability to perceive kinetic depth effects, i.e., to perceived 3 D shapes from 2 D projections of rotating 3 D objects. This process is based on a variety of visual cues such as lighting and shading effects. However, when such cues are weak or missing, perception can become faulty, as demonstrated by the famous silhouette illusion example of the spinning dancer. Inspired by this, we establish objective and subjective evaluation models of rotated3 D objects by taking their projected 2 D images as input. We investigate five different cues: ambient luminance, shading, rotation speed, perspective, and color difference between the objects and background.In the objective evaluation model, we first apply3 D reconstruction algorithms to obtain an objective reconstruction quality metric, and then use quadratic stepwise regression analysis to determine weights of depth cues to represent the reconstruction quality. In the subjective evaluation model, we use a comprehensive user study to reveal correlations with reaction time and accuracy, rotation speed, and perspective. The two evaluation models are generally consistent, and potentially of benefit to inter-disciplinary research into visual perception and 3 D reconstruction. 展开更多
关键词 ROTATION SPINNING ROTATING
原文传递
A review of image and video colorization: From analogies to deep learning 被引量:1
12
作者 Shu-Yu Chen Jia-Qi Zhang +3 位作者 You-You Zhao Paul L.Rosin yu-kun lai Lin Gao 《Visual Informatics》 EI 2022年第3期51-68,共18页
Image colorization is a classic and important topic in computer graphics,where the aim is to add color to a monochromatic input image to produce a colorful result.In this survey,we present the history of colorization ... Image colorization is a classic and important topic in computer graphics,where the aim is to add color to a monochromatic input image to produce a colorful result.In this survey,we present the history of colorization research in chronological order and summarize popular algorithms in this field.Early work on colorization mostly focused on developing techniques to improve the colorization quality.In the last few years,researchers have considered more possibilities such as combining colorization with NLP(natural language processing)and focused more on industrial applications.To better control the color,various types of color control are designed,such as providing reference images or color-scribbles.We have created a taxonomy of the colorization methods according to the input type,divided into grayscale,sketch-based and hybrid.The pros and cons are discussed for each algorithm,and they are compared according to their main characteristics.Finally,we discuss how deep learning,and in particular Generative Adversarial Networks(GANs),has changed this field. 展开更多
关键词 Image colorization Sketch colorization Manga colorization
原文传递
Benchmarking visual SLAM methods in mirror environments
13
作者 Peter Herbert Jing Wu +1 位作者 Ze Ji yu-kun lai 《Computational Visual Media》 SCIE EI CSCD 2024年第2期215-241,共27页
Visual simultaneous localisation and mapping(vSLAM)finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities,particularly mirror reflections.The effect of mirror presence(t... Visual simultaneous localisation and mapping(vSLAM)finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities,particularly mirror reflections.The effect of mirror presence(time visible and its average size in the frame)was hypothesised to impact localisation and mapping performance,with systems using direct techniques expected to perform worse.Thus,a dataset,MirrEnv,of image sequences recorded in mirror environments,was collected,and used to evaluate the performance of existing representative methods.RGBD ORB-SLAM3 and BundleFusion appear to show moderate degradation of absolute trajectory error with increasing mirror duration,whilst the remaining results did not show significantly degraded localisation performance.The mesh maps generated proved to be very inaccurate,with real and virtual reflections colliding in the reconstructions.A discussion is given of the likely sources of error and robustness in mirror environments,outlining future directions for validating and improving vSLAM performance in the presence of planar mirrors.The MirrEnv dataset is available at https://doi.org/10.17035/d.2023.0292477898. 展开更多
关键词 visual simultaneous localisation and mapping(vSLAM) MIRROR LOCALISATION MAPPING REFLECTION datase
原文传递
NPRportrait 1.0:A three-level benchmark for non-photorealistic rendering of portraits
14
作者 Paul L.Rosin yu-kun lai +10 位作者 David Mould Ran Yi Itamar Berger Lars Doyle Seungyong Lee Chuan Li Yong-Jin Liu Amir Semmo Ariel Shamir Minjung Son Holger Winnemöller 《Computational Visual Media》 SCIE EI CSCD 2022年第3期445-465,共21页
Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of perform... Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of performance evaluation in this field is poor,especially compared to the norms in the computer vision and machine learning communities.Unfortunately,the task of evaluating image stylisation is thus far not well defined,since it involves subjective,perceptual,and aesthetic aspects.To make progress towards a solution,this paper proposes a new structured,threelevel,benchmark dataset for the evaluation of stylised portrait images.Rigorous criteria were used for its construction,and its consistency was validated by user studies.Moreover,a new methodology has been developed for evaluating portrait stylisation algorithms,which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces.We perform evaluation for a wide variety of image stylisation methods(both portrait-specific and general purpose,and also both traditional NPR approaches and NST)using the new benchmark dataset. 展开更多
关键词 non-photorealistic rendering(NPR) image stylization style transfer PORTRAIT evaluation BENCHMARK
原文传递
FilterGNN:Image feature matching with cascaded outlier filters and linearattention
15
作者 Jun-Xiong Cai Tai-Jiang Mu yu-kun lai 《Computational Visual Media》 SCIE EI CSCD 2024年第5期873-884,共12页
The cross-view matching of local image features is a fundamental task in visual localization and 3D reconstruction.This study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matc... The cross-view matching of local image features is a fundamental task in visual localization and 3D reconstruction.This study proposes FilterGNN,a transformer-based graph neural network(GNN),aiming to improve the matching efficiency and accuracy of visual descriptors.Based on high matching sparseness and coarse-to-fine covisible area detection,FilterGNN utilizes cascaded optimal graph-matching filter modules to dynamically reject outlier matches.Moreover,we successfully adapted linear attention in FilterGNN with post-instance normalization support,which significantly reduces the complexity of complete graph learning from O(N2)to O(N).Experiments show that FilterGNN requires only 6%of the time cost and 33.3%of the memory cost compared with SuperGlue under a large-scale input size and achieves a competitive performance in various tasks,such as pose estimation,visual localization,and sparse 3D reconstruction. 展开更多
关键词 image matching TRANSFORMER linear attention visual localization sparse reconstruction
原文传递
STATE:Learning structure and texture representations for novel view synthesis
16
作者 Xinyi Jing Qiao Feng +3 位作者 yu-kun lai Jinsong Zhang Yuanqiang Yu Kun Li 《Computational Visual Media》 SCIE EI CSCD 2023年第4期767-786,共20页
Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,w... Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,while geometry-based methods have difficulties in synthesizing detailed textures.In this paper,we propose STATE,an end-to-end deep neural network,for sparse view synthesis by learning structure and texture representations.Structure is encoded as a hybrid feature field to predict reasonable structures for invisible regions while maintaining original structures for visible regions,and texture is encoded as a deformed feature map to preserve detailed textures.We propose a hierarchical fusion scheme with intra-branch and inter-branch aggregation,in which spatio-view attention allows multi-view fusion at the feature level to adaptively select important information by regressing pixel-wise or voxel-wise confidence maps.By decoding the aggregated features,STATE is able to generate realistic images with reasonable structures and detailed textures.Experimental results demonstrate that our method achieves qualitatively and quantitatively better results than state-of-the-art methods.Our method also enables texture and structure editing applications benefiting from implicit disentanglement of structure and texture.Our code is available at http://cic.tju.edu.cn/faculty/likun/projects/STATE. 展开更多
关键词 novel view synthesis sparse views spatioview attention structure representation texture representation
原文传递
Lesion region segmentation via weakly supervised learning
17
作者 Ran Yi Rui Zeng +3 位作者 Yang Weng Minjing Yu yu-kun lai Yong-Jin Liu 《Quantitative Biology》 CSCD 2022年第3期239-252,共14页
Background:Image-based automatic diagnosis of field diseases can help increase crop yields and is of great importance.However,crop lesion regions tend to be scattered and of varying sizes,this along with substantial i... Background:Image-based automatic diagnosis of field diseases can help increase crop yields and is of great importance.However,crop lesion regions tend to be scattered and of varying sizes,this along with substantial intraclass variation and small inter-class variation makes segmentation difficult.Methods:We propose a novel end-to-end system that only requires weak supervision of image-level labels for lesion region segmentation.First,a two-branch network is designed for joint disease classification and seed region generation.The generated seed regions are then used as input to the next segmentation stage where we design to use an encoder-decoder network.Different from previous works that use an encoder in the segmentation network,the encoder-decoder network is critical for our system to successfully segment images with small and scattered regions,which is the major challenge in image-based diagnosis of field diseases.We further propose a novel weakly supervised training strategy for the encoder-decoder semantic segmentation network,making use of the extracted seed regions.Results:Experimental results show that our system achieves better lesion region segmentation results than state of the arts.In addition to crop images,our method is also applicable to general scattered object segmentation.We demonstrate this by extending our framework to work on the PASCAL VOC dataset,which achieves comparable performance with the state-of-the-art DSRG(deep seeded region growing)method.Conclusion:Our method not only outperforms state-of-the-art semantic segmentation methods by a large margin for the lesion segmentation task,but also shows its capability to perform well on more general tasks. 展开更多
关键词 weakly supervised learning lesion segmentation disease detection semantic segmentation AGRICULTURE
原文传递
3D corrective nose reconstruction from a single image
18
作者 Yanlong Tang Yun Zhang +3 位作者 Xiaoguang Han Fang-Lue Zhang yu-kun lai Ruofeng Tong 《Computational Visual Media》 SCIE EI CSCD 2022年第2期225-237,共13页
There is a steadily growing range of applications that can benefit from facial reconstruction techniques,leading to an increasing demand for reconstruction of high-quality 3D face models.While it is an important expre... There is a steadily growing range of applications that can benefit from facial reconstruction techniques,leading to an increasing demand for reconstruction of high-quality 3D face models.While it is an important expressive part of the human face,the nose has received less attention than other expressive regions in the face reconstruction literature.When applying existing reconstruction methods to facial images,the reconstructed nose models are often inconsistent with the desired shape and expression.In this paper,we propose a coarse-to-fine 3D nose reconstruction and correction pipeline to build a nose model from a single image,where 3D and 2D nose curve correspondences are adaptively updated and refined.We first correct the reconstruction result coarsely using constraints of 3D-2D sparse landmark correspondences,and then heuristically update a dense 3D-2D curve correspondence based on the coarsely corrected result.A final refinement step is performed to correct the shape based on the updated 3D-2D dense curve constraints.Experimental results show the advantages of our method for 3D nose reconstruction over existing methods. 展开更多
关键词 nose shape recovery single image 3D reconstruction contour correspondence Laplacian deformation
原文传递
MILI:Multi-person inference from a low-resolution image
19
作者 Kun Li Yunke Liu +1 位作者 yu-kun lai Jingyu Yang 《Fundamental Research》 CAS CSCD 2023年第3期434-441,共8页
Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the ... Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the field of view and target distance given a limited camera resolution.In this paper,we propose an end-to-end multi-task framework for multi-person inference from a low-resolution image(MILI).To perceive more information from a low-resolution image,we use pair-wise images at high resolution and low resolution for training,and design a restoration network with a simple loss for better feature extraction from the low-resolution image.To address the occlusion problem in multi-person scenes,we propose an occlusion-aware mask prediction network to estimate the mask of each person during 3D mesh regression.Experimental results on both small-scale scenes and large-scale scenes demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.The code is available at http://cic.tju.edu.cn/faculty/likun/projects/MILI. 展开更多
关键词 Multi-person reconstruction Low-resolution human objects End-to-end Multi-task learning Occlusion-aware prediction
原文传递
RecStitchNet:Learning to stitch images with rectangular boundaries
20
作者 Yun Zhang yu-kun lai +2 位作者 Lang Nie Fang-Lue Zhang Lin Xu 《Computational Visual Media》 SCIE EI CSCD 2024年第4期687-703,共17页
Irregular boundaries in image stitching naturally occur due to freely moving cameras.To deal with this problem,existing methods focus on optimizing mesh warping to make boundaries regular using the traditional explici... Irregular boundaries in image stitching naturally occur due to freely moving cameras.To deal with this problem,existing methods focus on optimizing mesh warping to make boundaries regular using the traditional explicit solution.However,previous methods always depend on hand-crafted features(e.g.,keypoints and line segments).Thus,failures often happen in overlapping regions without distinctive features.In this paper,we address this problem by proposing RecStitchNet,a reasonable and effective network for image stitching with rectangular boundaries.Considering that both stitching and imposing rectangularity are non-trivial tasks in the learning-based framework,we propose a three-step progressive learning based strategy,which not only simplifies this task,but gradually achieves a good balance between stitching and imposing rectangularity.In the first step,we perform initial stitching by a pre-trained state-of-the-art image stitching model,to produce initially warped stitching results without considering the boundary constraint.Then,we use a regression network with a comprehensive objective regarding mesh,perception,and shape to further encourage the stitched meshes to have rectangular boundaries with high content fidelity.Finally,we propose an unsupervised instance-wise optimization strategy to refine the stitched meshes iteratively,which can effectively improve the stitching results in terms of feature alignment,as well as boundary and structure preservation.Due to the lack of stitching datasets and the difficulty of label generation,we propose to generate a stitching dataset with rectangular stitched images as pseudo-ground-truth labels,and the performance upper bound induced from the it can be broken by our unsupervised refinement.Qualitative and quantitative results and evaluations demonstrate the advantages of our method over the state-of-the-art. 展开更多
关键词 image stitching boundaries convolutional neural network
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部