Background Three-dimensional(3D)shape representation using mesh data is essential in various applications,such as virtual reality and simulation technologies.Current methods for extracting features from mesh edges or ...Background Three-dimensional(3D)shape representation using mesh data is essential in various applications,such as virtual reality and simulation technologies.Current methods for extracting features from mesh edges or faces struggle with complex 3D models because edge-based approaches miss global contexts and face-based methods overlook variations in adjacent areas,which affects the overall precision.To address these issues,we propose the Feature Discrimination and Context Propagation Network(FDCPNet),which is a novel approach that synergistically integrates local and global features in mesh datasets.Methods FDCPNet is composed of two modules:(1)the Feature Discrimination Module,which employs an attention mechanism to enhance the identification of key local features,and(2)the Context Propagation Module,which enriches key local features by integrating global contextual information,thereby facilitating a more detailed and comprehensive representation of crucial areas within the mesh model.Results Experiments on popular datasets validated the effectiveness of FDCPNet,showing an improvement in the classification accuracy over the baseline MeshNet.Furthermore,even with reduced mesh face numbers and limited training data,FDCPNet achieved promising results,demonstrating its robustness in scenarios of variable complexity.展开更多
As an essential field of multimedia and computer vision,3D shape recognition has attracted much research attention in recent years.Multiview-based approaches have demonstrated their superiority in generating effective...As an essential field of multimedia and computer vision,3D shape recognition has attracted much research attention in recent years.Multiview-based approaches have demonstrated their superiority in generating effective 3D shape representations.Typical methods usually extract the multiview global features and aggregate them together to generate 3D shape descriptors.However,there exist two disadvantages:First,the mainstream methods ignore the comprehensive exploration of local information in each view.Second,many approaches roughly aggregate multiview features by adding or concatenating them together.The information loss for some discriminative characteristics limits the representation effectiveness.To address these problems,a novel architecture named region-based joint attention network(RJAN)was proposed.Specifically,the authors first design a hierarchical local information exploration module for view descriptor extraction.The region-to-region and channel-to-channel relationships from different granularities can be comprehensively explored and utilised to provide more discriminative characteristics for view feature learning.Subsequently,a novel relation-aware view aggregation module is designed to aggregate the multiview features for shape descriptor generation,considering the view-to-view relationships.Extensive experiments were conducted on three public databases:ModelNet40,ModelNet10,and ShapeNetCore55.RJAN achieves state-of-the-art performance in the tasks of 3D shape classification and 3D shape retrieval,which demonstrates the effectiveness of RJAN.The code has been released on https://github.com/slurrpp/RJAN.展开更多
Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose...Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.展开更多
A new 3D surface contouring and ranging system based on digital fringe projection and phase shifting technique is presented. Using the phase-shift technique, points cloud with high spatial resolution and limited accur...A new 3D surface contouring and ranging system based on digital fringe projection and phase shifting technique is presented. Using the phase-shift technique, points cloud with high spatial resolution and limited accuracy can be generated. Stereo-pair images obtained from two cameras can be used to compute 3D world coordinates of a point using traditional active triangulation approach, yet the camera calibration is crucial. Neural network is a well-known approach to approximate a nonlinear system without an explicit physical model, in this work it is used to train the stereo vision application system to calculating 3D world coordinates such that the camera calibration can be bypassed. The training set for neural network consists of a variety of stereo-pair images and the corresponding 3D world coordinates. The picture elements correspondence problem is solved by using projected color-coded fringes with different orientations. Color imbalance is completely eliminated by the new color-coded method. Once the high accuracy correspondence of 2D images with 3D points is acquired, high precision 3D points cloud can be recognized by the well trained net. The obvious advantage of this approach is that high spatial resolution can be obtained by the phase-shifting technique and high accuracy 3D object point coordinates are achieved by the well trained net which is independent of the camera model works for any type of camera. Some experiments verified the performance of the method.展开更多
This paper describes a multiple camera-based method to reconstruct the 3D shape of a human foot. From a foot database, an initial 3D model of the foot represented by a cloud of points is built. The shape parameters, w...This paper describes a multiple camera-based method to reconstruct the 3D shape of a human foot. From a foot database, an initial 3D model of the foot represented by a cloud of points is built. The shape parameters, which can characterize more than 92% of a foot, are defined by using the principal component analysis method. Then, using "active shape models", the initial 3D model is adapted to the real foot captured in multiple images by applying some constraints (edge points' distance and color variance). We insist here on the experiment part where we demonstrate the efficiency of the proposed method on a plastic foot model, and also on real human feet with various shapes. We propose and compare different ways of texturing the foot which is needed for reconstruction. We present an experiment performed on the plastic foot model and on human feet and propose two different ways to improve the final 3D shapers accuracy according to the previous experiments' results. The first improvement proposed is the densification of the cloud of points used to represent the initial model and the foot database. The second improvement concerns the projected patterns used to texture the foot. We conclude by showing the obtained results for a human foot with the average computed shape error being only 1.06 mm.展开更多
Depth measurement and three-dimensional(3D)imaging under complex reflection and transmission conditions are challenging and even impossible for traditional structured light techniques,owing to the precondition of poin...Depth measurement and three-dimensional(3D)imaging under complex reflection and transmission conditions are challenging and even impossible for traditional structured light techniques,owing to the precondition of point-to-point triangulation.Despite recent progress in addressing this problem,there is still no efficient and general solution.Herein,a Fourier dual-slice projection with depth-constrained localization is presented to separate and utilize different illumination and reflection components efficiently,which can significantly decrease the number of projection patterns in each sequence from thousands to fifteen.Subsequently,multi-scale parallel single-pixel imaging(MS-PSI)is proposed based on the established and proven position-invariant theorem,which breaks the local regional assumption and enables dynamic 3D reconstruction.Our methodology successfully unveils unseen-before capabilities such as(1)accurate depth measurement under interreflection and subsurface scattering conditions,(2)dynamic measurement of the time-varying high-dynamic-range scene and through thin volumetric scattering media at a rate of 333 frames per second;(3)two-layer 3D imaging of the semitransparent surface and the object hidden behind it.The experimental results confirm that the proposed method paves the way for dynamic 3D reconstruction under complex optical field reflection and transmission conditions,benefiting imaging and sensing applications in advanced manufacturing,autonomous driving,and biomedical imaging.展开更多
Various techniques have been developed and introduced to address the pressing need to create three-dimensional(3D)content for advanced applications such as virtual reality and augmented reality.However,the intricate n...Various techniques have been developed and introduced to address the pressing need to create three-dimensional(3D)content for advanced applications such as virtual reality and augmented reality.However,the intricate nature of 3D shapes poses a greater challenge to their representation and generation than standard two-dimensional(2D)image data.Different types of representations have been proposed in the literature,including meshes,voxels and implicit functions.Implicit representations have attracted considerable interest from researchers due to the emergence of the radiance field representation,which allows the simultaneous reconstruction of both geometry and appearance.Subsequent work has successfully linked traditional signed distance fields to implicit representations,and more recently the triplane has offered the possibility of generating radiance fields using 2D content generators.Many articles have been published focusing on these particular areas of research.This paper provides a comprehensive analysis of recent studies on implicit representation-based 3D shape generation,classifying these studies based on the representation and generation architecture employed.The attributes of each representation are examined in detail.Potential avenues for future research in this area are also suggested.展开更多
Content-based shape retrieval techniques can facilitate 3D model resource reuse, 3D model modeling, object recognition, and 3D content classification. Recently more and more researchers have attempted to solve the pro...Content-based shape retrieval techniques can facilitate 3D model resource reuse, 3D model modeling, object recognition, and 3D content classification. Recently more and more researchers have attempted to solve the problems of partial retrieval in the domain of computer graphics, vision, CAD, and multimedia. Unfortunately, in the literature, there is little comprehensive discussion on the state-of-the-art methods of partial shape retrieval. In this article we focus on reviewing the partial shape retrieval methods over the last decade, and help novices to grasp latest developments in this field. We first give the definition of partial retrieval and discuss its desirable capabilities. Secondly, we classify the existing methods on partial shape retrieval into three classes by several criteria, describe the main ideas and techniques for each class, and detailedly compare their advantages and limits. We also present several relevant 3D datasets and corresponding evaluation metrics, which are necessary for evaluating partial retrieval performance. Finally, we discuss possible research directions to address partial shape retrieval.展开更多
基金Supported by the National Key R&D Program of China(2022YFC3803600).
文摘Background Three-dimensional(3D)shape representation using mesh data is essential in various applications,such as virtual reality and simulation technologies.Current methods for extracting features from mesh edges or faces struggle with complex 3D models because edge-based approaches miss global contexts and face-based methods overlook variations in adjacent areas,which affects the overall precision.To address these issues,we propose the Feature Discrimination and Context Propagation Network(FDCPNet),which is a novel approach that synergistically integrates local and global features in mesh datasets.Methods FDCPNet is composed of two modules:(1)the Feature Discrimination Module,which employs an attention mechanism to enhance the identification of key local features,and(2)the Context Propagation Module,which enriches key local features by integrating global contextual information,thereby facilitating a more detailed and comprehensive representation of crucial areas within the mesh model.Results Experiments on popular datasets validated the effectiveness of FDCPNet,showing an improvement in the classification accuracy over the baseline MeshNet.Furthermore,even with reduced mesh face numbers and limited training data,FDCPNet achieved promising results,demonstrating its robustness in scenarios of variable complexity.
基金the National Key Research and Development Program of China,Grant/Award Number:2020YFB1711704the National Natural Science Foundation of China,Grant/Award Number:62272337。
文摘As an essential field of multimedia and computer vision,3D shape recognition has attracted much research attention in recent years.Multiview-based approaches have demonstrated their superiority in generating effective 3D shape representations.Typical methods usually extract the multiview global features and aggregate them together to generate 3D shape descriptors.However,there exist two disadvantages:First,the mainstream methods ignore the comprehensive exploration of local information in each view.Second,many approaches roughly aggregate multiview features by adding or concatenating them together.The information loss for some discriminative characteristics limits the representation effectiveness.To address these problems,a novel architecture named region-based joint attention network(RJAN)was proposed.Specifically,the authors first design a hierarchical local information exploration module for view descriptor extraction.The region-to-region and channel-to-channel relationships from different granularities can be comprehensively explored and utilised to provide more discriminative characteristics for view feature learning.Subsequently,a novel relation-aware view aggregation module is designed to aggregate the multiview features for shape descriptor generation,considering the view-to-view relationships.Extensive experiments were conducted on three public databases:ModelNet40,ModelNet10,and ShapeNetCore55.RJAN achieves state-of-the-art performance in the tasks of 3D shape classification and 3D shape retrieval,which demonstrates the effectiveness of RJAN.The code has been released on https://github.com/slurrpp/RJAN.
基金supported by the National Natural Science Foundation of China(61773272,61976191)the Six Talent Peaks Project of Jiangsu Province,China(XYDXX-053)Suzhou Research Project of Technical Innovation,Jiangsu,China(SYG201711)。
文摘Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.
基金Supported by the Eleventh Five-Year Pre-research Project of China.
文摘A new 3D surface contouring and ranging system based on digital fringe projection and phase shifting technique is presented. Using the phase-shift technique, points cloud with high spatial resolution and limited accuracy can be generated. Stereo-pair images obtained from two cameras can be used to compute 3D world coordinates of a point using traditional active triangulation approach, yet the camera calibration is crucial. Neural network is a well-known approach to approximate a nonlinear system without an explicit physical model, in this work it is used to train the stereo vision application system to calculating 3D world coordinates such that the camera calibration can be bypassed. The training set for neural network consists of a variety of stereo-pair images and the corresponding 3D world coordinates. The picture elements correspondence problem is solved by using projected color-coded fringes with different orientations. Color imbalance is completely eliminated by the new color-coded method. Once the high accuracy correspondence of 2D images with 3D points is acquired, high precision 3D points cloud can be recognized by the well trained net. The obvious advantage of this approach is that high spatial resolution can be obtained by the phase-shifting technique and high accuracy 3D object point coordinates are achieved by the well trained net which is independent of the camera model works for any type of camera. Some experiments verified the performance of the method.
基金This work was supported by Grant-in-Aid for Scientific Research (C) (No.17500119)
文摘This paper describes a multiple camera-based method to reconstruct the 3D shape of a human foot. From a foot database, an initial 3D model of the foot represented by a cloud of points is built. The shape parameters, which can characterize more than 92% of a foot, are defined by using the principal component analysis method. Then, using "active shape models", the initial 3D model is adapted to the real foot captured in multiple images by applying some constraints (edge points' distance and color variance). We insist here on the experiment part where we demonstrate the efficiency of the proposed method on a plastic foot model, and also on real human feet with various shapes. We propose and compare different ways of texturing the foot which is needed for reconstruction. We present an experiment performed on the plastic foot model and on human feet and propose two different ways to improve the final 3D shapers accuracy according to the previous experiments' results. The first improvement proposed is the densification of the cloud of points used to represent the initial model and the foot database. The second improvement concerns the projected patterns used to texture the foot. We conclude by showing the obtained results for a human foot with the average computed shape error being only 1.06 mm.
基金supported by the National Natural Science Foundation of China(62205226,62075143)the National Postdoctoral Program for Innovative Talents of China(BX2021199)+2 种基金the General Financial Grant from the China Postdoctoral Science Foundation(2022M722290)the Key Science and Technology Research and Development Program of Jiangxi Province(20224AAC01011)the Fundamental Research Funds for Central Universities(2022SCU12010).
文摘Depth measurement and three-dimensional(3D)imaging under complex reflection and transmission conditions are challenging and even impossible for traditional structured light techniques,owing to the precondition of point-to-point triangulation.Despite recent progress in addressing this problem,there is still no efficient and general solution.Herein,a Fourier dual-slice projection with depth-constrained localization is presented to separate and utilize different illumination and reflection components efficiently,which can significantly decrease the number of projection patterns in each sequence from thousands to fifteen.Subsequently,multi-scale parallel single-pixel imaging(MS-PSI)is proposed based on the established and proven position-invariant theorem,which breaks the local regional assumption and enables dynamic 3D reconstruction.Our methodology successfully unveils unseen-before capabilities such as(1)accurate depth measurement under interreflection and subsurface scattering conditions,(2)dynamic measurement of the time-varying high-dynamic-range scene and through thin volumetric scattering media at a rate of 333 frames per second;(3)two-layer 3D imaging of the semitransparent surface and the object hidden behind it.The experimental results confirm that the proposed method paves the way for dynamic 3D reconstruction under complex optical field reflection and transmission conditions,benefiting imaging and sensing applications in advanced manufacturing,autonomous driving,and biomedical imaging.
基金supported by National Natural Science Foundation of China(No.62322210)Beijing Municipal Natural Science Foundation for Distinguished Young Scholars(No.JQ21013)Beijing Municipal Science and Technology Commission(No.Z231100005923031).
文摘Various techniques have been developed and introduced to address the pressing need to create three-dimensional(3D)content for advanced applications such as virtual reality and augmented reality.However,the intricate nature of 3D shapes poses a greater challenge to their representation and generation than standard two-dimensional(2D)image data.Different types of representations have been proposed in the literature,including meshes,voxels and implicit functions.Implicit representations have attracted considerable interest from researchers due to the emergence of the radiance field representation,which allows the simultaneous reconstruction of both geometry and appearance.Subsequent work has successfully linked traditional signed distance fields to implicit representations,and more recently the triplane has offered the possibility of generating radiance fields using 2D content generators.Many articles have been published focusing on these particular areas of research.This paper provides a comprehensive analysis of recent studies on implicit representation-based 3D shape generation,classifying these studies based on the representation and generation architecture employed.The attributes of each representation are examined in detail.Potential avenues for future research in this area are also suggested.
基金supported by the National Natural Science Foundation of China under Grant Nos. 61003137, 61202185, 61005018,91120005the Fundamental Fund of Research of Northwestern Polytechnical University of China under Grant Nos. JC201202,JC201220,JC20120237+2 种基金the Natural Science Foundation of Shaanxi Province of China under Grant No. 2012JQ8037the Open Fund from the State Key Lab of CAD&CG of Zhejiang University of Chinathe Program for New Century Excellent Talents in University of China under grant No. NCET-10-0079
文摘Content-based shape retrieval techniques can facilitate 3D model resource reuse, 3D model modeling, object recognition, and 3D content classification. Recently more and more researchers have attempted to solve the problems of partial retrieval in the domain of computer graphics, vision, CAD, and multimedia. Unfortunately, in the literature, there is little comprehensive discussion on the state-of-the-art methods of partial shape retrieval. In this article we focus on reviewing the partial shape retrieval methods over the last decade, and help novices to grasp latest developments in this field. We first give the definition of partial retrieval and discuss its desirable capabilities. Secondly, we classify the existing methods on partial shape retrieval into three classes by several criteria, describe the main ideas and techniques for each class, and detailedly compare their advantages and limits. We also present several relevant 3D datasets and corresponding evaluation metrics, which are necessary for evaluating partial retrieval performance. Finally, we discuss possible research directions to address partial shape retrieval.