期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning 被引量:1
1
作者 Chao Qi Jianqin Yin +1 位作者 Zhicheng Zhang Jin Tang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第1期232-243,共12页
Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existin... Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation.To address these problems,we explore a method to convert the point clouds into structured data and generate graphs without given structures.Specifically,we cluster points with similar augmented features into groups and establish their relationships,resulting in an initial structural representation of the point cloud.Besides,we propose a Dynamic Graph Generation Network(DGGN)to judge the semantic labels of targets of different granularity.It dynamically splits and merges point groups,resulting in a scene graph with high precision.Experiments show that our methods outperform other baseline methods.They output reliable graphs describing the object-level relationships without additional manual labeled data. 展开更多
关键词 scene graph generation structural representation point cloud
原文传递
DepthGAN: GAN-based depth generation from semantic layouts 被引量:1
2
作者 Yidi Li Jun Xiao +1 位作者 Yiqun Wang Zhengda Lu 《Computational Visual Media》 SCIE EI CSCD 2024年第3期505-522,共18页
Existing GAN-based generative methodsare typically used for semantic image synthesis. Wepose the question of whether GAN-based architecturescan generate plausible depth maps and find thatexisting methods have difficul... Existing GAN-based generative methodsare typically used for semantic image synthesis. Wepose the question of whether GAN-based architecturescan generate plausible depth maps and find thatexisting methods have difficulty in generating depthmaps which reasonably represent 3D scene structuredue to the lack of global geometric correlations.Thus, we propose DepthGAN, a novel method ofgenerating a depth map using a semantic layout asinput to aid construction, and manipulation of wellstructured 3D scene point clouds. Specifically, wefirst build a feature generation model with a cascadeof semantically-aware transformer blocks to obtaindepth features with global structural information.For our semantically aware transformer block, wepropose a mixed attention module and a semanticallyaware layer normalization module to better exploitsemantic consistency for depth features generation.Moreover, we present a novel semantically weighteddepth synthesis module, which generates adaptivedepth intervals for the current scene. We generate thefinal depth map by using a weighted combination ofsemantically aware depth weights for different depthranges. In this manner, we obtain a more accuratedepth map. Extensive experiments on indoor andoutdoor datasets demonstrate that DepthGAN achievessuperior results both quantitatively and visually for thedepth generation task. 展开更多
关键词 depth map generation generative model transformer scene generation
原文传递
SinGRAV: Learning a Generative Radiance Volume from a Single Natural Scene
3
作者 王玉洁 陈学霖 陈宝权 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第2期305-319,共15页
We present SinGRAV, an attempt to learn a generative radiance volume from multi-view observations of a single natural scene, in stark contrast to existing category-level 3D generative models that learn from images of ... We present SinGRAV, an attempt to learn a generative radiance volume from multi-view observations of a single natural scene, in stark contrast to existing category-level 3D generative models that learn from images of many object-centric scenes. Inspired by SinGAN, we also learn the internal distribution of the input scene, which necessitates our key designs w.r.t. the scene representation and network architecture. Unlike popular multi-layer perceptrons (MLP)-based architectures, we particularly employ convolutional generators and discriminators, which inherently possess spatial locality bias, to operate over voxelized volumes for learning the internal distribution over a plethora of overlapping regions. On the other hand, localizing the adversarial generators and discriminators over confined areas with limited receptive fields easily leads to highly implausible geometric structures in the spatial. Our remedy is to use spatial inductive bias and joint discrimination on geometric clues in the form of 2D depth maps. This strategy is effective in improving spatial arrangement while incurring negligible additional computational cost. Experimental results demonstrate the ability of SinGRAV in generating plausible and diverse variations from a single scene, the merits of SinGRAV over state-of-the-art generative neural scene models, and the versatility of SinGRAV by its use in a variety of applications. Code and data will be released to facilitate further research. 展开更多
关键词 generative model neural radiance field 3D scene generation
原文传递
Recent advances in 3D Gaussian splatting 被引量:9
4
作者 Tong Wu Yu-Jie Yuan +4 位作者 Ling-Xiao Zhang Jie Yang Yan-Pei Cao Ling-Qi Yan Lin Gao 《Computational Visual Media》 SCIE EI CSCD 2024年第4期613-642,共30页
The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position an... The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position and viewpoint-conditioned neural networks,3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images.Apart from fast rendering,the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction,geometry editing,and physical simulation.Considering the rapid changes and growing number of works in this field,we present a literature review of recent 3D Gaussian splatting methods,which can be roughly classified by functionality into 3D reconstruction,3D editing,and other downstream applications.Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique.This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview,aiming to stimulate future development of the 3D Gaussian splatting representation. 展开更多
关键词 3D Gaussian splatting(3DGS) radiance field novel view synthesis 3D editing scene generation
原文传递
A Comprehensive Pipeline for Complex Text-to-Image Synthesis
5
作者 Fei Fang Fei Luo +3 位作者 Hong-Pan Zhang Hua-Jian Zhou Alix L.H.Chow Chun-Xia Xiao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第3期522-537,共16页
Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing... Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem.It needs to solve several difficult tasks across the fields of natural language processing and computer vision.We model it as a combination of semantic entity recognition,object retrieval and recombination,and objects’status optimization.To reach a satisfactory result,we propose a comprehensive pipeline to convert the input text to its visual counterpart.The pipeline includes text processing,foreground objects and background scene retrieval,image synthesis using constrained MCMC,and post-processing.Firstly,we roughly divide the objects parsed from the input text into foreground objects and background scenes.Secondly,we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset,and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.Thirdly,in order to ensure the rationality of foreground objects’positions and sizes in the image synthesis step,we design a cost function and use the Markov Chain Monte Carlo(MCMC)method as the optimizer to solve this constrained layout problem.Finally,to make the image look natural and harmonious,we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step.The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks(GANs)in visual quality of generated scene images. 展开更多
关键词 image synthesis scene generation text-to-image conversion Markov Chain Monte Carlo(MCMC)
原文传递
Learning group interaction for sports video understanding from a perspective of athlete
6
作者 Rui HE Zehua FU +2 位作者 Qingjie LIU Yunhong WANG Xunxun CHEN 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第4期175-188,共14页
Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rath... Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete.For team sports videos such as volleyball and basketball videos,there are plenty of intra-team and inter-team relations.In this paper,a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos.To tackle this problem,a novel Hierarchical Relation Network is proposed.After all players in a video are finely divided into two teams,the feature of the two teams’activities and interactions will be enhanced by Graph Convolutional Networks,which are finally recognized to generate Group Scene Graph.For evaluation,built on Volleyball dataset with additional 9660 team activity labels,a Volleyball+dataset is proposed.A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method.Moreover,the idea of our method can be directly utilized in another video-based task,Group Activity Recognition.Experiments show the priority of our method and display the link between the two tasks.Finally,from the athlete’s view,we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’activities and provide professional gaming suggestions. 展开更多
关键词 group scene graph group activity recognition scene graph generation graph convolutional network sports video understanding
原文传递
CAGNet:a context-aware graph neural network for detecting social relationships in videos
7
作者 Fan Yu Yaqun Fang +3 位作者 Zhixiang Zhao Jia Bei Tongwei Ren Gangshan Wu 《Visual Intelligence》 2024年第1期259-271,共13页
Social relationships,such as parent-offspring and friends,are crucial and stable connections between individuals,especially at the person level,and are essential for accurately describing the semantics of videos.In th... Social relationships,such as parent-offspring and friends,are crucial and stable connections between individuals,especially at the person level,and are essential for accurately describing the semantics of videos.In this paper,we analogize such a task to scene graph generation,which we call video social relationship graph generation(VSRGG).It involves generating a social relationship graph for each video based on person-level relationships.We propose a context-aware graph neural network(CAGNet)for VSRGG,which effectively generates social relationship graphs through message passing,capturing the context of the video.Specifically,CAGNet detects persons in the video,generates an initial graph via relationship proposal,and extracts facial and body features to describe the detected individuals,as well as temporal features to describe their interactions.Then,CAGNet predicts pairwise relationships between individuals using graph message passing.Additionally,we construct a new dataset,VidSoR,to evaluate VSRGG,which contains 72 h of video with 6276 person instances and 5313 relationship instances of eight relationship types.Extensive experiments show that CAGNet can make accurate predictions with a comparatively high mean recall(mRecall)when using only visual features. 展开更多
关键词 Video analysis Social relationship detection scene graph generation Message passing
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部