Scene graph is a infrastructure of the virtual reality system to organize the virtual scene with abstraction, it can provide facility for the rendering engine and should be integrated effectively on demand into a real...Scene graph is a infrastructure of the virtual reality system to organize the virtual scene with abstraction, it can provide facility for the rendering engine and should be integrated effectively on demand into a real-time system, where a large quantities of scene objects and resources can be manipulated and managed with high flexibility and reliability. We present a new scheme of multiple scene graphs to accommodate the features of rendering engine and distributed systems. Based upon that, some other functions, e.g. block query, interactive editing, permission management, instance response, "redo" and "undo", are implemented to satisfy various requirements. At the same time, our design has compatibility to popular C/S architecture with good concurrent performance. Above all, it is convenient to be used for further development. The results of experiments including responding time demonstrate its good performance.展开更多
Today,autonomous mobile robots are widely used in all walks of life.Autonomous navigation,as a basic capability of robots,has become a research hotspot.Classical navigation techniques,which rely on pre-built maps,stru...Today,autonomous mobile robots are widely used in all walks of life.Autonomous navigation,as a basic capability of robots,has become a research hotspot.Classical navigation techniques,which rely on pre-built maps,struggle to cope with complex and dynamic environments.With the development of artificial intelligence,learning-based navigation technology have emerged.Instead of relying on pre-built maps,the agent perceives the environment and make decisions through visual observation,enabling end-to-end navigation.A key challenge is to enhance the generalization ability of the agent in unfamiliar environments.To tackle this challenge,it is necessary to endow the agent with spatial intelligence.Spatial intelligence refers to the ability of the agent to transform visual observations into insights,in-sights into understanding,and understanding into actions.To endow the agent with spatial intelligence,relevant research uses scene graph to represent the environment.We refer to this method as scene graph-based object goal navigation.In this paper,we concentrate on scene graph,offering formal description,computational framework of object goal navigation.We provide a comprehensive summary of the meth-ods for constructing and applying scene graph.Additionally,we present experimental evidence that highlights the critical role of scene graph in improving navigation success.This paper also delineates promising research directions,all aimed at sharpening the focus on scene graph.Overall,this paper shows how scene graph endows the agent with spatial intelligence,aiming to promote the importance of scene graph in the field of intelligent navigation.展开更多
Scene graph prediction has emerged as a critical task in computer vision,focusing on transforming complex visual scenes into structured representations by identifying objects,their attributes,and the relationships amo...Scene graph prediction has emerged as a critical task in computer vision,focusing on transforming complex visual scenes into structured representations by identifying objects,their attributes,and the relationships among them.Extending this to 3D semantic scene graph(3DSSG)prediction introduces an additional layer of complexity because it requires the processing of point-cloud data to accurately capture the spatial and volumetric characteristics of a scene.A significant challenge in 3DSSG is the long-tailed distribution of object and relationship labels,causing certain classes to be severely underrepresented and suboptimal performance in these rare categories.To address this,we proposed a fusion prototypical network(FPN),which combines the strengths of conventional neural networks for 3DSSG with a Prototypical Network.The former are known for their ability to handle complex scene graph predictions while the latter excels in few-shot learning scenarios.By leveraging this fusion,our approach enhances the overall prediction accuracy and substantially improves the handling of underrepresented labels.Through extensive experiments using the 3DSSG dataset,we demonstrated that the FPN achieves state-of-the-art performance in 3D scene graph prediction as a single model and effectively mitigates the impact of the long-tailed distribution,providing a more balanced and comprehensive understanding of complex 3D environments.展开更多
In this paper, a novel component-based scene graph is proposed, in which all objects in the scene are classified to different entities, and a scene can be represented as a hierarchical graph composed of the instances ...In this paper, a novel component-based scene graph is proposed, in which all objects in the scene are classified to different entities, and a scene can be represented as a hierarchical graph composed of the instances of entities. Each entity contains basic data and its operations which are encapsulated into the entity component. The entity possesses certain behaviours which are responses to rules and interaction defined by the high-level application. Such behaviours can be described by script or behaviours model. The component-based scene graph in the paper is more abstractive and high-level than traditional scene graphs. The contents of a scene could be extended flexibly by adding new entities and new entity components, and behaviour modification can be obtained by modifying the model components or behaviour scripts. Its robustness and efficiency are verified by many examples implemented in the Virtual Scenario developed by Peking University.展开更多
Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providi...Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.展开更多
Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes...Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.展开更多
Vision-Language-Navigation(VLN) task is a cross-modality task that combines natural language processing and computer vision. This task requires the agent to automatically move to the destination according to the natur...Vision-Language-Navigation(VLN) task is a cross-modality task that combines natural language processing and computer vision. This task requires the agent to automatically move to the destination according to the natural language instruction and the observed surrounding visual information. To make the best decision, in every step during the navigation, the agent should pay more attention to understanding the objects, the object attributes, and the object relationships. But most current methods process all received textual and visual information equally. Therefore, this paper integrates more detailed semantic connections between visual and textual information through three pre-training tasks(object prediction, object attributes prediction, and object relationship prediction). The model will learn better fusion representation and alignment between these two types of information to improve the success rate(SR) and generalization. The experiments show that compared with the former baseline models, the SR on the unseen validation set(Val Unseen) increased by 7%, and the SR weighted by path length(SPL) increased by 7%;the SR on the test set(Test) increased 4%, SPL increased by 3%.展开更多
The whole superconducting HT-7U Tokamak is a high-cost and large-scale complicated device. The assembly requirement of HT-7U device is arduous and strict. At present, there have been no guiding principle for the assem...The whole superconducting HT-7U Tokamak is a high-cost and large-scale complicated device. The assembly requirement of HT-7U device is arduous and strict. At present, there have been no guiding principle for the assembly of the device, but assembly simulation can help the engineer plan and make decision by an intuitional and visual way before its actual assembly. The problem is that which scheme is most suitable should be solved primarily. From current research situation and technology progress of assembly simulation, this paper explained and analyzed four kinds of technological schemes of assembly simulation in common use. Finally, we got the most feasible scheme that was suitable for HT-7U assembly simulation by comparing their technological issues and difficult points of simulation among the four kinds of feasible schemes.展开更多
Navigation is the only way to develop and utilize marine resources,while the promotion of seafarers1 quality is the basic force of navigation,so navigation simulator plays an important role in modern navigation educat...Navigation is the only way to develop and utilize marine resources,while the promotion of seafarers1 quality is the basic force of navigation,so navigation simulator plays an important role in modern navigation education.The simulation research on the operation of the union purchase is important to improve the special operation training of the actual cargo handling of the union purchase.On the basis of the Cartesian coordinate system transformation algorithm,the algorithm model of the union purchase operation is constructed.On the basis of three-dimensional(3D)rendering engine technology of open scene graph(OSG),the algorithm model of finding the space coordinates of the cargo point is established.The model of catenary equation is used to optimize the scene appearance of the cargo wire.By combination of QT channel signal mechanism and OSG,the simulation interaction of the union purchase operating system is realized.By acquiring the 3D coordinate values of each point,we fit the trajectories of each point in the operation and compare the trajectories.The results show that the model has high interactivity and small error.The comparison of the states of the cargo wire before and after optimization shows that the optimized wire is more realistic and the high fidelity meets the needs of operational training and simulation systems.展开更多
An ultra-massive distributed virtual environment generally consists of ultra-massive terrain data and a large quantity of objects and their attribute data,such as 2D/3D geometric models,audio/video,images,vectors,char...An ultra-massive distributed virtual environment generally consists of ultra-massive terrain data and a large quantity of objects and their attribute data,such as 2D/3D geometric models,audio/video,images,vectors,characteristics,etc.In this paper,we propose a novel method for constructing distributed scene graphs with high extensibility.This method can support high concurrent interaction of clients and implement various tasks such as editing,querying,accessing and motion controlling.Some application experiments are performed to demonstrate its efficiency and soundness.展开更多
Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existin...Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation.To address these problems,we explore a method to convert the point clouds into structured data and generate graphs without given structures.Specifically,we cluster points with similar augmented features into groups and establish their relationships,resulting in an initial structural representation of the point cloud.Besides,we propose a Dynamic Graph Generation Network(DGGN)to judge the semantic labels of targets of different granularity.It dynamically splits and merges point groups,resulting in a scene graph with high precision.Experiments show that our methods outperform other baseline methods.They output reliable graphs describing the object-level relationships without additional manual labeled data.展开更多
In healthcare facilities,including hospitals,pathogen transmission can lead to infectious disease outbreaks,highlighting the need for effective disinfection protocols.Although disinfection robots offer a promising sol...In healthcare facilities,including hospitals,pathogen transmission can lead to infectious disease outbreaks,highlighting the need for effective disinfection protocols.Although disinfection robots offer a promising solution,their deployment is often hindered by their inability to accurately recognize human activities within these environments.Although numerous studies have addressed Human Activity Recognition(HAR),few have utilized scene graph features that capture the relationships between objects in a scene.To address this gap,our study proposes a novel hybrid multi-classifier information fusion method that combines scene graph analysis with visual feature extraction for enhanced HAR in healthcare settings.We first extract scene graphs,complete with node and edge attributes,from images and use a graph classifi-cation network with a graph attention mechanism for activity recognition.Concurrently,we employ Swin Transformer and convolutional neural network models to extract visual features from the same images.The outputs from these three models are then integrated using a hybrid information fusion approach based on Dempster-Shafer theory and a weighted majority vote.Our method is evalu-ated on a newly compiled hospital activity data set,consisting of 5,770 images across 25 activity categories.The results demonstrate an accuracy of 90.59%,a recall of 90.16%,and a precision of 90.31%,outperforming existing HAR methods and showing its potential for practical applications in healthcare environments.展开更多
Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to...Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.展开更多
Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rath...Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete.For team sports videos such as volleyball and basketball videos,there are plenty of intra-team and inter-team relations.In this paper,a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos.To tackle this problem,a novel Hierarchical Relation Network is proposed.After all players in a video are finely divided into two teams,the feature of the two teams’activities and interactions will be enhanced by Graph Convolutional Networks,which are finally recognized to generate Group Scene Graph.For evaluation,built on Volleyball dataset with additional 9660 team activity labels,a Volleyball+dataset is proposed.A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method.Moreover,the idea of our method can be directly utilized in another video-based task,Group Activity Recognition.Experiments show the priority of our method and display the link between the two tasks.Finally,from the athlete’s view,we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’activities and provide professional gaming suggestions.展开更多
Social relationships,such as parent-offspring and friends,are crucial and stable connections between individuals,especially at the person level,and are essential for accurately describing the semantics of videos.In th...Social relationships,such as parent-offspring and friends,are crucial and stable connections between individuals,especially at the person level,and are essential for accurately describing the semantics of videos.In this paper,we analogize such a task to scene graph generation,which we call video social relationship graph generation(VSRGG).It involves generating a social relationship graph for each video based on person-level relationships.We propose a context-aware graph neural network(CAGNet)for VSRGG,which effectively generates social relationship graphs through message passing,capturing the context of the video.Specifically,CAGNet detects persons in the video,generates an initial graph via relationship proposal,and extracts facial and body features to describe the detected individuals,as well as temporal features to describe their interactions.Then,CAGNet predicts pairwise relationships between individuals using graph message passing.Additionally,we construct a new dataset,VidSoR,to evaluate VSRGG,which contains 72 h of video with 6276 person instances and 5313 relationship instances of eight relationship types.Extensive experiments show that CAGNet can make accurate predictions with a comparatively high mean recall(mRecall)when using only visual features.展开更多
基金Supported by National Natural Science Foundation of China(Nos.61173080,61232014,61472010,61421062)National Key Technology Support Program of China(No.2013BAK03B07)
文摘Scene graph is a infrastructure of the virtual reality system to organize the virtual scene with abstraction, it can provide facility for the rendering engine and should be integrated effectively on demand into a real-time system, where a large quantities of scene objects and resources can be manipulated and managed with high flexibility and reliability. We present a new scheme of multiple scene graphs to accommodate the features of rendering engine and distributed systems. Based upon that, some other functions, e.g. block query, interactive editing, permission management, instance response, "redo" and "undo", are implemented to satisfy various requirements. At the same time, our design has compatibility to popular C/S architecture with good concurrent performance. Above all, it is convenient to be used for further development. The results of experiments including responding time demonstrate its good performance.
基金Supported by the Major Science and Technology Project of Hubei Province of China(2022AAA009)the Open Fund of Hubei Luojia Laboratory。
文摘Today,autonomous mobile robots are widely used in all walks of life.Autonomous navigation,as a basic capability of robots,has become a research hotspot.Classical navigation techniques,which rely on pre-built maps,struggle to cope with complex and dynamic environments.With the development of artificial intelligence,learning-based navigation technology have emerged.Instead of relying on pre-built maps,the agent perceives the environment and make decisions through visual observation,enabling end-to-end navigation.A key challenge is to enhance the generalization ability of the agent in unfamiliar environments.To tackle this challenge,it is necessary to endow the agent with spatial intelligence.Spatial intelligence refers to the ability of the agent to transform visual observations into insights,in-sights into understanding,and understanding into actions.To endow the agent with spatial intelligence,relevant research uses scene graph to represent the environment.We refer to this method as scene graph-based object goal navigation.In this paper,we concentrate on scene graph,offering formal description,computational framework of object goal navigation.We provide a comprehensive summary of the meth-ods for constructing and applying scene graph.Additionally,we present experimental evidence that highlights the critical role of scene graph in improving navigation success.This paper also delineates promising research directions,all aimed at sharpening the focus on scene graph.Overall,this paper shows how scene graph endows the agent with spatial intelligence,aiming to promote the importance of scene graph in the field of intelligent navigation.
基金supported by the Glocal University 30 Project Fund of Gyeongsang National University in 2025.
文摘Scene graph prediction has emerged as a critical task in computer vision,focusing on transforming complex visual scenes into structured representations by identifying objects,their attributes,and the relationships among them.Extending this to 3D semantic scene graph(3DSSG)prediction introduces an additional layer of complexity because it requires the processing of point-cloud data to accurately capture the spatial and volumetric characteristics of a scene.A significant challenge in 3DSSG is the long-tailed distribution of object and relationship labels,causing certain classes to be severely underrepresented and suboptimal performance in these rare categories.To address this,we proposed a fusion prototypical network(FPN),which combines the strengths of conventional neural networks for 3DSSG with a Prototypical Network.The former are known for their ability to handle complex scene graph predictions while the latter excels in few-shot learning scenarios.By leveraging this fusion,our approach enhances the overall prediction accuracy and substantially improves the handling of underrepresented labels.Through extensive experiments using the 3DSSG dataset,we demonstrated that the FPN achieves state-of-the-art performance in 3D scene graph prediction as a single model and effectively mitigates the impact of the long-tailed distribution,providing a more balanced and comprehensive understanding of complex 3D environments.
基金Project supported by the National Basic Research Program (973) of China (No. 2004CB719403), and the National Natural Science Foun-dation of China (Nos. 60573151 and 60473100)
文摘In this paper, a novel component-based scene graph is proposed, in which all objects in the scene are classified to different entities, and a scene can be represented as a hierarchical graph composed of the instances of entities. Each entity contains basic data and its operations which are encapsulated into the entity component. The entity possesses certain behaviours which are responses to rules and interaction defined by the high-level application. Such behaviours can be described by script or behaviours model. The component-based scene graph in the paper is more abstractive and high-level than traditional scene graphs. The contents of a scene could be extended flexibly by adding new entities and new entity components, and behaviour modification can be obtained by modifying the model components or behaviour scripts. Its robustness and efficiency are verified by many examples implemented in the Virtual Scenario developed by Peking University.
基金funded by(i)Natural Science Foundation China(NSFC)under Grant Nos.61402397,61263043,61562093 and 61663046(ii)Open Foundation of Key Laboratory in Software Engineering of Yunnan Province:No.2020SE304.(iii)Practical Innovation Project of Yunnan University,Project Nos.2021z34,2021y128 and 2021y129.
文摘Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.
基金Supported by National Natural Science Foundation of China(61872024)National Key R&D Program of China under Grant(2018YFB2100603).
文摘Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.
基金Supported by the National Natural Science Foundation of China (62006150)Songjiang District Science and Technology Research Project (19SJKJGG83)Shanghai Young Science and Technology Talents Sailing Program (19YF1418400)。
文摘Vision-Language-Navigation(VLN) task is a cross-modality task that combines natural language processing and computer vision. This task requires the agent to automatically move to the destination according to the natural language instruction and the observed surrounding visual information. To make the best decision, in every step during the navigation, the agent should pay more attention to understanding the objects, the object attributes, and the object relationships. But most current methods process all received textual and visual information equally. Therefore, this paper integrates more detailed semantic connections between visual and textual information through three pre-training tasks(object prediction, object attributes prediction, and object relationship prediction). The model will learn better fusion representation and alignment between these two types of information to improve the success rate(SR) and generalization. The experiments show that compared with the former baseline models, the SR on the unseen validation set(Val Unseen) increased by 7%, and the SR weighted by path length(SPL) increased by 7%;the SR on the test set(Test) increased 4%, SPL increased by 3%.
基金National Nature Science Foundation of China(No.60273044)Nature Science Foundation of Anhui Province(No.01042201)
文摘The whole superconducting HT-7U Tokamak is a high-cost and large-scale complicated device. The assembly requirement of HT-7U device is arduous and strict. At present, there have been no guiding principle for the assembly of the device, but assembly simulation can help the engineer plan and make decision by an intuitional and visual way before its actual assembly. The problem is that which scheme is most suitable should be solved primarily. From current research situation and technology progress of assembly simulation, this paper explained and analyzed four kinds of technological schemes of assembly simulation in common use. Finally, we got the most feasible scheme that was suitable for HT-7U assembly simulation by comparing their technological issues and difficult points of simulation among the four kinds of feasible schemes.
基金the National Natural Science Foundation of China(No.51579114)the Natural Science Foundation of Fujian Province(No.2018J01485)the Project of Young and Middle-Aged Teacher Education of Fujian Province(No.JAT170309)。
文摘Navigation is the only way to develop and utilize marine resources,while the promotion of seafarers1 quality is the basic force of navigation,so navigation simulator plays an important role in modern navigation education.The simulation research on the operation of the union purchase is important to improve the special operation training of the actual cargo handling of the union purchase.On the basis of the Cartesian coordinate system transformation algorithm,the algorithm model of the union purchase operation is constructed.On the basis of three-dimensional(3D)rendering engine technology of open scene graph(OSG),the algorithm model of finding the space coordinates of the cargo point is established.The model of catenary equation is used to optimize the scene appearance of the cargo wire.By combination of QT channel signal mechanism and OSG,the simulation interaction of the union purchase operating system is realized.By acquiring the 3D coordinate values of each point,we fit the trajectories of each point in the operation and compare the trajectories.The results show that the model has high interactivity and small error.The comparison of the states of the cargo wire before and after optimization shows that the optimized wire is more realistic and the high fidelity meets the needs of operational training and simulation systems.
基金Supported by the National Basic Research Program of China(Grant No.2004CB719403)the National High-Tech Research&Development Program of China(Grant Nos.2006AA01Z334,2007AA01Z318,2009AA01Z324)+1 种基金the National Natural Science Foundation of China(GrantNos.60573151,60703062,60833007)the Marine 908-03-01-10 Project
文摘An ultra-massive distributed virtual environment generally consists of ultra-massive terrain data and a large quantity of objects and their attribute data,such as 2D/3D geometric models,audio/video,images,vectors,characteristics,etc.In this paper,we propose a novel method for constructing distributed scene graphs with high extensibility.This method can support high concurrent interaction of clients and implement various tasks such as editing,querying,accessing and motion controlling.Some application experiments are performed to demonstrate its efficiency and soundness.
基金This work was supported by the National Natural Science Foundation of China(Nos.62173045 and 61673192)the Fundamental Research Funds for the Central Universities(No.2020XD-A04-2)the BUPT Excellent PhD Students Foundation(No.CX2021222).
文摘Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation.To address these problems,we explore a method to convert the point clouds into structured data and generate graphs without given structures.Specifically,we cluster points with similar augmented features into groups and establish their relationships,resulting in an initial structural representation of the point cloud.Besides,we propose a Dynamic Graph Generation Network(DGGN)to judge the semantic labels of targets of different granularity.It dynamically splits and merges point groups,resulting in a scene graph with high precision.Experiments show that our methods outperform other baseline methods.They output reliable graphs describing the object-level relationships without additional manual labeled data.
基金US National Science Foundation(NSF)via Grant number 2038967This research also received support from the Science Alliance at the University of Tennessee Knoxville(UTK)via the Joint Directed Research and Development Program.
文摘In healthcare facilities,including hospitals,pathogen transmission can lead to infectious disease outbreaks,highlighting the need for effective disinfection protocols.Although disinfection robots offer a promising solution,their deployment is often hindered by their inability to accurately recognize human activities within these environments.Although numerous studies have addressed Human Activity Recognition(HAR),few have utilized scene graph features that capture the relationships between objects in a scene.To address this gap,our study proposes a novel hybrid multi-classifier information fusion method that combines scene graph analysis with visual feature extraction for enhanced HAR in healthcare settings.We first extract scene graphs,complete with node and edge attributes,from images and use a graph classifi-cation network with a graph attention mechanism for activity recognition.Concurrently,we employ Swin Transformer and convolutional neural network models to extract visual features from the same images.The outputs from these three models are then integrated using a hybrid information fusion approach based on Dempster-Shafer theory and a weighted majority vote.Our method is evalu-ated on a newly compiled hospital activity data set,consisting of 5,770 images across 25 activity categories.The results demonstrate an accuracy of 90.59%,a recall of 90.16%,and a precision of 90.31%,outperforming existing HAR methods and showing its potential for practical applications in healthcare environments.
基金supported in part by National Natural Science Foundation of China(Nos.61721004,61976214,62076078 and 62176246).
文摘Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods.
基金National Natural Science Foundation of China(Grant No.U20B2069)Fundamental Research Funds for the Central Universities.
文摘Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete.For team sports videos such as volleyball and basketball videos,there are plenty of intra-team and inter-team relations.In this paper,a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos.To tackle this problem,a novel Hierarchical Relation Network is proposed.After all players in a video are finely divided into two teams,the feature of the two teams’activities and interactions will be enhanced by Graph Convolutional Networks,which are finally recognized to generate Group Scene Graph.For evaluation,built on Volleyball dataset with additional 9660 team activity labels,a Volleyball+dataset is proposed.A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method.Moreover,the idea of our method can be directly utilized in another video-based task,Group Activity Recognition.Experiments show the priority of our method and display the link between the two tasks.Finally,from the athlete’s view,we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’activities and provide professional gaming suggestions.
基金supported by the National Natural Science Foundation of China(No.62072232)the Fundamental Research Funds for the Central Universities(No.021714380026)the Collaborative Innovation Center of Novel Software Technology and Industrialization.
文摘Social relationships,such as parent-offspring and friends,are crucial and stable connections between individuals,especially at the person level,and are essential for accurately describing the semantics of videos.In this paper,we analogize such a task to scene graph generation,which we call video social relationship graph generation(VSRGG).It involves generating a social relationship graph for each video based on person-level relationships.We propose a context-aware graph neural network(CAGNet)for VSRGG,which effectively generates social relationship graphs through message passing,capturing the context of the video.Specifically,CAGNet detects persons in the video,generates an initial graph via relationship proposal,and extracts facial and body features to describe the detected individuals,as well as temporal features to describe their interactions.Then,CAGNet predicts pairwise relationships between individuals using graph message passing.Additionally,we construct a new dataset,VidSoR,to evaluate VSRGG,which contains 72 h of video with 6276 person instances and 5313 relationship instances of eight relationship types.Extensive experiments show that CAGNet can make accurate predictions with a comparatively high mean recall(mRecall)when using only visual features.