期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Visual Relationship Detection with Contextual Information 被引量:1
1
作者 Yugang Li Yongbin Wang +1 位作者 Zhe Chen Yuting Zhu 《Computers, Materials & Continua》 SCIE EI 2020年第6期1575-1589,共15页
Understanding an image goes beyond recognizing and locating the objects in it,the relationships between objects also very important in image understanding.Most previous methods have focused on recognizing local predic... Understanding an image goes beyond recognizing and locating the objects in it,the relationships between objects also very important in image understanding.Most previous methods have focused on recognizing local predictions of the relationships.But real-world image relationships often determined by the surrounding objects and other contextual information.In this work,we employ this insight to propose a novel framework to deal with the problem of visual relationship detection.The core of the framework is a relationship inference network,which is a recurrent structure designed for combining the global contextual information of the object to infer the relationship of the image.Experimental results on Stanford VRD and Visual Genome demonstrate that the proposed method achieves a good performance both in efficiency and accuracy.Finally,we demonstrate the value of visual relationship on two computer vision tasks:image retrieval and scene graph generation. 展开更多
关键词 Visual relationship deep learning gated recurrent units image retrieval contextual information
在线阅读 下载PDF
WordleNet: A Visualization Approach for Relationship Exploration in Document Collection
2
作者 Xu Wang Zuowei Cui +2 位作者 Lei Jiang Wenhuan Lu Jie Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2020年第3期384-400,共17页
Document collections do not only contain rich semantic content but also a diverse range of relationships.We propose WordleNet,an approach to supporting effective relationship exploration in document collections.Existi... Document collections do not only contain rich semantic content but also a diverse range of relationships.We propose WordleNet,an approach to supporting effective relationship exploration in document collections.Existing approaches mainly focus on semantic similarity or a single category of relationships.By constructing a general definition of document relationships,our approach enables the flexible and real-time generation of document relationships that may not otherwise occur to human researchers and may give rise to interesting patterns among documents.Multiple novel visual components are integrated in our approach,the effectiveness of which has been verified through a case study,a comparative study,and an eye-tracking experiment. 展开更多
关键词 document relationship interaction techniques text visualization relationship visualization visual analytics
原文传递
Graph-based method for human-object interactions detection 被引量:1
3
作者 XIA Li-min WU Wei 《Journal of Central South University》 SCIE EI CAS CSCD 2021年第1期205-218,共14页
Human-object interaction(HOIs)detection is a new branch of visual relationship detection,which plays an important role in the field of image understanding.Because of the complexity and diversity of image content,the d... Human-object interaction(HOIs)detection is a new branch of visual relationship detection,which plays an important role in the field of image understanding.Because of the complexity and diversity of image content,the detection of HOIs is still an onerous challenge.Unlike most of the current works for HOIs detection which only rely on the pairwise information of a human and an object,we propose a graph-based HOIs detection method that models context and global structure information.Firstly,to better utilize the relations between humans and objects,the detected humans and objects are regarded as nodes to construct a fully connected undirected graph,and the graph is pruned to obtain an HOI graph that only preserving the edges connecting human and object nodes.Then,in order to obtain more robust features of human and object nodes,two different attention-based feature extraction networks are proposed,which model global and local contexts respectively.Finally,the graph attention network is introduced to pass messages between different nodes in the HOI graph iteratively,and detect the potential HOIs.Experiments on V-COCO and HICO-DET datasets verify the effectiveness of the proposed method,and show that it is superior to many existing methods. 展开更多
关键词 human-object interactions visual relationship context information graph attention network
在线阅读 下载PDF
Stacked Attention Networks for Referring Expressions Comprehension
4
作者 Yugang Li Haibo Sun +2 位作者 Zhe Chen Yudan Ding Siqi Zhou 《Computers, Materials & Continua》 SCIE EI 2020年第12期2529-2541,共13页
Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous... Referring expressions comprehension is the task of locating the image region described by a natural language expression,which refer to the properties of the region or the relationships with other regions.Most previous work handles this problem by selecting the most relevant regions from a set of candidate regions,when there are many candidate regions in the set these methods are inefficient.Inspired by recent success of image captioning by using deep learning methods,in this paper we proposed a framework to understand the referring expressions by multiple steps of reasoning.We present a model for referring expressions comprehension by selecting the most relevant region directly from the image.The core of our model is a recurrent attention network which can be seen as an extension of Memory Network.The proposed model capable of improving the results by multiple computational hops.We evaluate the proposed model on two referring expression datasets:Visual Genome and Flickr30k Entities.The experimental results demonstrate that the proposed model outperform previous state-of-the-art methods both in accuracy and efficiency.We also conduct an ablation experiment to show that the performance of the model is not getting better with the increase of the attention layers. 展开更多
关键词 Stacked attention networks referring expressions visual relationship deep learning
在线阅读 下载PDF
Learning Knowledge Enhanced Text-image Feature Selection Network for Chest X-ray Image Classification
5
作者 Xinyue Gao Xixi Wang +3 位作者 Bo Jiang Xiao Wang Jin Tang Chuanfu Li 《Machine Intelligence Research》 2025年第5期941-955,共15页
Discriminant feature representation of chest X-ray images is crucial for predicting diseases.Currently,large-scale language models predominantly utilize linear classifier for disease prediction,which ignores the seman... Discriminant feature representation of chest X-ray images is crucial for predicting diseases.Currently,large-scale language models predominantly utilize linear classifier for disease prediction,which ignores the semantic correlations between different diseases and potentially leads to the omission of discriminative visual details.To this end,this work proposes a novel knowledge enhanced text-image feature selection network(KT-FSN),comprising three main components:multi-relationship image encoder module,knowledge-enhanced text encoder module,and text-image label prediction module.Specifically,the multi-relationship image encoder(MRIE)module captures the visual relationships among images and incorporates a multi-relation graph to fuse relevant image information,thereby enhancing the image features.Then,we develop a novel knowledge-enhanced text encoder(KETE)module based on a large-scale language model to learn disease label word embeddings guided by medical domain expertise.Additionally,it employs a graph convolutional network(GCN)to capture the co-occurrence and interdependence of different disease labels.Finally,we propose a novel text-image label prediction(TILP)module based on transformer decoder,which adaptively selects discriminative image spatial features under the guidance of disease label word embeddings,ultimately leading to the accurate chest diseases prediction from chest X-ray images.Extensive experimental results on the publicly available ChestX-ray14 and CheXpert datasets validate the effectiveness and superiority of the proposed KT-FSN model.The source code will be available at https://github.com/GXY-20000/KT-FSN. 展开更多
关键词 Disease prediction language model visual relationship knowledge enhancement semantic correlation
原文传递
Comprehensive Relation Modelling for Image Paragraph Generation 被引量:2
6
作者 Xianglu Zhu Zhang Zhang +1 位作者 Wei Wang Zilei Wang 《Machine Intelligence Research》 EI CSCD 2024年第2期369-382,共14页
Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to... Image paragraph generation aims to generate a long description composed of multiple sentences,which is different from traditional image captioning containing only one sentence.Most of previous methods are dedicated to extracting rich features from image regions,and ignore modelling the visual relationships.In this paper,we propose a novel method to generate a paragraph by modelling visual relationships comprehensively.First,we parse an image into a scene graph,where each node represents a specific object and each edge denotes the relationship between two objects.Second,we enrich the object features by implicitly encoding visual relationships through a graph convolutional network(GCN).We further explore high-order relations between different relation features using another graph convolutional network.In addition,we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space.With these features,we present an attention-based topic generation network to select relevant features and produce a set of topic vectors,which are then utilized to generate multiple sentences.We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation,and our method achieves competitive performance in comparison with other state-of-the-art(SOTA)methods. 展开更多
关键词 Image paragraph generation visual relationship scene graph graph convolutional network(GCN) long short-term memory
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部