Recently a new paradigm is emerging in synthetic aperture radar(SAR)three-dimensional(3D)imaging technology where the imaging performance is enhanced by exploiting SAR visual semantics.Here by“SAR visual semantics”,...Recently a new paradigm is emerging in synthetic aperture radar(SAR)three-dimensional(3D)imaging technology where the imaging performance is enhanced by exploiting SAR visual semantics.Here by“SAR visual semantics”,we mean primarily the scene conceptual structural information extracted directly from SAR images.Under this paradigm,a paramount open problem lies in what and how the SAR visual semantics could be extracted and used at different levels associated with different structural information.This work is a tentative attempt to tackle the above what-and-how problem,and it mainly consists of the following two parts.The first part is a sketchy description of how three-level(low,middle,and high)SAR visual semantics could be extracted and used in SAR Tomography(TomoSAR),including an extension of SAR visual semantics analysis(e.g.,facades and roofs)to sparse 3D points initially recovered via traditional TomoSAR methods.The second part is a case study on two open source TomoSAR datasets to illustrate and validate the effectiveness and efficiency of SAR visual semantics exploitation in TomoSAR for box-like 3D building modeling.Due to the space limit,only main steps of the involved methods are reported,and we hope,such neglects of technical details will not severely compromise the underlying key concepts and ideas.展开更多
Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the presen...Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.展开更多
Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing...Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing.Recently,visual and semantic embedding(VSE)learning has shown promising improvements in image text retrieval tasks.Most existing VSE models employ two unrelated encoders to extract features and then use complex methods to contextualize and aggregate these features into holistic embeddings.Despite recent advances,existing approaches still suffer from two limitations:(1)without considering intermediate interactions and adequate alignment between different modalities,these models cannot guarantee the discriminative ability of representations;and(2)existing feature aggregators are susceptible to certain noisy regions,which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features.Methods To address these challenges,we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder that aims to learn the adequate alignment and interaction of aggregated features to effectively bridge the modality gap.Results Experiments on the Microsoft COCO and Flickr30k datasets demonstrated the superiority of our model over state-of-the-art methods.展开更多
The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology.One of the research directions is employing relations among multi-modal data to enhanc...The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology.One of the research directions is employing relations among multi-modal data to enhance perfor-mance.However,the reliance on manually annotated multi-modal datasets results in a high cost of data labeling.In this paper,the topic semantics of images is proposed to alleviate the above problem.First,topic-related images can be auto-matically collected from the Internet by search engines.Second,topic semantics is sufficient to encode the relations be-tween multi-modal data such as texts and images.Specifically,we propose a visual topic semantic enhanced translation(VTSE)model that utilizes topic-related images to construct a cross-lingual and cross-modal semantic space,allowing the VTSE model to simultaneously integrate the syntactic structure and semantic features.In the above process,topic similar texts and images are wrapped into groups so that the model can extract more robust topic semantics from a set of similar images and then further optimize the feature integration.The results show that our model outperforms competitive base-lines by a large margin on the Multi30k and the Ambiguous COCO datasets.Our model can use external images to bring gains to translation,improving data efficiency.展开更多
Synthetic Aperture Radar three-dimensional(3D)imaging enables the acquisition of more comprehensive information,making it a recent hotspot in radar imaging.Traditional 3D imaging methods have evolved from 2D and inter...Synthetic Aperture Radar three-dimensional(3D)imaging enables the acquisition of more comprehensive information,making it a recent hotspot in radar imaging.Traditional 3D imaging methods have evolved from 2D and interferometric imaging,combining elevation aperture extension with signal processing techniques.Limitations such as long acquisition or complex system from its imaging mechanism restrict its application.In recent years,rapid development of artificial intelligence has led to a swift advancement in radar,injecting new vitality into SAR 3D imaging.SAR microwave vision 3D imaging theory,which is built upon advanced technologies,has emerged as a new interdisciplinary field for radar imaging.This paper reviews SAR 3D imaging’s history and present situation,and introduces SAR microwave vision.We establish a theoretical framework covering representation models,computational models,processing paradigms and evaluation systems.Additionally,our research progress in this area is discussed,along with future prospects for SAR microwave vision 3D imaging.展开更多
基金supported by the National Natural Science Foundation of China(61991423,62376269 and 62472464)the Key Scientific and Technological Project of Henan Province(232102321068)
文摘Recently a new paradigm is emerging in synthetic aperture radar(SAR)three-dimensional(3D)imaging technology where the imaging performance is enhanced by exploiting SAR visual semantics.Here by“SAR visual semantics”,we mean primarily the scene conceptual structural information extracted directly from SAR images.Under this paradigm,a paramount open problem lies in what and how the SAR visual semantics could be extracted and used at different levels associated with different structural information.This work is a tentative attempt to tackle the above what-and-how problem,and it mainly consists of the following two parts.The first part is a sketchy description of how three-level(low,middle,and high)SAR visual semantics could be extracted and used in SAR Tomography(TomoSAR),including an extension of SAR visual semantics analysis(e.g.,facades and roofs)to sparse 3D points initially recovered via traditional TomoSAR methods.The second part is a case study on two open source TomoSAR datasets to illustrate and validate the effectiveness and efficiency of SAR visual semantics exploitation in TomoSAR for box-like 3D building modeling.Due to the space limit,only main steps of the involved methods are reported,and we hope,such neglects of technical details will not severely compromise the underlying key concepts and ideas.
文摘Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.
基金Supported by the National Natural Science Foundation of China (62172109,62072118)the National Science Foundation of Guangdong Province (2022A1515010322)+1 种基金the Guangdong Basic and Applied Basic Research Foundation (2021B1515120010)the Huangpu International Sci&Tech Cooperation foundation of Guangzhou (2021GH12)。
文摘Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications,particularly image-text retrieval in the fields of computer vision and natural language processing.Recently,visual and semantic embedding(VSE)learning has shown promising improvements in image text retrieval tasks.Most existing VSE models employ two unrelated encoders to extract features and then use complex methods to contextualize and aggregate these features into holistic embeddings.Despite recent advances,existing approaches still suffer from two limitations:(1)without considering intermediate interactions and adequate alignment between different modalities,these models cannot guarantee the discriminative ability of representations;and(2)existing feature aggregators are susceptible to certain noisy regions,which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features.Methods To address these challenges,we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder that aims to learn the adequate alignment and interaction of aggregated features to effectively bridge the modality gap.Results Experiments on the Microsoft COCO and Flickr30k datasets demonstrated the superiority of our model over state-of-the-art methods.
基金supported by the National Natural Science Foundation of China under Grant No.52178034.
文摘The scarcity of bilingual parallel corpus imposes limitations on exploiting the state-of-the-art supervised translation technology.One of the research directions is employing relations among multi-modal data to enhance perfor-mance.However,the reliance on manually annotated multi-modal datasets results in a high cost of data labeling.In this paper,the topic semantics of images is proposed to alleviate the above problem.First,topic-related images can be auto-matically collected from the Internet by search engines.Second,topic semantics is sufficient to encode the relations be-tween multi-modal data such as texts and images.Specifically,we propose a visual topic semantic enhanced translation(VTSE)model that utilizes topic-related images to construct a cross-lingual and cross-modal semantic space,allowing the VTSE model to simultaneously integrate the syntactic structure and semantic features.In the above process,topic similar texts and images are wrapped into groups so that the model can extract more robust topic semantics from a set of similar images and then further optimize the feature integration.The results show that our model outperforms competitive base-lines by a large margin on the Multi30k and the Ambiguous COCO datasets.Our model can use external images to bring gains to translation,improving data efficiency.
基金supported by the National Natural Science Foundation of China(61991420,61991421 and 61991424)
文摘Synthetic Aperture Radar three-dimensional(3D)imaging enables the acquisition of more comprehensive information,making it a recent hotspot in radar imaging.Traditional 3D imaging methods have evolved from 2D and interferometric imaging,combining elevation aperture extension with signal processing techniques.Limitations such as long acquisition or complex system from its imaging mechanism restrict its application.In recent years,rapid development of artificial intelligence has led to a swift advancement in radar,injecting new vitality into SAR 3D imaging.SAR microwave vision 3D imaging theory,which is built upon advanced technologies,has emerged as a new interdisciplinary field for radar imaging.This paper reviews SAR 3D imaging’s history and present situation,and introduces SAR microwave vision.We establish a theoretical framework covering representation models,computational models,processing paradigms and evaluation systems.Additionally,our research progress in this area is discussed,along with future prospects for SAR microwave vision 3D imaging.