The complexity of multi-domain access control policy integration makes it difficult to understand and manage the policy conflict information. The policy information visualization technology can express the logical rel...The complexity of multi-domain access control policy integration makes it difficult to understand and manage the policy conflict information. The policy information visualization technology can express the logical relation of the complex information intuitively which can effectively improve the management ability of the multi-domain policy integration. Based on the role-based access control model, this paper proposed two policy analyzing methods on the separated domain statistical information of multi-domain policy integration conflicts and the policy element levels of inter-domain and element mapping of cross-domain respectively. In addition, the corresponding visualization tool is developed. We use the tree-maps algorithm to statistically analyze quantity and type of the policy integration conflicts. On that basis, the semantic substrates algorithm is applied to concretely analyze the policy element levels of inter-domain and role and permission mapping of cross-domain. Experimental result shows tree-maps and semantic substrates can effectively analyze the conflicts of multi-domain policy integration and have a good application value.展开更多
The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology play...The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology plays a crucial role in vehicle localization and navigation. Traditional Simultaneous Localization and Mapping (SLAM) systems are designed for use in static environments, and they can result in poor performance in terms of accuracy and robustness when used in dynamic environments where objects are in constant movement. To address this issue, a new real-time visual SLAM system called MG-SLAM has been developed. Based on ORB-SLAM2, MG-SLAM incorporates a dynamic target detection process that enables the detection of both known and unknown moving objects. In this process, a separate semantic segmentation thread is required to segment dynamic target instances, and the Mask R-CNN algorithm is applied on the Graphics Processing Unit (GPU) to accelerate segmentation. To reduce computational cost, only key frames are segmented to identify known dynamic objects. Additionally, a multi-view geometry method is adopted to detect unknown moving objects. The results demonstrate that MG-SLAM achieves higher precision, with an improvement from 0.2730 m to 0.0135 m in precision. Moreover, the processing time required by MG-SLAM is significantly reduced compared to other dynamic scene SLAM algorithms, which illustrates its efficacy in locating objects in dynamic scenes.展开更多
Medical visual question answering(Med-VQA)is a task that aims to answer clinical questions given a medical image.Existing literature generally treats it as a classic classification task based on interaction features o...Medical visual question answering(Med-VQA)is a task that aims to answer clinical questions given a medical image.Existing literature generally treats it as a classic classification task based on interaction features of the image and question.However,such a paradigm ignores the valuable semantics of candidate answers as well as their relations.From the real-world dataset,we observe that:1)The text of candidate answers has a strong intrinsic correlation with medical images;2)Subtle differences among multiple candidate answers are crucial for identifying the correct one.Therefore,we propose an answer semantics enhanced(ASE)method to integrate the semantics of answers and capture their subtle differences.Specifically,we enhance the semantic correlation of image-question-answer triplets by aligning images and question-answer tuples within the feature fusion module.Then,we devise a contrastive learning loss to highlight the semantic differences between the correct answer and other answers.Finally,extensive experiments demonstrate the effectiveness of our method.展开更多
Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed ...Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed semantic SLAM,which combines object detection,semantic segmentation,instance segmentation,and visual SLAM.Despite the growing body of literature on semantic SLAM,there is currently a lack of comprehensive research on the integration of object detection and visual SLAM.Therefore,this study aims to gather information from multiple databases and review relevant literature using specific keywords.It focuses on visual SLAM based on object detection,covering different aspects.Firstly,it discusses the current research status and challenges in this field,highlighting methods for incorporating semantic information from object detection networks into mileage measurement,closed-loop detection,and map construction.It also compares the characteristics and performance of various visual SLAM object detection algorithms.Lastly,it provides an outlook on future research directions and emerging trends in visual SLAM.Research has shown that visual SLAM based on object detection has significant improvements compared to traditional SLAM in dynamic point removal,data association,point cloud segmentation,and other technologies.It can improve the robustness and accuracy of the entire SLAM system and can run in real time.With the continuous optimization of algorithms and the improvement of hardware level,object visual SLAM has great potential for development.展开更多
In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense s...In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and tile motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify tim feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate tile dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.展开更多
Because of the importance of graphics and information within the domain of architecture, engineering and construction (AEC), an appropriate combination of visualization technology and information management technolo...Because of the importance of graphics and information within the domain of architecture, engineering and construction (AEC), an appropriate combination of visualization technology and information management technology is of utter importance in the development of appropriately supporting design and construction applications. Virtual environments, however, tend not to make this information available. The sparse number of applications that present additional information furthermore tend to limit their scope to pure construction information and do not incorporate information from loosely related knowledge domains, such as cultural heritage or architectural history information. We therefore started an investigation of two of the newest developments in these domains, namely game engine technology and semantic web technology. This paper documents part of this research, containing a review and comparison of the most prominent game engines and documenting our architectural semantic web. A short test-case illustrates how both can be combined to enhance information visualization for architectural design and construction.展开更多
A new scheme named personalized image retrieval technique based on visual perception is proposed in this letter, whose motive is to narrow the semantic gap by directly perceiving user's visual information. It uses...A new scheme named personalized image retrieval technique based on visual perception is proposed in this letter, whose motive is to narrow the semantic gap by directly perceiving user's visual information. It uses visual attention model to segment image regions and eye-tracking technique to record fixations. Visual perception is obtained by analyzing the fixations in regions to measure gaze interests. Integrating visual perception into attention model is to detect the Regions Of Interest (ROIs), whose features are extracted and analyzed, then feedback interests to optimize the results and construct user profiles.展开更多
A vast quantity of art in existence today is inaccessible to individuals.If people want to know the different types of art that exist,how individual works are connected,and how works of art are interpreted and discuss...A vast quantity of art in existence today is inaccessible to individuals.If people want to know the different types of art that exist,how individual works are connected,and how works of art are interpreted and discussed in the context of other works,they must utilize means other than simply viewing the art.Therefore,this paper proposes a language to analyze,describe,and explore collections of visual art(LadeCA).LadeCA combines human interpretation and automatic analyses of images,allowing users to assess collections of visual art without viewing every image in them.This paper focuses on the lexical base of LadeCA.It also outlines how collections of visual art can be analyzed,described,and explored using a LadeCA vocabulary.Additionally,the relationship between LadeCA and indexing systems,such as ICONCLASS or AAT,is demonstrated,and ways in which LadeCA and indexing systems can complement each other are highlighted.展开更多
Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the presen...Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.展开更多
While traditional Convolutional Neural Network(CNN)-based semantic segmentation methods have proven effective,they often encounter significant computational challenges due to the requirement for dense pixel-level pred...While traditional Convolutional Neural Network(CNN)-based semantic segmentation methods have proven effective,they often encounter significant computational challenges due to the requirement for dense pixel-level predictions,which complicates real-time implementation.To address this,we introduce an advanced real-time semantic segmentation strategy specifically designed for autonomous driving,utilizing the capabilities of Visual Transformers.By leveraging the self-attention mechanism inherent in Visual Transformers,our method enhances global contextual awareness,refining the representation of each pixel in relation to the overall scene.This enhancement is critical for quickly and accurately interpreting the complex elements within driving sce-narios—a fundamental need for autonomous vehicles.Our experiments conducted on the DriveSeg autonomous driving dataset indicate that our model surpasses traditional segmentation methods,achieving a significant 4.5%improvement in Mean Intersection over Union(mIoU)while maintaining real-time responsiveness.This paper not only underscores the potential for optimized semantic segmentation but also establishes a promising direction for real-time processing in autonomous navigation systems.Future work will focus on integrating this technique with other perception modules in autonomous driving to further improve the robustness and efficiency of self-driving perception frameworks,thereby opening new pathways for research and practical applications in scenarios requiring rapid and precise decision-making capabilities.Further experimentation and adaptation of this model could lead to broader implications for the fields of machine learning and computer vision,particularly in enhancing the interaction between automated systems and their dynamic environments.展开更多
Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dens...Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dense depth maps of complex scenes.However,simultaneous se⁃mantic and spatial joint perception,so-called dense 3D semantic mapping,estimating the 3D ge⁃ometry of a scene and attaching semantic labels to the geometry,remains a challenging problem that,if solved,would make structured vision understanding and editing more widely accessible.Concurrently,progress in computer vision and machine learning has motivated us to pursue the capability of understanding and digitally reconstructing the surrounding world.Neural metric-se⁃mantic understanding is a new and rapidly emerging field that combines differentiable machine learning techniques with physical knowledge from computer vision,e.g.,the integration of visualinertial simultaneous localization and mapping(SLAM),mesh reconstruction,and semantic un⁃derstanding.In this paper,we attempt to summarize the recent trends and applications of neural metric-semantic understanding.Starting with an overview of the underlying computer vision and machine learning concepts,we discuss critical aspects of such perception approaches.Specifical⁃ly,our emphasis is on fully leveraging the joint semantic and 3D information.Later on,many im⁃portant applications of the perception capability such as novel view synthesis and semantic aug⁃mented reality(AR)contents manipulation are also presented.Finally,we conclude with a dis⁃cussion of the technical implications of the technology under a 5G edge computing scenario.展开更多
目的随着视觉大模型的发展,利用多源无标注遥感影像预训练学习全局视觉特征,并在局部目标任务上进行迁移微调,已成为遥感影像领域自适应的一种新范式。然而,现有的全局预训练策略主要聚焦于学习低级的通用视觉特征,难以捕捉复杂、高层...目的随着视觉大模型的发展,利用多源无标注遥感影像预训练学习全局视觉特征,并在局部目标任务上进行迁移微调,已成为遥感影像领域自适应的一种新范式。然而,现有的全局预训练策略主要聚焦于学习低级的通用视觉特征,难以捕捉复杂、高层次的语义关联。此外,微调过程中使用的少量标注样本往往只反映目标域的特定场景,无法充分激活全局模型中与目标域匹配的领域知识。因此,面对复杂多变的遥感影像跨时空领域偏移,现有方法得到的全局模型与目标任务之间仍然存在巨大的语义鸿沟。为应对这一挑战,本文提出一种语言文本引导的“全局模型预训练—局部模型微调”的领域自适应框架。方法提出框架针对遥感数据的时空异质性特点,借助大型视觉语言助手LLaVA(large language and vision assistant)生成包含季节、地理区域及地物分布等时空信息的遥感影像文本描述。通过语言文本引导的学习帮助全局模型挖掘地物的时空分布规律,增强局部任务微调时相关领域知识的激活。结果在对比判别式、掩码生成式和扩散生成式3种不同全局预训练策略上设置了3组“全局—局部”跨时空领域自适应语义分割实验来验证提出框架的有效性。以全局→局部(长沙)为例,使用语言文本引导相比于无文本引导在3种不同预训练策略上分别提升了8.7%、4.4%和2.9%。同样地,提出框架在全局→局部(湘潭)和全局→局部(武汉)上也都有性能提升。结论证明了语言文本对准确理解跨时空遥感影像中的语义内容具有积极影响。与无文本引导的学习方法相比,提出框架显著提升了模型的迁移性能。展开更多
【目的】高分辨率遥感影像语义分割通过精准提取地物信息,为城市规划、土地分析利用提供了重要的数据支持。当前分割方法通常将遥感影像划分为标准块,进行多尺度局部分割和层次推理,未充分考虑影像中的上下文先验知识和局部特征交互能力...【目的】高分辨率遥感影像语义分割通过精准提取地物信息,为城市规划、土地分析利用提供了重要的数据支持。当前分割方法通常将遥感影像划分为标准块,进行多尺度局部分割和层次推理,未充分考虑影像中的上下文先验知识和局部特征交互能力,影响了推理分割质量。【方法】为了解决这一问题,本文提出了一种联合跨尺度注意力和语义视觉Transformer的遥感影像分割框架(Cross-scale Attention Transformer,CATrans),融合跨尺度注意力模块和语义视觉Transformer,提取上下文先验知识增强局部特征表示和分割性能。首先,跨尺度注意力模块通过空间和通道两个维度进行并行特征处理,分析浅层-深层和局部-全局特征之间的依赖关系,提升对遥感影像中不同粒度对象的注意力。其次,语义视觉Transformer通过空间注意力机制捕捉上下文语义信息,建模语义信息之间的依赖关系。【结果】本文在DeepGlobe、Inria Aerial和LoveDA数据集上进行对比实验,结果表明:CATrans的分割性能优于现有的WSDNet(Discrete Wavelet Smooth Network)和ISDNet(Integrating Shallow and Deep Network)等分割算法,分别取得了76.2%、79.2%、54.2%的平均交并比(Mean Intersection over Union,mIoU)和86.5%、87.8%、66.8%的平均F1得分(Mean F1 Score,mF1),推理速度分别达到38.1 FPS、13.2 FPS和95.22 FPS。相较于本文所对比的最佳方法WSDNet,mIoU和mF1在3个数据集中分别提升2.1%、4.0%、5.3%和1.3%、1.8%、5.6%,在每类地物的分割中都具有显著优势。【结论】本方法实现了高效率、高精度的高分辨率遥感影像语义分割。展开更多
文摘The complexity of multi-domain access control policy integration makes it difficult to understand and manage the policy conflict information. The policy information visualization technology can express the logical relation of the complex information intuitively which can effectively improve the management ability of the multi-domain policy integration. Based on the role-based access control model, this paper proposed two policy analyzing methods on the separated domain statistical information of multi-domain policy integration conflicts and the policy element levels of inter-domain and element mapping of cross-domain respectively. In addition, the corresponding visualization tool is developed. We use the tree-maps algorithm to statistically analyze quantity and type of the policy integration conflicts. On that basis, the semantic substrates algorithm is applied to concretely analyze the policy element levels of inter-domain and role and permission mapping of cross-domain. Experimental result shows tree-maps and semantic substrates can effectively analyze the conflicts of multi-domain policy integration and have a good application value.
基金funded by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(grant number 22KJD440001)Changzhou Science&Technology Program(grant number CJ20220232).
文摘The Internet of Vehicles (IoV) has become an important direction in the field of intelligent transportation, in which vehicle positioning is a crucial part. SLAM (Simultaneous Localization and Mapping) technology plays a crucial role in vehicle localization and navigation. Traditional Simultaneous Localization and Mapping (SLAM) systems are designed for use in static environments, and they can result in poor performance in terms of accuracy and robustness when used in dynamic environments where objects are in constant movement. To address this issue, a new real-time visual SLAM system called MG-SLAM has been developed. Based on ORB-SLAM2, MG-SLAM incorporates a dynamic target detection process that enables the detection of both known and unknown moving objects. In this process, a separate semantic segmentation thread is required to segment dynamic target instances, and the Mask R-CNN algorithm is applied on the Graphics Processing Unit (GPU) to accelerate segmentation. To reduce computational cost, only key frames are segmented to identify known dynamic objects. Additionally, a multi-view geometry method is adopted to detect unknown moving objects. The results demonstrate that MG-SLAM achieves higher precision, with an improvement from 0.2730 m to 0.0135 m in precision. Moreover, the processing time required by MG-SLAM is significantly reduced compared to other dynamic scene SLAM algorithms, which illustrates its efficacy in locating objects in dynamic scenes.
基金supported by National Natural Science Foundation of China(Nos.62032013 and 62102074)the Science and Technology Projects in Liaoning Province,China(No.2023JH3/10200005).
文摘Medical visual question answering(Med-VQA)is a task that aims to answer clinical questions given a medical image.Existing literature generally treats it as a classic classification task based on interaction features of the image and question.However,such a paradigm ignores the valuable semantics of candidate answers as well as their relations.From the real-world dataset,we observe that:1)The text of candidate answers has a strong intrinsic correlation with medical images;2)Subtle differences among multiple candidate answers are crucial for identifying the correct one.Therefore,we propose an answer semantics enhanced(ASE)method to integrate the semantics of answers and capture their subtle differences.Specifically,we enhance the semantic correlation of image-question-answer triplets by aligning images and question-answer tuples within the feature fusion module.Then,we devise a contrastive learning loss to highlight the semantic differences between the correct answer and other answers.Finally,extensive experiments demonstrate the effectiveness of our method.
基金the National Natural Science Foundation of China(No.62063006)to the Natural Science Foundation of Guangxi Province(No.2023GXNS-FAA026025)+3 种基金to the Innovation Fund of Chinese Universities Industry-University-Research(ID:2021RYC06005)to the Research Project for Young and Middle-aged Teachers in Guangxi Universities(ID:2020KY15013)to the Special Research Project of Hechi University(ID:2021GCC028)supported by the Project of Outstanding Thousand Young Teachers’Training in Higher Education Institutions of Guangxi,Guangxi Colleges and Universities Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region.
文摘Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed semantic SLAM,which combines object detection,semantic segmentation,instance segmentation,and visual SLAM.Despite the growing body of literature on semantic SLAM,there is currently a lack of comprehensive research on the integration of object detection and visual SLAM.Therefore,this study aims to gather information from multiple databases and review relevant literature using specific keywords.It focuses on visual SLAM based on object detection,covering different aspects.Firstly,it discusses the current research status and challenges in this field,highlighting methods for incorporating semantic information from object detection networks into mileage measurement,closed-loop detection,and map construction.It also compares the characteristics and performance of various visual SLAM object detection algorithms.Lastly,it provides an outlook on future research directions and emerging trends in visual SLAM.Research has shown that visual SLAM based on object detection has significant improvements compared to traditional SLAM in dynamic point removal,data association,point cloud segmentation,and other technologies.It can improve the robustness and accuracy of the entire SLAM system and can run in real time.With the continuous optimization of algorithms and the improvement of hardware level,object visual SLAM has great potential for development.
基金supported by National Natural Science Foundation of China(Nos.NSFC 61473042 and 61105092)Beijing Higher Education Young Elite Teacher Project(No.YETP1215)
文摘In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and tile motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify tim feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate tile dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.
文摘Because of the importance of graphics and information within the domain of architecture, engineering and construction (AEC), an appropriate combination of visualization technology and information management technology is of utter importance in the development of appropriately supporting design and construction applications. Virtual environments, however, tend not to make this information available. The sparse number of applications that present additional information furthermore tend to limit their scope to pure construction information and do not incorporate information from loosely related knowledge domains, such as cultural heritage or architectural history information. We therefore started an investigation of two of the newest developments in these domains, namely game engine technology and semantic web technology. This paper documents part of this research, containing a review and comparison of the most prominent game engines and documenting our architectural semantic web. A short test-case illustrates how both can be combined to enhance information visualization for architectural design and construction.
基金Supported by the National Natural Science Foundation of China (No.60472036, No.60431020, No.60402036)the Natural Science Foundation of Beijing (No.4042008)and Ph.D. Foundation of Ministry of Education (No.20040005015).
文摘A new scheme named personalized image retrieval technique based on visual perception is proposed in this letter, whose motive is to narrow the semantic gap by directly perceiving user's visual information. It uses visual attention model to segment image regions and eye-tracking technique to record fixations. Visual perception is obtained by analyzing the fixations in regions to measure gaze interests. Integrating visual perception into attention model is to detect the Regions Of Interest (ROIs), whose features are extracted and analyzed, then feedback interests to optimize the results and construct user profiles.
文摘A vast quantity of art in existence today is inaccessible to individuals.If people want to know the different types of art that exist,how individual works are connected,and how works of art are interpreted and discussed in the context of other works,they must utilize means other than simply viewing the art.Therefore,this paper proposes a language to analyze,describe,and explore collections of visual art(LadeCA).LadeCA combines human interpretation and automatic analyses of images,allowing users to assess collections of visual art without viewing every image in them.This paper focuses on the lexical base of LadeCA.It also outlines how collections of visual art can be analyzed,described,and explored using a LadeCA vocabulary.Additionally,the relationship between LadeCA and indexing systems,such as ICONCLASS or AAT,is demonstrated,and ways in which LadeCA and indexing systems can complement each other are highlighted.
文摘Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.
文摘While traditional Convolutional Neural Network(CNN)-based semantic segmentation methods have proven effective,they often encounter significant computational challenges due to the requirement for dense pixel-level predictions,which complicates real-time implementation.To address this,we introduce an advanced real-time semantic segmentation strategy specifically designed for autonomous driving,utilizing the capabilities of Visual Transformers.By leveraging the self-attention mechanism inherent in Visual Transformers,our method enhances global contextual awareness,refining the representation of each pixel in relation to the overall scene.This enhancement is critical for quickly and accurately interpreting the complex elements within driving sce-narios—a fundamental need for autonomous vehicles.Our experiments conducted on the DriveSeg autonomous driving dataset indicate that our model surpasses traditional segmentation methods,achieving a significant 4.5%improvement in Mean Intersection over Union(mIoU)while maintaining real-time responsiveness.This paper not only underscores the potential for optimized semantic segmentation but also establishes a promising direction for real-time processing in autonomous navigation systems.Future work will focus on integrating this technique with other perception modules in autonomous driving to further improve the robustness and efficiency of self-driving perception frameworks,thereby opening new pathways for research and practical applications in scenarios requiring rapid and precise decision-making capabilities.Further experimentation and adaptation of this model could lead to broader implications for the fields of machine learning and computer vision,particularly in enhancing the interaction between automated systems and their dynamic environments.
文摘Efficient perception of the real world is a long-standing effort of computer vision.Mod⁃ern visual computing techniques have succeeded in attaching semantic labels to thousands of daily objects and reconstructing dense depth maps of complex scenes.However,simultaneous se⁃mantic and spatial joint perception,so-called dense 3D semantic mapping,estimating the 3D ge⁃ometry of a scene and attaching semantic labels to the geometry,remains a challenging problem that,if solved,would make structured vision understanding and editing more widely accessible.Concurrently,progress in computer vision and machine learning has motivated us to pursue the capability of understanding and digitally reconstructing the surrounding world.Neural metric-se⁃mantic understanding is a new and rapidly emerging field that combines differentiable machine learning techniques with physical knowledge from computer vision,e.g.,the integration of visualinertial simultaneous localization and mapping(SLAM),mesh reconstruction,and semantic un⁃derstanding.In this paper,we attempt to summarize the recent trends and applications of neural metric-semantic understanding.Starting with an overview of the underlying computer vision and machine learning concepts,we discuss critical aspects of such perception approaches.Specifical⁃ly,our emphasis is on fully leveraging the joint semantic and 3D information.Later on,many im⁃portant applications of the perception capability such as novel view synthesis and semantic aug⁃mented reality(AR)contents manipulation are also presented.Finally,we conclude with a dis⁃cussion of the technical implications of the technology under a 5G edge computing scenario.
文摘目的随着视觉大模型的发展,利用多源无标注遥感影像预训练学习全局视觉特征,并在局部目标任务上进行迁移微调,已成为遥感影像领域自适应的一种新范式。然而,现有的全局预训练策略主要聚焦于学习低级的通用视觉特征,难以捕捉复杂、高层次的语义关联。此外,微调过程中使用的少量标注样本往往只反映目标域的特定场景,无法充分激活全局模型中与目标域匹配的领域知识。因此,面对复杂多变的遥感影像跨时空领域偏移,现有方法得到的全局模型与目标任务之间仍然存在巨大的语义鸿沟。为应对这一挑战,本文提出一种语言文本引导的“全局模型预训练—局部模型微调”的领域自适应框架。方法提出框架针对遥感数据的时空异质性特点,借助大型视觉语言助手LLaVA(large language and vision assistant)生成包含季节、地理区域及地物分布等时空信息的遥感影像文本描述。通过语言文本引导的学习帮助全局模型挖掘地物的时空分布规律,增强局部任务微调时相关领域知识的激活。结果在对比判别式、掩码生成式和扩散生成式3种不同全局预训练策略上设置了3组“全局—局部”跨时空领域自适应语义分割实验来验证提出框架的有效性。以全局→局部(长沙)为例,使用语言文本引导相比于无文本引导在3种不同预训练策略上分别提升了8.7%、4.4%和2.9%。同样地,提出框架在全局→局部(湘潭)和全局→局部(武汉)上也都有性能提升。结论证明了语言文本对准确理解跨时空遥感影像中的语义内容具有积极影响。与无文本引导的学习方法相比,提出框架显著提升了模型的迁移性能。
文摘【目的】高分辨率遥感影像语义分割通过精准提取地物信息,为城市规划、土地分析利用提供了重要的数据支持。当前分割方法通常将遥感影像划分为标准块,进行多尺度局部分割和层次推理,未充分考虑影像中的上下文先验知识和局部特征交互能力,影响了推理分割质量。【方法】为了解决这一问题,本文提出了一种联合跨尺度注意力和语义视觉Transformer的遥感影像分割框架(Cross-scale Attention Transformer,CATrans),融合跨尺度注意力模块和语义视觉Transformer,提取上下文先验知识增强局部特征表示和分割性能。首先,跨尺度注意力模块通过空间和通道两个维度进行并行特征处理,分析浅层-深层和局部-全局特征之间的依赖关系,提升对遥感影像中不同粒度对象的注意力。其次,语义视觉Transformer通过空间注意力机制捕捉上下文语义信息,建模语义信息之间的依赖关系。【结果】本文在DeepGlobe、Inria Aerial和LoveDA数据集上进行对比实验,结果表明:CATrans的分割性能优于现有的WSDNet(Discrete Wavelet Smooth Network)和ISDNet(Integrating Shallow and Deep Network)等分割算法,分别取得了76.2%、79.2%、54.2%的平均交并比(Mean Intersection over Union,mIoU)和86.5%、87.8%、66.8%的平均F1得分(Mean F1 Score,mF1),推理速度分别达到38.1 FPS、13.2 FPS和95.22 FPS。相较于本文所对比的最佳方法WSDNet,mIoU和mF1在3个数据集中分别提升2.1%、4.0%、5.3%和1.3%、1.8%、5.6%,在每类地物的分割中都具有显著优势。【结论】本方法实现了高效率、高精度的高分辨率遥感影像语义分割。