期刊文献+
共找到532篇文章
< 1 2 27 >
每页显示 20 50 100
High-dimensional features of adaptive superpixels for visually degraded images 被引量:1
1
作者 LIAO Feng-feng CAO Ke-ye +1 位作者 ZHANG Yu-xiang LIU Sheng 《Optoelectronics Letters》 EI 2019年第3期231-235,共5页
This study presents a novel and highly efficient superpixel algorithm, namely, depth-fused adaptive superpixel(DFASP), which can generate accurate superpixels in a degraded image. In many applications, particularly in... This study presents a novel and highly efficient superpixel algorithm, namely, depth-fused adaptive superpixel(DFASP), which can generate accurate superpixels in a degraded image. In many applications, particularly in actual scenes, vision degradation, such as motion blur, overexposure, and underexposure, often occurs. Well-known color-based superpixel algorithms are incapable of producing accurate superpixels in degraded images because of the ambiguity of color information caused by vision degradation. To eliminate this ambiguity, we use depth and color information to generate superpixels. We map the depth and color information to a high-dimensional feature space. Then, we develop a fast multilevel clustering algorithm to produce superpixels. Furthermore, we design an adaptive mechanism to adjust the color and depth information automatically during pixel clustering. Experimental results demonstrate that regardless of boundary recall, under segmentation error, run time, or achievable segmentation accuracy, DFASP is better than state-of-the-art superpixel methods. 展开更多
关键词 HIGH-DIMENSIONAL featureS visually degraded imageS
原文传递
A Concise and Varied Visual Features-Based Image Captioning Model with Visual Selection
2
作者 Alaa Thobhani Beiji Zou +4 位作者 Xiaoyan Kui Amr Abdussalam Muhammad Asim Naveed Ahmed Mohammed Ali Alshara 《Computers, Materials & Continua》 SCIE EI 2024年第11期2873-2894,共22页
Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms... Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024). 展开更多
关键词 visual attention image captioning visual feature detector visual feature visual attention
在线阅读 下载PDF
A Visual Indoor Localization Method Based on Efficient Image Retrieval 被引量:1
3
作者 Mengyan Lyu Xinxin Guo +1 位作者 Kunpeng Zhang Liye Zhang 《Journal of Computer and Communications》 2024年第2期47-66,共20页
The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor l... The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor localization technologies generally used scene-specific 3D representations or were trained on specific datasets, making it challenging to balance accuracy and cost when applied to new scenes. Addressing this issue, this paper proposed a universal indoor visual localization method based on efficient image retrieval. Initially, a Multi-Layer Perceptron (MLP) was employed to aggregate features from intermediate layers of a convolutional neural network, obtaining a global representation of the image. This approach ensured accurate and rapid retrieval of reference images. Subsequently, a new mechanism using Random Sample Consensus (RANSAC) was designed to resolve relative pose ambiguity caused by the essential matrix decomposition based on the five-point method. Finally, the absolute pose of the queried user image was computed, thereby achieving indoor user pose estimation. The proposed indoor localization method was characterized by its simplicity, flexibility, and excellent cross-scene generalization. Experimental results demonstrated a positioning error of 0.09 m and 2.14° on the 7Scenes dataset, and 0.15 m and 6.37° on the 12Scenes dataset. These results convincingly illustrated the outstanding performance of the proposed indoor localization method. 展开更多
关键词 visual Indoor Positioning feature Point Matching image Retrieval Position Calculation Five-Point Method
在线阅读 下载PDF
Structured Computational Modeling of Human Visual System for No-reference Image Quality Assessment
4
作者 Wen-Han Zhu Wei Sun +2 位作者 Xiong-Kuo Min Guang-Tao Zhai Xiao-Kang Yang 《International Journal of Automation and computing》 EI CSCD 2021年第2期204-218,共15页
Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate eval... Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate evaluator for visual experience,thus the modeling of human visual system(HVS)is a core issue for objective IQA and visual experience optimization.The traditional model based on black box fitting has low interpretability and it is difficult to guide the experience optimization effectively,while the model based on physiological simulation is hard to integrate into practical visual communication services due to its high computational complexity.For bridging the gap between signal distortion and visual experience,in this paper,we propose a novel perceptual no-reference(NR)IQA algorithm based on structural computational modeling of HVS.According to the mechanism of the human brain,we divide the visual signal processing into a low-level visual layer,a middle-level visual layer and a high-level visual layer,which conduct pixel information processing,primitive information processing and global image information processing,respectively.The natural scene statistics(NSS)based features,deep features and free-energy based features are extracted from these three layers.The support vector regression(SVR)is employed to aggregate features to the final quality prediction.Extensive experimental comparisons on three widely used benchmark IQA databases(LIVE,CSIQ and TID2013)demonstrate that our proposed metric is highly competitive with or outperforms the state-of-the-art NR IQA measures. 展开更多
关键词 image quality assessment(IQA) no-reference(NR) structural computational modeling human visual system visual feature extraction
原文传递
Bag-of-visual-words model for artificial pornographic images recognition
5
作者 李芳芳 罗四伟 +1 位作者 刘熙尧 邹北骥 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第6期1383-1389,共7页
It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in de... It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method. 展开更多
关键词 artificial pornographic image bag-of-words (BoW) speeded-up robust feature (SURF) descriptors visual vocabulary
在线阅读 下载PDF
Multi-source image fusion algorithm based on fast weighted guided filter 被引量:6
6
作者 WANG Jian YANG Ke +2 位作者 REN Ping QIN Chunxia ZHANG Xiufei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第5期831-840,共10页
In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Fi... In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Firstly,the source images are separated into a series of high and low frequency components.Secondly,three visual features of the source image are extracted to construct a decision graph model.Thirdly,a fast weighted guided filter is raised to optimize the result obtained in the previous step and reduce the time complexity by considering the correlation among neighboring pixels.Finally,the image obtained in the previous step is combined with the weight map to realize the image fusion.The proposed algorithm is applied to multi-focus,visible-infrared and multi-modal image respectively and the final results show that the algorithm effectively solves the halo artifacts of the merged images with higher efficiency,and is better than the traditional method considering subjective visual consequent and objective evaluation. 展开更多
关键词 FAST GUIDED FILTER image fusion visual feature DECISION map
在线阅读 下载PDF
Content-based retrieval based on binary vectors for 2-D medical images
7
作者 龚鹏 邹亚东 洪海 《吉林大学学报(信息科学版)》 CAS 2003年第S1期127-130,共4页
In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts... In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts. To facilitate the decision-making in the health-care and the related areas, in this paper, a two-step content-based medical image retrieval algorithm is proposed. Firstly, in the preprocessing step, the image segmentation is performed to distinguish image objects, and on the basis of the ... 展开更多
关键词 Content-based image retrieval Medical images feature space: Spatial relationship visual information retrieval
在线阅读 下载PDF
Historical Arabic Images Classification and Retrieval Using Siamese Deep Learning Model
8
作者 Manal M.Khayyat Lamiaa A.Elrefaei Mashael M.Khayyat 《Computers, Materials & Continua》 SCIE EI 2022年第7期2109-2125,共17页
Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of eff... Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of efforts trying to automate the classification operation and retrieve similar images accurately.To reach this goal,we developed a VGG19 deep convolutional neural network to extract the visual features from the images automatically.Then,the distances among the extracted features vectors are measured and a similarity score is generated using a Siamese deep neural network.The Siamese model built and trained at first from scratch but,it didn’t generated high evaluation metrices.Thus,we re-built it from VGG19 pre-trained deep learning model to generate higher evaluation metrices.Afterward,three different distance metrics combined with the Sigmoid activation function are experimented looking for the most accurate method formeasuring the similarities among the retrieved images.Reaching that the highest evaluation parameters generated using the Cosine distance metric.Moreover,the Graphics Processing Unit(GPU)utilized to run the code instead of running it on the Central Processing Unit(CPU).This step optimized the execution further since it expedited both the training and the retrieval time efficiently.After extensive experimentation,we reached satisfactory solution recording 0.98 and 0.99 F-score for the classification and for the retrieval,respectively. 展开更多
关键词 visual features vectors deep learning models distance methods similar image retrieval
在线阅读 下载PDF
Sparse representation of global features of visual images in human primary visual cortex: Evidence from fMRI 被引量:2
9
作者 ZHAO SongNian YAO Li +6 位作者 JIN Zhen XIONG XiaoYun WU Xia ZOU Qi YAO GuoZheng CAI XiaoHong LIU YiJun 《Chinese Science Bulletin》 SCIE EI CAS 2008年第14期2165-2174,共10页
In fMRI experiments on object representation in visual cortex, we designed two types of stimuli: one is the gray face image and its line drawing, and the other is the illusion and its corresponding completed illusion.... In fMRI experiments on object representation in visual cortex, we designed two types of stimuli: one is the gray face image and its line drawing, and the other is the illusion and its corresponding completed illusion. Both of them have the same global features with different minute details so that the results of fMRI experiments can be compared with each other. The first kind of visual stimuli was used in a block design fMRI experiment, and the second was used in an event-related fMRI experiment. Comparing and analyzing interesting visual cortex activity patterns and blood oxygenation level dependent (BOLD)- fMRI signal, we obtained results to show some invariance of global features of visual images. A plau- sible explanation about the invariant mechanism is related with the cooperation of synchronized re- sponse to the global features of the visual image with a feedback of shape perception from higher cortex to cortex V1, namely the integration of global features and embodiment of sparse representation and distributed population code. 展开更多
关键词 视皮层 视觉成像 fMRI表达 生物学
在线阅读 下载PDF
基于GLF-ViT算法的地面侦察机器人多标签图像分类
10
作者 杨成山 王明 +1 位作者 郭东兵 赵爱军 《火力与指挥控制》 北大核心 2026年第2期168-173,共6页
现有多标签图像分类算法在地面侦察机器人任务中面临复杂背景、高噪声干扰和目标间存在显著尺度差异等挑战,导致视觉特征提取效果受限。为此,提出一种基于ViT模型的全局-局部特征融合算法(GLF-ViT),通过自注意力机制筛选高响应区域增强... 现有多标签图像分类算法在地面侦察机器人任务中面临复杂背景、高噪声干扰和目标间存在显著尺度差异等挑战,导致视觉特征提取效果受限。为此,提出一种基于ViT模型的全局-局部特征融合算法(GLF-ViT),通过自注意力机制筛选高响应区域增强局部特征表达,并结合全局特征实现跨尺度协同建模。在PASCAL VOC2012数据集上的实验表明,GLF-ViT算法能够有效融合全局与局部特征,在视觉特征提取方面表现出一定的优越性。 展开更多
关键词 多标签图像分类 ViT模型 特征融合 自注意力机制 特征提取
在线阅读 下载PDF
一种应用于视觉导航的轻量级FPGA图像预处理加速器方案
11
作者 薛仁魁 张杰 +2 位作者 李斌 李萌 吴洋 《中国科学院大学学报(中英文)》 北大核心 2026年第2期277-287,共11页
针对视觉导航图像前端的加速处理需求,提出一种基于轻量级、低成本FPGA的图像预处理加速器方案。该方案通过高效的流水线设计以及并行处理技术集成直方图均衡化、FAST特征点检测及多源传感器数据时间同步等关键功能,解决了在有限硬件资... 针对视觉导航图像前端的加速处理需求,提出一种基于轻量级、低成本FPGA的图像预处理加速器方案。该方案通过高效的流水线设计以及并行处理技术集成直方图均衡化、FAST特征点检测及多源传感器数据时间同步等关键功能,解决了在有限硬件资源下实现多功能集成、满足实时性要求、平衡成本与性能、多源传感器信息时间同步,以及实现软硬件协同设计等技术难点。该方案基于Xilinx公司Zynq-7000系列轻量级FPGA实现,在实现低成本的同时大大降低了图像处理延迟。当FPGA以160 MHz的频率运行时,对于1280×720的图像可实现150帧/s的处理速度,提供了一种低成本、高性能的视觉导航图像前端加速解决方案。 展开更多
关键词 图像加速器 直方图均衡化 特征点提取 时间同步 视觉导航 现场可编程门阵列(FPGA)
在线阅读 下载PDF
人行横道处视觉道路环境整体特征分析
12
作者 任蔚溪 陈雨人 《交通与运输》 2026年第1期88-93,共6页
立足视觉感知分析人行横道处的道路环境特征,对提升行人安全具有重要意义。基于街景图像,从语义分割、色彩计算和深度估计3个维度自动提取视觉道路环境特征;采用层次聚类将人行横道处的视觉道路环境划分为自然主导低饱和度型、建筑主导... 立足视觉感知分析人行横道处的道路环境特征,对提升行人安全具有重要意义。基于街景图像,从语义分割、色彩计算和深度估计3个维度自动提取视觉道路环境特征;采用层次聚类将人行横道处的视觉道路环境划分为自然主导低饱和度型、建筑主导高视觉复杂度型及开阔且均衡型3种类型。结果表明:这3种类型对应的行人事故平均数分别为0.741、0.457与0.380,其中开阔且均衡型具有最高行人安全性;与单一视觉特征相比,视觉道路环境类别与行人事故的相关性更强,更能反映驾驶人与行人对场景的整体视觉感知。 展开更多
关键词 人行横道 视觉道路环境 视觉特征 层次聚类 街景图像
在线阅读 下载PDF
Brain functional network connectivity based on a visual task: visual information processing-related brain regions are significantly activated in the task state 被引量:2
13
作者 Yan-li Yang Hong-xia Deng +2 位作者 Gui-yang Xing Xiao-luan Xia Hai-fang Li 《Neural Regeneration Research》 SCIE CAS CSCD 2015年第2期298-307,共10页
It is not clear whether the method used in functional brain-network related research can be applied to explore the feature binding mechanism of visual perception. In this study, we inves-tigated feature binding of col... It is not clear whether the method used in functional brain-network related research can be applied to explore the feature binding mechanism of visual perception. In this study, we inves-tigated feature binding of color and shape in visual perception. Functional magnetic resonance imaging data were collected from 38 healthy volunteers at rest and while performing a visual perception task to construct brain networks active during resting and task states. Results showed that brain regions involved in visual information processing were obviously activated during the task. The components were partitioned using a greedy algorithm, indicating the visual network existed during the resting state.Z-values in the vision-related brain regions were calculated, conifrming the dynamic balance of the brain network. Connectivity between brain regions was determined, and the result showed that occipital and lingual gyri were stable brain regions in the visual system network, the parietal lobe played a very important role in the binding process of color features and shape features, and the fusiform and inferior temporal gyri were crucial for processing color and shape information. Experimental ifndings indicate that understanding visual feature binding and cognitive processes will help establish computational models of vision, improve image recognition technology, and provide a new theoretical mechanism for feature binding in visual perception. 展开更多
关键词 nerve regeneration functional magnetic resonance imaging resting state task state brain network module division feature binding Fisher’s Z transform CONNECTIVITY visual stimuli NSFC grants neural regeneration
在线阅读 下载PDF
EGSNet:An Efficient Glass Segmentation Network Based on Multi-Level Heterogeneous Architecture and Boundary Awareness
14
作者 Guojun Chen Tao Cui +1 位作者 Yongjie Hou Huihui Li 《Computers, Materials & Continua》 SCIE EI 2024年第12期3969-3987,共19页
Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-see... Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-seeking real-time tasks such as autonomous driving.The inefficiency of the models is mainly due to employing homogeneous modules to process features of different layers.These modules require computationally intensive convolutions and weight calculation branches with numerous parameters to accommodate the differences in information across layers.We propose an efficient glass segmentation network(EGSNet)based on multi-level heterogeneous architecture and boundary awareness to balance the model performance and efficiency.EGSNet divides the feature layers from different stages into low-level understanding,semantic-level understanding,and global understanding with boundary guidance.Based on the information differences among the different layers,we further propose the multi-angle collaborative enhancement(MCE)module,which extracts the detailed information from shallow features,and the large-scale contextual feature extraction(LCFE)module to understand semantic logic through deep features.The models are trained and evaluated on the glass segmentation datasets HSO(Home-Scene-Oriented)and Trans10k-stuff,respectively,and EGSNet achieves the best efficiency and performance compared to advanced methods.In the HSO test set results,the IoU,Fβ,MAE(Mean Absolute Error),and BER(Balance Error Rate)of EGSNet are 0.804,0.847,0.084,and 0.085,and the GFLOPs(Giga Floating Point Operations Per Second)are only 27.15.Experimental results show that EGSNet significantly improves the efficiency of the glass segmentation task with better performance. 展开更多
关键词 image segmentation multi-level heterogeneous architecture feature differences
在线阅读 下载PDF
Improved Blending Attention Mechanism in Visual Question Answering
15
作者 Siyu Lu Yueming Ding +4 位作者 Zhengtong Yin Mingzhe Liu Xuan Liu Wenfeng Zheng Lirong Yin 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期1149-1161,共13页
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach... Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network. 展开更多
关键词 visual question answering spatial attention mechanism channel attention mechanism image feature processing text feature extraction
在线阅读 下载PDF
边缘感知增强的煤矿井下视觉SLAM方法 被引量:4
16
作者 牟琦 梁鑫 +2 位作者 郭媛婕 王煜豪 李占利 《煤田地质与勘探》 北大核心 2025年第3期231-242,共12页
【目的】煤矿井下普遍存在低照度、弱纹理和结构化的特征退化场景,导致视觉SLAM(visual simultaneous localization and mapping)系统面临有效特征不足或误匹配率高的问题,严重制约了其定位的准确性和鲁棒性。【方法】提出一种基于边缘... 【目的】煤矿井下普遍存在低照度、弱纹理和结构化的特征退化场景,导致视觉SLAM(visual simultaneous localization and mapping)系统面临有效特征不足或误匹配率高的问题,严重制约了其定位的准确性和鲁棒性。【方法】提出一种基于边缘感知增强的视觉SLAM方法。首先,构建了边缘感知约束的低光图像增强模块。通过自适应尺度的梯度域引导滤波器优化Retinex算法,以获得纹理清晰光照均匀的图像,从而显著提升了在低光照和不均匀光照条件下特征提取性能。其次,在视觉里程计中构建了边缘感知增强的特征提取和匹配模块,通过点线特征融合策略有效增强了弱纹理和结构化场景中特征的可检测性和匹配准确性。具体使用边缘绘制线特征提取算法(edge drawing lines,EDLines)提取线特征,定向FAST和旋转BRIEF点特征提取算法(oriented fast and rotated brief,ORB)提取点特征,并利用基于网格运动统计(grid-based motion statistics,GMS)和比值测试匹配算法进行精确匹配。最后,将该方法与ORB-SLAM2、ORB-SLAM3在TUM数据集和煤矿井下实景数据集上进行了全面实验验证,涵盖图像增强、特征匹配和定位等多个环节。【结果和结论】结果表明:(1)在TUM数据集上的测试结果显示,所提方法与ORB-SLAM2相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了4%~38.46%、8.62%~50%;与ORB-SLAM3相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了0~61.68%、3.63%~47.05%。(2)在煤矿井下实景实验中,所提方法的定位轨迹更接近于相机运动参考轨迹。(3)有效提高了视觉SLAM在煤矿井下特征退化场景中的准确性和鲁棒性,为视觉SLAM技术在煤矿井下的应用提供了技术解决方案。研究面向井下特征退化场景的视觉SLAM方法,对于推动煤矿井下移动式装备机器人化具有重要意义。 展开更多
关键词 视觉SLAM 特征退化 边缘感知 图像增强 点线特征融合 TUM数据集
在线阅读 下载PDF
煤矿井下锚网特征掘进机视觉定位方法 被引量:1
17
作者 张旭辉 迟云凯 +6 位作者 杜昱阳 姜俊英 杨文娟 赵友军 万继成 王彦群 田琛辉 《煤田地质与勘探》 北大核心 2025年第6期259-270,共12页
【背景】煤矿井下掘进装备精确定位是实现综掘工作面自动化、智能化导控的重要基础。但因井下巷道狭长封闭、光照不足、纹理稀疏等因素,传统的视觉定位方法应用受限,基于此提出一种基于锚网特征的煤矿井下掘进机视觉定位方法。【方法】... 【背景】煤矿井下掘进装备精确定位是实现综掘工作面自动化、智能化导控的重要基础。但因井下巷道狭长封闭、光照不足、纹理稀疏等因素,传统的视觉定位方法应用受限,基于此提出一种基于锚网特征的煤矿井下掘进机视觉定位方法。【方法】采用三分支深度可分离卷积的图像增强网络,分别估计图像的反射、光照和噪声,在调整光照分量的同时抑制噪声的影响,得到了光照均匀、纹理清晰的图像,提升了视觉定位系统在复杂光照条件下的适应性;设计了适用于锚网线特征提取与匹配的方法,通过自适应阈值的EDLines(edge drawing lines)增强了对锚网线特征的提取能力,并利用结构相似度(structure similarity index measure,SSIM)提高了线特征的匹配的准确性;构建了最小化线特征重投影误差的位姿解算模型,结合位姿图优化,实现了掘进机的精确定位。搭建实验平台,对图像增强、线特征处理以及定位性能分别设计实验进行定量分析。【结果和结论】TSCRNET图像增强方法相较于MSRCR和Zero-DCE取得了更高的PSNR值与SSIM值;线特征处理方法相对于传统算法提取特征数量与匹配精度显著提高,为后续定位过程奠定了基础;定位实验部分,在EuRoC数据集以及实际巷道场景中将TSCR-NET算法与其它基于线特征的视觉定位方法进行对比,该算法在EuRoC数据集的9个数据序列中表现优于PL-VINS算法,在60 m范围内的巷道锚网环境中对机身进行连续跟踪,观测到该视觉定位方法最大误差为163 mm,与PL-VINS的最大误差213 mm相比,降低了23.5%,均方根误差由0.531降低至0.426,降低了19.8%,可见TSCR-NET算法具有更高的精度与稳定性,对掘进机在井下巷道锚网环境中的长距离位姿检测具有重要借鉴作用。 展开更多
关键词 掘进机 视觉定位 图像增强 线特征提取与匹配 运动估计 锚网特征 煤矿
在线阅读 下载PDF
基于扩展图像特征的无标定视觉伺服方法
18
作者 张淑珍 成煜坤 +1 位作者 刘杨波 查富生 《系统仿真学报》 北大核心 2025年第5期1210-1221,共12页
针对传统无标定视觉伺服依赖图像雅可比矩阵的估计、相机各自由度运动耦合的问题,在基于图像的无标定视觉伺服的基础上,提出了一种基于扩展图像特征的无标定视觉伺服方法。通过分析视觉伺服过程中图像特征和相机位姿变化关系,将图像空... 针对传统无标定视觉伺服依赖图像雅可比矩阵的估计、相机各自由度运动耦合的问题,在基于图像的无标定视觉伺服的基础上,提出了一种基于扩展图像特征的无标定视觉伺服方法。通过分析视觉伺服过程中图像特征和相机位姿变化关系,将图像空间中视觉伺服过程分解为平移、拉伸、旋转、缩放四个基本过程;通过分析视觉伺服过程中图像特征变化规律,采用扩展图像特征补充传统图像特征的含义,以图像重心坐标、直线相对长度、两点距离、方向角等作为扩展图像特征与相机各自由度运动对应,通过图像特征误差直接控制机器人运动,实现不依赖图像雅可比矩阵的、解耦的视觉伺服。在CoppeliaSim平台进行对比仿真实验,结果表明本研究所提的方法与传统有标定视觉伺服在目标图像位置误差、相机位置误差和姿态误差相比分别降低了88%、94%和93%,并利用实物实验验证了本算法的有效性。 展开更多
关键词 机器人 无标定视觉伺服 扩展图像特征 特征选择 运动解耦
原文传递
基于双分支特征聚合的无人机视觉位置识别
19
作者 刘奇 裴智翔 +2 位作者 惠乐 何明一 戴玉超 《航空学报》 北大核心 2025年第23期119-130,共12页
无人机依赖全球导航卫星系统(GNSS)进行导航定位容易受到信号阻挡或干扰造成失效。视觉位置识别(VPR)通过将无人机捕获的视觉信息、预先构建的地图数据进行匹配实现地理定位,能够在GNSS信号拒止环境下提供可靠的定位信息,因此成为近年... 无人机依赖全球导航卫星系统(GNSS)进行导航定位容易受到信号阻挡或干扰造成失效。视觉位置识别(VPR)通过将无人机捕获的视觉信息、预先构建的地图数据进行匹配实现地理定位,能够在GNSS信号拒止环境下提供可靠的定位信息,因此成为近年来的研究热点。传统VPR方法依赖预训练网络提取用于匹配、检索的全局特征,通常对视角、尺度、光照等视觉外观变化敏感,并且容易丢失细粒度信息。为此,提出了一种基于双分支特征聚合网络的无人机视觉地理位置识别方法,结合了预训练的视觉Transformer模型、状态空间模型以提取更加鲁棒的特征。具体来说,设计了一个集成了DINOv2、VMamba模型的双分支特征提取网络,通过结合ViT的全局语义理解、视觉状态空间模型的局部动态建模能力,实现更强的泛化、细节感知能力。此外,引入了一个受MLP-Mixer架构启发的高效特征融合框架,以增强多通道特征表示的性能。在同视角的ALTO数据集、跨视角的VIGOR数据集上进行的实验表明,所提出的方法在诸如返回前1、5个结果中的召回率指标上具有较高的准确性,且优于现有方法,无论是在同一视角还是跨视角的场景中,都能够更有效地识别出匹配图像。 展开更多
关键词 无人机视觉位置识别 视觉匹配定位 状态空间模型 双分支特征提取 图像检索
原文传递
基于环境光感知和红外特征分层引导的图像融合网络
20
作者 王爱侠 胡傲杰 +3 位作者 闫爱云 高尚 金硕巍 庞永恒 《控制与决策》 北大核心 2025年第10期3177-3189,共13页
在红外图像中,目标物体的突出显示与可见光图像中丰富的纹理细节相结合,可以有效地增强融合图像的信息熵,从而为夜间智能驾驶等下游视觉任务提供重要支持.然而,现有的主流融合算法对于可见光图像在恶劣光照夜间道路环境下的信息熵低与... 在红外图像中,目标物体的突出显示与可见光图像中丰富的纹理细节相结合,可以有效地增强融合图像的信息熵,从而为夜间智能驾驶等下游视觉任务提供重要支持.然而,现有的主流融合算法对于可见光图像在恶劣光照夜间道路环境下的信息熵低与像素强度高之间的矛盾,尚缺乏针对性的研究.因此,在正常环境下表现良好的融合算法,在强光干扰下只能生成与可见光图像相似、信息熵较低的融合图像.对此,提出一种能够抵抗恶劣光照环境干扰的图像融合网络,结合信息熵和信息论原理,增强图像融合的鲁棒性和信息保留能力.首先,设计一个在正常光照条件下具备高鲁棒性和优异性能的图像融合网络,在该融合网络的基础上设计一个环境光感知模块,以便在极端光照条件下对低信息熵的可见光图像的特征权重进行分析.然后,设计一个红外边缘特征分层引导融合模块,以充分提取红外图像中的有效特征信息.实验结果表明,该融合网络能够在夜间恶劣光照条件下充分利用可见光和红外图像的特征信息,显著提高这种情况下融合图像的质量.与其他主流算法相比,所提出方法生成的融合结果包含了更丰富和更有效的信息. 展开更多
关键词 图像融合 夜间恶劣光照感知 红外特征挖掘 高级视觉任务
原文传递
上一页 1 2 27 下一页 到第
使用帮助 返回顶部