期刊文献+
共找到524篇文章
< 1 2 27 >
每页显示 20 50 100
A Concise and Varied Visual Features-Based Image Captioning Model with Visual Selection
1
作者 Alaa Thobhani Beiji Zou +4 位作者 Xiaoyan Kui Amr Abdussalam Muhammad Asim Naveed Ahmed Mohammed Ali Alshara 《Computers, Materials & Continua》 SCIE EI 2024年第11期2873-2894,共22页
Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms... Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024). 展开更多
关键词 visual attention image captioning visual feature detector visual feature visual attention
在线阅读 下载PDF
High-dimensional features of adaptive superpixels for visually degraded images 被引量:1
2
作者 LIAO Feng-feng CAO Ke-ye +1 位作者 ZHANG Yu-xiang LIU Sheng 《Optoelectronics Letters》 EI 2019年第3期231-235,共5页
This study presents a novel and highly efficient superpixel algorithm, namely, depth-fused adaptive superpixel(DFASP), which can generate accurate superpixels in a degraded image. In many applications, particularly in... This study presents a novel and highly efficient superpixel algorithm, namely, depth-fused adaptive superpixel(DFASP), which can generate accurate superpixels in a degraded image. In many applications, particularly in actual scenes, vision degradation, such as motion blur, overexposure, and underexposure, often occurs. Well-known color-based superpixel algorithms are incapable of producing accurate superpixels in degraded images because of the ambiguity of color information caused by vision degradation. To eliminate this ambiguity, we use depth and color information to generate superpixels. We map the depth and color information to a high-dimensional feature space. Then, we develop a fast multilevel clustering algorithm to produce superpixels. Furthermore, we design an adaptive mechanism to adjust the color and depth information automatically during pixel clustering. Experimental results demonstrate that regardless of boundary recall, under segmentation error, run time, or achievable segmentation accuracy, DFASP is better than state-of-the-art superpixel methods. 展开更多
关键词 HIGH-DIMENSIONAL featureS visually degraded imageS
原文传递
A Visual Indoor Localization Method Based on Efficient Image Retrieval
3
作者 Mengyan Lyu Xinxin Guo +1 位作者 Kunpeng Zhang Liye Zhang 《Journal of Computer and Communications》 2024年第2期47-66,共20页
The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor l... The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor localization technologies generally used scene-specific 3D representations or were trained on specific datasets, making it challenging to balance accuracy and cost when applied to new scenes. Addressing this issue, this paper proposed a universal indoor visual localization method based on efficient image retrieval. Initially, a Multi-Layer Perceptron (MLP) was employed to aggregate features from intermediate layers of a convolutional neural network, obtaining a global representation of the image. This approach ensured accurate and rapid retrieval of reference images. Subsequently, a new mechanism using Random Sample Consensus (RANSAC) was designed to resolve relative pose ambiguity caused by the essential matrix decomposition based on the five-point method. Finally, the absolute pose of the queried user image was computed, thereby achieving indoor user pose estimation. The proposed indoor localization method was characterized by its simplicity, flexibility, and excellent cross-scene generalization. Experimental results demonstrated a positioning error of 0.09 m and 2.14° on the 7Scenes dataset, and 0.15 m and 6.37° on the 12Scenes dataset. These results convincingly illustrated the outstanding performance of the proposed indoor localization method. 展开更多
关键词 visual Indoor Positioning feature Point Matching image Retrieval Position Calculation Five-Point Method
在线阅读 下载PDF
Structured Computational Modeling of Human Visual System for No-reference Image Quality Assessment
4
作者 Wen-Han Zhu Wei Sun +2 位作者 Xiong-Kuo Min Guang-Tao Zhai Xiao-Kang Yang 《International Journal of Automation and computing》 EI CSCD 2021年第2期204-218,共15页
Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate eval... Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate evaluator for visual experience,thus the modeling of human visual system(HVS)is a core issue for objective IQA and visual experience optimization.The traditional model based on black box fitting has low interpretability and it is difficult to guide the experience optimization effectively,while the model based on physiological simulation is hard to integrate into practical visual communication services due to its high computational complexity.For bridging the gap between signal distortion and visual experience,in this paper,we propose a novel perceptual no-reference(NR)IQA algorithm based on structural computational modeling of HVS.According to the mechanism of the human brain,we divide the visual signal processing into a low-level visual layer,a middle-level visual layer and a high-level visual layer,which conduct pixel information processing,primitive information processing and global image information processing,respectively.The natural scene statistics(NSS)based features,deep features and free-energy based features are extracted from these three layers.The support vector regression(SVR)is employed to aggregate features to the final quality prediction.Extensive experimental comparisons on three widely used benchmark IQA databases(LIVE,CSIQ and TID2013)demonstrate that our proposed metric is highly competitive with or outperforms the state-of-the-art NR IQA measures. 展开更多
关键词 image quality assessment(IQA) no-reference(NR) structural computational modeling human visual system visual feature extraction
原文传递
Bag-of-visual-words model for artificial pornographic images recognition
5
作者 李芳芳 罗四伟 +1 位作者 刘熙尧 邹北骥 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第6期1383-1389,共7页
It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in de... It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method. 展开更多
关键词 artificial pornographic image bag-of-words (BoW) speeded-up robust feature (SURF) descriptors visual vocabulary
在线阅读 下载PDF
Multi-source image fusion algorithm based on fast weighted guided filter 被引量:6
6
作者 WANG Jian YANG Ke +2 位作者 REN Ping QIN Chunxia ZHANG Xiufei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第5期831-840,共10页
In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Fi... In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Firstly,the source images are separated into a series of high and low frequency components.Secondly,three visual features of the source image are extracted to construct a decision graph model.Thirdly,a fast weighted guided filter is raised to optimize the result obtained in the previous step and reduce the time complexity by considering the correlation among neighboring pixels.Finally,the image obtained in the previous step is combined with the weight map to realize the image fusion.The proposed algorithm is applied to multi-focus,visible-infrared and multi-modal image respectively and the final results show that the algorithm effectively solves the halo artifacts of the merged images with higher efficiency,and is better than the traditional method considering subjective visual consequent and objective evaluation. 展开更多
关键词 FAST GUIDED FILTER image fusion visual feature DECISION map
在线阅读 下载PDF
Content-based retrieval based on binary vectors for 2-D medical images
7
作者 龚鹏 邹亚东 洪海 《吉林大学学报(信息科学版)》 CAS 2003年第S1期127-130,共4页
In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts... In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts. To facilitate the decision-making in the health-care and the related areas, in this paper, a two-step content-based medical image retrieval algorithm is proposed. Firstly, in the preprocessing step, the image segmentation is performed to distinguish image objects, and on the basis of the ... 展开更多
关键词 Content-based image retrieval Medical images feature space: Spatial relationship visual information retrieval
在线阅读 下载PDF
Historical Arabic Images Classification and Retrieval Using Siamese Deep Learning Model
8
作者 Manal M.Khayyat Lamiaa A.Elrefaei Mashael M.Khayyat 《Computers, Materials & Continua》 SCIE EI 2022年第7期2109-2125,共17页
Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of eff... Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of efforts trying to automate the classification operation and retrieve similar images accurately.To reach this goal,we developed a VGG19 deep convolutional neural network to extract the visual features from the images automatically.Then,the distances among the extracted features vectors are measured and a similarity score is generated using a Siamese deep neural network.The Siamese model built and trained at first from scratch but,it didn’t generated high evaluation metrices.Thus,we re-built it from VGG19 pre-trained deep learning model to generate higher evaluation metrices.Afterward,three different distance metrics combined with the Sigmoid activation function are experimented looking for the most accurate method formeasuring the similarities among the retrieved images.Reaching that the highest evaluation parameters generated using the Cosine distance metric.Moreover,the Graphics Processing Unit(GPU)utilized to run the code instead of running it on the Central Processing Unit(CPU).This step optimized the execution further since it expedited both the training and the retrieval time efficiently.After extensive experimentation,we reached satisfactory solution recording 0.98 and 0.99 F-score for the classification and for the retrieval,respectively. 展开更多
关键词 visual features vectors deep learning models distance methods similar image retrieval
在线阅读 下载PDF
EGSNet:An Efficient Glass Segmentation Network Based on Multi-Level Heterogeneous Architecture and Boundary Awareness
9
作者 Guojun Chen Tao Cui +1 位作者 Yongjie Hou Huihui Li 《Computers, Materials & Continua》 SCIE EI 2024年第12期3969-3987,共19页
Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-see... Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-seeking real-time tasks such as autonomous driving.The inefficiency of the models is mainly due to employing homogeneous modules to process features of different layers.These modules require computationally intensive convolutions and weight calculation branches with numerous parameters to accommodate the differences in information across layers.We propose an efficient glass segmentation network(EGSNet)based on multi-level heterogeneous architecture and boundary awareness to balance the model performance and efficiency.EGSNet divides the feature layers from different stages into low-level understanding,semantic-level understanding,and global understanding with boundary guidance.Based on the information differences among the different layers,we further propose the multi-angle collaborative enhancement(MCE)module,which extracts the detailed information from shallow features,and the large-scale contextual feature extraction(LCFE)module to understand semantic logic through deep features.The models are trained and evaluated on the glass segmentation datasets HSO(Home-Scene-Oriented)and Trans10k-stuff,respectively,and EGSNet achieves the best efficiency and performance compared to advanced methods.In the HSO test set results,the IoU,Fβ,MAE(Mean Absolute Error),and BER(Balance Error Rate)of EGSNet are 0.804,0.847,0.084,and 0.085,and the GFLOPs(Giga Floating Point Operations Per Second)are only 27.15.Experimental results show that EGSNet significantly improves the efficiency of the glass segmentation task with better performance. 展开更多
关键词 image segmentation multi-level heterogeneous architecture feature differences
在线阅读 下载PDF
边缘感知增强的煤矿井下视觉SLAM方法 被引量:2
10
作者 牟琦 梁鑫 +2 位作者 郭媛婕 王煜豪 李占利 《煤田地质与勘探》 北大核心 2025年第3期231-242,共12页
【目的】煤矿井下普遍存在低照度、弱纹理和结构化的特征退化场景,导致视觉SLAM(visual simultaneous localization and mapping)系统面临有效特征不足或误匹配率高的问题,严重制约了其定位的准确性和鲁棒性。【方法】提出一种基于边缘... 【目的】煤矿井下普遍存在低照度、弱纹理和结构化的特征退化场景,导致视觉SLAM(visual simultaneous localization and mapping)系统面临有效特征不足或误匹配率高的问题,严重制约了其定位的准确性和鲁棒性。【方法】提出一种基于边缘感知增强的视觉SLAM方法。首先,构建了边缘感知约束的低光图像增强模块。通过自适应尺度的梯度域引导滤波器优化Retinex算法,以获得纹理清晰光照均匀的图像,从而显著提升了在低光照和不均匀光照条件下特征提取性能。其次,在视觉里程计中构建了边缘感知增强的特征提取和匹配模块,通过点线特征融合策略有效增强了弱纹理和结构化场景中特征的可检测性和匹配准确性。具体使用边缘绘制线特征提取算法(edge drawing lines,EDLines)提取线特征,定向FAST和旋转BRIEF点特征提取算法(oriented fast and rotated brief,ORB)提取点特征,并利用基于网格运动统计(grid-based motion statistics,GMS)和比值测试匹配算法进行精确匹配。最后,将该方法与ORB-SLAM2、ORB-SLAM3在TUM数据集和煤矿井下实景数据集上进行了全面实验验证,涵盖图像增强、特征匹配和定位等多个环节。【结果和结论】结果表明:(1)在TUM数据集上的测试结果显示,所提方法与ORB-SLAM2相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了4%~38.46%、8.62%~50%;与ORB-SLAM3相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了0~61.68%、3.63%~47.05%。(2)在煤矿井下实景实验中,所提方法的定位轨迹更接近于相机运动参考轨迹。(3)有效提高了视觉SLAM在煤矿井下特征退化场景中的准确性和鲁棒性,为视觉SLAM技术在煤矿井下的应用提供了技术解决方案。研究面向井下特征退化场景的视觉SLAM方法,对于推动煤矿井下移动式装备机器人化具有重要意义。 展开更多
关键词 视觉SLAM 特征退化 边缘感知 图像增强 点线特征融合 TUM数据集
在线阅读 下载PDF
煤矿井下锚网特征掘进机视觉定位方法
11
作者 张旭辉 迟云凯 +6 位作者 杜昱阳 姜俊英 杨文娟 赵友军 万继成 王彦群 田琛辉 《煤田地质与勘探》 北大核心 2025年第6期259-270,共12页
【背景】煤矿井下掘进装备精确定位是实现综掘工作面自动化、智能化导控的重要基础。但因井下巷道狭长封闭、光照不足、纹理稀疏等因素,传统的视觉定位方法应用受限,基于此提出一种基于锚网特征的煤矿井下掘进机视觉定位方法。【方法】... 【背景】煤矿井下掘进装备精确定位是实现综掘工作面自动化、智能化导控的重要基础。但因井下巷道狭长封闭、光照不足、纹理稀疏等因素,传统的视觉定位方法应用受限,基于此提出一种基于锚网特征的煤矿井下掘进机视觉定位方法。【方法】采用三分支深度可分离卷积的图像增强网络,分别估计图像的反射、光照和噪声,在调整光照分量的同时抑制噪声的影响,得到了光照均匀、纹理清晰的图像,提升了视觉定位系统在复杂光照条件下的适应性;设计了适用于锚网线特征提取与匹配的方法,通过自适应阈值的EDLines(edge drawing lines)增强了对锚网线特征的提取能力,并利用结构相似度(structure similarity index measure,SSIM)提高了线特征的匹配的准确性;构建了最小化线特征重投影误差的位姿解算模型,结合位姿图优化,实现了掘进机的精确定位。搭建实验平台,对图像增强、线特征处理以及定位性能分别设计实验进行定量分析。【结果和结论】TSCRNET图像增强方法相较于MSRCR和Zero-DCE取得了更高的PSNR值与SSIM值;线特征处理方法相对于传统算法提取特征数量与匹配精度显著提高,为后续定位过程奠定了基础;定位实验部分,在EuRoC数据集以及实际巷道场景中将TSCR-NET算法与其它基于线特征的视觉定位方法进行对比,该算法在EuRoC数据集的9个数据序列中表现优于PL-VINS算法,在60 m范围内的巷道锚网环境中对机身进行连续跟踪,观测到该视觉定位方法最大误差为163 mm,与PL-VINS的最大误差213 mm相比,降低了23.5%,均方根误差由0.531降低至0.426,降低了19.8%,可见TSCR-NET算法具有更高的精度与稳定性,对掘进机在井下巷道锚网环境中的长距离位姿检测具有重要借鉴作用。 展开更多
关键词 掘进机 视觉定位 图像增强 线特征提取与匹配 运动估计 锚网特征 煤矿
在线阅读 下载PDF
基于扩展图像特征的无标定视觉伺服方法
12
作者 张淑珍 成煜坤 +1 位作者 刘杨波 查富生 《系统仿真学报》 北大核心 2025年第5期1210-1221,共12页
针对传统无标定视觉伺服依赖图像雅可比矩阵的估计、相机各自由度运动耦合的问题,在基于图像的无标定视觉伺服的基础上,提出了一种基于扩展图像特征的无标定视觉伺服方法。通过分析视觉伺服过程中图像特征和相机位姿变化关系,将图像空... 针对传统无标定视觉伺服依赖图像雅可比矩阵的估计、相机各自由度运动耦合的问题,在基于图像的无标定视觉伺服的基础上,提出了一种基于扩展图像特征的无标定视觉伺服方法。通过分析视觉伺服过程中图像特征和相机位姿变化关系,将图像空间中视觉伺服过程分解为平移、拉伸、旋转、缩放四个基本过程;通过分析视觉伺服过程中图像特征变化规律,采用扩展图像特征补充传统图像特征的含义,以图像重心坐标、直线相对长度、两点距离、方向角等作为扩展图像特征与相机各自由度运动对应,通过图像特征误差直接控制机器人运动,实现不依赖图像雅可比矩阵的、解耦的视觉伺服。在CoppeliaSim平台进行对比仿真实验,结果表明本研究所提的方法与传统有标定视觉伺服在目标图像位置误差、相机位置误差和姿态误差相比分别降低了88%、94%和93%,并利用实物实验验证了本算法的有效性。 展开更多
关键词 机器人 无标定视觉伺服 扩展图像特征 特征选择 运动解耦
原文传递
基于环境光感知和红外特征分层引导的图像融合网络
13
作者 王爱侠 胡傲杰 +3 位作者 闫爱云 高尚 金硕巍 庞永恒 《控制与决策》 北大核心 2025年第10期3177-3189,共13页
在红外图像中,目标物体的突出显示与可见光图像中丰富的纹理细节相结合,可以有效地增强融合图像的信息熵,从而为夜间智能驾驶等下游视觉任务提供重要支持.然而,现有的主流融合算法对于可见光图像在恶劣光照夜间道路环境下的信息熵低与... 在红外图像中,目标物体的突出显示与可见光图像中丰富的纹理细节相结合,可以有效地增强融合图像的信息熵,从而为夜间智能驾驶等下游视觉任务提供重要支持.然而,现有的主流融合算法对于可见光图像在恶劣光照夜间道路环境下的信息熵低与像素强度高之间的矛盾,尚缺乏针对性的研究.因此,在正常环境下表现良好的融合算法,在强光干扰下只能生成与可见光图像相似、信息熵较低的融合图像.对此,提出一种能够抵抗恶劣光照环境干扰的图像融合网络,结合信息熵和信息论原理,增强图像融合的鲁棒性和信息保留能力.首先,设计一个在正常光照条件下具备高鲁棒性和优异性能的图像融合网络,在该融合网络的基础上设计一个环境光感知模块,以便在极端光照条件下对低信息熵的可见光图像的特征权重进行分析.然后,设计一个红外边缘特征分层引导融合模块,以充分提取红外图像中的有效特征信息.实验结果表明,该融合网络能够在夜间恶劣光照条件下充分利用可见光和红外图像的特征信息,显著提高这种情况下融合图像的质量.与其他主流算法相比,所提出方法生成的融合结果包含了更丰富和更有效的信息. 展开更多
关键词 图像融合 夜间恶劣光照感知 红外特征挖掘 高级视觉任务
原文传递
三维激光影视特效图像视觉传达方法研究
14
作者 孙红娟 《激光杂志》 北大核心 2025年第8期146-151,共6页
针对现阶段影视特效图像视觉传达效果不佳等问题,提出三维激光影视特效图像视觉传达方法研究。首先,利用三维激光扫描设备,采集影视特效场景的三维激光点云数据;其次,通过引入邻域投影法,选择场景三维激光点云数据的特征点;最后,根据选... 针对现阶段影视特效图像视觉传达效果不佳等问题,提出三维激光影视特效图像视觉传达方法研究。首先,利用三维激光扫描设备,采集影视特效场景的三维激光点云数据;其次,通过引入邻域投影法,选择场景三维激光点云数据的特征点;最后,根据选择的点云数据特征点,通过特征匹配方法,成功地将三维激光点云数据转换为高质量的二维影视特效图像,实现三维激光影视特效图像视觉传达。实验结果表明,提出的三维激光影视特效图像视觉传达方法的精度更高、实际应用效果更好。 展开更多
关键词 三维激光点云数据 邻域投影法 特征点匹配 视觉传达 影视特效图像
原文传递
Sparse representation of global features of visual images in human primary visual cortex: Evidence from fMRI 被引量:2
15
作者 ZHAO SongNian YAO Li +6 位作者 JIN Zhen XIONG XiaoYun WU Xia ZOU Qi YAO GuoZheng CAI XiaoHong LIU YiJun 《Chinese Science Bulletin》 SCIE EI CAS 2008年第14期2165-2174,共10页
In fMRI experiments on object representation in visual cortex, we designed two types of stimuli: one is the gray face image and its line drawing, and the other is the illusion and its corresponding completed illusion.... In fMRI experiments on object representation in visual cortex, we designed two types of stimuli: one is the gray face image and its line drawing, and the other is the illusion and its corresponding completed illusion. Both of them have the same global features with different minute details so that the results of fMRI experiments can be compared with each other. The first kind of visual stimuli was used in a block design fMRI experiment, and the second was used in an event-related fMRI experiment. Comparing and analyzing interesting visual cortex activity patterns and blood oxygenation level dependent (BOLD)- fMRI signal, we obtained results to show some invariance of global features of visual images. A plau- sible explanation about the invariant mechanism is related with the cooperation of synchronized re- sponse to the global features of the visual image with a feedback of shape perception from higher cortex to cortex V1, namely the integration of global features and embodiment of sparse representation and distributed population code. 展开更多
关键词 视皮层 视觉成像 fMRI表达 生物学
在线阅读 下载PDF
街景图像视觉位置识别技术研究综述
16
作者 张暖 王涛 +3 位作者 张艳 魏毅博 李镏文 刘熠晨 《地球信息科学学报》 北大核心 2025年第8期1751-1779,共29页
【意义】街景图像视觉位置识别(Street View Image-based Visual Place Recognition,SV-VPR)是一种基于视觉特征信息的地理位置识别技术,其核心任务是通过分析街景图像的视觉特征,实现对未知地点的地理位置预测和精确定位。该技术需要... 【意义】街景图像视觉位置识别(Street View Image-based Visual Place Recognition,SV-VPR)是一种基于视觉特征信息的地理位置识别技术,其核心任务是通过分析街景图像的视觉特征,实现对未知地点的地理位置预测和精确定位。该技术需要克服不同环境条件下的外观变化(如昼夜光照差异、季节更替特征演变等)和视点差异(如车载相机与卫星图像的视角偏差),并通过计算图像特征相似性、几何约束等条件来实现精准识别。作为计算机视觉与地理信息科学的交叉领域,SV-VPR与视觉定位、图像检索、SLAM等技术密切相关,在无人机自主导航、自动驾驶高精度定位、网络空间地理围栏构建、增强现实场景融合等领域具有重要应用价值,特别是在GPS信号缺失场景下展现出独特的定位优势。【分析】本文系统综述了街景图像视觉位置识别技术的研究进展,主要包含以下内容:首先,阐述了图像视觉位置识别技术的基础概念与分类,深入探讨了街景图像视觉位置识别技术的基础概念与分类方法;其次,详细分析了该领域的关键技术研究;此外,全面梳理了街景图像视觉位置识别技术相关的数据集资源;同时,梳理了该技术的评价方法与指标体系;最后,对街景图像视觉位置识别技术的未来研究方向进行了展望。【目的】通过本综述,旨在为相关研究者提供系统化的技术发展脉络梳理,帮助快速把握领域研究现状;关键技术与评估方法的对比分析,为算法选型提供决策依据;前沿挑战与潜在突破方向的预判,启发创新性研究思路。 展开更多
关键词 街景图像 视觉位置识别 视觉特征 位置预测 精确定位 视觉定位 图像检索
原文传递
DINO-MSRA:用于无人机与卫星影像跨视角图像检索定位的新型网络架构
17
作者 平一凡 卢俊 +4 位作者 郭海涛 侯青峰 朱坤 桑泽豪 刘彤 《地球信息科学学报》 北大核心 2025年第7期1608-1623,共16页
【目的】跨视角图像地理定位是指通过将待查询影像与不同视角且具备精确位置信息的参考影像进行匹配从而推断其地理位置的一门技术。该技术已经广泛应用于无人机导航、目标定位等现实任务中。当前基于深度学习的无人机-卫星跨视角图像... 【目的】跨视角图像地理定位是指通过将待查询影像与不同视角且具备精确位置信息的参考影像进行匹配从而推断其地理位置的一门技术。该技术已经广泛应用于无人机导航、目标定位等现实任务中。当前基于深度学习的无人机-卫星跨视角图像检索定位方法大多依赖监督学习,但高质量标注样本的稀缺导致监督学习模型的泛化能力受限。同时,由于现有方法对空间布局特征的建模缺失,使得跨视角影像间的显著域差异难以弥补。【方法】针对上述问题,本文提出了一个基于无人机-卫星影像的跨视角图像检索定位新架构——DINO-MSRA,该架构首先利用经Conv-LoRA微调后的Dinov2大模型作为特征编码器,旨在利用较少的参数量增强模型的特征提取能力。其次,设计了一个基于Mamba模块的空间关系感知特征聚合器(MSRA)用于聚合图像特征,通过将空间配置特征嵌入到全局描述符中,为跨视角匹配定位任务带来了显著的性能增益。最后,采用InfoNCE损失函数对模型进行训练。【结果】本文在Univerisity-1652和SUES-200数据集上进行了大量对比实验和消融实验,实验结果表明,当分别面向无人机定位任务和无人机导航任务时,本文方法在Univeirity-1652数据集上的R@1精度达到95.14%、97.29%,相比于目前最优算法CAMP分别提升0.68%、1.14%;在SUES-200数据集上150 m高度的R@1精度分别达到97.2%、98.75%,相较于CAMP提升1.8%、2.5%,并且所需参数量也明显少于现有算法,仅为Sample4Geo的19.2%;【结论】DINO-MSRA在跨视角图像匹配方面优于目前最先进的方法,实现了更高的精度,更快的推理速度,证明了其在具有挑战性的场景中的鲁棒性和实际应用潜力。 展开更多
关键词 跨视角图像定位 视觉基础模型 微调 特征聚合 无人机影像 卫星影像
原文传递
融合视觉传达的模糊激光图像纹理特征提取技术
18
作者 徐璐 武文英 黄泽军 《激光杂志》 北大核心 2025年第4期109-114,共6页
由于散射、衍射等物理效应或成像设备限制,模糊激光图像的纹理边界不清晰,且其纹理具有多种尺度和方向性,全面且有效地提取纹理特征变得尤为困难。通过视觉传达融合能够结合不同视觉特征的优势,从而更全面地描述模糊激光图像的纹理信息... 由于散射、衍射等物理效应或成像设备限制,模糊激光图像的纹理边界不清晰,且其纹理具有多种尺度和方向性,全面且有效地提取纹理特征变得尤为困难。通过视觉传达融合能够结合不同视觉特征的优势,从而更全面地描述模糊激光图像的纹理信息,基于此提出了融合视觉传达的模糊激光图像纹理特征提取方法。将RGB空间中的模糊激光图像转换到利于人眼感知的CIE Lab颜色空间,以增强图像的颜色信息,并通过MSRCR算法和CLAHE算法在保持图像颜色的同时,增强模糊激光图像的对比度;在此基础上,利用Gabor滤波器的复数性质,在多个尺度和方向上提取模糊激光图像的纹理特征信息。实验结果表明,该方法的图像预处理效果好,且特征平均吻合度在0.98以上,结构相似性指数平均值达到了0.84,峰值信噪比平均值达到了27.70 dB,具有较高的纹理特征提取的性能。 展开更多
关键词 视觉传达 模糊激光图像 MSRCR算法 GABOR滤波器 纹理特征提取
原文传递
基于激光视觉数据融合的视觉传达图像清晰化处理方法 被引量:1
19
作者 俞杰 《激光杂志》 北大核心 2025年第3期127-132,共6页
图像是目前视觉传达领域的关键信息载体之一,由于采集设备、压缩技术、传输环境与显示模式等多种因素的影响,致使视觉传达图像清晰度较差,制约着视觉信息的传达效果,故提出基于激光视觉数据融合的视觉传达图像清晰化处理方法。应用激光... 图像是目前视觉传达领域的关键信息载体之一,由于采集设备、压缩技术、传输环境与显示模式等多种因素的影响,致使视觉传达图像清晰度较差,制约着视觉信息的传达效果,故提出基于激光视觉数据融合的视觉传达图像清晰化处理方法。应用激光雷达与单目相机获取视觉传达对象的相关信息,提取激光数据特征——点云密度、激光数据点法线与激光数据点曲率,视觉数据特征——颜色特征、纹理特征、形状特征与空间关系特征,以此为基础,搭建激光视觉数据融合框架,确定激光数据特征与视觉数据特征融合尺度,将激光数据投影到视觉图像上,从而实现了视觉传达图像的清晰化处理。实验结果显示:应用提出方法获得的视觉传达图像清晰度较高,视觉传达图像信息熵最大值为35 bit,充分证实了提出方法图像清晰化效果更佳。 展开更多
关键词 视觉传达图像 数据融合 图像增强 激光数据特征 清晰化 视觉数据特征
原文传递
基于视觉特征增强与双向交互融合的图文情绪分类
20
作者 王露瑶 胡慧君 刘茂福 《计算机工程与科学》 北大核心 2025年第11期2056-2066,共11页
多模态情感分析日益受到广泛关注,其目的是利用文本和图像等多模态信息实现情感预测。相较于文本,视觉模态作为辅助模态,可能包含更多与情感无关的混淆或者冗余信息,同时现有研究未充分考虑多个感知模态间的交互作用和互补性。针对上述... 多模态情感分析日益受到广泛关注,其目的是利用文本和图像等多模态信息实现情感预测。相较于文本,视觉模态作为辅助模态,可能包含更多与情感无关的混淆或者冗余信息,同时现有研究未充分考虑多个感知模态间的交互作用和互补性。针对上述问题,提出了基于视觉特征增强与双向交互融合的图文情绪分类VFEBIF模型。其中,细粒度视觉特征增强模块利用场景图的结构化知识和基于CLIP的筛选技术,提取出与视觉语义相关的文本关键词,从而增强视觉局部特征。此外,双向交互融合模块并行实现模态间交互,并融合多模态特征以深入挖掘模态间的互补信息,进而实现情绪分类。在TumEmo和MVSA-Single这2个公共数据集上的实验表明,VFEBIF模型优于多数现有模型,能够有效提升情绪分类性能。 展开更多
关键词 多模态情感分析 图文情绪分类 视觉特征增强 双向交互融合
在线阅读 下载PDF
上一页 1 2 27 下一页 到第
使用帮助 返回顶部