Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms...Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024).展开更多
This study presents a novel and highly efficient superpixel algorithm, namely, depth-fused adaptive superpixel(DFASP), which can generate accurate superpixels in a degraded image. In many applications, particularly in...This study presents a novel and highly efficient superpixel algorithm, namely, depth-fused adaptive superpixel(DFASP), which can generate accurate superpixels in a degraded image. In many applications, particularly in actual scenes, vision degradation, such as motion blur, overexposure, and underexposure, often occurs. Well-known color-based superpixel algorithms are incapable of producing accurate superpixels in degraded images because of the ambiguity of color information caused by vision degradation. To eliminate this ambiguity, we use depth and color information to generate superpixels. We map the depth and color information to a high-dimensional feature space. Then, we develop a fast multilevel clustering algorithm to produce superpixels. Furthermore, we design an adaptive mechanism to adjust the color and depth information automatically during pixel clustering. Experimental results demonstrate that regardless of boundary recall, under segmentation error, run time, or achievable segmentation accuracy, DFASP is better than state-of-the-art superpixel methods.展开更多
The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor l...The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor localization technologies generally used scene-specific 3D representations or were trained on specific datasets, making it challenging to balance accuracy and cost when applied to new scenes. Addressing this issue, this paper proposed a universal indoor visual localization method based on efficient image retrieval. Initially, a Multi-Layer Perceptron (MLP) was employed to aggregate features from intermediate layers of a convolutional neural network, obtaining a global representation of the image. This approach ensured accurate and rapid retrieval of reference images. Subsequently, a new mechanism using Random Sample Consensus (RANSAC) was designed to resolve relative pose ambiguity caused by the essential matrix decomposition based on the five-point method. Finally, the absolute pose of the queried user image was computed, thereby achieving indoor user pose estimation. The proposed indoor localization method was characterized by its simplicity, flexibility, and excellent cross-scene generalization. Experimental results demonstrated a positioning error of 0.09 m and 2.14° on the 7Scenes dataset, and 0.15 m and 6.37° on the 12Scenes dataset. These results convincingly illustrated the outstanding performance of the proposed indoor localization method.展开更多
Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate eval...Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate evaluator for visual experience,thus the modeling of human visual system(HVS)is a core issue for objective IQA and visual experience optimization.The traditional model based on black box fitting has low interpretability and it is difficult to guide the experience optimization effectively,while the model based on physiological simulation is hard to integrate into practical visual communication services due to its high computational complexity.For bridging the gap between signal distortion and visual experience,in this paper,we propose a novel perceptual no-reference(NR)IQA algorithm based on structural computational modeling of HVS.According to the mechanism of the human brain,we divide the visual signal processing into a low-level visual layer,a middle-level visual layer and a high-level visual layer,which conduct pixel information processing,primitive information processing and global image information processing,respectively.The natural scene statistics(NSS)based features,deep features and free-energy based features are extracted from these three layers.The support vector regression(SVR)is employed to aggregate features to the final quality prediction.Extensive experimental comparisons on three widely used benchmark IQA databases(LIVE,CSIQ and TID2013)demonstrate that our proposed metric is highly competitive with or outperforms the state-of-the-art NR IQA measures.展开更多
It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in de...It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.展开更多
In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Fi...In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Firstly,the source images are separated into a series of high and low frequency components.Secondly,three visual features of the source image are extracted to construct a decision graph model.Thirdly,a fast weighted guided filter is raised to optimize the result obtained in the previous step and reduce the time complexity by considering the correlation among neighboring pixels.Finally,the image obtained in the previous step is combined with the weight map to realize the image fusion.The proposed algorithm is applied to multi-focus,visible-infrared and multi-modal image respectively and the final results show that the algorithm effectively solves the halo artifacts of the merged images with higher efficiency,and is better than the traditional method considering subjective visual consequent and objective evaluation.展开更多
In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts...In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts. To facilitate the decision-making in the health-care and the related areas, in this paper, a two-step content-based medical image retrieval algorithm is proposed. Firstly, in the preprocessing step, the image segmentation is performed to distinguish image objects, and on the basis of the ...展开更多
Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of eff...Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of efforts trying to automate the classification operation and retrieve similar images accurately.To reach this goal,we developed a VGG19 deep convolutional neural network to extract the visual features from the images automatically.Then,the distances among the extracted features vectors are measured and a similarity score is generated using a Siamese deep neural network.The Siamese model built and trained at first from scratch but,it didn’t generated high evaluation metrices.Thus,we re-built it from VGG19 pre-trained deep learning model to generate higher evaluation metrices.Afterward,three different distance metrics combined with the Sigmoid activation function are experimented looking for the most accurate method formeasuring the similarities among the retrieved images.Reaching that the highest evaluation parameters generated using the Cosine distance metric.Moreover,the Graphics Processing Unit(GPU)utilized to run the code instead of running it on the Central Processing Unit(CPU).This step optimized the execution further since it expedited both the training and the retrieval time efficiently.After extensive experimentation,we reached satisfactory solution recording 0.98 and 0.99 F-score for the classification and for the retrieval,respectively.展开更多
Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-see...Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-seeking real-time tasks such as autonomous driving.The inefficiency of the models is mainly due to employing homogeneous modules to process features of different layers.These modules require computationally intensive convolutions and weight calculation branches with numerous parameters to accommodate the differences in information across layers.We propose an efficient glass segmentation network(EGSNet)based on multi-level heterogeneous architecture and boundary awareness to balance the model performance and efficiency.EGSNet divides the feature layers from different stages into low-level understanding,semantic-level understanding,and global understanding with boundary guidance.Based on the information differences among the different layers,we further propose the multi-angle collaborative enhancement(MCE)module,which extracts the detailed information from shallow features,and the large-scale contextual feature extraction(LCFE)module to understand semantic logic through deep features.The models are trained and evaluated on the glass segmentation datasets HSO(Home-Scene-Oriented)and Trans10k-stuff,respectively,and EGSNet achieves the best efficiency and performance compared to advanced methods.In the HSO test set results,the IoU,Fβ,MAE(Mean Absolute Error),and BER(Balance Error Rate)of EGSNet are 0.804,0.847,0.084,and 0.085,and the GFLOPs(Giga Floating Point Operations Per Second)are only 27.15.Experimental results show that EGSNet significantly improves the efficiency of the glass segmentation task with better performance.展开更多
【目的】煤矿井下普遍存在低照度、弱纹理和结构化的特征退化场景,导致视觉SLAM(visual simultaneous localization and mapping)系统面临有效特征不足或误匹配率高的问题,严重制约了其定位的准确性和鲁棒性。【方法】提出一种基于边缘...【目的】煤矿井下普遍存在低照度、弱纹理和结构化的特征退化场景,导致视觉SLAM(visual simultaneous localization and mapping)系统面临有效特征不足或误匹配率高的问题,严重制约了其定位的准确性和鲁棒性。【方法】提出一种基于边缘感知增强的视觉SLAM方法。首先,构建了边缘感知约束的低光图像增强模块。通过自适应尺度的梯度域引导滤波器优化Retinex算法,以获得纹理清晰光照均匀的图像,从而显著提升了在低光照和不均匀光照条件下特征提取性能。其次,在视觉里程计中构建了边缘感知增强的特征提取和匹配模块,通过点线特征融合策略有效增强了弱纹理和结构化场景中特征的可检测性和匹配准确性。具体使用边缘绘制线特征提取算法(edge drawing lines,EDLines)提取线特征,定向FAST和旋转BRIEF点特征提取算法(oriented fast and rotated brief,ORB)提取点特征,并利用基于网格运动统计(grid-based motion statistics,GMS)和比值测试匹配算法进行精确匹配。最后,将该方法与ORB-SLAM2、ORB-SLAM3在TUM数据集和煤矿井下实景数据集上进行了全面实验验证,涵盖图像增强、特征匹配和定位等多个环节。【结果和结论】结果表明:(1)在TUM数据集上的测试结果显示,所提方法与ORB-SLAM2相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了4%~38.46%、8.62%~50%;与ORB-SLAM3相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了0~61.68%、3.63%~47.05%。(2)在煤矿井下实景实验中,所提方法的定位轨迹更接近于相机运动参考轨迹。(3)有效提高了视觉SLAM在煤矿井下特征退化场景中的准确性和鲁棒性,为视觉SLAM技术在煤矿井下的应用提供了技术解决方案。研究面向井下特征退化场景的视觉SLAM方法,对于推动煤矿井下移动式装备机器人化具有重要意义。展开更多
In fMRI experiments on object representation in visual cortex, we designed two types of stimuli: one is the gray face image and its line drawing, and the other is the illusion and its corresponding completed illusion....In fMRI experiments on object representation in visual cortex, we designed two types of stimuli: one is the gray face image and its line drawing, and the other is the illusion and its corresponding completed illusion. Both of them have the same global features with different minute details so that the results of fMRI experiments can be compared with each other. The first kind of visual stimuli was used in a block design fMRI experiment, and the second was used in an event-related fMRI experiment. Comparing and analyzing interesting visual cortex activity patterns and blood oxygenation level dependent (BOLD)- fMRI signal, we obtained results to show some invariance of global features of visual images. A plau- sible explanation about the invariant mechanism is related with the cooperation of synchronized re- sponse to the global features of the visual image with a feedback of shape perception from higher cortex to cortex V1, namely the integration of global features and embodiment of sparse representation and distributed population code.展开更多
基金supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047)High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).
文摘Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024).
基金supported by the Science Technology Department of Zhejiang Province(No.LGG19F020010)the Department of Education of Zhejiang Province(No.Y201329938)the National Natural Science Foundation of China(No.61876167)
文摘This study presents a novel and highly efficient superpixel algorithm, namely, depth-fused adaptive superpixel(DFASP), which can generate accurate superpixels in a degraded image. In many applications, particularly in actual scenes, vision degradation, such as motion blur, overexposure, and underexposure, often occurs. Well-known color-based superpixel algorithms are incapable of producing accurate superpixels in degraded images because of the ambiguity of color information caused by vision degradation. To eliminate this ambiguity, we use depth and color information to generate superpixels. We map the depth and color information to a high-dimensional feature space. Then, we develop a fast multilevel clustering algorithm to produce superpixels. Furthermore, we design an adaptive mechanism to adjust the color and depth information automatically during pixel clustering. Experimental results demonstrate that regardless of boundary recall, under segmentation error, run time, or achievable segmentation accuracy, DFASP is better than state-of-the-art superpixel methods.
文摘The task of indoor visual localization, utilizing camera visual information for user pose calculation, was a core component of Augmented Reality (AR) and Simultaneous Localization and Mapping (SLAM). Existing indoor localization technologies generally used scene-specific 3D representations or were trained on specific datasets, making it challenging to balance accuracy and cost when applied to new scenes. Addressing this issue, this paper proposed a universal indoor visual localization method based on efficient image retrieval. Initially, a Multi-Layer Perceptron (MLP) was employed to aggregate features from intermediate layers of a convolutional neural network, obtaining a global representation of the image. This approach ensured accurate and rapid retrieval of reference images. Subsequently, a new mechanism using Random Sample Consensus (RANSAC) was designed to resolve relative pose ambiguity caused by the essential matrix decomposition based on the five-point method. Finally, the absolute pose of the queried user image was computed, thereby achieving indoor user pose estimation. The proposed indoor localization method was characterized by its simplicity, flexibility, and excellent cross-scene generalization. Experimental results demonstrated a positioning error of 0.09 m and 2.14° on the 7Scenes dataset, and 0.15 m and 6.37° on the 12Scenes dataset. These results convincingly illustrated the outstanding performance of the proposed indoor localization method.
基金This work was supported by National Natural Science Foundation of China(Nos.61831015 and 61901260)Key Research and Development Program of China(No.2019YFB1405902).
文摘Objective image quality assessment(IQA)plays an important role in various visual communication systems,which can automatically and efficiently predict the perceived quality of images.The human eye is the ultimate evaluator for visual experience,thus the modeling of human visual system(HVS)is a core issue for objective IQA and visual experience optimization.The traditional model based on black box fitting has low interpretability and it is difficult to guide the experience optimization effectively,while the model based on physiological simulation is hard to integrate into practical visual communication services due to its high computational complexity.For bridging the gap between signal distortion and visual experience,in this paper,we propose a novel perceptual no-reference(NR)IQA algorithm based on structural computational modeling of HVS.According to the mechanism of the human brain,we divide the visual signal processing into a low-level visual layer,a middle-level visual layer and a high-level visual layer,which conduct pixel information processing,primitive information processing and global image information processing,respectively.The natural scene statistics(NSS)based features,deep features and free-energy based features are extracted from these three layers.The support vector regression(SVR)is employed to aggregate features to the final quality prediction.Extensive experimental comparisons on three widely used benchmark IQA databases(LIVE,CSIQ and TID2013)demonstrate that our proposed metric is highly competitive with or outperforms the state-of-the-art NR IQA measures.
基金Projects(41001260,61173122,61573380) supported by the National Natural Science Foundation of ChinaProject(11JJ5044) supported by the Hunan Provincial Natural Science Foundation of China
文摘It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.
基金supported by the National Natural Science Foundation of China(61472324 61671383)+1 种基金Shaanxi Key Industry Innovation Chain Project(2018ZDCXL-G-12-2 2019ZDLGY14-02-02)
文摘In last few years,guided image fusion algorithms become more and more popular.However,the current algorithms cannot solve the halo artifacts.We propose an image fusion algorithm based on fast weighted guided filter.Firstly,the source images are separated into a series of high and low frequency components.Secondly,three visual features of the source image are extracted to construct a decision graph model.Thirdly,a fast weighted guided filter is raised to optimize the result obtained in the previous step and reduce the time complexity by considering the correlation among neighboring pixels.Finally,the image obtained in the previous step is combined with the weight map to realize the image fusion.The proposed algorithm is applied to multi-focus,visible-infrared and multi-modal image respectively and the final results show that the algorithm effectively solves the halo artifacts of the merged images with higher efficiency,and is better than the traditional method considering subjective visual consequent and objective evaluation.
文摘In medical research and clinical diagnosis, automated or computer-assisted classification and retrieval methods are highly desirable to offset the high cost of manual classification and manipulation by medical experts. To facilitate the decision-making in the health-care and the related areas, in this paper, a two-step content-based medical image retrieval algorithm is proposed. Firstly, in the preprocessing step, the image segmentation is performed to distinguish image objects, and on the basis of the ...
基金The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4400271DSR01).
文摘Classifying the visual features in images to retrieve a specific image is a significant problem within the computer vision field especially when dealing with historical faded colored images.Thus,there were lots of efforts trying to automate the classification operation and retrieve similar images accurately.To reach this goal,we developed a VGG19 deep convolutional neural network to extract the visual features from the images automatically.Then,the distances among the extracted features vectors are measured and a similarity score is generated using a Siamese deep neural network.The Siamese model built and trained at first from scratch but,it didn’t generated high evaluation metrices.Thus,we re-built it from VGG19 pre-trained deep learning model to generate higher evaluation metrices.Afterward,three different distance metrics combined with the Sigmoid activation function are experimented looking for the most accurate method formeasuring the similarities among the retrieved images.Reaching that the highest evaluation parameters generated using the Cosine distance metric.Moreover,the Graphics Processing Unit(GPU)utilized to run the code instead of running it on the Central Processing Unit(CPU).This step optimized the execution further since it expedited both the training and the retrieval time efficiently.After extensive experimentation,we reached satisfactory solution recording 0.98 and 0.99 F-score for the classification and for the retrieval,respectively.
文摘Existing glass segmentation networks have high computational complexity and large memory occupation,leading to high hardware requirements and time overheads for model inference,which is not conducive to efficiency-seeking real-time tasks such as autonomous driving.The inefficiency of the models is mainly due to employing homogeneous modules to process features of different layers.These modules require computationally intensive convolutions and weight calculation branches with numerous parameters to accommodate the differences in information across layers.We propose an efficient glass segmentation network(EGSNet)based on multi-level heterogeneous architecture and boundary awareness to balance the model performance and efficiency.EGSNet divides the feature layers from different stages into low-level understanding,semantic-level understanding,and global understanding with boundary guidance.Based on the information differences among the different layers,we further propose the multi-angle collaborative enhancement(MCE)module,which extracts the detailed information from shallow features,and the large-scale contextual feature extraction(LCFE)module to understand semantic logic through deep features.The models are trained and evaluated on the glass segmentation datasets HSO(Home-Scene-Oriented)and Trans10k-stuff,respectively,and EGSNet achieves the best efficiency and performance compared to advanced methods.In the HSO test set results,the IoU,Fβ,MAE(Mean Absolute Error),and BER(Balance Error Rate)of EGSNet are 0.804,0.847,0.084,and 0.085,and the GFLOPs(Giga Floating Point Operations Per Second)are only 27.15.Experimental results show that EGSNet significantly improves the efficiency of the glass segmentation task with better performance.
文摘【目的】煤矿井下普遍存在低照度、弱纹理和结构化的特征退化场景,导致视觉SLAM(visual simultaneous localization and mapping)系统面临有效特征不足或误匹配率高的问题,严重制约了其定位的准确性和鲁棒性。【方法】提出一种基于边缘感知增强的视觉SLAM方法。首先,构建了边缘感知约束的低光图像增强模块。通过自适应尺度的梯度域引导滤波器优化Retinex算法,以获得纹理清晰光照均匀的图像,从而显著提升了在低光照和不均匀光照条件下特征提取性能。其次,在视觉里程计中构建了边缘感知增强的特征提取和匹配模块,通过点线特征融合策略有效增强了弱纹理和结构化场景中特征的可检测性和匹配准确性。具体使用边缘绘制线特征提取算法(edge drawing lines,EDLines)提取线特征,定向FAST和旋转BRIEF点特征提取算法(oriented fast and rotated brief,ORB)提取点特征,并利用基于网格运动统计(grid-based motion statistics,GMS)和比值测试匹配算法进行精确匹配。最后,将该方法与ORB-SLAM2、ORB-SLAM3在TUM数据集和煤矿井下实景数据集上进行了全面实验验证,涵盖图像增强、特征匹配和定位等多个环节。【结果和结论】结果表明:(1)在TUM数据集上的测试结果显示,所提方法与ORB-SLAM2相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了4%~38.46%、8.62%~50%;与ORB-SLAM3相比,绝对轨迹误差、相对轨迹误差的均方根误差分别降低了0~61.68%、3.63%~47.05%。(2)在煤矿井下实景实验中,所提方法的定位轨迹更接近于相机运动参考轨迹。(3)有效提高了视觉SLAM在煤矿井下特征退化场景中的准确性和鲁棒性,为视觉SLAM技术在煤矿井下的应用提供了技术解决方案。研究面向井下特征退化场景的视觉SLAM方法,对于推动煤矿井下移动式装备机器人化具有重要意义。
基金Supported by the National Natural Science Foundation of China (Grant Nos. 60371045 & 60628101)the National High Technology Research and Development Program of China (Grant No. 2006AA01Z132)
文摘In fMRI experiments on object representation in visual cortex, we designed two types of stimuli: one is the gray face image and its line drawing, and the other is the illusion and its corresponding completed illusion. Both of them have the same global features with different minute details so that the results of fMRI experiments can be compared with each other. The first kind of visual stimuli was used in a block design fMRI experiment, and the second was used in an event-related fMRI experiment. Comparing and analyzing interesting visual cortex activity patterns and blood oxygenation level dependent (BOLD)- fMRI signal, we obtained results to show some invariance of global features of visual images. A plau- sible explanation about the invariant mechanism is related with the cooperation of synchronized re- sponse to the global features of the visual image with a feedback of shape perception from higher cortex to cortex V1, namely the integration of global features and embodiment of sparse representation and distributed population code.