期刊文献+
共找到35篇文章
< 1 2 >
每页显示 20 50 100
A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features
1
作者 Alaa Thobhani Beiji Zou +4 位作者 Xiaoyan Kui Amr Abdussalam Muhammad Asim Mohammed ELAffendi Sajid Shah 《Computers, Materials & Continua》 2025年第3期3943-3964,共22页
Image captioning,the task of generating descriptive sentences for images,has advanced significantly with the integration of semantic information.However,traditional models still rely on static visual features that do ... Image captioning,the task of generating descriptive sentences for images,has advanced significantly with the integration of semantic information.However,traditional models still rely on static visual features that do not evolve with the changing linguistic context,which can hinder the ability to form meaningful connections between the image and the generated captions.This limitation often leads to captions that are less accurate or descriptive.In this paper,we propose a novel approach to enhance image captioning by introducing dynamic interactions where visual features continuously adapt to the evolving linguistic context.Our model strengthens the alignment between visual and linguistic elements,resulting in more coherent and contextually appropriate captions.Specifically,we introduce two innovative modules:the Visual Weighting Module(VWM)and the Enhanced Features Attention Module(EFAM).The VWM adjusts visual features using partial attention,enabling dynamic reweighting of the visual inputs,while the EFAM further refines these features to improve their relevance to the generated caption.By continuously adjusting visual features in response to the linguistic context,our model bridges the gap between static visual features and dynamic language generation.We demonstrate the effectiveness of our approach through experiments on the MS-COCO dataset,where our method outperforms state-of-the-art techniques in terms of caption quality and contextual relevance.Our results show that dynamic visual-linguistic alignment significantly enhances image captioning performance. 展开更多
关键词 Image-captioning visual attention deep learning visual features
在线阅读 下载PDF
A Novel CAPTCHA Recognition System Based on Refined Visual Attention
2
作者 Zaid Derea Beiji Zou +3 位作者 Xiaoyan Kui Monir Abdullah Alaa Thobhani Amr Abdussalam 《Computers, Materials & Continua》 2025年第4期115-136,共22页
Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)has emerged as a key strategy for distinguishing huma... Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)has emerged as a key strategy for distinguishing human users from automated bots.Text-based CAPTCHAs,designed to be easily decipherable by humans yet challenging for machines,are a common form of this verification.However,advancements in deep learning have facilitated the creation of models adept at recognizing these text-based CAPTCHAs with surprising efficiency.In our comprehensive investigation into CAPTCHA recognition,we have tailored the renowned UpDown image captioning model specifically for this purpose.Our approach innovatively combines an encoder to extract both global and local features,significantly boosting the model’s capability to identify complex details within CAPTCHA images.For the decoding phase,we have adopted a refined attention mechanism,integrating enhanced visual attention with dual layers of Long Short-Term Memory(LSTM)networks to elevate CAPTCHA recognition accuracy.Our rigorous testing across four varied datasets,including those from Weibo,BoC,Gregwar,and Captcha 0.3,demonstrates the versatility and effectiveness of our method.The results not only highlight the efficiency of our approach but also offer profound insights into its applicability across different CAPTCHA types,contributing to a deeper understanding of CAPTCHA recognition technology. 展开更多
关键词 Text-based CAPTCHA recognition refined visual attention web security computer vision
在线阅读 下载PDF
A Dual-Layer Attention Based CAPTCHA Recognition Approach with Guided Visual Attention
3
作者 Zaid Derea Beiji Zou +2 位作者 Xiaoyan Kui Alaa Thobhani Amr Abdussalam 《Computer Modeling in Engineering & Sciences》 2025年第3期2841-2867,共27页
Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.Whil... Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.While text-based CAPTCHAs are designed to challenge machines while remaining human-readable,recent advances in deep learning have enabled models to recognize them with remarkable efficiency.In this regard,we propose a novel two-layer visual attention framework for CAPTCHA recognition that builds on traditional attention mechanisms by incorporating Guided Visual Attention(GVA),which sharpens focus on relevant visual features.We have specifically adapted the well-established image captioning task to address this need.Our approach utilizes the first-level attention module as guidance to the second-level attention component,incorporating two LSTM(Long Short-Term Memory)layers to enhance CAPTCHA recognition.Our extensive evaluation across four diverse datasets—Weibo,BoC(Bank of China),Gregwar,and Captcha 0.3—shows the adaptability and efficacy of our method.Our approach demonstrated impressive performance,achieving an accuracy of 96.70%for BoC and 95.92%for Webo.These results underscore the effectiveness of our method in accurately recognizing and processing CAPTCHA datasets,showcasing its robustness,reliability,and ability to handle varied challenges in CAPTCHA recognition. 展开更多
关键词 Text-based CAPTCHA image recognition guided visual attention web security computer vision
在线阅读 下载PDF
Microstructure recognition of steels by machine learning based on visual attention mechanism
4
作者 Xing-yu Chen Lin Cheng +2 位作者 Cheng-yang Hu Yu-peng Zhang Kai-ming Wu 《Journal of Iron and Steel Research International》 SCIE EI CAS CSCD 2024年第4期909-923,共15页
U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentat... U-Net has achieved good performance with the small-scale datasets through skip connections to merge the features of the low-level layers and high-level layers and has been widely utilized in biomedical image segmentation as well as recent microstructure image segregation of the materials.Three representative visual attention mechanism modules,named as squeeze-and-excitation networks,convolutional block attention module,and extended calibration algorithm,were intro-duced into the traditional U-Net architecture to further improve the prediction accuracy.It is found that compared with the original U-Net architecture,the evaluation index of the improved U-Net architecture has been significantly improved for the microstructure segmentation of the steels with the ferrite/martensite composite microstructure and pearlite/ferrite composite microstructure and the complex martensite/austenite island/bainite microstructure,which demonstrates the advantages of the utilization of the visual attention mechanism in the microstructure segregation.The reasons for the accuracy improvement were discussed based on the feature maps analysis. 展开更多
关键词 Microstructure recognition-Steel Machine learning visual attention mechanism visualIZATION
原文传递
AdaFI-FCN:an adaptive feature integration fully convolutional network for predicting driver’s visual attention
5
作者 Bowen Shi Weihua Dong Zhicheng Zhan 《Geo-Spatial Information Science》 CSCD 2024年第4期1309-1325,共17页
Visual Attention Prediction(VAP)is widely applied in GIS research,such as navigation task identification and driver assistance systems.Previous studies commonly took color information to detect the visual saliency of ... Visual Attention Prediction(VAP)is widely applied in GIS research,such as navigation task identification and driver assistance systems.Previous studies commonly took color information to detect the visual saliency of natural scene images.However,these studies rarely considered adaptively feature integration to different geospatial scenes in specific tasks.To better predict visual attention while driving tasks,in this paper,we firstly propose an Adaptive Feature Integration Fully Convolutional Network(AdaFI-FCN)using Scene-Adaptive Weights(SAW)to integrate RGB-D,motion and semantic features.The quantitative comparison results on the DR(eye)VE dataset show that the proposed framework achieved the best accuracy and robustness performance compared with state-of-the-art models(AUC-Judd=0.971,CC=0.767,KL=1.046,SIM=0.579).In addition,the experimental results of the ablation study demonstrated the positive effect of the SAW method on the prediction robustness in response to scene changes.The proposed model has the potential to benefit adaptive VAP research in universal geospatial scenes,such as AR-aided navigation,indoor navigation,and street-view image reading. 展开更多
关键词 visual attention Prediction(VAP) feature integration Fully Convolutional Network(FCN) driving environment deep learning
原文传递
Hierarchical Visual Attention Model for Saliency Detection Inspired by Avian Visual Pathways 被引量:9
6
作者 Xiaohua Wang Haibin Duan 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2019年第2期540-552,共13页
Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment.Most computational visual attention models are designed with inspirations from mammalian vi... Visual attention is a mechanism that enables the visual system to detect potentially important objects in complex environment.Most computational visual attention models are designed with inspirations from mammalian visual systems.However,electrophysiological and behavioral evidences indicate that avian species are animals with high visual capability that can process complex information accurately in real time.Therefore,the visual system of the avian species,especially the nuclei related to the visual attention mechanism,are investigated in this paper.Afterwards,a hierarchical visual attention model is proposed for saliency detection.The optic tectum neuron responses are computed and the self-information is used to compute primary saliency maps in the first hierarchy.The"winner-takeall"network in the tecto-isthmal projection is simulated and final saliency maps are estimated with the regularized random walks ranking in the second hierarchy.Comparison results verify that the proposed model,which can define the focus of attention accurately,outperforms several state-of-the-art models.This study provides insights into the relationship between the visual attention mechanism and the avian visual pathways.The computational visual attention model may reveal the underlying neural mechanism of the nuclei for biological visual attention. 展开更多
关键词 Avian visual pathways BIO-INSPIRED saliency detection visual attention
在线阅读 下载PDF
A Visual Attention Model for Robot Object Tracking 被引量:3
7
作者 Jin-Kui Chu Rong-Hua Li Qing-Ying Li Hong-Qing Wang School of Mechanical Engineering, Dalian University of Technology, Dalian 116024, PRC 《International Journal of Automation and computing》 EI 2010年第1期39-46,共8页
Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-u... Inspired by human behaviors, a robot object tracking model is proposed on the basis of visual attention mechanism, which is fit for the theory of topological perception. The model integrates the image-driven, bottom-up attention and the object-driven, top-down attention, whereas the previous attention model has mostly focused on either the bottom-up or top-down attention. By the bottom-up component, the whole scene is segmented into the ground region and the salient regions. Guided by top-down strategy which is achieved by a topological graph, the object regions are separated from the salient regions. The salient regions except the object regions are the barrier regions. In order to estimate the model, a mobile robot platform is developed, on which some experiments are implemented. The experimental results indicate that processing an image with a resolution of 752 × 480 pixels takes less than 200 ms and the object regions are unabridged. The analysis obtained by comparing the proposed model with the existing model demonstrates that the proposed model has some advantages in robot object tracking in terms of speed and efficiency. 展开更多
关键词 Object tracking visual attention topological perception salient regions weighted similarity equation
在线阅读 下载PDF
Visual attention and clustering-based automatic selection of landmarks using single camera 被引量:1
8
作者 CHUHO Yi YONGMIN Shin JUNGWON Cho 《Journal of Central South University》 SCIE EI CAS 2014年第9期3525-3533,共9页
An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoo... An improved method with better selection capability using a single camera was presented in comparison with previous method. To improve performance, two methods were applied to landmark selection in an unfamiliar indoor environment. First, a modified visual attention method was proposed to automatically select a candidate region as a more useful landmark. In visual attention, candidate landmark regions were selected with different characteristics of ambient color and intensity in the image. Then, the more useful landmarks were selected by combining the candidate regions using clustering. As generally implemented, automatic landmark selection by vision-based simultaneous localization and mapping(SLAM) results in many useless landmarks, because the features of images are distinguished from the surrounding environment but detected repeatedly. These useless landmarks create a serious problem for the SLAM system because they complicate data association. To address this, a method was proposed in which the robot initially collected landmarks through automatic detection while traversing the entire area where the robot performed SLAM, and then, the robot selected only those landmarks that exhibited high rarity through clustering, which enhanced the system performance. Experimental results show that this method of automatic landmark selection results in selection of a high-rarity landmark. The average error of the performance of SLAM decreases 52% compared with conventional methods and the accuracy of data associations increases. 展开更多
关键词 simultaneous localization and mapping automatic landmark selection visual attention CLUSTERING
在线阅读 下载PDF
Human Visual Attention Mechanism-Inspired Point-and-Line Stereo Visual Odometry for Environments with Uneven Distributed Features 被引量:1
9
作者 Chang Wang Jianhua Zhang +2 位作者 Yan Zhao Youjie Zhou Jincheng Jiang 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2023年第3期191-204,共14页
Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly dist... Visual odometry is critical in visual simultaneous localization and mapping for robot navigation.However,the pose estimation performance of most current visual odometry algorithms degrades in scenes with unevenly distributed features because dense features occupy excessive weight.Herein,a new human visual attention mechanism for point-and-line stereo visual odometry,which is called point-line-weight-mechanism visual odometry(PLWM-VO),is proposed to describe scene features in a global and balanced manner.A weight-adaptive model based on region partition and region growth is generated for the human visual attention mechanism,where sufficient attention is assigned to position-distinctive objects(sparse features in the environment).Furthermore,the sum of absolute differences algorithm is used to improve the accuracy of initialization for line features.Compared with the state-of-the-art method(ORB-VO),PLWM-VO show a 36.79%reduction in the absolute trajectory error on the Kitti and Euroc datasets.Although the time consumption of PLWM-VO is higher than that of ORB-VO,online test results indicate that PLWM-VO satisfies the real-time demand.The proposed algorithm not only significantly promotes the environmental adaptability of visual odometry,but also quantitatively demonstrates the superiority of the human visual attention mechanism. 展开更多
关键词 visual odometry Human visual attention mechanism Environmental adaptability Uneven distributed features
在线阅读 下载PDF
Visual attention based model for target detection in large-field images 被引量:1
10
作者 Lining Gao Fukun Bi Jian Yang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2011年第1期150-156,共7页
It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper pro... It is of great significance to rapidly detect targets in large-field remote sensing images,with limited computation resources.Employing relative achievements of visual attention in perception psychology,this paper proposes a hierarchical attention based model for target detection.Specifically,at the preattention stage,before getting salient regions,a fast computational approach is applied to build a saliency map.After that,the focus of attention(FOA) can be quickly obtained to indicate the salient objects.Then,at the attention stage,under the FOA guidance,the high-level visual features of the region of interest are extracted in parallel.Finally,at the post-attention stage,by integrating these parallel and independent visual attributes,a decision-template based classifier fusion strategy is proposed to discriminate the task-related targets from the other extracted salient objects.For comparison,experiments on ship detection are done for validating the effectiveness and feasibility of the proposed model. 展开更多
关键词 target detection visual attention salient region classifier fusion.
在线阅读 下载PDF
Temporal continuity of visual attention for future gaze prediction in immersive virtual reality 被引量:1
11
作者 Zhiming HU Sheng LI Meng GAI 《Virtual Reality & Intelligent Hardware》 2020年第2期142-152,共11页
Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rend... Background Eye tracking te chnology is receiving increased attention in the field of virtual reality.Specifically,future gaze prediction is crucial in pre-computation for many applications such as gaze-contingent rendering,advertisement placement,and content-based design.To explore future gaze prediction,it is necessary to analyze the temporal continuity of visual attention in immersive virtual reality.Methods In this paper,the concept of temporal continuity of visual attention is presented.Subsequently,an autocorrelation function method is proposed to evaluate the temporal continuity.Thereafter,the temporal continuity is analyzed in both free-viewing and task-oriented conditions.Results Specifically,in free-viewing conditions,the analysis of a free-viewing gaze dataset indicates that the temporal continuity performs well only within a short time interval.A task-oriented game scene condition was created and conducted to collect users'gaze data.An analysis of the collected gaze data finds the temporal continuity has a similar performance with that of the free-viewing conditions.Temporal continuity can be applied to future gaze prediction and if it is good,users'current gaze positions can be directly utilized to predict their gaze positions in the future.Conclusions The current gaze's future prediction performances are further evaluated in both free-viewing and task-oriented conditions and discover that the current gaze can be efficiently applied to the task of short-term future gaze prediction.The task of long-term gaze prediction still remains to be explored. 展开更多
关键词 Temporal continuity visual attention Autocorrelation analysis Gaze prediction Virtual reality
在线阅读 下载PDF
Traffic danger detection by visual attention model of sparse sampling
12
作者 夏利民 刘涛 谭论正 《Journal of Central South University》 SCIE EI CAS CSCD 2015年第10期3916-3924,共9页
A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection ... A method to detect traffic dangers based on visual attention model of sparse sampling was proposed. The hemispherical sparse sampling model was used to decrease the amount of calculation which increases the detection speed. Bayesian probability model and Gaussian kernel function were applied to calculate the saliency of traffic videos. The method of multiscale saliency was used and the final saliency was the average of all scales, which increased the detection rates extraordinarily. The detection results of several typical traffic dangers show that the proposed method has higher detection rates and speed, which meets the requirement of real-time detection of traffic dangers. 展开更多
关键词 traffic dangers visual attention model sparse sampling Bayesian probability model multiscale saliency
在线阅读 下载PDF
A video structural similarity quality metric based on a joint spatial-temporal visual attention model
13
作者 Hua ZHANG Xiang TIAN Yao-wu CHEN 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第12期1696-1704,共9页
Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effect... Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effectively. In this paper we propose a structural similarity quality metric for videos based on a spatial-temporal visual attention model. This model acquires the motion attended region and the distortion attended region by computing the motion features and the distortion contrast. It mimics the visual attention shifting between the two attended regions and takes the burst of error into account by introducing the non-linear weighting fimctions to give a much higher weighting factor to the extremely damaged frames. The proposed metric based on the model renders the final object quality rating of the whole video sequence and is validated using the 50 Hz video sequences of Video Quality Experts Group Phase I test database. 展开更多
关键词 Quality assessment Structural similarity (SSIM) index Attended region visual attention shift
原文传递
Effective Video Summarization Approach Based on Visual Attention
14
作者 Hilal Ahmad Habib Ullah Khan +3 位作者 Sikandar Ali Syed Ijaz Ur Rahman Fazli Wahid Hizbullah Khattak 《Computers, Materials & Continua》 SCIE EI 2022年第4期1427-1442,共16页
Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the... Video summarization is applied to reduce redundancy and developa concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes,the frames that stand out visually are extracted as key frames based on humanattention modeling theories. The schemes for modeling visual attention haveproven to be effective for video summaries. Nevertheless, the high cost ofcomputing in such techniques restricts their usability in everyday situations.In this context, we propose a method based on KFE (key frame extraction)technique, which is recommended based on an efficient and accurate visualattention model. The calculation effort is minimized by utilizing dynamicvisual highlighting based on the temporal gradient instead of the traditionaloptical flow techniques. In addition, an efficient technique using a discretecosine transformation is utilized for the static visual salience. The dynamic andstatic visual attention metrics are merged by means of a non-linear weightedfusion technique. Results of the system are compared with some existing stateof-the-art techniques for the betterment of accuracy. The experimental resultsof our proposed model indicate the efficiency and high standard in terms ofthe key frames extraction as output. 展开更多
关键词 KFE video summarization visual saliency visual attention model
在线阅读 下载PDF
Hand gesture tracking algorithm based on visual attention
15
作者 冯志全 徐涛 +3 位作者 吕娜 唐好魁 蒋彦 梁丽伟 《Journal of Beijing Institute of Technology》 EI CAS 2016年第4期491-501,共11页
In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects ... In the majority of the interaction process, the operator often focuses on the tracked 3D hand gesture model at the "interaction points" in the collision detectionscene, such as "grasp" and "release" and objects in the scene, without paying attention to the tracked 3D hand gesture model in the total procedure. Thus in this paper, a visual attention distribution model of operator in the "grasp", "translation", "release" and other basic operation procedures is first studied and a 3D hand gesture tracking algorithm based on this distribution model is proposed. Utilizing the algorithm, in the period with a low degree of visual attention, a pre-stored 3D hand gesture animation can be used to directly visualise a 3D hand gesture model in the interactive scene; in the time period with a high degree of visual attention, an existing "frame-by-frame tracking" approach can be adopted to obtain a 3D gesture model. The results demonstrate that the proposed method can achieve real-time tracking of 3D hand gestures with an effective improvement on the efficiency, fluency, and availability of 3D hand gesture interaction. 展开更多
关键词 visual attention 3D hand gesture tracking hand gesture interaction
在线阅读 下载PDF
Entropy-based guidance and predictive modelling of pedestrians’visual attention in urban environment 被引量:1
16
作者 Qixu Xie Li Zhang 《Building Simulation》 SCIE EI CSCD 2024年第10期1659-1674,共16页
Selective visual attention determines what pedestrians notice and ignore in urban environment.If consistency exists between different individuals’visual attention,designers can modify design by underlining mechanisms... Selective visual attention determines what pedestrians notice and ignore in urban environment.If consistency exists between different individuals’visual attention,designers can modify design by underlining mechanisms to better meet user needs.However,the mechanism of pedestrians’visual attention remains poorly understood,and it is challenging to forecast which position will attract pedestrians more in urban environment.To address this gap,we employed 360°video and immersive virtual reality to simulate walking scenarios and record eye movement in 138 participants.Our findings reveal a remarkable consistency in fixation distribution across individuals,exceeding both chance and orientation bias.One driver of this consistency emerges as a strategy of information maximization,with participants tending to fixate areas of higher local entropy.Additionally,we built the first eye movement dataset for panorama videos of diverse urban walking scenes,and developed a predictive model to forecast pedestrians’visual attention by supervised deep learning.The predictive model aids designers in better understanding how pedestrians will visually interact with the urban environment during the design phase. 展开更多
关键词 visual attention PEDESTRIAN EYE-TRACKING local entropy deep learning urban ergonomics
原文传递
A Concise and Varied Visual Features-Based Image Captioning Model with Visual Selection
17
作者 Alaa Thobhani Beiji Zou +4 位作者 Xiaoyan Kui Amr Abdussalam Muhammad Asim Naveed Ahmed Mohammed Ali Alshara 《Computers, Materials & Continua》 SCIE EI 2024年第11期2873-2894,共22页
Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms... Image captioning has gained increasing attention in recent years.Visual characteristics found in input images play a crucial role in generating high-quality captions.Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image,improving the effectiveness of identifying relevant image regions at each step of caption generation.However,providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features.Consequently,this leads to enhanced captioning network performance.In light of this,we present an image captioning framework that efficiently exploits the extracted representations of the image.Our framework comprises three key components:the Visual Feature Detector module(VFD),the Visual Feature Visual Attention module(VFVA),and the language model.The VFD module is responsible for detecting a subset of the most pertinent features from the local visual features,creating an updated visual features matrix.Subsequently,the VFVA directs its attention to the visual features matrix generated by the VFD,resulting in an updated context vector employed by the language model to generate an informative description.Integrating the VFD and VFVA modules introduces an additional layer of processing for the visual features,thereby contributing to enhancing the image captioning model’s performance.Using the MS-COCO dataset,our experiments show that the proposed framework competes well with state-of-the-art methods,effectively leveraging visual representations to improve performance.The implementation code can be found here:https://github.com/althobhani/VFDICM(accessed on 30 July 2024). 展开更多
关键词 visual attention image captioning visual feature detector visual feature visual attention
在线阅读 下载PDF
Wayfinding design in transportation architecture-are saliency models or designer visual attention a good predictor of passenger visual attention? 被引量:3
18
作者 Ran Xu Haishan Xia Mei Tian 《Frontiers of Architectural Research》 CSCD 2020年第4期726-738,共13页
In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attentio... In transportation architecture,wayfinding quality is a crucial factor for determining transfer efficiency and level of service.When developing architectural design concepts,designers often employ their visual attention to imagine where passengers will look.A saliency model is a software program that can predict human visual attention.This research examined whether a saliency model or designer visual attention is a good predictor of passenger visual attention during wayfinding in side transportation architecture.Using a remote eye-tracking system,the eye-movements of 29 participants watching 100 still images depicting different indoor seenes of transportation architecture were recorded and transformed into saliency maps to illustrate participants'visual attention.Participants were categorized as either"designers"or"laypeople"based on their architectural design expertise.Similarities were compared among the"designers'"visual attention,saliency model predictions,and"laypeople's"visual attention.The results showed that while the"designers'"visual attention was the best predictor of that of"laypeople",followed by saliency models,a single desig ner's visual attend on was not a good predictor.The divergence in visual attention highlights the limitation of designers in predicting passenger wayfinding behavior and implies that integrating a saliency model in practice can be beneficial for wayfinding design. 展开更多
关键词 Tran sportation architecture design Passenger wayfinding Path choice visual attention Eye fixation
原文传递
EDVAM:a 3D eye-tracking dataset for visual attention modeling in a virtual museum
19
作者 Yunzhan ZHOU Tian FENG +3 位作者 Shihui SHUAI Xiangdong LI Lingyun SUN Henry Been-Lirn DUH 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第1期101-112,共12页
Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tra... Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience.Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases,and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective.We present the first 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum,known as the EDVAM.In addition,a deep learning model is devised and tested with the EDVAM to predict a user’s subsequent visual attention from previous eye movements.This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums. 展开更多
关键词 visual attention Virtual museums Eye-tracking datasets Gaze detection Deep learning
原文传递
Effect of visual attention guidance by camera work in visualization using dome display
20
作者 Tetsuro Ogi Takeshi Yokota 《International Journal of Modeling, Simulation, and Scientific Computing》 EI 2018年第3期42-54,共13页
Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see ... Dome display is expected to be used as effective visualization environment for modeling and simulation due to the features of frameless and high immersive sensation.However,since the users in the dome display can see the projected image in arbitrary direction freely,it is difficult to share information among the viewers.In this research,in order to solve such a problem,the effect of visual attention guidance in the dome environment due to the effect of camera work was examined.As a visualization system,DomePlayer that can express the effect of camera work based on the camera work description language was developed.From the result of evaluation experiments using this system,the constraint condition of the camera work in the dome environment was derived and the effect of visual attention guidance by the camera work was evaluated. 展开更多
关键词 visual attention guidance dome display 360◦image camera work VR sickness
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部