A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress ...A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.展开更多
In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D ...In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.展开更多
Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we deve...Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we develop an unsupervised gaze estimation method that helps with egocentric action anticipation.We adopt gaze map as a feature representation,and input it into a multiple modality network jointly with red-green-blue(RGB),optical flow and object features.We explore the method on EGTEA dataset.The estimated gaze map is further optimized with dilation and Gaussian filter,masked onto the original RGB frame and encoded as the important gaze modality.Our results outperform the strong baseline Rolling-Unrolling LSTMs(RULSTM),with top-5 accuracy achieving 34.31%on the seen test set(S1)and 22.07%on unseen test set(S2).The accuracy is improved by 0.58%and 0.87%,respectively.展开更多
Gaze estimation,a crucial non-verbal communication cue,has achieved remarkable progress through convolutional neural networks.However,accurate gaze prediction in uncon-strained environments,particularly in extreme hea...Gaze estimation,a crucial non-verbal communication cue,has achieved remarkable progress through convolutional neural networks.However,accurate gaze prediction in uncon-strained environments,particularly in extreme head poses,partial occlusions,and abnormal lighting,remains challenging.Existing models often struggle to effectively focus on discriminative ocular features,leading to suboptimal performance.To address these limitations,this paper proposes dual-branch gaze estimation with Gaussian mixture distribution heatmaps and dynamic adaptive loss function(DMGDL),a novel dual-branch gaze estimation algorithm.By introducing Gaussian mixture distribution heatmaps centered on pupil positions as spatial attention guides,the model is enabled to prioritize ocular regions.Additionally,a dual-branch network architecture is designed to separately extract features for yaw and pitch angles,enhancing flexibility and mitigating cross-angle interference.A dynamic adaptive loss function is further formulated to address discontinuities in angle estimation,improving robustness and convergence stability.Experimental evaluations on three benchmark datasets demonstrate that DMGDL outperforms state-of-the-art methods,achiev-ing a mean angular error of 3.98°on the Max-Planck institute for informatics face gaze(MPI-IFaceGaze)dataset,10.21°on the physically unconstrained gaze estimation in the wild(Gaze360)dataset and 6.14°on the real-time eye gaze estimation in natural environments(RT-Gene)dataset,exhibiting superior generalization and robustness.展开更多
Background Eye-tracking technology for mobile devices has made significant progress.However,owing to limited computing capacity and the complexity of context,the conventional image feature-based technology cannot extr...Background Eye-tracking technology for mobile devices has made significant progress.However,owing to limited computing capacity and the complexity of context,the conventional image feature-based technology cannot extract features accurately,thus affecting the performance.Methods This study proposes a novel approach by combining appearance-and feature-based eye-tracking methods.Face and eye region detections were conducted to obtain features that were used as inputs to the appearance model to detect the feature points.The feature points were used to generate feature vectors,such as corner center-pupil center,by which the gaze fixation coordinates were calculated.Results To obtain feature vectors with the best performance,we compared different vectors under different image resolution and illumination conditions,and the results indicated that the average gaze fixation accuracy was achieved at a visual angle of 1.93°when the image resolution was 96×48 pixels,with light sources illuminating from the front of the eye.Conclusions Compared with the current methods,our method improved the accuracy of gaze fixation and it was more usable.展开更多
As humans and robots work closer together than ever,anthropomorphic robotic arms with intuitive human-robot interaction interfaces have drawn massive attention to improving the quality of robot-assisted manipulation.I...As humans and robots work closer together than ever,anthropomorphic robotic arms with intuitive human-robot interaction interfaces have drawn massive attention to improving the quality of robot-assisted manipulation.In pursuit of this,we designed a dedicated 7-degrees-of-freedom(DoF)anthropomorphic robotic arm having three compact differential joints and a head-mounted gaze tracker enabling head-pose-tracked 3D gaze estimation.Moreover,two key challenges were addressed to achieve accurate robot-assisted manipulation of the object indicated by the direction of human gaze.First,a novel predictive pupil feature was proposed for 3D gaze estimation.Differing from most existing features subjected to the common paraxial approximation assumption,the proposed novel predictive pupil feature considered the light refraction at two corneal surfaces with a more realistic eye model,significantly improving the 3D gaze estimation accuracy when the eyeball rotates at large angles.Second,a novel optimization-based approach was developed to efficiently compensate for the posture errors of the designed 7-DoF anthropomorphic robotic arm for accurate manipulation.Compared with the existing Jacobian-based or optimization-based approaches with nominal joint values as iteration initial,the proposed approach computed the optimal iteration initial and realized faster convergence for real-time posture error compensation.With the posture error compensation in real time and 3D gaze estimated accurately,the human can command accurate robot-assisted manipulation using his eyes intuitively.The proposed system was successfully tested on five healthy subjects.展开更多
基金supported by the Science and Technology Support Project of Sichuan Science and Technology Department(2018SZ0357)and China Scholarship。
文摘A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.
基金the National Natural Science Foundation of China,No.61932003and the Fundamental Research Funds for the Central Universities.
文摘In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.
基金Supported by the National Natural Science Foundation of China(61772328)
文摘Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we develop an unsupervised gaze estimation method that helps with egocentric action anticipation.We adopt gaze map as a feature representation,and input it into a multiple modality network jointly with red-green-blue(RGB),optical flow and object features.We explore the method on EGTEA dataset.The estimated gaze map is further optimized with dilation and Gaussian filter,masked onto the original RGB frame and encoded as the important gaze modality.Our results outperform the strong baseline Rolling-Unrolling LSTMs(RULSTM),with top-5 accuracy achieving 34.31%on the seen test set(S1)and 22.07%on unseen test set(S2).The accuracy is improved by 0.58%and 0.87%,respectively.
基金supported by the Key Project of the NationalLanguage Commission(No.ZDI145-110)the AcademicResearch Projects of Beijing Union University(No.ZK20202514)+1 种基金the Key Laboratory Project(No.YYZN-2024-6)the Project for the Construction and Support of High-Level Innovative Teams in Beijing Municipal Institutions(No.BPHR20220121).
文摘Gaze estimation,a crucial non-verbal communication cue,has achieved remarkable progress through convolutional neural networks.However,accurate gaze prediction in uncon-strained environments,particularly in extreme head poses,partial occlusions,and abnormal lighting,remains challenging.Existing models often struggle to effectively focus on discriminative ocular features,leading to suboptimal performance.To address these limitations,this paper proposes dual-branch gaze estimation with Gaussian mixture distribution heatmaps and dynamic adaptive loss function(DMGDL),a novel dual-branch gaze estimation algorithm.By introducing Gaussian mixture distribution heatmaps centered on pupil positions as spatial attention guides,the model is enabled to prioritize ocular regions.Additionally,a dual-branch network architecture is designed to separately extract features for yaw and pitch angles,enhancing flexibility and mitigating cross-angle interference.A dynamic adaptive loss function is further formulated to address discontinuities in angle estimation,improving robustness and convergence stability.Experimental evaluations on three benchmark datasets demonstrate that DMGDL outperforms state-of-the-art methods,achiev-ing a mean angular error of 3.98°on the Max-Planck institute for informatics face gaze(MPI-IFaceGaze)dataset,10.21°on the physically unconstrained gaze estimation in the wild(Gaze360)dataset and 6.14°on the real-time eye gaze estimation in natural environments(RT-Gene)dataset,exhibiting superior generalization and robustness.
基金Supported by the National Natural Science Foundation of China (61772468, 62172368)the Fundamental Research Funds forthe Provincial Universities of Zhejiang (RF-B2019001)
文摘Background Eye-tracking technology for mobile devices has made significant progress.However,owing to limited computing capacity and the complexity of context,the conventional image feature-based technology cannot extract features accurately,thus affecting the performance.Methods This study proposes a novel approach by combining appearance-and feature-based eye-tracking methods.Face and eye region detections were conducted to obtain features that were used as inputs to the appearance model to detect the feature points.The feature points were used to generate feature vectors,such as corner center-pupil center,by which the gaze fixation coordinates were calculated.Results To obtain feature vectors with the best performance,we compared different vectors under different image resolution and illumination conditions,and the results indicated that the average gaze fixation accuracy was achieved at a visual angle of 1.93°when the image resolution was 96×48 pixels,with light sources illuminating from the front of the eye.Conclusions Compared with the current methods,our method improved the accuracy of gaze fixation and it was more usable.
基金supported by the National Natural Science Foundation of China(Grant Nos.52027806,52435005,92248304,52075191).
文摘As humans and robots work closer together than ever,anthropomorphic robotic arms with intuitive human-robot interaction interfaces have drawn massive attention to improving the quality of robot-assisted manipulation.In pursuit of this,we designed a dedicated 7-degrees-of-freedom(DoF)anthropomorphic robotic arm having three compact differential joints and a head-mounted gaze tracker enabling head-pose-tracked 3D gaze estimation.Moreover,two key challenges were addressed to achieve accurate robot-assisted manipulation of the object indicated by the direction of human gaze.First,a novel predictive pupil feature was proposed for 3D gaze estimation.Differing from most existing features subjected to the common paraxial approximation assumption,the proposed novel predictive pupil feature considered the light refraction at two corneal surfaces with a more realistic eye model,significantly improving the 3D gaze estimation accuracy when the eyeball rotates at large angles.Second,a novel optimization-based approach was developed to efficiently compensate for the posture errors of the designed 7-DoF anthropomorphic robotic arm for accurate manipulation.Compared with the existing Jacobian-based or optimization-based approaches with nominal joint values as iteration initial,the proposed approach computed the optimal iteration initial and realized faster convergence for real-time posture error compensation.With the posture error compensation in real time and 3D gaze estimated accurately,the human can command accurate robot-assisted manipulation using his eyes intuitively.The proposed system was successfully tested on five healthy subjects.