Gaze estimation,a crucial non-verbal communication cue,has achieved remarkable progress through convolutional neural networks.However,accurate gaze prediction in uncon-strained environments,particularly in extreme hea...Gaze estimation,a crucial non-verbal communication cue,has achieved remarkable progress through convolutional neural networks.However,accurate gaze prediction in uncon-strained environments,particularly in extreme head poses,partial occlusions,and abnormal lighting,remains challenging.Existing models often struggle to effectively focus on discriminative ocular features,leading to suboptimal performance.To address these limitations,this paper proposes dual-branch gaze estimation with Gaussian mixture distribution heatmaps and dynamic adaptive loss function(DMGDL),a novel dual-branch gaze estimation algorithm.By introducing Gaussian mixture distribution heatmaps centered on pupil positions as spatial attention guides,the model is enabled to prioritize ocular regions.Additionally,a dual-branch network architecture is designed to separately extract features for yaw and pitch angles,enhancing flexibility and mitigating cross-angle interference.A dynamic adaptive loss function is further formulated to address discontinuities in angle estimation,improving robustness and convergence stability.Experimental evaluations on three benchmark datasets demonstrate that DMGDL outperforms state-of-the-art methods,achiev-ing a mean angular error of 3.98°on the Max-Planck institute for informatics face gaze(MPI-IFaceGaze)dataset,10.21°on the physically unconstrained gaze estimation in the wild(Gaze360)dataset and 6.14°on the real-time eye gaze estimation in natural environments(RT-Gene)dataset,exhibiting superior generalization and robustness.展开更多
Dear Editor,Dorsal pontine lesions may cause a variety of complex neuro-ophthalmic deficits,including horizontal gaze palsy(HGP),internuclear ophthalmoplegia,one-and-ahalf syndrome,abducens nerve palsy,skew deviation,...Dear Editor,Dorsal pontine lesions may cause a variety of complex neuro-ophthalmic deficits,including horizontal gaze palsy(HGP),internuclear ophthalmoplegia,one-and-ahalf syndrome,abducens nerve palsy,skew deviation,or any combination of these.Here we present a rare case of an adult patient who developed multiple complicated clinical manifestations after surgical removal of a pontine cavernous hemangioma(PCH).Our case highlights a single pontine lesion may involve complicated neural pathways and result in complicated symptoms and signs,in which abducens nerve palsy or skew deviation is easily missed when combined with HGP.展开更多
The orthogonal time frequency space(OTFS)modulation is a novel modulation scheme that can effectively cope with the high Doppler expansion caused by high mobility.Since it modulates data on delay-Doppler(DD)domain and...The orthogonal time frequency space(OTFS)modulation is a novel modulation scheme that can effectively cope with the high Doppler expansion caused by high mobility.Since it modulates data on delay-Doppler(DD)domain and makes full use of the sparse characteristics of DD domain,it has been widely studied to design efficient channel estimation and signal detection schemes.In this paper,we design a novel superimposed pilot pattern with transition band,which replaces the traditional embedded pilot(EP)guard zero-symbols,and perform a two-stage channel estimation.In the first stage,we fully utilize the dispersion characteristics of OTFS signal in DD domain,and use threshold decision to make coarse channel estimation.In the second stage,we use the results of the coarse estimation for iterative signal detection and accurate channel estimation.During the second stage,we make full use of the sparsity of the channel in DD domain,remodel the received signal into the form of sparse channel vector multiplied by channel coefficient matrix,and introduce Doppler index segmentation factor(DISF)to subdivide the Doppler index to solve the problem of fractional Doppler.Simulations reveal that,the scheme proposed in this paper has higher spectral efficiency compared with traditional EP scheme and lower peak-to-average power ratio(PAPR)compared with traditional superimposed pilot scheme.展开更多
In GNSS-denied environments,signals of opportunity(SOP)offer an efficient and passive solution for navigation and positioning by utilizing ambient signals.Nevertheless,conventional SOP techniques face significant chal...In GNSS-denied environments,signals of opportunity(SOP)offer an efficient and passive solution for navigation and positioning by utilizing ambient signals.Nevertheless,conventional SOP techniques face significant challenges in real-time processing,especially under sub-Nyquist sampling conditions,due to high data acquisition rates and offgrid errors.To address this,this paper proposes the signal reconstruction and kernel sparse encoding(SRKSE)model,a novel general framework for high-precision parameter estimation.By combining compressed sensing with a deep unfolding network,the SRKSE model not only achieves robust signal reconstruction but also effectively reduces quantization errors.Key innovations of SRKSE include dual crossattention mechanisms for enhanced feature extraction,sinc sparse kernel encoding to minimize quantization errors,and a custom loss function for balanced optimization.With these advancements,SRKSE achieves up to a 650-fold improvement in time of arrival(TOA)estimation accuracy while operating at just 1%of the Nyquist sampling rate.The SRKSE surpasses both conventional and deep learning-based techniques in accuracy and efficiency,especially when operating under sub-Nyquist sampling conditions.Simulations and real-world experiments confirm the reliability and potential of SRKSE for real-time applications in IoT and wireless communication.展开更多
The growing use of lithium-ion batteries in electric transportation and grid-scale storage systems has intensified the need for accurate and highly generalizable state-of-health(SOH)estimation.Conventional approaches ...The growing use of lithium-ion batteries in electric transportation and grid-scale storage systems has intensified the need for accurate and highly generalizable state-of-health(SOH)estimation.Conventional approaches often suffer from reduced accuracy under dynamically uncertain state-of-charge(SOC)operating ranges and heterogeneous aging stresses.This study presents a unified SOH estimation framework that integrates physics-informed modeling,subspace identification,and Transformer-based learning.A reduced-order model is derived from simplified electrochemical dynamics,providing an interpretable and computationally efficient representation of battery behavior.Subspace identification across a wide SOC and SOH range yields degradation-sensitive features,which the Transformer uses to capture long-range aging dynamics via multi-head self-attention.Experiments on LiFePO4 cells under joint-cell training show consistently accurate SOH estimation,with a maximum error of 1.39%,demonstrating the framework’s effectiveness in decoupling SOC and SOH effects.In cross-cell validation,where training and validation are performed on different cells,the model maintains a maximum error of 2.06%,confirming strong generalization to unseen aging trajectories.Comparative experiments on LiFePO_(4)and public LiCoO_(2)datasets confirm the framework’s cross-chemistry applicability.By extracting low-dimensional,physically interpretable features via subspace identification,the framework significantly reduces training cost while maintaining high SOH estimation accuracy,outperforming conventional data-driven models lacking physical guidance.展开更多
Presented in this study is a novel method for estimating the depth of single underwater source in shallow water,utilizing vector sensors.The approach leverages the depth distribution of the broadband Stokes parameters...Presented in this study is a novel method for estimating the depth of single underwater source in shallow water,utilizing vector sensors.The approach leverages the depth distribution of the broadband Stokes parameters to estimate source depth accurately.Unlike traditional matched field processing(MFP)and matched mode processing(MMP),the proposed approach can estimate source depth directly from the data received by sensors without requiring complete environmental information.Firstly,the broadband Stokes parameters(BSP)are established using the normal mode theory.Then the nonstationary phase approximation is used to simplify the theoretical derivation,which is necessary when dealing with broadband integrals.Additionally,range terms of the BSP are eliminated by normalization.By analyzing the depth distribution of the normalized broadband Stokes parameters(NBSP),it is found that the NBSP exhibit extreme values at the source depth,which can be used for source depth estimation.So the proposed depth estimation method is based on searching the peaks of the NBSP.Simulations show that this method is effective in relatively simple shallow water environments.Finally,the effect of source range,frequency bandwidth,sound speed profile(SSP),water depth,and signal-to-noise ratio(SNR)are studied.The findings indicate that the proposed method can accurately estimate the source depth when the SNR is greater than-5 d B and does not need to consider model mismatch issues.Additionally,variations in environmental parameters have minimal impact on estimation accuracy.Compared to MFP,the proposed method requires a higher SNR,but demonstrates superior robustness against fluctuations in environmental parameters.展开更多
Considering the impact of terminal impact time constraints and the state information of maneuvering targets on the guidance accuracy in multi-UAV cooperative guidance,this paper proposes an impact time cooperative con...Considering the impact of terminal impact time constraints and the state information of maneuvering targets on the guidance accuracy in multi-UAV cooperative guidance,this paper proposes an impact time cooperative control guidance law(ITCCG)that combines the optimal error dynamics with an improved adaptive cubature Kalman filter(IACKF)algorithm.First,a terminal impact time feedback term is introduced into proportional navigation guidance based on the relative virtual guidance model,and terminal time control is achieved through optimal error dynamics.Then,the Huber loss function is used to reduce the impact of measurement outliers,and the diagonal decomposition is applied to address the issue of non-positive definite matrices that cannot undergo Cholesky decomposition.Finally,the ITCCG and IACKF algorithms combined achieve multi-UAV time-cooperated guidance based on maneuvering target state estimation.Simulation results show that the proposed algorithm effectively reduces the target state estimation error and achieves cooperative guidance within the desired time frame.展开更多
The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly fa...The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.展开更多
(Quasi-)closed-form results for the statistical properties of unmanned aerial vehicle(UAV)airto-ground channels are derived for the first time using a novel spatial-vector-based method from a threedimensional(3-D)arbi...(Quasi-)closed-form results for the statistical properties of unmanned aerial vehicle(UAV)airto-ground channels are derived for the first time using a novel spatial-vector-based method from a threedimensional(3-D)arbitrary-elevation one-cylinder model.The derived results include a closed-form expression for the space-time correlation function and some quasi-closed-form ones for the space-Doppler power spectrum density,the level crossing rate,and the average fading duration,which are shown to be the generalizations of those previously obtained from the two-dimensional(2-D)one-ring model and the 3-D low-elevation one-cylinder model for terrestrial mobile-to-mobile channels.The close agreements between the theoretical results and the simulations as well as the measurements validate the utility of the derived channel statistics.Based on the derived expressions,the impacts of some parameters on the channel characteristics are investigated in an effective,efficient,and explicable way,which leads to a general guideline on the manual parameter estimation from the measurement description.展开更多
Accurate estimation of photovoltaic(PV)parameters is essential for optimizing solar module perfor-mance and enhancing resource efficiency in renewable energy systems.This study presents a process innovation by introdu...Accurate estimation of photovoltaic(PV)parameters is essential for optimizing solar module perfor-mance and enhancing resource efficiency in renewable energy systems.This study presents a process innovation by introducing,for the first time,the Triangulation Topology Aggregation Optimizer(TTAO)integrated with parallel computing to address PV parameter estimation challenges.The effectiveness and robustness of TTAO are rigorously evaluated using two standard benchmark datasets(KC200GT and R.T.C.France solar cells)and a real-world dataset(Poly70W solar module)under single-,double-,and triple-diode configurations.Results show that TTAO consistently achieves superior accuracy by producing the lowest RMSE values and faster convergence compared to state-of-the-art metaheuristic algorithms.In addition,the integration of parallel computing significantly enhances computational efficiency,reducing execution time by up to 85%without compromising accuracy.Validation using real-world data further demonstrates TTAO’s adaptability and practical relevance in renewable energy systems,effectively bridging the gap between theoretical modeling and real-world implementation for PV system monitoring and optimization,contributing to climate mitigation through improved solar energy performance.展开更多
We investigated the impact of convexity and isoperimetric deficits on the accuracy of sectional area estimates of tree stems using traditional methods(caliper,tape,formulas based on stem diameter and circumference).In...We investigated the impact of convexity and isoperimetric deficits on the accuracy of sectional area estimates of tree stems using traditional methods(caliper,tape,formulas based on stem diameter and circumference).In two complementary experiments,the use of photographs to estimate cross-sectional areas was first validated,then the use of a caliper and diameter tape was computer-simulated.The results indicated that the photographic method offers high precision,with mean relative errors below 0.1%,minimal deviation,and no significant bias,and the traditional methods led to substantial and systematic errors,with deviations from circularity and convexity significantly increasing the errors in area estimation.展开更多
A scheme is proposed based on a Mach-Zehnder interferometer with high phase sensitivity,utilizing a two-mode squeezed coherent state,generated by four-wave mixing,as input.The phase sensitivity of this scheme easily s...A scheme is proposed based on a Mach-Zehnder interferometer with high phase sensitivity,utilizing a two-mode squeezed coherent state,generated by four-wave mixing,as input.The phase sensitivity of this scheme easily surpasses the Heisenberg limit when intensity difference detection is applied.Under phase-matching conditions,the quantum Cramér-Rao bound significantly exceeds the Heisenberg limit.Additionally,the scheme exhibits robustness against photon loss.When compared with the modified SU(1,1)interferometer with two coherent state inputs,this approach demonstrates superior measurement sensitivity,evaluated through various detection methods and the quantum Cramér-Rao bound.This work holds potential applications in quantum metrology.展开更多
Accurate time delay estimation of target echo signals is a critical component of underwater target localization.In active sonar systems,echo signal processing is vulnerable to the effects of reverberation and noise in...Accurate time delay estimation of target echo signals is a critical component of underwater target localization.In active sonar systems,echo signal processing is vulnerable to the effects of reverberation and noise in the maritime environment.This paper proposes a novel method for estimating target time delay using multi-bright spot echoes,assuming the target’s size and depth are known.Aiming to effectively enhance the extraction of geometric features from the target echoes and mitigate the impact of reverberation and noise,the proposed approach employs the fractional order Fourier transform-frequency sliced wavelet transform to extract multi-bright spot echoes.Using the highlighting model theory and the target size information,an observation matrix is constructed to represent multi-angle incident signals and obtain the theoretical scattered echo signals from different angles.Aiming to accurately estimate the target’s time delay,waveform similarity coefficients and mean square error values between the theoretical return signals and received signals are computed across various incident angles and time delays.Simulation results show that,compared to the conventional matched filter,the proposed algorithm reduces the relative error by 65.9%-91.5%at a signal-to noise ratio of-25 dB,and by 66.7%-88.9%at a signal-to-reverberation ratio of−10 dB.This algorithm provides a new approach for the precise localization of submerged targets in shallow water environments.展开更多
Accurate parameter extraction of photovoltaic(PV)models plays a critical role in enabling precise performance prediction,optimal system sizing,and effective operational control under diverse environmental conditions.W...Accurate parameter extraction of photovoltaic(PV)models plays a critical role in enabling precise performance prediction,optimal system sizing,and effective operational control under diverse environmental conditions.While a wide range of metaheuristic optimisation techniques have been applied to this problem,many existing methods are hindered by slow convergence rates,susceptibility to premature stagnation,and reduced accuracy when applied to complex multi-diode PV configurations.These limitations can lead to suboptimal modelling,reducing the efficiency of PV system design and operation.In this work,we propose an enhanced hybrid optimisation approach,the modified Spider Wasp Optimization(mSWO)with Opposition-Based Learning algorithm,which integrates the exploration and exploitation capabilities of the Spider Wasp Optimization(SWO)metaheuristic with the diversityenhancing mechanism of Opposition-Based Learning(OBL).The hybridisation is designed to dynamically expand the search space coverage,avoid premature convergence,and improve both convergence speed and precision in highdimensional optimisation tasks.The mSWO algorithm is applied to three well-established PV configurations:the single diode model(SDM),the double diode model(DDM),and the triple diode model(TDM).Real experimental current-voltage(I-V)datasets from a commercial PV module under standard test conditions(STC)are used for evaluation.Comparative analysis is conducted against eighteen advanced metaheuristic algorithms,including BSDE,RLGBO,GWOCS,MFO,EO,TSA,and SCA.Performance metrics include minimum,mean,and maximum root mean square error(RMSE),standard deviation(SD),and convergence behaviour over 30 independent runs.The results reveal that mSWO consistently delivers superior accuracy and robustness across all PV models,achieving the lowest RMSE values of 0.000986022(SDM),0.000982884(DDM),and 0.000982529(TDM),with minimal SD values,indicating remarkable repeatability.Convergence analyses further show that mSWO reaches optimal solutions more rapidly and with fewer oscillations than all competing methods,with the performance gap widening as model complexity increases.These findings demonstrate that mSWO provides a scalable,computationally efficient,and highly reliable framework for PV parameter extraction.Its adaptability to models of growing complexity suggests strong potential for broader applications in renewable energy systems,including performance monitoring,fault detection,and intelligent control,thereby contributing to the optimisation of next-generation solar energy solutions.展开更多
A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress ...A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.展开更多
Gaze estimation has become an important field of image and information processing.Estimating gaze from full-face images using convolutional neural network(CNN) has achieved fine accuracy.However,estimating gaze from e...Gaze estimation has become an important field of image and information processing.Estimating gaze from full-face images using convolutional neural network(CNN) has achieved fine accuracy.However,estimating gaze from eye images is very challenging due to the less information contained in eye images than in full-face images,and it’s still vital since eye-image-based methods have wider applications.In this paper,we propose the discretization-gaze network(DGaze-Net) to optimize monocular three-dimensional(3D) gaze estimation accuracy by feature discretization and attention mechanism.The gaze predictor of DGaze-Net is optimized based on feature discretization.By discretizing the gaze angle into K bins,a classification constraint is added to the gaze predictor.In the gaze predictor,the gaze angle is pre-applied with a binned classification before regressing with the real gaze angle to improve gaze estimation accuracy.In addition,the attention mechanism is applied to the backbone to enhance the ability to extract eye features related to gaze.The proposed method is validated on three gaze datasets and achieves encouraging gaze estimation accuracy.展开更多
In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D ...In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.展开更多
Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we deve...Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we develop an unsupervised gaze estimation method that helps with egocentric action anticipation.We adopt gaze map as a feature representation,and input it into a multiple modality network jointly with red-green-blue(RGB),optical flow and object features.We explore the method on EGTEA dataset.The estimated gaze map is further optimized with dilation and Gaussian filter,masked onto the original RGB frame and encoded as the important gaze modality.Our results outperform the strong baseline Rolling-Unrolling LSTMs(RULSTM),with top-5 accuracy achieving 34.31%on the seen test set(S1)and 22.07%on unseen test set(S2).The accuracy is improved by 0.58%and 0.87%,respectively.展开更多
Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This st...Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.展开更多
Gaze estimation is one of the most promising technologies for supporting indoor monitoring and interaction systems.However,previous gaze estimation techniques generally work only in a controlled laboratory environment...Gaze estimation is one of the most promising technologies for supporting indoor monitoring and interaction systems.However,previous gaze estimation techniques generally work only in a controlled laboratory environment because they require a number of high-resolution eye images.This makes them unsuitable for welfare and healthcare facilities with the following challenging characteristics:1)users’continuous movements,2)various lighting conditions,and 3)a limited amount of available data.To address these issues,we introduce a multi-view multi-modal head-gaze estimation system that translates the user’s head orientation into the gaze direction.The proposed system captures the user using multiple cameras with depth and infrared modalities to train more robust gaze estimators under the aforementioned conditions.To this end,we implemented a deep learning pipeline that can handle different types and combinations of data.The proposed system was evaluated using the data collected from 10 volunteer participants to analyze how the use of single/multiple cameras and modalities affect the performance of head-gaze estimators.Through various experiments,we found that 1)an infrared-modality provides more useful features than a depth-modality,2)multi-view multi-modal approaches provide better accuracy than singleview single-modal approaches,and 3)the proposed estimators achieve a high inference efficiency that can be used in real-time applications.展开更多
基金supported by the Key Project of the NationalLanguage Commission(No.ZDI145-110)the AcademicResearch Projects of Beijing Union University(No.ZK20202514)+1 种基金the Key Laboratory Project(No.YYZN-2024-6)the Project for the Construction and Support of High-Level Innovative Teams in Beijing Municipal Institutions(No.BPHR20220121).
文摘Gaze estimation,a crucial non-verbal communication cue,has achieved remarkable progress through convolutional neural networks.However,accurate gaze prediction in uncon-strained environments,particularly in extreme head poses,partial occlusions,and abnormal lighting,remains challenging.Existing models often struggle to effectively focus on discriminative ocular features,leading to suboptimal performance.To address these limitations,this paper proposes dual-branch gaze estimation with Gaussian mixture distribution heatmaps and dynamic adaptive loss function(DMGDL),a novel dual-branch gaze estimation algorithm.By introducing Gaussian mixture distribution heatmaps centered on pupil positions as spatial attention guides,the model is enabled to prioritize ocular regions.Additionally,a dual-branch network architecture is designed to separately extract features for yaw and pitch angles,enhancing flexibility and mitigating cross-angle interference.A dynamic adaptive loss function is further formulated to address discontinuities in angle estimation,improving robustness and convergence stability.Experimental evaluations on three benchmark datasets demonstrate that DMGDL outperforms state-of-the-art methods,achiev-ing a mean angular error of 3.98°on the Max-Planck institute for informatics face gaze(MPI-IFaceGaze)dataset,10.21°on the physically unconstrained gaze estimation in the wild(Gaze360)dataset and 6.14°on the real-time eye gaze estimation in natural environments(RT-Gene)dataset,exhibiting superior generalization and robustness.
文摘Dear Editor,Dorsal pontine lesions may cause a variety of complex neuro-ophthalmic deficits,including horizontal gaze palsy(HGP),internuclear ophthalmoplegia,one-and-ahalf syndrome,abducens nerve palsy,skew deviation,or any combination of these.Here we present a rare case of an adult patient who developed multiple complicated clinical manifestations after surgical removal of a pontine cavernous hemangioma(PCH).Our case highlights a single pontine lesion may involve complicated neural pathways and result in complicated symptoms and signs,in which abducens nerve palsy or skew deviation is easily missed when combined with HGP.
基金supported by National Natural Science Foundation(NNSF)of China under Grant 62001351the Foundation of National Key Laboratory of Electromagnetic Environment(6142403220202)the Stability Support Fund for Basic Military Industrial Research Institutes(A240104130).
文摘The orthogonal time frequency space(OTFS)modulation is a novel modulation scheme that can effectively cope with the high Doppler expansion caused by high mobility.Since it modulates data on delay-Doppler(DD)domain and makes full use of the sparse characteristics of DD domain,it has been widely studied to design efficient channel estimation and signal detection schemes.In this paper,we design a novel superimposed pilot pattern with transition band,which replaces the traditional embedded pilot(EP)guard zero-symbols,and perform a two-stage channel estimation.In the first stage,we fully utilize the dispersion characteristics of OTFS signal in DD domain,and use threshold decision to make coarse channel estimation.In the second stage,we use the results of the coarse estimation for iterative signal detection and accurate channel estimation.During the second stage,we make full use of the sparsity of the channel in DD domain,remodel the received signal into the form of sparse channel vector multiplied by channel coefficient matrix,and introduce Doppler index segmentation factor(DISF)to subdivide the Doppler index to solve the problem of fractional Doppler.Simulations reveal that,the scheme proposed in this paper has higher spectral efficiency compared with traditional EP scheme and lower peak-to-average power ratio(PAPR)compared with traditional superimposed pilot scheme.
基金National Key Laboratory of Unmanned Aerial Vehicle Technology(No.202408)Key Laboratory of Smart Earth(No.KF2023ZD01-05)。
文摘In GNSS-denied environments,signals of opportunity(SOP)offer an efficient and passive solution for navigation and positioning by utilizing ambient signals.Nevertheless,conventional SOP techniques face significant challenges in real-time processing,especially under sub-Nyquist sampling conditions,due to high data acquisition rates and offgrid errors.To address this,this paper proposes the signal reconstruction and kernel sparse encoding(SRKSE)model,a novel general framework for high-precision parameter estimation.By combining compressed sensing with a deep unfolding network,the SRKSE model not only achieves robust signal reconstruction but also effectively reduces quantization errors.Key innovations of SRKSE include dual crossattention mechanisms for enhanced feature extraction,sinc sparse kernel encoding to minimize quantization errors,and a custom loss function for balanced optimization.With these advancements,SRKSE achieves up to a 650-fold improvement in time of arrival(TOA)estimation accuracy while operating at just 1%of the Nyquist sampling rate.The SRKSE surpasses both conventional and deep learning-based techniques in accuracy and efficiency,especially when operating under sub-Nyquist sampling conditions.Simulations and real-world experiments confirm the reliability and potential of SRKSE for real-time applications in IoT and wireless communication.
基金supported by the National Natural Science Foundation of China(No.52207228)the Beijing Natural Science Foundation,China(No.3224070)the National Natural Science Foundation of China(No.52077208).
文摘The growing use of lithium-ion batteries in electric transportation and grid-scale storage systems has intensified the need for accurate and highly generalizable state-of-health(SOH)estimation.Conventional approaches often suffer from reduced accuracy under dynamically uncertain state-of-charge(SOC)operating ranges and heterogeneous aging stresses.This study presents a unified SOH estimation framework that integrates physics-informed modeling,subspace identification,and Transformer-based learning.A reduced-order model is derived from simplified electrochemical dynamics,providing an interpretable and computationally efficient representation of battery behavior.Subspace identification across a wide SOC and SOH range yields degradation-sensitive features,which the Transformer uses to capture long-range aging dynamics via multi-head self-attention.Experiments on LiFePO4 cells under joint-cell training show consistently accurate SOH estimation,with a maximum error of 1.39%,demonstrating the framework’s effectiveness in decoupling SOC and SOH effects.In cross-cell validation,where training and validation are performed on different cells,the model maintains a maximum error of 2.06%,confirming strong generalization to unseen aging trajectories.Comparative experiments on LiFePO_(4)and public LiCoO_(2)datasets confirm the framework’s cross-chemistry applicability.By extracting low-dimensional,physically interpretable features via subspace identification,the framework significantly reduces training cost while maintaining high SOH estimation accuracy,outperforming conventional data-driven models lacking physical guidance.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.12274348 and 12004335)the National Key Research and Development Program of China(Grant No.2024YFC2813800)。
文摘Presented in this study is a novel method for estimating the depth of single underwater source in shallow water,utilizing vector sensors.The approach leverages the depth distribution of the broadband Stokes parameters to estimate source depth accurately.Unlike traditional matched field processing(MFP)and matched mode processing(MMP),the proposed approach can estimate source depth directly from the data received by sensors without requiring complete environmental information.Firstly,the broadband Stokes parameters(BSP)are established using the normal mode theory.Then the nonstationary phase approximation is used to simplify the theoretical derivation,which is necessary when dealing with broadband integrals.Additionally,range terms of the BSP are eliminated by normalization.By analyzing the depth distribution of the normalized broadband Stokes parameters(NBSP),it is found that the NBSP exhibit extreme values at the source depth,which can be used for source depth estimation.So the proposed depth estimation method is based on searching the peaks of the NBSP.Simulations show that this method is effective in relatively simple shallow water environments.Finally,the effect of source range,frequency bandwidth,sound speed profile(SSP),water depth,and signal-to-noise ratio(SNR)are studied.The findings indicate that the proposed method can accurately estimate the source depth when the SNR is greater than-5 d B and does not need to consider model mismatch issues.Additionally,variations in environmental parameters have minimal impact on estimation accuracy.Compared to MFP,the proposed method requires a higher SNR,but demonstrates superior robustness against fluctuations in environmental parameters.
基金supported by the Fundamental Research Funds for the Central Universities of China(FRF-TP-24-058A)with additional support from the National Key Laboratory of Helicopter Aeromechanics(2024-ZSJ-LB-02-02).
文摘Considering the impact of terminal impact time constraints and the state information of maneuvering targets on the guidance accuracy in multi-UAV cooperative guidance,this paper proposes an impact time cooperative control guidance law(ITCCG)that combines the optimal error dynamics with an improved adaptive cubature Kalman filter(IACKF)algorithm.First,a terminal impact time feedback term is introduced into proportional navigation guidance based on the relative virtual guidance model,and terminal time control is achieved through optimal error dynamics.Then,the Huber loss function is used to reduce the impact of measurement outliers,and the diagonal decomposition is applied to address the issue of non-positive definite matrices that cannot undergo Cholesky decomposition.Finally,the ITCCG and IACKF algorithms combined achieve multi-UAV time-cooperated guidance based on maneuvering target state estimation.Simulation results show that the proposed algorithm effectively reduces the target state estimation error and achieves cooperative guidance within the desired time frame.
基金supported by the Knowledge Innovation Program of Wuhan-Shuguang Project(Grant No.2023010201020443)the School-Level Scientific Research Project Funding Program of Jianghan University(Grant No.2022XKZX33)the Natural Science Foundation of Hubei Province(Grant No.2024AFB466).
文摘The 6D pose estimation of objects is of great significance for the intelligent assembly and sorting of industrial parts.In the industrial robot production scenarios,the 6D pose estimation of industrial parts mainly faces two challenges:one is the loss of information and interference caused by occlusion and stacking in the sorting scenario,the other is the difficulty of feature extraction due to the weak texture of industrial parts.To address the above problems,this paper proposes an attention-based pixel-level voting network for 6D pose estimation of weakly textured industrial parts,namely CB-PVNet.On the one hand,the voting scheme can predict the keypoints of affected pixels,which improves the accuracy of keypoint localization even in scenarios such as weak texture and partial occlusion.On the other hand,the attention mechanism can extract interesting features of the object while suppressing useless features of surroundings.Extensive comparative experiments were conducted on both public datasets(including LINEMOD,Occlusion LINEMOD and T-LESS datasets)and self-made datasets.The experimental results indicate that the proposed network CB-PVNet can achieve accuracy of ADD(-s)comparable to state-of-the-art using only RGB images while ensuring real-time performance.Additionally,we also conducted robot grasping experiments in the real world.The balance between accuracy and computational efficiency makes the method well-suited for applications in industrial automation.
基金supported in part by the National Key Research and Development Program of China(2021YFB2900501)in part by the Shaanxi Science and Technology Innovation Team(2023-CX-TD-03)+3 种基金in part by the Science and Technology Program of Shaanxi Province(2021GXLH-Z-038)in part by the Natural Science Foundation of Hunan Province(2023JJ40607 and 2023JJ50045)in part by the Scientific Research Foundation of Hunan Provincial Education Department(23B0713 and 24B0603)in part by the National Natural Science Foundation of China(62401371,62101275,and 62372070).
文摘(Quasi-)closed-form results for the statistical properties of unmanned aerial vehicle(UAV)airto-ground channels are derived for the first time using a novel spatial-vector-based method from a threedimensional(3-D)arbitrary-elevation one-cylinder model.The derived results include a closed-form expression for the space-time correlation function and some quasi-closed-form ones for the space-Doppler power spectrum density,the level crossing rate,and the average fading duration,which are shown to be the generalizations of those previously obtained from the two-dimensional(2-D)one-ring model and the 3-D low-elevation one-cylinder model for terrestrial mobile-to-mobile channels.The close agreements between the theoretical results and the simulations as well as the measurements validate the utility of the derived channel statistics.Based on the derived expressions,the impacts of some parameters on the channel characteristics are investigated in an effective,efficient,and explicable way,which leads to a general guideline on the manual parameter estimation from the measurement description.
基金funded by the Malaysian Ministry of Higher Education through the Fundamental Research Grant Scheme(FRGS/1/2024/ICT02/UCSI/02/1).
文摘Accurate estimation of photovoltaic(PV)parameters is essential for optimizing solar module perfor-mance and enhancing resource efficiency in renewable energy systems.This study presents a process innovation by introducing,for the first time,the Triangulation Topology Aggregation Optimizer(TTAO)integrated with parallel computing to address PV parameter estimation challenges.The effectiveness and robustness of TTAO are rigorously evaluated using two standard benchmark datasets(KC200GT and R.T.C.France solar cells)and a real-world dataset(Poly70W solar module)under single-,double-,and triple-diode configurations.Results show that TTAO consistently achieves superior accuracy by producing the lowest RMSE values and faster convergence compared to state-of-the-art metaheuristic algorithms.In addition,the integration of parallel computing significantly enhances computational efficiency,reducing execution time by up to 85%without compromising accuracy.Validation using real-world data further demonstrates TTAO’s adaptability and practical relevance in renewable energy systems,effectively bridging the gap between theoretical modeling and real-world implementation for PV system monitoring and optimization,contributing to climate mitigation through improved solar energy performance.
文摘We investigated the impact of convexity and isoperimetric deficits on the accuracy of sectional area estimates of tree stems using traditional methods(caliper,tape,formulas based on stem diameter and circumference).In two complementary experiments,the use of photographs to estimate cross-sectional areas was first validated,then the use of a caliper and diameter tape was computer-simulated.The results indicated that the photographic method offers high precision,with mean relative errors below 0.1%,minimal deviation,and no significant bias,and the traditional methods led to substantial and systematic errors,with deviations from circularity and convexity significantly increasing the errors in area estimation.
基金supported by the National Natural Science Foundation of China(Grant Nos.12104190,12104189,12204312)the Natural Science Foundation of Jiangsu Province(Grant No.BK20210874)+2 种基金General project of Natural Science Research in Colleges And Universities of Jiangsu Province(Grant No.20KJB140008)the Jiangxi Provincial Natural Science Foundation(Grant Nos.20224BAB211014 and 20232BAB201042)Key Laboratory of Tian Qin Project(Sun Yat-sen University)。
文摘A scheme is proposed based on a Mach-Zehnder interferometer with high phase sensitivity,utilizing a two-mode squeezed coherent state,generated by four-wave mixing,as input.The phase sensitivity of this scheme easily surpasses the Heisenberg limit when intensity difference detection is applied.Under phase-matching conditions,the quantum Cramér-Rao bound significantly exceeds the Heisenberg limit.Additionally,the scheme exhibits robustness against photon loss.When compared with the modified SU(1,1)interferometer with two coherent state inputs,this approach demonstrates superior measurement sensitivity,evaluated through various detection methods and the quantum Cramér-Rao bound.This work holds potential applications in quantum metrology.
基金Supported by the State Key Laboratory of Acoustics and Marine Information Chinese Academy of Sciences(SKL A202507).
文摘Accurate time delay estimation of target echo signals is a critical component of underwater target localization.In active sonar systems,echo signal processing is vulnerable to the effects of reverberation and noise in the maritime environment.This paper proposes a novel method for estimating target time delay using multi-bright spot echoes,assuming the target’s size and depth are known.Aiming to effectively enhance the extraction of geometric features from the target echoes and mitigate the impact of reverberation and noise,the proposed approach employs the fractional order Fourier transform-frequency sliced wavelet transform to extract multi-bright spot echoes.Using the highlighting model theory and the target size information,an observation matrix is constructed to represent multi-angle incident signals and obtain the theoretical scattered echo signals from different angles.Aiming to accurately estimate the target’s time delay,waveform similarity coefficients and mean square error values between the theoretical return signals and received signals are computed across various incident angles and time delays.Simulation results show that,compared to the conventional matched filter,the proposed algorithm reduces the relative error by 65.9%-91.5%at a signal-to noise ratio of-25 dB,and by 66.7%-88.9%at a signal-to-reverberation ratio of−10 dB.This algorithm provides a new approach for the precise localization of submerged targets in shallow water environments.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R442)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Accurate parameter extraction of photovoltaic(PV)models plays a critical role in enabling precise performance prediction,optimal system sizing,and effective operational control under diverse environmental conditions.While a wide range of metaheuristic optimisation techniques have been applied to this problem,many existing methods are hindered by slow convergence rates,susceptibility to premature stagnation,and reduced accuracy when applied to complex multi-diode PV configurations.These limitations can lead to suboptimal modelling,reducing the efficiency of PV system design and operation.In this work,we propose an enhanced hybrid optimisation approach,the modified Spider Wasp Optimization(mSWO)with Opposition-Based Learning algorithm,which integrates the exploration and exploitation capabilities of the Spider Wasp Optimization(SWO)metaheuristic with the diversityenhancing mechanism of Opposition-Based Learning(OBL).The hybridisation is designed to dynamically expand the search space coverage,avoid premature convergence,and improve both convergence speed and precision in highdimensional optimisation tasks.The mSWO algorithm is applied to three well-established PV configurations:the single diode model(SDM),the double diode model(DDM),and the triple diode model(TDM).Real experimental current-voltage(I-V)datasets from a commercial PV module under standard test conditions(STC)are used for evaluation.Comparative analysis is conducted against eighteen advanced metaheuristic algorithms,including BSDE,RLGBO,GWOCS,MFO,EO,TSA,and SCA.Performance metrics include minimum,mean,and maximum root mean square error(RMSE),standard deviation(SD),and convergence behaviour over 30 independent runs.The results reveal that mSWO consistently delivers superior accuracy and robustness across all PV models,achieving the lowest RMSE values of 0.000986022(SDM),0.000982884(DDM),and 0.000982529(TDM),with minimal SD values,indicating remarkable repeatability.Convergence analyses further show that mSWO reaches optimal solutions more rapidly and with fewer oscillations than all competing methods,with the performance gap widening as model complexity increases.These findings demonstrate that mSWO provides a scalable,computationally efficient,and highly reliable framework for PV parameter extraction.Its adaptability to models of growing complexity suggests strong potential for broader applications in renewable energy systems,including performance monitoring,fault detection,and intelligent control,thereby contributing to the optimisation of next-generation solar energy solutions.
基金supported by the Science and Technology Support Project of Sichuan Science and Technology Department(2018SZ0357)and China Scholarship。
文摘A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.
基金supported by the Major Science and Technology Special Plan Projects of Yunnan (No.202002AD080001)。
文摘Gaze estimation has become an important field of image and information processing.Estimating gaze from full-face images using convolutional neural network(CNN) has achieved fine accuracy.However,estimating gaze from eye images is very challenging due to the less information contained in eye images than in full-face images,and it’s still vital since eye-image-based methods have wider applications.In this paper,we propose the discretization-gaze network(DGaze-Net) to optimize monocular three-dimensional(3D) gaze estimation accuracy by feature discretization and attention mechanism.The gaze predictor of DGaze-Net is optimized based on feature discretization.By discretizing the gaze angle into K bins,a classification constraint is added to the gaze predictor.In the gaze predictor,the gaze angle is pre-applied with a binned classification before regressing with the real gaze angle to improve gaze estimation accuracy.In addition,the attention mechanism is applied to the backbone to enhance the ability to extract eye features related to gaze.The proposed method is validated on three gaze datasets and achieves encouraging gaze estimation accuracy.
基金the National Natural Science Foundation of China,No.61932003and the Fundamental Research Funds for the Central Universities.
文摘In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.
基金Supported by the National Natural Science Foundation of China(61772328)
文摘Gaze information is important for finding region of interest(ROI)which implies where the next action will happen.Supervised gaze estimation does not work on EPIC-Kitchens for lack of ground truth.In this paper,we develop an unsupervised gaze estimation method that helps with egocentric action anticipation.We adopt gaze map as a feature representation,and input it into a multiple modality network jointly with red-green-blue(RGB),optical flow and object features.We explore the method on EGTEA dataset.The estimated gaze map is further optimized with dilation and Gaussian filter,masked onto the original RGB frame and encoded as the important gaze modality.Our results outperform the strong baseline Rolling-Unrolling LSTMs(RULSTM),with top-5 accuracy achieving 34.31%on the seen test set(S1)and 22.07%on unseen test set(S2).The accuracy is improved by 0.58%and 0.87%,respectively.
基金Supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004)Supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155885,Artificial Intelligence Convergence Innovation Human Resources Development(Hanyang University ERICA)).
文摘Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.
基金This work was supported by the Basic Research Program through the National Research Foundation of Korea(NRF)grant funded by the Korea Government(MSIT)under Grant 2019R1F1A1045329 and Grant 2020R1A4A1017775.
文摘Gaze estimation is one of the most promising technologies for supporting indoor monitoring and interaction systems.However,previous gaze estimation techniques generally work only in a controlled laboratory environment because they require a number of high-resolution eye images.This makes them unsuitable for welfare and healthcare facilities with the following challenging characteristics:1)users’continuous movements,2)various lighting conditions,and 3)a limited amount of available data.To address these issues,we introduce a multi-view multi-modal head-gaze estimation system that translates the user’s head orientation into the gaze direction.The proposed system captures the user using multiple cameras with depth and infrared modalities to train more robust gaze estimators under the aforementioned conditions.To this end,we implemented a deep learning pipeline that can handle different types and combinations of data.The proposed system was evaluated using the data collected from 10 volunteer participants to analyze how the use of single/multiple cameras and modalities affect the performance of head-gaze estimators.Through various experiments,we found that 1)an infrared-modality provides more useful features than a depth-modality,2)multi-view multi-modal approaches provide better accuracy than singleview single-modal approaches,and 3)the proposed estimators achieve a high inference efficiency that can be used in real-time applications.