Accurate detection of driver fatigue is essential for improving road safety.This study investigates the effectiveness of using multimodal physiological signals for fatigue detection while incorporating uncertainty qua...Accurate detection of driver fatigue is essential for improving road safety.This study investigates the effectiveness of using multimodal physiological signals for fatigue detection while incorporating uncertainty quantification to enhance the reliability of predictions.Physiological signals,including Electrocardiogram(ECG),Galvanic Skin Response(GSR),and Electroencephalogram(EEG),were transformed into image representations and analyzed using pretrained deep neu-ral networks.The extracted features were classified through a feedforward neural network,and prediction reliability was assessed using uncertainty quantification techniques such as Monte Carlo Dropout(MCD),model ensembles,and combined approaches.Evaluation metrics included standard measures(sensitivity,specificity,precision,and accuracy)along with uncertainty-aware metrics such as uncertainty sensitivity and uncertainty precision.Across all evaluations,ECG-based models consistently demonstrated strong performance.The findings indicate that combining multimodal physi-ological signals,Transfer Learning(TL),and uncertainty quantification can significantly improve both the accuracy and trustworthiness of fatigue detection systems.This approach supports the development of more reliable driver assistance technologies aimed at preventing fatigue-related accidents.展开更多
Within the domain of low-level vision,enhancing low-light images and removing sand-dust from single images are both critical tasks.These challenges are particularly pronounced in real-world applications such as autono...Within the domain of low-level vision,enhancing low-light images and removing sand-dust from single images are both critical tasks.These challenges are particularly pronounced in real-world applications such as autonomous driving,surveillance systems,and remote sensing,where adverse lighting and environmental conditions often degrade image quality.Various neural network models,including MLPs,CNNs,GANs,and Transformers,have been proposed to tackle these challenges,with the Vision KAN models showing particular promise.However,existing models,including the Vision KAN models use deterministic neural networks that do not address the uncertainties inherent in these processes.To overcome this,we introduce the Uncertainty-Aware Kolmogorov-Arnold Network(UAKAN),a novel structure that integrates KAN with uncertainty estimation.Our approach uniquely employs Tokenized KANs for sampling within a U-Net architecture’s encoder and decoder layers,enhancing the network’s ability to learn complex representations.Furthermore,for aleatoric uncertainty,we propose an uncertainty coupling certainty module that couples uncertainty distribution learning and residual learning in a feature fusion manner.For epistemic uncertainty,we propose a feature selection mechanism for spatial and pixel dimension uncertainty modeling,which captures and models uncertainty by learning the uncertainty contained between feature maps.Notably,our uncertainty-aware framework enables the model to produce both high-quality enhanced images and reliable uncertainty maps,which are crucial for downstream applications requiring confidence estimation.Through comparative and ablation studies on our synthetic SLLIE6K dataset,designed for low-light enhancement and sand-dust removal,we validate the effectiveness and theoretical robustness of our methodology.展开更多
Text-to-image person retrieval,a fine-grained cross-modal retrieval problem,aims to search for person images from an image library that match a given textual caption.Existing text-to-image person retrieval methods usu...Text-to-image person retrieval,a fine-grained cross-modal retrieval problem,aims to search for person images from an image library that match a given textual caption.Existing text-to-image person retrieval methods usually use fixed-point embedding to express the semantics of the two modalities and perform multi-granularity alignment between modalities in the embedding space.However,owing to the inherent mutual one-to-many correspondence between images and texts,it is often difficult for fixed-point embedding methods to adequately capture this relationship,leading to erroneous retrieval results.To address this problem,we propose a novel uncertainty-aware coarse-to-fine alignment method,which first maps fixed-point embedding to probability distributions and then aligns two modalities in terms of distributions and sampling points at a coarse-to-fine granularity,for accurate text-to-image person retrieval.Specifically,we first introduce two contrastive learning tasks of distribution contrast learning and point contrast learning,to achieve coarse-grained inter-modal alignment with uncertainty-aware.The distribution contrast learning task ensures that distributions with the same identity are as similar as possible across modalities through distribution-based contrastive learning.The point contrast learning task performs the contrastive learning of inter-modal and intra-modal sampling points,which not only models rich and diverse cross-modal associations,but also optimizes the learning of distributions.For the fine-grained association requirements of text-to-image person retrieval,we design the task of uncertainty-aware attribute masking language reconstruction,which achieves fine-grained alignment by randomly masking attribute words in the text and reconstructing them via inter-modal sample point interactions.Extensive experiments on two public datasets demonstrate the superior performance of our method.展开更多
Event cameras provide high temporal resolution,high dynamic range,and low latency,offering significant advantages over conventional frame-based cameras.In this work,we introduce an uncertainty-aware refinement network...Event cameras provide high temporal resolution,high dynamic range,and low latency,offering significant advantages over conventional frame-based cameras.In this work,we introduce an uncertainty-aware refinement network called URNet for event-based stereo depth estimation.Our approach features a local-global refinement module that effectively captures fine-grained local details and long-range global context.Additionally,we introduce a Kullback-Leibler(KL)divergence-based uncertainty modeling method to enhance prediction reliability.Extensive experiments on the DSEC dataset demonstrate that URNet consistently outperforms state-of-the-art(SOTA)methods in both qualitative and quantitative evaluations.展开更多
基金the Australian Research Council Discovery Projects funding scheme(DP190102181,DP210101465).
文摘Accurate detection of driver fatigue is essential for improving road safety.This study investigates the effectiveness of using multimodal physiological signals for fatigue detection while incorporating uncertainty quantification to enhance the reliability of predictions.Physiological signals,including Electrocardiogram(ECG),Galvanic Skin Response(GSR),and Electroencephalogram(EEG),were transformed into image representations and analyzed using pretrained deep neu-ral networks.The extracted features were classified through a feedforward neural network,and prediction reliability was assessed using uncertainty quantification techniques such as Monte Carlo Dropout(MCD),model ensembles,and combined approaches.Evaluation metrics included standard measures(sensitivity,specificity,precision,and accuracy)along with uncertainty-aware metrics such as uncertainty sensitivity and uncertainty precision.Across all evaluations,ECG-based models consistently demonstrated strong performance.The findings indicate that combining multimodal physi-ological signals,Transfer Learning(TL),and uncertainty quantification can significantly improve both the accuracy and trustworthiness of fatigue detection systems.This approach supports the development of more reliable driver assistance technologies aimed at preventing fatigue-related accidents.
基金supported by National Key R&D Program of China(2023YFB2504400).
文摘Within the domain of low-level vision,enhancing low-light images and removing sand-dust from single images are both critical tasks.These challenges are particularly pronounced in real-world applications such as autonomous driving,surveillance systems,and remote sensing,where adverse lighting and environmental conditions often degrade image quality.Various neural network models,including MLPs,CNNs,GANs,and Transformers,have been proposed to tackle these challenges,with the Vision KAN models showing particular promise.However,existing models,including the Vision KAN models use deterministic neural networks that do not address the uncertainties inherent in these processes.To overcome this,we introduce the Uncertainty-Aware Kolmogorov-Arnold Network(UAKAN),a novel structure that integrates KAN with uncertainty estimation.Our approach uniquely employs Tokenized KANs for sampling within a U-Net architecture’s encoder and decoder layers,enhancing the network’s ability to learn complex representations.Furthermore,for aleatoric uncertainty,we propose an uncertainty coupling certainty module that couples uncertainty distribution learning and residual learning in a feature fusion manner.For epistemic uncertainty,we propose a feature selection mechanism for spatial and pixel dimension uncertainty modeling,which captures and models uncertainty by learning the uncertainty contained between feature maps.Notably,our uncertainty-aware framework enables the model to produce both high-quality enhanced images and reliable uncertainty maps,which are crucial for downstream applications requiring confidence estimation.Through comparative and ablation studies on our synthetic SLLIE6K dataset,designed for low-light enhancement and sand-dust removal,we validate the effectiveness and theoretical robustness of our methodology.
基金supported by the National Natural Science Foundation of China(No.62376004)the Natural Science Foundation of Anhui Province(No.2208085J18)the University Synergy Innovation Program of Anhui Province(No.GXXT-2022-033).
文摘Text-to-image person retrieval,a fine-grained cross-modal retrieval problem,aims to search for person images from an image library that match a given textual caption.Existing text-to-image person retrieval methods usually use fixed-point embedding to express the semantics of the two modalities and perform multi-granularity alignment between modalities in the embedding space.However,owing to the inherent mutual one-to-many correspondence between images and texts,it is often difficult for fixed-point embedding methods to adequately capture this relationship,leading to erroneous retrieval results.To address this problem,we propose a novel uncertainty-aware coarse-to-fine alignment method,which first maps fixed-point embedding to probability distributions and then aligns two modalities in terms of distributions and sampling points at a coarse-to-fine granularity,for accurate text-to-image person retrieval.Specifically,we first introduce two contrastive learning tasks of distribution contrast learning and point contrast learning,to achieve coarse-grained inter-modal alignment with uncertainty-aware.The distribution contrast learning task ensures that distributions with the same identity are as similar as possible across modalities through distribution-based contrastive learning.The point contrast learning task performs the contrastive learning of inter-modal and intra-modal sampling points,which not only models rich and diverse cross-modal associations,but also optimizes the learning of distributions.For the fine-grained association requirements of text-to-image person retrieval,we design the task of uncertainty-aware attribute masking language reconstruction,which achieves fine-grained alignment by randomly masking attribute words in the text and reconstructing them via inter-modal sample point interactions.Extensive experiments on two public datasets demonstrate the superior performance of our method.
文摘Event cameras provide high temporal resolution,high dynamic range,and low latency,offering significant advantages over conventional frame-based cameras.In this work,we introduce an uncertainty-aware refinement network called URNet for event-based stereo depth estimation.Our approach features a local-global refinement module that effectively captures fine-grained local details and long-range global context.Additionally,we introduce a Kullback-Leibler(KL)divergence-based uncertainty modeling method to enhance prediction reliability.Extensive experiments on the DSEC dataset demonstrate that URNet consistently outperforms state-of-the-art(SOTA)methods in both qualitative and quantitative evaluations.