Objective:Deep learning is employed increasingly in Gastroenterology(GI)endoscopy computer-aided diagnostics for polyp segmentation and multi-class disease detection.In the real world,implementation requires high accu...Objective:Deep learning is employed increasingly in Gastroenterology(GI)endoscopy computer-aided diagnostics for polyp segmentation and multi-class disease detection.In the real world,implementation requires high accuracy,therapeutically relevant explanations,strong calibration,domain generalization,and efficiency.Current Convolutional Neural Network(CNN)and transformer models compromise border precision and global context,generate attention maps that fail to align with expert reasoning,deteriorate during cross-center changes,and exhibit inadequate calibration,hence diminishing clinical trust.Methods:HMA-DER is a hierarchical multi-attention architecture that uses dilation-enhanced residual blocks and an explainability-aware Cognitive Alignment Score(CAS)regularizer to directly align attribution maps with reasoning signals from experts.The framework has additions that make it more resilient and a way to test for accuracy,macro-averaged F1 score,Area Under the Receiver Operating Characteristic Curve(AUROC),calibration(Expected Calibration Error(ECE),Brier Score),explainability(CAS,insertion/deletion AUC),cross-dataset transfer,and throughput.Results:HMA-DER gets Dice Similarity Coefficient scores of 89.5%and 86.0%on Kvasir-SEG and CVC-ClinicDB,beating the strongest baseline by+1.9 and+1.7 points.It gets 86.4%and 85.3%macro-F1 and 94.0%and 93.4%AUROC on HyperKvasir and GastroVision,which is better than the baseline by+1.4/+1.6macro-F1 and+1.2/+1.1AUROC.Ablation study shows that hierarchical attention gives the highest(+3.0),followed by CAS regularization(+2–3),dilatation(+1.5–2.0),and residual connections(+2–3).Cross-dataset validation demonstrates competitive zero-shot transfer(e.g.,KS→CVC Dice 82.7%),whereas multi-dataset training diminishes the domain gap,yielding an 88.1%primary-metric average.HMA-DER’s mixed-precision inference can handle 155 pictures per second,which helps with calibration.Conclusion:HMA-DER strikes a compromise between accuracy,explainability,robustness,and efficiency for the use of reliable GI computer-aided diagnosis in real-world clinical settings.展开更多
Face liveness detection is essential for securing biometric authentication systems against spoofing attacks,including printed photos,replay videos,and 3D masks.This study systematically evaluates pre-trained CNN model...Face liveness detection is essential for securing biometric authentication systems against spoofing attacks,including printed photos,replay videos,and 3D masks.This study systematically evaluates pre-trained CNN models—DenseNet201,VGG16,InceptionV3,ResNet50,VGG19,MobileNetV2,Xception,and InceptionResNetV2—leveraging transfer learning and fine-tuning to enhance liveness detection performance.The models were trained and tested on NUAA and Replay-Attack datasets,with cross-dataset generalization validated on SiW-MV2 to assess real-world adaptability.Performance was evaluated using accuracy,precision,recall,FAR,FRR,HTER,and specialized spoof detection metrics(APCER,NPCER,ACER).Fine-tuning significantly improved detection accuracy,with DenseNet201 achieving the highest performance(98.5%on NUAA,97.71%on Replay-Attack),while MobileNetV2 proved the most efficient model for real-time applications(latency:15 ms,memory usage:45 MB,energy consumption:30 mJ).A statistical significance analysis(paired t-tests,confidence intervals)validated these improvements.Cross-dataset experiments identified DenseNet201 and MobileNetV2 as the most generalizable architectures,with DenseNet201 achieving 86.4%accuracy on Replay-Attack when trained on NUAA,demonstrating robust feature extraction and adaptability.In contrast,ResNet50 showed lower generalization capabilities,struggling with dataset variability and complex spoofing attacks.These findings suggest that MobileNetV2 is well-suited for low-power applications,while DenseNet201 is ideal for high-security environments requiring superior accuracy.This research provides a framework for improving real-time face liveness detection,enhancing biometric security,and guiding future advancements in AI-driven anti-spoofing techniques.展开更多
文摘Objective:Deep learning is employed increasingly in Gastroenterology(GI)endoscopy computer-aided diagnostics for polyp segmentation and multi-class disease detection.In the real world,implementation requires high accuracy,therapeutically relevant explanations,strong calibration,domain generalization,and efficiency.Current Convolutional Neural Network(CNN)and transformer models compromise border precision and global context,generate attention maps that fail to align with expert reasoning,deteriorate during cross-center changes,and exhibit inadequate calibration,hence diminishing clinical trust.Methods:HMA-DER is a hierarchical multi-attention architecture that uses dilation-enhanced residual blocks and an explainability-aware Cognitive Alignment Score(CAS)regularizer to directly align attribution maps with reasoning signals from experts.The framework has additions that make it more resilient and a way to test for accuracy,macro-averaged F1 score,Area Under the Receiver Operating Characteristic Curve(AUROC),calibration(Expected Calibration Error(ECE),Brier Score),explainability(CAS,insertion/deletion AUC),cross-dataset transfer,and throughput.Results:HMA-DER gets Dice Similarity Coefficient scores of 89.5%and 86.0%on Kvasir-SEG and CVC-ClinicDB,beating the strongest baseline by+1.9 and+1.7 points.It gets 86.4%and 85.3%macro-F1 and 94.0%and 93.4%AUROC on HyperKvasir and GastroVision,which is better than the baseline by+1.4/+1.6macro-F1 and+1.2/+1.1AUROC.Ablation study shows that hierarchical attention gives the highest(+3.0),followed by CAS regularization(+2–3),dilatation(+1.5–2.0),and residual connections(+2–3).Cross-dataset validation demonstrates competitive zero-shot transfer(e.g.,KS→CVC Dice 82.7%),whereas multi-dataset training diminishes the domain gap,yielding an 88.1%primary-metric average.HMA-DER’s mixed-precision inference can handle 155 pictures per second,which helps with calibration.Conclusion:HMA-DER strikes a compromise between accuracy,explainability,robustness,and efficiency for the use of reliable GI computer-aided diagnosis in real-world clinical settings.
基金funded by Centre for Advanced Modelling and Geospatial Information Systems(CAMGIS),Faculty of Engineering and IT,University of Technology Sydney.Moreover,Ongoing Research Funding Program(ORF-2025-14)King Saud University,Riyadh,Saudi Arabia,under Project ORF-2025-。
文摘Face liveness detection is essential for securing biometric authentication systems against spoofing attacks,including printed photos,replay videos,and 3D masks.This study systematically evaluates pre-trained CNN models—DenseNet201,VGG16,InceptionV3,ResNet50,VGG19,MobileNetV2,Xception,and InceptionResNetV2—leveraging transfer learning and fine-tuning to enhance liveness detection performance.The models were trained and tested on NUAA and Replay-Attack datasets,with cross-dataset generalization validated on SiW-MV2 to assess real-world adaptability.Performance was evaluated using accuracy,precision,recall,FAR,FRR,HTER,and specialized spoof detection metrics(APCER,NPCER,ACER).Fine-tuning significantly improved detection accuracy,with DenseNet201 achieving the highest performance(98.5%on NUAA,97.71%on Replay-Attack),while MobileNetV2 proved the most efficient model for real-time applications(latency:15 ms,memory usage:45 MB,energy consumption:30 mJ).A statistical significance analysis(paired t-tests,confidence intervals)validated these improvements.Cross-dataset experiments identified DenseNet201 and MobileNetV2 as the most generalizable architectures,with DenseNet201 achieving 86.4%accuracy on Replay-Attack when trained on NUAA,demonstrating robust feature extraction and adaptability.In contrast,ResNet50 showed lower generalization capabilities,struggling with dataset variability and complex spoofing attacks.These findings suggest that MobileNetV2 is well-suited for low-power applications,while DenseNet201 is ideal for high-security environments requiring superior accuracy.This research provides a framework for improving real-time face liveness detection,enhancing biometric security,and guiding future advancements in AI-driven anti-spoofing techniques.