Accurate detection of smoke and fire sources is critical for early fire warning and environmental monitoring.However,conventional detection approaches are highly susceptible to noise,illumination variations,and comple...Accurate detection of smoke and fire sources is critical for early fire warning and environmental monitoring.However,conventional detection approaches are highly susceptible to noise,illumination variations,and complex environmental conditions,which often reduce detection accuracy and real-time performance.To address these limitations,we propose Lightweight and Precise YOLO(LP-YOLO),a high-precision detection framework that integrates a self-attention mechanism with a feature pyramid,built upon YOLOv8.First,to overcome the restricted receptive field and parameter redundancy of conventional Convolutional Neural Networks(CNNs),we design an enhanced backbone based on Wavelet Convolutions(WTConv),which expands the receptive field through multifrequency convolutional processing.Second,a Bidirectional Feature Pyramid Network(BiFPN)is employed to achieve bidirectional feature fusion,enhancing the representation of smoke features across scales.Third,to mitigate the challenge of ambiguous object boundaries,we introduce the Frequency-aware Feature Fusion(FreqFusion)module,in which the Adaptive Low-Pass Filter(ALPF)reduces intra-class inconsistencies,the offset generator refines boundary localization,and the Adaptive High-Pass Filter(AHPF)recovers high-frequency details lost during down-sampling.Experimental evaluations demonstrate that LP-YOLO significantly outperforms the baseline YOLOv8,achieving an improvement of 9.3%in mAP@50 and 9.2%in F1-score.Moreover,the model is 56.6%and 32.4%smaller than YOLOv7-tiny and EfficientDet,respectively,while maintaining real-time inference speed at 238 frames per second(FPS).Validation on multiple benchmark datasets,including D-Fire,FIRESENSE,and BoWFire,further confirms its robustness and generalization ability,with detection accuracy consistently exceeding 82%.These results highlight the potential of LP-YOLO as a practical solution with high accuracy,robustness,and real-time performance for smoke and fire source detection.展开更多
Two key challenges raised by a product images classification system are classification precision and classification time. In some categories, classification precision of the latest techniques, in the product images cl...Two key challenges raised by a product images classification system are classification precision and classification time. In some categories, classification precision of the latest techniques, in the product images classification system, is still low. In this paper, we propose a local texture descriptor termed fan refined local binary pattern, which captures more detailed information by integrating the spatial distribution into the local binary pattern feature. We compare our approach with different methods on a subset of product images on Amazon/e Bay and parts of PI100 and experimental results have demonstrated that our proposed approach is superior to the current existing methods. The highest classification precision is increased by 21% and the average classification time is reduced by 2/3.展开更多
Accurate pancreas segmentation is critical for the diagnosis and management of diseases of the pancreas. It is challenging to precisely delineate pancreas due to the highly variations in volume, shape and location. In...Accurate pancreas segmentation is critical for the diagnosis and management of diseases of the pancreas. It is challenging to precisely delineate pancreas due to the highly variations in volume, shape and location. In recent years, coarse-to-fine methods have been widely used to alleviate class imbalance issue and improve pancreas segmentation accuracy. However,cascaded methods could be computationally intensive and the refined results are significantly dependent on the performance of its coarse segmentation results. To balance the segmentation accuracy and computational efficiency, we propose a Discriminative Feature Attention Network for pancreas segmentation, to effectively highlight pancreas features and improve segmentation accuracy without explicit pancreas location. The final segmentation is obtained by applying a simple yet effective post-processing step. Two experiments on both public NIH pancreas CT dataset and abdominal BTCV multi-organ dataset are individually conducted to show the effectiveness of our method for 2 D pancreas segmentation. We obtained average Dice Similarity Coefficient(DSC) of 82.82±6.09%, average Jaccard Index(JI) of 71.13± 8.30% and average Symmetric Average Surface Distance(ASD) of 1.69 ± 0.83 mm on the NIH dataset. Compared to the existing deep learning-based pancreas segmentation methods, our experimental results achieve the best average DSC and JI value.展开更多
Semantic segmentation for mixed scenes of aerial remote sensing and road traffic is one of the key technologies for visual perception of flying cars.The State-of-the-Art(SOTA)semantic segmentation methods have made re...Semantic segmentation for mixed scenes of aerial remote sensing and road traffic is one of the key technologies for visual perception of flying cars.The State-of-the-Art(SOTA)semantic segmentation methods have made remarkable achievements in both fine-grained segmentation and real-time performance.However,when faced with the huge differences in scale and semantic categories brought about by the mixed scenes of aerial remote sensing and road traffic,they still face great challenges and there is little related research.Addressing the above issue,this paper proposes a semantic segmentation model specifically for mixed datasets of aerial remote sensing and road traffic scenes.First,a novel decoding-recoding multi-scale feature iterative refinement structure is proposed,which utilizes the re-integration and continuous enhancement of multi-scale information to effectively deal with the huge scale differences between cross-domain scenes,while using a fully convolutional structure to ensure the lightweight and real-time requirements.Second,a welldesigned cross-window attention mechanism combined with a global information integration decoding block forms an enhanced global context perception,which can effectively capture the long-range dependencies and multi-scale global context information of different scenes,thereby achieving fine-grained semantic segmentation.The proposed method is tested on a large-scale mixed dataset of aerial remote sensing and road traffic scenes.The results confirm that it can effectively deal with the problem of large-scale differences in cross-domain scenes.Its segmentation accuracy surpasses that of the SOTA methods,which meets the real-time requirements.展开更多
This paper presents a novel approach for camera pose refinement based on neural radiance fields(NeRF)by introducing semantic feature consistency to enhance robustness.NeRF has been successfully applied to camera pose ...This paper presents a novel approach for camera pose refinement based on neural radiance fields(NeRF)by introducing semantic feature consistency to enhance robustness.NeRF has been successfully applied to camera pose estimation by inverting the rendering process given an observed RGB image and an initial pose estimate.However,previous methods only adopted photometric consistency for pose optimization,which is prone to be trapped in local minima.To address this problem,we introduce semantic feature consistency into the existing framework.Specifically,we utilize high-level features extracted from a convolutional neural network(CNN)pre-trained for image recognition,and maintain consistency of such features between observed and rendered images during the optimization procedure.Unlike the color values at each pixel,these features contain rich semantic information shared within local regions and can be more robust to appearance changes from different viewpoints.Since it is computationally expensive to render a full image with NeRF for feature extraction from CNN,we propose an efficient way to estimate the features of individually rendered pixels by projecting them to a nearby reference image and interpolating its feature maps.Extensive experiments show that our method greatly outperforms the baseline method on both synthetic objects and real-world large indoor scenes,increasing the accuracy of pose estimation by over 6.4%.展开更多
基金supported by the National Natural Science Foundation of China(No.62203163)the Scientific Research Project of Hunan Provincial Education Department(No.24A0519)+1 种基金the Hunan Provincial Natural Science Foundation(No.2025JJ60407)the Postgraduate Scientific Research Innovation Project of Hunan Province(No.CX2024100).
文摘Accurate detection of smoke and fire sources is critical for early fire warning and environmental monitoring.However,conventional detection approaches are highly susceptible to noise,illumination variations,and complex environmental conditions,which often reduce detection accuracy and real-time performance.To address these limitations,we propose Lightweight and Precise YOLO(LP-YOLO),a high-precision detection framework that integrates a self-attention mechanism with a feature pyramid,built upon YOLOv8.First,to overcome the restricted receptive field and parameter redundancy of conventional Convolutional Neural Networks(CNNs),we design an enhanced backbone based on Wavelet Convolutions(WTConv),which expands the receptive field through multifrequency convolutional processing.Second,a Bidirectional Feature Pyramid Network(BiFPN)is employed to achieve bidirectional feature fusion,enhancing the representation of smoke features across scales.Third,to mitigate the challenge of ambiguous object boundaries,we introduce the Frequency-aware Feature Fusion(FreqFusion)module,in which the Adaptive Low-Pass Filter(ALPF)reduces intra-class inconsistencies,the offset generator refines boundary localization,and the Adaptive High-Pass Filter(AHPF)recovers high-frequency details lost during down-sampling.Experimental evaluations demonstrate that LP-YOLO significantly outperforms the baseline YOLOv8,achieving an improvement of 9.3%in mAP@50 and 9.2%in F1-score.Moreover,the model is 56.6%and 32.4%smaller than YOLOv7-tiny and EfficientDet,respectively,while maintaining real-time inference speed at 238 frames per second(FPS).Validation on multiple benchmark datasets,including D-Fire,FIRESENSE,and BoWFire,further confirms its robustness and generalization ability,with detection accuracy consistently exceeding 82%.These results highlight the potential of LP-YOLO as a practical solution with high accuracy,robustness,and real-time performance for smoke and fire source detection.
基金Supported by the National Natural Science Foundation of China(60802061, 11426087) Supported by Key Project of Science and Technology of the Education Department Henan Province(14A120009)+1 种基金 Supported by the Program of Henan Province Young Scholar(2013GGJS-027) Supported by the Research Foundation of Henan University(2013YBZR016)
文摘Two key challenges raised by a product images classification system are classification precision and classification time. In some categories, classification precision of the latest techniques, in the product images classification system, is still low. In this paper, we propose a local texture descriptor termed fan refined local binary pattern, which captures more detailed information by integrating the spatial distribution into the local binary pattern feature. We compare our approach with different methods on a subset of product images on Amazon/e Bay and parts of PI100 and experimental results have demonstrated that our proposed approach is superior to the current existing methods. The highest classification precision is increased by 21% and the average classification time is reduced by 2/3.
基金Supported by the Ph.D. Research Startup Project of Minnan Normal University(KJ2021020)the National Natural Science Foundation of China(12090020 and 12090025)Zhejiang Provincial Natural Science Foundation of China(LSD19H180005)。
文摘Accurate pancreas segmentation is critical for the diagnosis and management of diseases of the pancreas. It is challenging to precisely delineate pancreas due to the highly variations in volume, shape and location. In recent years, coarse-to-fine methods have been widely used to alleviate class imbalance issue and improve pancreas segmentation accuracy. However,cascaded methods could be computationally intensive and the refined results are significantly dependent on the performance of its coarse segmentation results. To balance the segmentation accuracy and computational efficiency, we propose a Discriminative Feature Attention Network for pancreas segmentation, to effectively highlight pancreas features and improve segmentation accuracy without explicit pancreas location. The final segmentation is obtained by applying a simple yet effective post-processing step. Two experiments on both public NIH pancreas CT dataset and abdominal BTCV multi-organ dataset are individually conducted to show the effectiveness of our method for 2 D pancreas segmentation. We obtained average Dice Similarity Coefficient(DSC) of 82.82±6.09%, average Jaccard Index(JI) of 71.13± 8.30% and average Symmetric Average Surface Distance(ASD) of 1.69 ± 0.83 mm on the NIH dataset. Compared to the existing deep learning-based pancreas segmentation methods, our experimental results achieve the best average DSC and JI value.
基金supported by the National Key Research and Development of China(No.2022YFB2503400).
文摘Semantic segmentation for mixed scenes of aerial remote sensing and road traffic is one of the key technologies for visual perception of flying cars.The State-of-the-Art(SOTA)semantic segmentation methods have made remarkable achievements in both fine-grained segmentation and real-time performance.However,when faced with the huge differences in scale and semantic categories brought about by the mixed scenes of aerial remote sensing and road traffic,they still face great challenges and there is little related research.Addressing the above issue,this paper proposes a semantic segmentation model specifically for mixed datasets of aerial remote sensing and road traffic scenes.First,a novel decoding-recoding multi-scale feature iterative refinement structure is proposed,which utilizes the re-integration and continuous enhancement of multi-scale information to effectively deal with the huge scale differences between cross-domain scenes,while using a fully convolutional structure to ensure the lightweight and real-time requirements.Second,a welldesigned cross-window attention mechanism combined with a global information integration decoding block forms an enhanced global context perception,which can effectively capture the long-range dependencies and multi-scale global context information of different scenes,thereby achieving fine-grained semantic segmentation.The proposed method is tested on a large-scale mixed dataset of aerial remote sensing and road traffic scenes.The results confirm that it can effectively deal with the problem of large-scale differences in cross-domain scenes.Its segmentation accuracy surpasses that of the SOTA methods,which meets the real-time requirements.
文摘This paper presents a novel approach for camera pose refinement based on neural radiance fields(NeRF)by introducing semantic feature consistency to enhance robustness.NeRF has been successfully applied to camera pose estimation by inverting the rendering process given an observed RGB image and an initial pose estimate.However,previous methods only adopted photometric consistency for pose optimization,which is prone to be trapped in local minima.To address this problem,we introduce semantic feature consistency into the existing framework.Specifically,we utilize high-level features extracted from a convolutional neural network(CNN)pre-trained for image recognition,and maintain consistency of such features between observed and rendered images during the optimization procedure.Unlike the color values at each pixel,these features contain rich semantic information shared within local regions and can be more robust to appearance changes from different viewpoints.Since it is computationally expensive to render a full image with NeRF for feature extraction from CNN,we propose an efficient way to estimate the features of individually rendered pixels by projecting them to a nearby reference image and interpolating its feature maps.Extensive experiments show that our method greatly outperforms the baseline method on both synthetic objects and real-world large indoor scenes,increasing the accuracy of pose estimation by over 6.4%.