Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approach...Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.展开更多
Low-light image enhancement is one of the most active research areas in the field of computer vision in recent years.In the low-light image enhancement process,loss of image details and increase in noise occur inevita...Low-light image enhancement is one of the most active research areas in the field of computer vision in recent years.In the low-light image enhancement process,loss of image details and increase in noise occur inevitably,influencing the quality of enhanced images.To alleviate this problem,a low-light image enhancement model called RetinexNet model based on Retinex theory was proposed in this study.The model was composed of an image decomposition module and a brightness enhancement module.In the decomposition module,a convolutional block attention module(CBAM)was incorporated to enhance feature representation capacity of the network,focusing on crucial features and suppressing irrelevant ones.A multifeature fusion denoising module was designed within the brightness enhancement module,circumventing the issue of feature loss during downsampling.The proposed model outperforms the existing algorithms in terms of PSNR and SSIM metrics on the publicly available datasets LOL and MIT-Adobe FiveK,as well as gives superior results in terms of NIQE metrics on the publicly available dataset LIME.展开更多
Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illuminati...Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illumination degradation of visible light images makes it difficult for existing fusion methods to extract texture detail information from the scene.At this time,relying solely on the target saliency information provided by infrared images is far from sufficient.To address this challenge,this paper proposes a lightweight infrared and visible light image fusion method based on low-light enhancement,named LLE-Fuse.The method is based on the improvement of the MobileOne Block,using the Edge-MobileOne Block embedded with the Sobel operator to perform feature extraction and downsampling on the source images.The intermediate features at different scales obtained are then fused by a cross-modal attention fusion module.In addition,the Contrast Limited Adaptive Histogram Equalization(CLAHE)algorithm is used for image enhancement of both infrared and visible light images,guiding the network model to learn low-light enhancement capabilities through enhancement loss.Upon completion of network training,the Edge-MobileOne Block is optimized into a direct connection structure similar to MobileNetV1 through structural reparameterization,effectively reducing computational resource consumption.Finally,after extensive experimental comparisons,our method achieved improvements of 4.6%,40.5%,156.9%,9.2%,and 98.6%in the evaluation metrics Standard Deviation(SD),Visual Information Fidelity(VIF),Entropy(EN),and Spatial Frequency(SF),respectively,compared to the best results of the compared algorithms,while only being 1.5 ms/it slower in computation speed than the fastest method.展开更多
Low-light images often have defects such as low visibility,low contrast,high noise,and high color distortion compared with well-exposed images.If the low-light region of an image is enhanced directly,the noise will in...Low-light images often have defects such as low visibility,low contrast,high noise,and high color distortion compared with well-exposed images.If the low-light region of an image is enhanced directly,the noise will inevitably blur the whole image.Besides,according to the retina-and-cortex(retinex)theory of color vision,the reflectivity of different image regions may differ,limiting the enhancement performance of applying uniform operations to the entire image.Therefore,we design a Hierarchical Flow Learning(HFL)framework,which consists of a Hierarchical Image Network(HIN)and a normalized invertible Flow Learning Network(FLN).HIN can extract hierarchical structural features from low-light images,while FLN maps the distribution of normally exposed images to a Gaussian distribution using the learned hierarchical features of low-light images.In subsequent testing,the reversibility of FLN allows inferring and obtaining enhanced low-light images.Specifically,the HIN extracts as much image information as possible from three scales,local,regional,and global,using a Triple-branch Hierarchical Fusion Module(THFM)and a Dual-Dconv Cross Fusion Module(DCFM).The THFM aggregates regional and global features to enhance the overall brightness and quality of low-light images by perceiving and extracting more structure information,whereas the DCFM uses the properties of the activation function and local features to enhance images at the pixel-level to reduce noise and improve contrast.In addition,in this paper,the model was trained using a negative log-likelihood loss function.Qualitative and quantitative experimental results demonstrate that our HFL can better handle many quality degradation types in low-light images compared with state-of-the-art solutions.The HFL model enhances low-light images with better visibility,less noise,and improved contrast,suitable for practical scenarios such as autonomous driving,medical imaging,and nighttime surveillance.Outperforming them by PSNR=27.26 dB,SSIM=0.93,and LPIPS=0.10 on benchmark dataset LOL-v1.The source code of HFL is available at https://github.com/Smile-QT/HFL-for-LIE.展开更多
Recently,a multitude of techniques that fuse deep learning with Retinex theory have been utilized in the field of low-light image enhancement,yielding remarkable outcomes.Due to the intricate nature of imaging scenari...Recently,a multitude of techniques that fuse deep learning with Retinex theory have been utilized in the field of low-light image enhancement,yielding remarkable outcomes.Due to the intricate nature of imaging scenarios,including fluctuating noise levels and unpredictable environmental elements,these techniques do not fully resolve these challenges.We introduce an innovative strategy that builds upon Retinex theory and integrates a novel deep network architecture,merging the Convolutional Block Attention Module(CBAM)with the Transformer.Our model is capable of detecting more prominent features across both channel and spatial domains.We have conducted extensive experiments across several datasets,namely LOLv1,LOLv2-real,and LOLv2-sync.The results show that our approach surpasses other methods when evaluated against critical metrics such as Peak Signal-to-Noise Ratio(PSNR)and Structural Similarity Index(SSIM).Moreover,we have visually assessed images enhanced by various techniques and utilized visual metrics like LPIPS for comparison,and the experimental data clearly demonstrate that our approach excels visually over other methods as well.展开更多
Enhancing low-light images with color distortion and uneven multi-light source distribution presents challenges. Most advanced methods for low-light image enhancement are based on the Retinex model using deep learning...Enhancing low-light images with color distortion and uneven multi-light source distribution presents challenges. Most advanced methods for low-light image enhancement are based on the Retinex model using deep learning. Retinexformer introduces channel self-attention mechanisms in the IG-MSA. However, it fails to effectively capture long-range spatial dependencies, leaving room for improvement. Based on the Retinexformer deep learning framework, we designed the Retinexformer+ network. The “+” signifies our advancements in extracting long-range spatial dependencies. We introduced multi-scale dilated convolutions in illumination estimation to expand the receptive field. These convolutions effectively capture the weakening semantic dependency between pixels as distance increases. In illumination restoration, we used Unet++ with multi-level skip connections to better integrate semantic information at different scales. The designed Illumination Fusion Dual Self-Attention (IF-DSA) module embeds multi-scale dilated convolutions to achieve spatial self-attention. This module captures long-range spatial semantic relationships within acceptable computational complexity. Experimental results on the Low-Light (LOL) dataset show that Retexformer+ outperforms other State-Of-The-Art (SOTA) methods in both quantitative and qualitative evaluations, with the computational complexity increased to an acceptable 51.63 G FLOPS. On the LOL_v1 dataset, RetinexFormer+ shows an increase of 1.15 in Peak Signal-to-Noise Ratio (PSNR) and a decrease of 0.39 in Root Mean Square Error (RMSE). On the LOL_v2_real dataset, the PSNR increases by 0.42 and the RMSE decreases by 0.18. Experimental results on the Exdark dataset show that Retexformer+ can effectively enhance real-scene images and maintain their semantic information.展开更多
Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of vis...Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.展开更多
Under low-illumination conditions, the quality of image signals deteriorates significantly, typically characterized by a peak signal-to-noise ratio (PSNR) below 10 dB, which severely limits the usability of the images...Under low-illumination conditions, the quality of image signals deteriorates significantly, typically characterized by a peak signal-to-noise ratio (PSNR) below 10 dB, which severely limits the usability of the images. Supervised methods, which utilize paired high-low light images as training sets, can enhance the PSNR to around 20 dB, significantly improving image quality. However, such data is challenging to obtain. In recent years, unsupervised low-light image enhancement (LIE) methods based on the Retinex framework have been proposed, but they generally lag behind supervised methods by 5–10 dB in performance. In this paper, we introduce the Denoising-Distilled Retine (DDR) method, an unsupervised approach that integrates denoising priors into a Retinex-based training framework. By explicitly incorporating denoising, the DDR method effectively addresses the challenges of noise and artifacts in low-light images, thereby enhancing the performance of the Retinex framework. The model achieved a PSNR of 19.82 dB on the LOL dataset, which is comparable to the performance of supervised methods. Furthermore, by applying knowledge distillation, the DDR method optimizes the model for real-time processing of low-light images, achieving a processing speed of 199.7 fps without incurring additional computational costs. While the DDR method has demonstrated superior performance in terms of image quality and processing speed, there is still room for improvement in terms of robustness across different color spaces and under highly resource-constrained conditions. Future research will focus on enhancing the model’s generalizability and adaptability to address these challenges. Our rigorous testing on public datasets further substantiates the DDR method’s state-of-the-art performance in both image quality and processing speed.展开更多
Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propo...Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propose a two-stage unsupervised low-light image enhancement algorithm called Retinex and Exposure Fusion Network(RFNet),which can overcome the problems of over-enhancement of the high dynamic range and under-enhancement of the low dynamic range in existing enhancement algorithms.This algorithm can better manage the challenges brought about by complex environments in real-world scenarios by training with unpaired low-light images and regular-light images.In the first stage,we design a multi-scale feature extraction module based on Retinex theory,capable of extracting details and structural information at different scales to generate high-quality illumination and reflection images.In the second stage,an exposure image generator is designed through the camera response mechanism function to acquire exposure images containing more dark features,and the generated images are fused with the original input images to complete the low-light image enhancement.Experiments show the effectiveness and rationality of each module designed in this paper.And the method reconstructs the details of contrast and color distribution,outperforms the current state-of-the-art methods in both qualitative and quantitative metrics,and shows excellent performance in the real world.展开更多
Computed tomography(CT) reconstruction with a well-registered priori magnetic resonance imaging(MRI) image can improve reconstruction results with low-dose CT, because well-registered CT and MRI images have similar st...Computed tomography(CT) reconstruction with a well-registered priori magnetic resonance imaging(MRI) image can improve reconstruction results with low-dose CT, because well-registered CT and MRI images have similar structures. However, in clinical settings, the CT image of patients does not always match the priori MRI image because of breathing and movement of patients during CT scanning. To improve the image quality in this case, multi-group datasets expansion is proposed in this paper. In our method, multi-group CT-MRI datasets are formed by expanding CT-MRI datasets. These expanded datasets can also be used by most existing CT-MRI algorithms and improve the reconstructed image quality when the CT image of a patient is not registered with the priori MRI image. In the experiments, we evaluate the performance of the algorithm by using multi-group CT-MRI datasets in several unregistered situations. Experiments show that when the CT and priori MRI images are not registered, the reconstruction results of using multi-group dataset expansion are better than those obtained without using the expansion.展开更多
A new image enhancement algorithm based on Retinex theory is proposed to solve the problem of bad visual effect of an image in low-light conditions. First, an image is converted from the RGB color space to the HSV col...A new image enhancement algorithm based on Retinex theory is proposed to solve the problem of bad visual effect of an image in low-light conditions. First, an image is converted from the RGB color space to the HSV color space to get the V channel. Next, the illuminations are respectively estimated by the guided filtering and the variational framework on the V channel and combined into a new illumination by average gradient. The new reflectance is calculated using V channel and the new illumination. Then a new V channel obtained by multiplying the new illumination and reflectance is processed with contrast limited adaptive histogram equalization(CLAHE). Finally, the new image in HSV space is converted back to RGB space to obtain the enhanced image. Experimental results show that the proposed method has better subjective quality and objective quality than existing methods.展开更多
The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients ...The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients who test positive for Covid-19 are diagnosed via a nasal PCR test.In comparison,polymerase chain reaction(PCR)findings take a few hours to a few days.The PCR test is expensive,although the government may bear expenses in certain places.Furthermore,subsets of the population resist invasive testing like swabs.Therefore,chest X-rays or Computerized Vomography(CT)scans are preferred in most cases,and more importantly,they are non-invasive,inexpensive,and provide a faster response time.Recent advances in Artificial Intelligence(AI),in combination with state-of-the-art methods,have allowed for the diagnosis of COVID-19 using chest x-rays.This article proposes a method for classifying COVID-19 as positive or negative on a decentralized dataset that is based on the Federated learning scheme.In order to build a progressive global COVID-19 classification model,two edge devices are employed to train the model on their respective localized dataset,and a 3-layered custom Convolutional Neural Network(CNN)model is used in the process of training the model,which can be deployed from the server.These two edge devices then communicate their learned parameter and weight to the server,where it aggregates and updates the globalmodel.The proposed model is trained using an image dataset that can be found on Kaggle.There are more than 13,000 X-ray images in Kaggle Database collection,from that collection 9000 images of Normal and COVID-19 positive images are used.Each edge node possesses a different number of images;edge node 1 has 3200 images,while edge node 2 has 5800.There is no association between the datasets of the various nodes that are included in the network.By doing it in this manner,each of the nodes will have access to a separate image collection that has no correlation with each other.The diagnosis of COVID-19 has become considerably more efficient with the installation of the suggested algorithm and dataset,and the findings that we have obtained are quite encouraging.展开更多
A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with ...A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.展开更多
Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of...Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).展开更多
Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presen...Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.展开更多
The analysis of Android malware shows that this threat is constantly increasing and is a real threat to mobile devices since traditional approaches,such as signature-based detection,are no longer effective due to the ...The analysis of Android malware shows that this threat is constantly increasing and is a real threat to mobile devices since traditional approaches,such as signature-based detection,are no longer effective due to the continuously advancing level of sophistication.To resolve this problem,efficient and flexible malware detection tools are needed.This work examines the possibility of employing deep CNNs to detect Android malware by transforming network traffic into image data representations.Moreover,the dataset used in this study is the CIC-AndMal2017,which contains 20,000 instances of network traffic across five distinct malware categories:a.Trojan,b.Adware,c.Ransomware,d.Spyware,e.Worm.These network traffic features are then converted to image formats for deep learning,which is applied in a CNN framework,including the VGG16 pre-trained model.In addition,our approach yielded high performance,yielding an accuracy of 0.92,accuracy of 99.1%,precision of 98.2%,recall of 99.5%,and F1 score of 98.7%.Subsequent improvements to the classification model through changes within the VGG19 framework improved the classification rate to 99.25%.Through the results obtained,it is clear that CNNs are a very effective way to classify Android malware,providing greater accuracy than conventional techniques.The success of this approach also shows the applicability of deep learning in mobile security along with the direction for the future advancement of the real-time detection system and other deeper learning techniques to counter the increasing number of threats emerging in the future.展开更多
Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but ...Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but the collection of suitable normal-light images is difficult.In contrast,a self-supervised method breaks free from the reliance on normal-light data,resulting in more convenience and better generalization.Existing self-supervised methods primarily focus on illumination adjustment and design pixel-based adjustment methods,resulting in remnants of other degradations,uneven brightness and artifacts.In response,this paper proposes a self-supervised enhancement method,termed as SLIE.It can handle multiple degradations including illumination attenuation,noise pollution,and color shift,all in a self-supervised manner.Illumination attenuation is estimated based on physical principles and local neighborhood information.The removal and correction of noise and color shift removal are solely realized with noisy images and images with color shifts.Finally,the comprehensive and fully self-supervised approach can achieve better adaptability and generalization.It is applicable to various low light conditions,and can reproduce the original color of scenes in natural light.Extensive experiments conducted on four public datasets demonstrate the superiority of SLIE to thirteen state-of-the-art methods.Our code is available at https://github.com/hanna-xu/SLIE.展开更多
This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed ...This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed images reflecting a highly challenging and unconstraint environment.The methodology for building the dataset consists of four core phases;that include acquisition of videos,extraction of frames,localization of face regions,and cropping and resizing of detected face regions.The raw images in the dataset consist of a total of 4613 frames obtained fromvideo sequences.The processed images in the dataset consist of the face regions of 250 persons extracted from raw data images to ensure the authenticity of the presented data.The dataset further consists of 8 images corresponding to each of the 250 subjects(persons)for a total of 2000 images.It portrays a highly unconstrained and challenging environment with human faces of varying sizes and pixel quality(resolution).Since the face regions in video sequences are severely degraded due to various unavoidable factors,it can be used as a benchmark to test and evaluate face detection and recognition algorithms for research purposes.We have also gathered and displayed records of the presence of subjects who appear in presented frames;in a temporal context.This can also be used as a temporal benchmark for tracking,finding persons,activity monitoring,and crowd counting in large crowd scenarios.展开更多
Aiming at the scale adaptation of automatic driving target detection algorithms in low illumination environments and the shortcomings in target occlusion processing,this paper proposes a YOLO-LKSDS automatic driving d...Aiming at the scale adaptation of automatic driving target detection algorithms in low illumination environments and the shortcomings in target occlusion processing,this paper proposes a YOLO-LKSDS automatic driving detection model.Firstly,the Contrast-Limited Adaptive Histogram Equalisation(CLAHE)image enhancement algorithm is improved to increase the image contrast and enhance the detailed features of the target;then,on the basis of the YOLOv5 model,the Kmeans++clustering algorithm is introduced to obtain a suitable anchor frame,and SPPELAN spatial pyramid pooling is improved to enhance the accuracy and robustness of the model for multi-scale target detection.Finally,an improved SEAM(Separated and Enhancement Attention Module)attention mechanism is combined with the DIOU-NMS algorithm to optimize the model’s performance when dealing with occlusion and dense scenes.Compared with the original model,the improved YOLO-LKSDS model achieves a 13.3%improvement in accuracy,a 1.7%improvement in mAP,and 240,000 fewer parameters on the BDD100K dataset.In order to validate the generalization of the improved algorithm,we selected the KITTI dataset for experimentation,which shows that YOLOv5’s accuracy improves by 21.1%,recall by 36.6%,and mAP50 by 29.5%,respectively,on the KITTI dataset.The deployment of this paper’s algorithm is verified by an edge computing platform,where the average speed of detection reaches 24.4 FPS while power consumption remains below 9 W,demonstrating high real-time capability and energy efficiency.展开更多
基金funded by the National Natural Science Foundation of China,grant numbers 52374156 and 62476005。
文摘Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.
文摘Low-light image enhancement is one of the most active research areas in the field of computer vision in recent years.In the low-light image enhancement process,loss of image details and increase in noise occur inevitably,influencing the quality of enhanced images.To alleviate this problem,a low-light image enhancement model called RetinexNet model based on Retinex theory was proposed in this study.The model was composed of an image decomposition module and a brightness enhancement module.In the decomposition module,a convolutional block attention module(CBAM)was incorporated to enhance feature representation capacity of the network,focusing on crucial features and suppressing irrelevant ones.A multifeature fusion denoising module was designed within the brightness enhancement module,circumventing the issue of feature loss during downsampling.The proposed model outperforms the existing algorithms in terms of PSNR and SSIM metrics on the publicly available datasets LOL and MIT-Adobe FiveK,as well as gives superior results in terms of NIQE metrics on the publicly available dataset LIME.
基金This researchwas Sponsored by Xinjiang Uygur Autonomous Region Tianshan Talent Programme Project(2023TCLJ02)Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01C349).
文摘Infrared and visible light image fusion technology integrates feature information from two different modalities into a fused image to obtain more comprehensive information.However,in low-light scenarios,the illumination degradation of visible light images makes it difficult for existing fusion methods to extract texture detail information from the scene.At this time,relying solely on the target saliency information provided by infrared images is far from sufficient.To address this challenge,this paper proposes a lightweight infrared and visible light image fusion method based on low-light enhancement,named LLE-Fuse.The method is based on the improvement of the MobileOne Block,using the Edge-MobileOne Block embedded with the Sobel operator to perform feature extraction and downsampling on the source images.The intermediate features at different scales obtained are then fused by a cross-modal attention fusion module.In addition,the Contrast Limited Adaptive Histogram Equalization(CLAHE)algorithm is used for image enhancement of both infrared and visible light images,guiding the network model to learn low-light enhancement capabilities through enhancement loss.Upon completion of network training,the Edge-MobileOne Block is optimized into a direct connection structure similar to MobileNetV1 through structural reparameterization,effectively reducing computational resource consumption.Finally,after extensive experimental comparisons,our method achieved improvements of 4.6%,40.5%,156.9%,9.2%,and 98.6%in the evaluation metrics Standard Deviation(SD),Visual Information Fidelity(VIF),Entropy(EN),and Spatial Frequency(SF),respectively,compared to the best results of the compared algorithms,while only being 1.5 ms/it slower in computation speed than the fastest method.
基金supported by the National Natural Science Foundation of China(Grant Nos.61971078,61501070)the Scientific Research Foundation of Chongqing University of Technology(Grant No.0121230236)the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJ202301165).
文摘Low-light images often have defects such as low visibility,low contrast,high noise,and high color distortion compared with well-exposed images.If the low-light region of an image is enhanced directly,the noise will inevitably blur the whole image.Besides,according to the retina-and-cortex(retinex)theory of color vision,the reflectivity of different image regions may differ,limiting the enhancement performance of applying uniform operations to the entire image.Therefore,we design a Hierarchical Flow Learning(HFL)framework,which consists of a Hierarchical Image Network(HIN)and a normalized invertible Flow Learning Network(FLN).HIN can extract hierarchical structural features from low-light images,while FLN maps the distribution of normally exposed images to a Gaussian distribution using the learned hierarchical features of low-light images.In subsequent testing,the reversibility of FLN allows inferring and obtaining enhanced low-light images.Specifically,the HIN extracts as much image information as possible from three scales,local,regional,and global,using a Triple-branch Hierarchical Fusion Module(THFM)and a Dual-Dconv Cross Fusion Module(DCFM).The THFM aggregates regional and global features to enhance the overall brightness and quality of low-light images by perceiving and extracting more structure information,whereas the DCFM uses the properties of the activation function and local features to enhance images at the pixel-level to reduce noise and improve contrast.In addition,in this paper,the model was trained using a negative log-likelihood loss function.Qualitative and quantitative experimental results demonstrate that our HFL can better handle many quality degradation types in low-light images compared with state-of-the-art solutions.The HFL model enhances low-light images with better visibility,less noise,and improved contrast,suitable for practical scenarios such as autonomous driving,medical imaging,and nighttime surveillance.Outperforming them by PSNR=27.26 dB,SSIM=0.93,and LPIPS=0.10 on benchmark dataset LOL-v1.The source code of HFL is available at https://github.com/Smile-QT/HFL-for-LIE.
文摘Recently,a multitude of techniques that fuse deep learning with Retinex theory have been utilized in the field of low-light image enhancement,yielding remarkable outcomes.Due to the intricate nature of imaging scenarios,including fluctuating noise levels and unpredictable environmental elements,these techniques do not fully resolve these challenges.We introduce an innovative strategy that builds upon Retinex theory and integrates a novel deep network architecture,merging the Convolutional Block Attention Module(CBAM)with the Transformer.Our model is capable of detecting more prominent features across both channel and spatial domains.We have conducted extensive experiments across several datasets,namely LOLv1,LOLv2-real,and LOLv2-sync.The results show that our approach surpasses other methods when evaluated against critical metrics such as Peak Signal-to-Noise Ratio(PSNR)and Structural Similarity Index(SSIM).Moreover,we have visually assessed images enhanced by various techniques and utilized visual metrics like LPIPS for comparison,and the experimental data clearly demonstrate that our approach excels visually over other methods as well.
基金supported by the Key Laboratory of Forensic Science and Technology at College of Sichuan Province(2023YB04).
文摘Enhancing low-light images with color distortion and uneven multi-light source distribution presents challenges. Most advanced methods for low-light image enhancement are based on the Retinex model using deep learning. Retinexformer introduces channel self-attention mechanisms in the IG-MSA. However, it fails to effectively capture long-range spatial dependencies, leaving room for improvement. Based on the Retinexformer deep learning framework, we designed the Retinexformer+ network. The “+” signifies our advancements in extracting long-range spatial dependencies. We introduced multi-scale dilated convolutions in illumination estimation to expand the receptive field. These convolutions effectively capture the weakening semantic dependency between pixels as distance increases. In illumination restoration, we used Unet++ with multi-level skip connections to better integrate semantic information at different scales. The designed Illumination Fusion Dual Self-Attention (IF-DSA) module embeds multi-scale dilated convolutions to achieve spatial self-attention. This module captures long-range spatial semantic relationships within acceptable computational complexity. Experimental results on the Low-Light (LOL) dataset show that Retexformer+ outperforms other State-Of-The-Art (SOTA) methods in both quantitative and qualitative evaluations, with the computational complexity increased to an acceptable 51.63 G FLOPS. On the LOL_v1 dataset, RetinexFormer+ shows an increase of 1.15 in Peak Signal-to-Noise Ratio (PSNR) and a decrease of 0.39 in Root Mean Square Error (RMSE). On the LOL_v2_real dataset, the PSNR increases by 0.42 and the RMSE decreases by 0.18. Experimental results on the Exdark dataset show that Retexformer+ can effectively enhance real-scene images and maintain their semantic information.
基金supported by the National Natural Science Foundation of China(Grant No.62302086)the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the Fundamental Research Funds for the Central Universities(Grant No.N2317005).
文摘Visible-infrared object detection leverages the day-night stable object perception capability of infrared images to enhance detection robustness in low-light environments by fusing the complementary information of visible and infrared images.However,the inherent differences in the imaging mechanisms of visible and infrared modalities make effective cross-modal fusion challenging.Furthermore,constrained by the physical characteristics of sensors and thermal diffusion effects,infrared images generally suffer from blurred object contours and missing details,making it difficult to extract object features effectively.To address these issues,we propose an infrared-visible image fusion network that realizesmultimodal information fusion of infrared and visible images through a carefully designedmultiscale fusion strategy.First,we design an adaptive gray-radiance enhancement(AGRE)module to strengthen the detail representation in infrared images,improving their usability in complex lighting scenarios.Next,we introduce a channelspatial feature interaction(CSFI)module,which achieves efficient complementarity between the RGB and infrared(IR)modalities via dynamic channel switching and a spatial attention mechanism.Finally,we propose a multi-scale enhanced cross-attention fusion(MSECA)module,which optimizes the fusion ofmulti-level features through dynamic convolution and gating mechanisms and captures long-range complementary relationships of cross-modal features on a global scale,thereby enhancing the expressiveness of the fused features.Experiments on the KAIST,M3FD,and FLIR datasets demonstrate that our method delivers outstanding performance in daytime and nighttime scenarios.On the KAIST dataset,the miss rate drops to 5.99%,and further to 4.26% in night scenes.On the FLIR and M3FD datasets,it achieves AP50 scores of 79.4% and 88.9%,respectively.
基金support by the Guangxi Natural Science Foundation(Grant No.2024GXNSFAA010484)the NationalNatural Science Foundation of China(No.62466013),this work has been made possible.
文摘Under low-illumination conditions, the quality of image signals deteriorates significantly, typically characterized by a peak signal-to-noise ratio (PSNR) below 10 dB, which severely limits the usability of the images. Supervised methods, which utilize paired high-low light images as training sets, can enhance the PSNR to around 20 dB, significantly improving image quality. However, such data is challenging to obtain. In recent years, unsupervised low-light image enhancement (LIE) methods based on the Retinex framework have been proposed, but they generally lag behind supervised methods by 5–10 dB in performance. In this paper, we introduce the Denoising-Distilled Retine (DDR) method, an unsupervised approach that integrates denoising priors into a Retinex-based training framework. By explicitly incorporating denoising, the DDR method effectively addresses the challenges of noise and artifacts in low-light images, thereby enhancing the performance of the Retinex framework. The model achieved a PSNR of 19.82 dB on the LOL dataset, which is comparable to the performance of supervised methods. Furthermore, by applying knowledge distillation, the DDR method optimizes the model for real-time processing of low-light images, achieving a processing speed of 199.7 fps without incurring additional computational costs. While the DDR method has demonstrated superior performance in terms of image quality and processing speed, there is still room for improvement in terms of robustness across different color spaces and under highly resource-constrained conditions. Future research will focus on enhancing the model’s generalizability and adaptability to address these challenges. Our rigorous testing on public datasets further substantiates the DDR method’s state-of-the-art performance in both image quality and processing speed.
基金supported by the National Key Research and Development Program Topics(Grant No.2021YFB4000905)the National Natural Science Foundation of China(Grant Nos.62101432 and 62102309)in part by Shaanxi Natural Science Fundamental Research Program Project(No.2022JM-508).
文摘Low-light image enhancement methods have limitations in addressing issues such as color distortion,lack of vibrancy,and uneven light distribution and often require paired training data.To address these issues,we propose a two-stage unsupervised low-light image enhancement algorithm called Retinex and Exposure Fusion Network(RFNet),which can overcome the problems of over-enhancement of the high dynamic range and under-enhancement of the low dynamic range in existing enhancement algorithms.This algorithm can better manage the challenges brought about by complex environments in real-world scenarios by training with unpaired low-light images and regular-light images.In the first stage,we design a multi-scale feature extraction module based on Retinex theory,capable of extracting details and structural information at different scales to generate high-quality illumination and reflection images.In the second stage,an exposure image generator is designed through the camera response mechanism function to acquire exposure images containing more dark features,and the generated images are fused with the original input images to complete the low-light image enhancement.Experiments show the effectiveness and rationality of each module designed in this paper.And the method reconstructs the details of contrast and color distribution,outperforms the current state-of-the-art methods in both qualitative and quantitative metrics,and shows excellent performance in the real world.
基金the National Natural Science Foundation of China(No.813716234)the National Basic Research Program(973)of China(No.2010CB834302)Shanghai Jiao Tong University Medical Engineering Cross Research Funds(Nos.YG2013MS30 and YG2014ZD05)
文摘Computed tomography(CT) reconstruction with a well-registered priori magnetic resonance imaging(MRI) image can improve reconstruction results with low-dose CT, because well-registered CT and MRI images have similar structures. However, in clinical settings, the CT image of patients does not always match the priori MRI image because of breathing and movement of patients during CT scanning. To improve the image quality in this case, multi-group datasets expansion is proposed in this paper. In our method, multi-group CT-MRI datasets are formed by expanding CT-MRI datasets. These expanded datasets can also be used by most existing CT-MRI algorithms and improve the reconstructed image quality when the CT image of a patient is not registered with the priori MRI image. In the experiments, we evaluate the performance of the algorithm by using multi-group CT-MRI datasets in several unregistered situations. Experiments show that when the CT and priori MRI images are not registered, the reconstruction results of using multi-group dataset expansion are better than those obtained without using the expansion.
基金supported by the China Scholarship CouncilPostgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX17_0776)the Natural Science Foundation of NUPT(No.NY214039)
文摘A new image enhancement algorithm based on Retinex theory is proposed to solve the problem of bad visual effect of an image in low-light conditions. First, an image is converted from the RGB color space to the HSV color space to get the V channel. Next, the illuminations are respectively estimated by the guided filtering and the variational framework on the V channel and combined into a new illumination by average gradient. The new reflectance is calculated using V channel and the new illumination. Then a new V channel obtained by multiplying the new illumination and reflectance is processed with contrast limited adaptive histogram equalization(CLAHE). Finally, the new image in HSV space is converted back to RGB space to obtain the enhanced image. Experimental results show that the proposed method has better subjective quality and objective quality than existing methods.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R66)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients who test positive for Covid-19 are diagnosed via a nasal PCR test.In comparison,polymerase chain reaction(PCR)findings take a few hours to a few days.The PCR test is expensive,although the government may bear expenses in certain places.Furthermore,subsets of the population resist invasive testing like swabs.Therefore,chest X-rays or Computerized Vomography(CT)scans are preferred in most cases,and more importantly,they are non-invasive,inexpensive,and provide a faster response time.Recent advances in Artificial Intelligence(AI),in combination with state-of-the-art methods,have allowed for the diagnosis of COVID-19 using chest x-rays.This article proposes a method for classifying COVID-19 as positive or negative on a decentralized dataset that is based on the Federated learning scheme.In order to build a progressive global COVID-19 classification model,two edge devices are employed to train the model on their respective localized dataset,and a 3-layered custom Convolutional Neural Network(CNN)model is used in the process of training the model,which can be deployed from the server.These two edge devices then communicate their learned parameter and weight to the server,where it aggregates and updates the globalmodel.The proposed model is trained using an image dataset that can be found on Kaggle.There are more than 13,000 X-ray images in Kaggle Database collection,from that collection 9000 images of Normal and COVID-19 positive images are used.Each edge node possesses a different number of images;edge node 1 has 3200 images,while edge node 2 has 5800.There is no association between the datasets of the various nodes that are included in the network.By doing it in this manner,each of the nodes will have access to a separate image collection that has no correlation with each other.The diagnosis of COVID-19 has become considerably more efficient with the installation of the suggested algorithm and dataset,and the findings that we have obtained are quite encouraging.
基金funded by theDeanship of Research andGraduate Studies at King Khalid University through Large Research Project under grant number RGP2/417/46.
文摘A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.
基金supported by the National Natural Science Foundation of China(Grant Nos.62072250,61772281,61702235,U1636117,U1804263,62172435,61872203 and 61802212)the Zhongyuan Science and Technology Innovation Leading Talent Project of China(Grant No.214200510019)+3 种基金the Suqian Municipal Science and Technology Plan Project in 2020(S202015)the Plan for Scientific Talent of Henan Province(Grant No.2018JR0018)the Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology(Grant No.2020B1212060078)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)Fund.
文摘Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).
文摘Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images.
基金funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University,through the Research Funding Program,Grant No.(FRP-1443-15).
文摘The analysis of Android malware shows that this threat is constantly increasing and is a real threat to mobile devices since traditional approaches,such as signature-based detection,are no longer effective due to the continuously advancing level of sophistication.To resolve this problem,efficient and flexible malware detection tools are needed.This work examines the possibility of employing deep CNNs to detect Android malware by transforming network traffic into image data representations.Moreover,the dataset used in this study is the CIC-AndMal2017,which contains 20,000 instances of network traffic across five distinct malware categories:a.Trojan,b.Adware,c.Ransomware,d.Spyware,e.Worm.These network traffic features are then converted to image formats for deep learning,which is applied in a CNN framework,including the VGG16 pre-trained model.In addition,our approach yielded high performance,yielding an accuracy of 0.92,accuracy of 99.1%,precision of 98.2%,recall of 99.5%,and F1 score of 98.7%.Subsequent improvements to the classification model through changes within the VGG19 framework improved the classification rate to 99.25%.Through the results obtained,it is clear that CNNs are a very effective way to classify Android malware,providing greater accuracy than conventional techniques.The success of this approach also shows the applicability of deep learning in mobile security along with the direction for the future advancement of the real-time detection system and other deeper learning techniques to counter the increasing number of threats emerging in the future.
基金supported by the National Natural Science Foundation of China(62276192)。
文摘Low-light images suffer from low quality due to poor lighting conditions,noise pollution,and improper settings of cameras.To enhance low-light images,most existing methods rely on normal-light images for guidance but the collection of suitable normal-light images is difficult.In contrast,a self-supervised method breaks free from the reliance on normal-light data,resulting in more convenience and better generalization.Existing self-supervised methods primarily focus on illumination adjustment and design pixel-based adjustment methods,resulting in remnants of other degradations,uneven brightness and artifacts.In response,this paper proposes a self-supervised enhancement method,termed as SLIE.It can handle multiple degradations including illumination attenuation,noise pollution,and color shift,all in a self-supervised manner.Illumination attenuation is estimated based on physical principles and local neighborhood information.The removal and correction of noise and color shift removal are solely realized with noisy images and images with color shifts.Finally,the comprehensive and fully self-supervised approach can achieve better adaptability and generalization.It is applicable to various low light conditions,and can reproduce the original color of scenes in natural light.Extensive experiments conducted on four public datasets demonstrate the superiority of SLIE to thirteen state-of-the-art methods.Our code is available at https://github.com/hanna-xu/SLIE.
基金This research was supported by the Deanship of Scientific Research,Islamic University of Madinah,Madinah(KSA),under Tammayuz program Grant Number 1442/505.
文摘This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed images reflecting a highly challenging and unconstraint environment.The methodology for building the dataset consists of four core phases;that include acquisition of videos,extraction of frames,localization of face regions,and cropping and resizing of detected face regions.The raw images in the dataset consist of a total of 4613 frames obtained fromvideo sequences.The processed images in the dataset consist of the face regions of 250 persons extracted from raw data images to ensure the authenticity of the presented data.The dataset further consists of 8 images corresponding to each of the 250 subjects(persons)for a total of 2000 images.It portrays a highly unconstrained and challenging environment with human faces of varying sizes and pixel quality(resolution).Since the face regions in video sequences are severely degraded due to various unavoidable factors,it can be used as a benchmark to test and evaluate face detection and recognition algorithms for research purposes.We have also gathered and displayed records of the presence of subjects who appear in presented frames;in a temporal context.This can also be used as a temporal benchmark for tracking,finding persons,activity monitoring,and crowd counting in large crowd scenarios.
基金supported by the Key R&D Program of Shaanxi Province(No.2025CYYBXM-078).
文摘Aiming at the scale adaptation of automatic driving target detection algorithms in low illumination environments and the shortcomings in target occlusion processing,this paper proposes a YOLO-LKSDS automatic driving detection model.Firstly,the Contrast-Limited Adaptive Histogram Equalisation(CLAHE)image enhancement algorithm is improved to increase the image contrast and enhance the detailed features of the target;then,on the basis of the YOLOv5 model,the Kmeans++clustering algorithm is introduced to obtain a suitable anchor frame,and SPPELAN spatial pyramid pooling is improved to enhance the accuracy and robustness of the model for multi-scale target detection.Finally,an improved SEAM(Separated and Enhancement Attention Module)attention mechanism is combined with the DIOU-NMS algorithm to optimize the model’s performance when dealing with occlusion and dense scenes.Compared with the original model,the improved YOLO-LKSDS model achieves a 13.3%improvement in accuracy,a 1.7%improvement in mAP,and 240,000 fewer parameters on the BDD100K dataset.In order to validate the generalization of the improved algorithm,we selected the KITTI dataset for experimentation,which shows that YOLOv5’s accuracy improves by 21.1%,recall by 36.6%,and mAP50 by 29.5%,respectively,on the KITTI dataset.The deployment of this paper’s algorithm is verified by an edge computing platform,where the average speed of detection reaches 24.4 FPS while power consumption remains below 9 W,demonstrating high real-time capability and energy efficiency.