Deployable mechanism with preferable deployable performance,strong expansibility,and lightweight has attracted much attention because of their potential in aerospace.A basic deployable pyramid unit with good deployabi...Deployable mechanism with preferable deployable performance,strong expansibility,and lightweight has attracted much attention because of their potential in aerospace.A basic deployable pyramid unit with good deployability and expandability is proposed to construct a sizeable deployable mechanism.Firstly,the basic unit folding principle and expansion method is proposed.The configuration synthesis method of adding constraint chains of spatial closed-loop mechanism is used to synthesize the basic unit.Then,the degree of freedom of the basic unit is analyzed using the screw theory and the link dismantling method.Next,the three-dimensional models of the pyramid unit,expansion unit,and array unit are established,and the folding motion simulation analysis is carried out.Based on the number of components,weight reduction rate,and deployable rate,the performance characteristics of the three types of mechanisms are described in detail.Finally,prototypes of the pyramid unit,combination unit,and expansion unit are developed to verify further the correctness of the configuration synthesis based on the pyramid.The proposed deployable mechanism provides aference for the design and application of antennas with a large aperture,high deployable rate,and lightweight.It has a good application prospect in the aerospace field.展开更多
The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditio...The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.展开更多
Single Image Super-Resolution(SISR)seeks to reconstruct high-resolution(HR)images from lowresolution(LR)inputs,thereby enhancing visual fidelity and the perception of fine details.While Transformer-based models—such ...Single Image Super-Resolution(SISR)seeks to reconstruct high-resolution(HR)images from lowresolution(LR)inputs,thereby enhancing visual fidelity and the perception of fine details.While Transformer-based models—such as SwinIR,Restormer,and HAT—have recently achieved impressive results in super-resolution tasks by capturing global contextual information,these methods often suffer from substantial computational and memory overhead,which limits their deployment on resource-constrained edge devices.To address these challenges,we propose a novel lightweight super-resolution network,termed Binary Attention-Guided Information Distillation(BAID),which integrates frequency-aware modeling with a binary attention mechanism to significantly reduce computational complexity and parameter count whilemaintaining strong reconstruction performance.The network combines a high–low frequency decoupling strategy with a local–global attention sharing mechanism,enabling efficient compression of redundant computations through binary attention guidance.At the core of the architecture lies the Attention-Guided Distillation Block(AGDB),which retains the strengths of the information distillation framework while introducing a sparse binary attention module to enhance both inference efficiency and feature representation.Extensive×4 superresolution experiments on four standard benchmarks—Set5,Set14,BSD100,and Urban100—demonstrate that BAID achieves Peak Signal-to-Noise Ratio(PSNR)values of 32.13,28.51,27.47,and 26.15,respectively,with only 1.22 million parameters and 26.1 G Floating-Point Operations(FLOPs),outperforming other state-of-the-art lightweight methods such as Information Multi-Distillation Network(IMDN)and Residual Feature Distillation Network(RFDN).These results highlight the proposed model’s ability to deliver high-quality image reconstruction while offering strong deployment efficiency,making it well-suited for image restoration tasks in resource-limited environments.展开更多
Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstructio...Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstruction methods either compromise on accuracy with iterative algorithms or suffer from limited generalizability with task-specific deep learning approaches.Methods:We present LDM-PIR,a lightweight physics-conditioned diffusion multi-model for medical image reconstruction that addresses key challenges in magnetic resonance imaging(MRI),CT,and low-photon imaging.Unlike traditional iterative methods,which are computationally expensive,or task-specific deep learning approaches lacking generalizability,integrates three innovations.A physics-conditioned diffusion framework that embeds acquisition operators(Fourier/Radon transforms)and noise models directly into the reconstruction process.A multi-model architecture that unifies denoising,inpainting,and super-resolution via shared weight conditioning.A lightweight design(2.1M parameters)enabling rapid inference(0.8s/image on GPU).Through self-supervised fine-tuning with measurement consistency losses adapts to new imaging modalities using fewer annotated samples.Results:Achieves state-of-the-art performance on fastMRI(peak signal-to-noise ratio(PSNR):34.04 for single-coil/31.50 for multi-coil)and Lung Image Database Consortium and Image Database Resource Initiative(28.83 PSNR under Poisson noise).Clinical evaluations demonstrate superior preservation of anatomical structures,with SSIM improvements of 8.8%for single-coil and 4.36%for multi-coil MRI over uDPIR.Conclusion:It offers a flexible,efficient,and scalable solution for medical image reconstruction,addressing the challenges of noise,undersampling,and modality generalization.The model’s lightweight design allows for rapid inference,while its self-supervised fine-tuning capability minimizes reliance on large annotated datasets,making it suitable for real-world clinical applications.展开更多
Detecting small forest fire targets in unmanned aerial vehicle(UAV)images is difficult,as flames typically cover only a very limited portion of the visual scene.This study proposes Context-guided Compact Lightweight N...Detecting small forest fire targets in unmanned aerial vehicle(UAV)images is difficult,as flames typically cover only a very limited portion of the visual scene.This study proposes Context-guided Compact Lightweight Network(CCLNet),an end-to-end lightweight model designed to detect small forest fire targets while ensuring efficient inference on devices with constrained computational resources.CCLNet employs a three-stage network architecture.Its key components include three modules.C3F-Convolutional Gated Linear Unit(C3F-CGLU)performs selective local feature extraction while preserving fine-grained high-frequency flame details.Context-Guided Feature Fusion Module(CGFM)replaces plain concatenation with triplet-attention interactions to emphasize subtle flame patterns.Lightweight Shared Convolution with Separated Batch Normalization Detection(LSCSBD)reduces parameters through separated batch normalization while maintaining scale-specific statistics.We build TF-11K,an 11,139-image dataset combining 9139 self-collected UAV images from subtropical forests and 2000 re-annotated frames from the FLAME dataset.On TF-11K,CCLNet attains 85.8%mAP@0.5,45.5%mean Average Precision(mAP)@[0.5:0.95],87.4%precision,and 79.1%recall with 2.21 M parameters and 5.7 Giga Floating-point Operations Per Second(GFLOPs).The ablation study confirms that each module contributes to both accuracy and efficiency.Cross-dataset evaluation on DFS yields 77.5%mAP@0.5 and 42.3%mAP@[0.5:0.95],indicating good generalization to unseen scenes.These results suggest that CCLNet offers a practical balance between accuracy and speed for small-target forest fire monitoring with UAVs.展开更多
Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order t...Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order to solve this problem,we propose a new method,which combined the lightweight network mobile vision transformer(Mobile Vi T)with the convolutional block attention module(CBAM)mechanism and the new regression loss function.This method needed less computation resources,making it more suitable for embedded edge detection devices.Meanwhile,the new loss function improved the positioning accuracy of the bounding box and enhanced the robustness of the model.In addition,experiments on public datasets demonstrate that the improved model achieves an average accuracy of 87.9%across six typical defect detection tasks,while reducing computational costs by nearly 90%.It significantly reduces the model's computational requirements while maintaining accuracy,ensuring reliable performance for edge deployment.展开更多
Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is cr...Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is crucial for computationally limited portable devices such as augmented reality and virtual reality.With the rapid advancements in deep learning,many network models have been developed specifically for eye image segmentation.Some methods divide the segmentation process into multiple stages to achieve model parameter miniaturization while enhancing output through post processing techniques to improve segmentation accuracy.These approaches significantly increase the inference time.Other networks adopt more complex encoding and decoding modules to achieve end-to-end output,which requires substantial computation.Therefore,balancing the model’s size,accuracy,and computational complexity is essential.To address these challenges,we propose a lightweight asymmetric UNet architecture and a projection loss function.We utilize ResNet-3 layer blocks to enhance feature extraction efficiency in the encoding stage.In the decoding stage,we employ regular convolutions and skip connections to upscale the feature maps from the latent space to the original image size,balancing the model size and segmentation accuracy.In addition,we leverage the geometric features of the eye region and design a projection loss function to further improve the segmentation accuracy without adding any additional inference computational cost.We validate our approach on the OpenEDS2019 dataset for virtual reality and achieve state-of-the-art performance with 95.33%mean intersection over union(mIoU).Our model has only 0.63M parameters and 350 FPS,which are 68%and 200%of the state-of-the-art model RITNet,respectively.展开更多
Accurately identifying crop pests and diseases ensures agricultural productivity and safety.Although current YOLO-based detection models offer real-time capabilities,their conventional convolutional layers involve hig...Accurately identifying crop pests and diseases ensures agricultural productivity and safety.Although current YOLO-based detection models offer real-time capabilities,their conventional convolutional layers involve high computational redundancy and a fixed receptive field,making it challenging to capture local details and global semantics in complex scenarios simultaneously.This leads to significant issues like missed detections of small targets and heightened sensitivity to background interference.To address these challenges,this paper proposes a lightweight adaptive detection network—StarSpark-AdaptiveNet(SSANet),which optimizes features through a dual-module collaborative mechanism.Specifically,the StarNet module utilizes Depthwise separable convolutions(DW-Conv)and dynamic star operations to establish multi-stage feature extraction pathways,enhancing local detail perception within a lightweight framework.Moreover,the Multi-scale Adaptive Spatial Attention Gate(MASAG)module integrates cross-layer feature fusion and dynamic weight allocation to capture multi-scale global contextual information,effectively suppressing background noise.These modules jointly form a“local enhancement-global calibration”bidirectional optimization mechanism,significantly improving the model’s adaptability to complex disease patterns.Furthermore,the proposed Scale-based Dynamic Loss(SD Loss)dynamically adjusts the weight of scale and localization losses,improving regression stability and localization accuracy,especially for small targets.Experiments on the eggplant fruit disease dataset demonstrate that SSANet achieves an mAP50 of 83.9%and a detection speed of 273.5 FPS with only 2.11 M parameters and 5.1 GFLOPs computational cost,outperforming the baseline YOLO11 model by reducing parameters by 18.1%,increasing mAP50 by 1.3%,and improving inference speed by 9.1%.Ablation studies further confirm the effectiveness and complementarity of the modules.SSANet offers a high-accuracy,low-cost solution suitable for real-time pest and disease detection in crops,facilitating edge device deployment and promoting precision agriculture.展开更多
In recent years,the country has spent significant workforce and material resources to prevent traffic accidents,particularly those caused by fatigued driving.The current studies mainly concentrate on driver physiologi...In recent years,the country has spent significant workforce and material resources to prevent traffic accidents,particularly those caused by fatigued driving.The current studies mainly concentrate on driver physiological signals,driving behavior,and vehicle information.However,most of the approaches are computationally intensive and inconvenient for real-time detection.Therefore,this paper designs a network that combines precision,speed and lightweight and proposes an algorithm for facial fatigue detection based on multi-feature fusion.Specifically,the face detection model takes YOLOv8(You Only Look Once version 8)as the basic framework,and replaces its backbone network with MobileNetv3.To focus on the significant regions in the image,CPCA(Channel Prior Convolution Attention)is adopted to enhance the network’s capacity for feature extraction.Meanwhile,the network training phase employs the Focal-EIOU(Focal and Efficient Intersection Over Union)loss function,which makes the network lightweight and increases the accuracy of target detection.Ultimately,the Dlib toolkit was employed to annotate 68 facial feature points.This study established an evaluation metric for facial fatigue and developed a novel fatigue detection algorithm to assess the driver’s condition.A series of comparative experiments were carried out on the self-built dataset.The suggested method’s mAP(mean Average Precision)values for object detection and fatigue detection are 96.71%and 95.75%,respectively,as well as the detection speed is 47 FPS(Frames Per Second).This method can balance the contradiction between computational complexity and model accuracy.Furthermore,it can be transplanted to NVIDIA Jetson Orin NX and quickly detect the driver’s state while maintaining a high degree of accuracy.It contributes to the development of automobile safety systems and reduces the occurrence of traffic accidents.展开更多
Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstl...Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstly,a multiplexed aggregated feature extraction network is proposed using residual bottleneck block(RES-Bottleneck)and middle partial-convolution(MP-Conv)to capture multi-scale spatial features and enhance focus on disease features for better differentiation between disease targets and background information.Secondly,a lightweight feature fusion network is designed using scale-fuse concatenation(SF-Cat)and triple-scale sequence feature fusion(TSSF)module to merge multi-scale feature maps comprehensively.Depthwise convolution(DWConv)and GhostNet lighten the network,while the cross stage partial bottleneck with 3 convolutions ghost-normalization attention module(C3-GN)reduces missed detections by suppressing irrelevant background information.Finally,soft non-maximum suppression(Soft-NMS)is used in the post-processing stage to improve the problem of misdetection of dense disease sites.The results show that the MSL-Net improves mean average precision at intersection over union of 0.5(mAP@0.5)by 2.0%over the baseline you only look once version 5s(YOLOv5s)and reduces parameters by 44%,reducing computation by 27%,outperforming other state-of-the-art(SOTA)models overall.This method also shows excellent performance compared to the latest research.展开更多
Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning b...Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.展开更多
Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by le...Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by learning distinctive representations,most existing spatial and hybrid attention methods focus on local regions with extensive parameters,making them unsuitable for lightweight CNNs.In this paper,we propose a self-attention mechanism tailored for lightweight networks,namely the brief self-attention module(BSAM).BSAM consists of the brief spatial attention(BSA)and advanced channel attention blocks.Unlike conventional self-attention methods with many parameters,our BSA block improves the performance of lightweight networks by effectively learning global semantic representations.Moreover,BSAM can be seamlessly integrated into lightweight CNNs for end-to-end training,maintaining the network’s lightweight and mobile characteristics.We validate the effectiveness of the proposed method on image classification tasks using the Food-101,Caltech-256,and Mini-ImageNet datasets.展开更多
Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes,where segmentati...Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes,where segmentation maps contain sparse and fragmented landslide regions under diverse geographical conditions.To address these issues,we propose a lightweight dual-stream siamese deep learning framework that integrates optical and topographical data fusion with an adaptive decoder,guided multimodal fusion,and deep supervision.The framework is built upon the synergistic combination of cross-attention,gated fusion,and sub-pixel upsampling within a unified dual-stream architecture specifically optimized for landslide segmentation,enabling efficient context modeling and robust feature exchange between modalities.The decoder captures long-range context at deeper levels using lightweight cross-attention and refines spatial details at shallower levels through attention-gated skip fusion,enabling precise boundary delineation and fewer false positives.The gated fusion further enhances multimodal integration of optical and topographical cues,and the deep supervision stabilizes training and improves generalization.Moreover,to mitigate checkerboard artifacts,a learnable sub-pixel upsampling is devised to replace the traditional transposed convolution.Despite its compact design with fewer parameters,the model consistently outperforms state-of-the-art baselines.Experiments on two benchmark datasets,Landslide4Sense and Bijie,confirm the effectiveness of the framework.On the Bijie dataset,it achieves an F1-score of 0.9110 and an intersection over union(IoU)of 0.8839.These results highlight its potential for accurate large-scale landslide inventory mapping and real-time disaster response.The implementation is publicly available at https://github.com/mishaown/DiGATe-UNet-LandSlide-Segmentation(accessed on 3 November 2025).展开更多
Aiming at the problem of potential information noise introduced during the generation of ghost feature maps in GhostNet,this paper proposes a novel lightweight neural network model called ResghostNet.This model constr...Aiming at the problem of potential information noise introduced during the generation of ghost feature maps in GhostNet,this paper proposes a novel lightweight neural network model called ResghostNet.This model constructs the Resghost Module by combining residual connections and Adaptive-SE Blocks,which enhances the quality of generated feature maps through direct propagation of original input information and selection of important channels before cheap operations.Specifically,ResghostNet introduces residual connections on the basis of the Ghost Module to optimize the information flow,and designs a weight self-attention mechanism combined with SE blocks to enhance feature expression capabilities in cheap operations.Experimental results on the ImageNet dataset show that,compared to GhostNet,ResghostNet achieves higher accuracy while reducing the number of parameters by 52%.Although the computational complexity increases,by optimizing the usage strategy of GPU cachememory,themodel’s inference speed becomes faster.The ResghostNet is optimized in terms of classification accuracy and the number of model parameters,and shows great potential in edge computing devices.展开更多
Unauthorized operations referred to as“black flights”of unmanned aerial vehicles(UAVs)pose a significant danger to public safety,and existing low-attitude object detection algorithms encounter difficulties in balanc...Unauthorized operations referred to as“black flights”of unmanned aerial vehicles(UAVs)pose a significant danger to public safety,and existing low-attitude object detection algorithms encounter difficulties in balancing detection precision and speed.Additionally,their accuracy is insufficient,particularly for small objects in complex environments.To solve these problems,we propose a lightweight feature-enhanced convolutional neural network able to perform detection with high precision detection for low-attitude flying objects in real time to provide guidance information to suppress black-flying UAVs.The proposed network consists of three modules.A lightweight and stable feature extraction module is used to reduce the computational load and stably extract more low-level feature,an enhanced feature processing module significantly improves the feature extraction ability of the model,and an accurate detection module integrates low-level and advanced features to improve the multiscale detection accuracy in complex environments,particularly for small objects.The proposed method achieves a detection speed of 147 frames per second(FPS)and a mean average precision(mAP)of 90.97%for a dataset composed of flying objects,indicating its potential for low-altitude object detection.Furthermore,evaluation results based on microsoft common objects in context(MS COCO)indicate that the proposed method is also applicable to object detection in general.展开更多
To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease rec...To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.展开更多
Accurately identifying defect patterns in wafer maps can help engineers find abnormal failure factors in production lines.During the wafer testing stage,deep learning methods are widely used in wafer defect detection ...Accurately identifying defect patterns in wafer maps can help engineers find abnormal failure factors in production lines.During the wafer testing stage,deep learning methods are widely used in wafer defect detection due to their powerful feature extraction capa-bilities.However,most of the current wafer defect patterns classification models have high complexity and slow detection speed,which are difficult to apply in the actual wafer production process.In addition,there is a data imbalance in the wafer dataset that seriously affects the training results of the model.To reduce the complexity of the deep model without affecting the wafer feature expression,this paper adjusts the structure of the dense block in the PeleeNet network and proposes a lightweight network WM‐PeleeNet based on the PeleeNet module.In addition,to reduce the impact of data imbalance on model training,this paper proposes a wafer data augmentation method based on a convolutional autoencoder by adding random Gaussian noise to the hidden layer.The method proposed in this paper has an average accuracy of 95.4%on the WM‐811K wafer dataset with only 173.643 KB of the parameters and 316.194 M of FLOPs,and takes only 22.99 s to detect 1000 wafer pictures.Compared with the original PeleeNet network without optimization,the number of parameters and FLOPs are reduced by 92.68%and 58.85%,respectively.Data augmentation on the minority class wafer map improves the average classification accuracy by 1.8%on the WM‐811K dataset.At the same time,the recognition accuracy of minority classes such as Scratch pattern and Donut pattern are significantly improved.展开更多
The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study s...The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study super-resolution(SR)algorithms applied to CT images to improve the reso-lution of CT images.However,most of the existing SR algorithms are studied based on natural images,which are not suitable for medical images;and most of these algorithms improve the reconstruction quality by increasing the network depth,which is not suitable for machines with limited resources.To alleviate these issues,we propose a residual feature attentional fusion network for lightweight chest CT image super-resolution(RFAFN).Specifically,we design a contextual feature extraction block(CFEB)that can extract CT image features more efficiently and accurately than ordinary residual blocks.In addition,we propose a feature-weighted cascading strategy(FWCS)based on attentional feature fusion blocks(AFFB)to utilize the high-frequency detail information extracted by CFEB as much as possible via selectively fusing adjacent level feature information.Finally,we suggest a global hierarchical feature fusion strategy(GHFFS),which can utilize the hierarchical features more effectively than dense concatenation by progressively aggregating the feature information at various levels.Numerous experiments show that our method performs better than most of the state-of-the-art(SOTA)methods on the COVID-19 chest CT dataset.In detail,the peak signal-to-noise ratio(PSNR)is 0.11 dB and 0.47 dB higher on CTtest1 and CTtest2 at×3 SR compared to the suboptimal method,but the number of parameters and multi-adds are reduced by 22K and 0.43G,respectively.Our method can better recover chest CT image quality with fewer computational resources and effectively assist in COVID-19.展开更多
Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”k...Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”kernel size of the convolutional operator,for the spatially dense patterns,such as the generic face images,the performance of CNNs is limited.Here,we propose a“non-local”model,termed the Speckle-Transformer(SpT)UNet,for speckle feature extraction of generic face images.It is worth noting that the lightweight SpT UNet reveals a high efficiency and strong comparative performance with Pearson Correlation Coefficient(PCC),and structural similarity measure(SSIM)exceeding 0.989,and 0.950,respectively.展开更多
As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of ...As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of faces is a challenging process.This paper proposes a face age estimation algorithm based on lightweight convolutional neural network in view of the complexity of the environment and the limitations of device computing ability.Improving face age estimation based on Soft Stagewise Regression Network(SSR-Net)and facial images,this paper employs the Center Symmetric Local Binary Pattern(CSLBP)method to obtain the feature image and then combines the face image and the feature image as network input data.Adding feature images to the convolutional neural network can improve the accuracy as well as increase the network model robustness.The experimental results on IMDB-WIKI and MORPH 2 datasets show that the lightweight convolutional neural network method proposed in this paper reduces model complexity and increases the accuracy of face age estimations.展开更多
基金Supported by National Natural Science Foundation of China(Grant No.52075467)Jiangsu Provincial Natural Science Foundation of China(Grant No.BK20220649)+1 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions of China(Grant No.23KJB460010)Jiangsu Provincial Key R&D Project(Grant No.BE2022062).
文摘Deployable mechanism with preferable deployable performance,strong expansibility,and lightweight has attracted much attention because of their potential in aerospace.A basic deployable pyramid unit with good deployability and expandability is proposed to construct a sizeable deployable mechanism.Firstly,the basic unit folding principle and expansion method is proposed.The configuration synthesis method of adding constraint chains of spatial closed-loop mechanism is used to synthesize the basic unit.Then,the degree of freedom of the basic unit is analyzed using the screw theory and the link dismantling method.Next,the three-dimensional models of the pyramid unit,expansion unit,and array unit are established,and the folding motion simulation analysis is carried out.Based on the number of components,weight reduction rate,and deployable rate,the performance characteristics of the three types of mechanisms are described in detail.Finally,prototypes of the pyramid unit,combination unit,and expansion unit are developed to verify further the correctness of the configuration synthesis based on the pyramid.The proposed deployable mechanism provides aference for the design and application of antennas with a large aperture,high deployable rate,and lightweight.It has a good application prospect in the aerospace field.
基金funded by the National Natural Science Foundation of China under Grant No.62371187the Open Program of Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center under Grant No.2024JS101.
文摘The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.
基金funded by Project of Sichuan Provincial Department of Science and Technology under 2025JDKP0150the Fundamental Research Funds for the Central Universities under 25CAFUC03093.
文摘Single Image Super-Resolution(SISR)seeks to reconstruct high-resolution(HR)images from lowresolution(LR)inputs,thereby enhancing visual fidelity and the perception of fine details.While Transformer-based models—such as SwinIR,Restormer,and HAT—have recently achieved impressive results in super-resolution tasks by capturing global contextual information,these methods often suffer from substantial computational and memory overhead,which limits their deployment on resource-constrained edge devices.To address these challenges,we propose a novel lightweight super-resolution network,termed Binary Attention-Guided Information Distillation(BAID),which integrates frequency-aware modeling with a binary attention mechanism to significantly reduce computational complexity and parameter count whilemaintaining strong reconstruction performance.The network combines a high–low frequency decoupling strategy with a local–global attention sharing mechanism,enabling efficient compression of redundant computations through binary attention guidance.At the core of the architecture lies the Attention-Guided Distillation Block(AGDB),which retains the strengths of the information distillation framework while introducing a sparse binary attention module to enhance both inference efficiency and feature representation.Extensive×4 superresolution experiments on four standard benchmarks—Set5,Set14,BSD100,and Urban100—demonstrate that BAID achieves Peak Signal-to-Noise Ratio(PSNR)values of 32.13,28.51,27.47,and 26.15,respectively,with only 1.22 million parameters and 26.1 G Floating-Point Operations(FLOPs),outperforming other state-of-the-art lightweight methods such as Information Multi-Distillation Network(IMDN)and Residual Feature Distillation Network(RFDN).These results highlight the proposed model’s ability to deliver high-quality image reconstruction while offering strong deployment efficiency,making it well-suited for image restoration tasks in resource-limited environments.
文摘Background:Medical imaging advancements are constrained by fundamental trade-offs between acquisition speed,radiation dose,and image quality,forcing clinicians to work with noisy,incomplete data.Existing reconstruction methods either compromise on accuracy with iterative algorithms or suffer from limited generalizability with task-specific deep learning approaches.Methods:We present LDM-PIR,a lightweight physics-conditioned diffusion multi-model for medical image reconstruction that addresses key challenges in magnetic resonance imaging(MRI),CT,and low-photon imaging.Unlike traditional iterative methods,which are computationally expensive,or task-specific deep learning approaches lacking generalizability,integrates three innovations.A physics-conditioned diffusion framework that embeds acquisition operators(Fourier/Radon transforms)and noise models directly into the reconstruction process.A multi-model architecture that unifies denoising,inpainting,and super-resolution via shared weight conditioning.A lightweight design(2.1M parameters)enabling rapid inference(0.8s/image on GPU).Through self-supervised fine-tuning with measurement consistency losses adapts to new imaging modalities using fewer annotated samples.Results:Achieves state-of-the-art performance on fastMRI(peak signal-to-noise ratio(PSNR):34.04 for single-coil/31.50 for multi-coil)and Lung Image Database Consortium and Image Database Resource Initiative(28.83 PSNR under Poisson noise).Clinical evaluations demonstrate superior preservation of anatomical structures,with SSIM improvements of 8.8%for single-coil and 4.36%for multi-coil MRI over uDPIR.Conclusion:It offers a flexible,efficient,and scalable solution for medical image reconstruction,addressing the challenges of noise,undersampling,and modality generalization.The model’s lightweight design allows for rapid inference,while its self-supervised fine-tuning capability minimizes reliance on large annotated datasets,making it suitable for real-world clinical applications.
基金funded by the Natural Science Foundation of Hunan Province(Grant No.2025JJ80352)the National Natural Science Foundation Project of China(Grant No.32271879).
文摘Detecting small forest fire targets in unmanned aerial vehicle(UAV)images is difficult,as flames typically cover only a very limited portion of the visual scene.This study proposes Context-guided Compact Lightweight Network(CCLNet),an end-to-end lightweight model designed to detect small forest fire targets while ensuring efficient inference on devices with constrained computational resources.CCLNet employs a three-stage network architecture.Its key components include three modules.C3F-Convolutional Gated Linear Unit(C3F-CGLU)performs selective local feature extraction while preserving fine-grained high-frequency flame details.Context-Guided Feature Fusion Module(CGFM)replaces plain concatenation with triplet-attention interactions to emphasize subtle flame patterns.Lightweight Shared Convolution with Separated Batch Normalization Detection(LSCSBD)reduces parameters through separated batch normalization while maintaining scale-specific statistics.We build TF-11K,an 11,139-image dataset combining 9139 self-collected UAV images from subtropical forests and 2000 re-annotated frames from the FLAME dataset.On TF-11K,CCLNet attains 85.8%mAP@0.5,45.5%mean Average Precision(mAP)@[0.5:0.95],87.4%precision,and 79.1%recall with 2.21 M parameters and 5.7 Giga Floating-point Operations Per Second(GFLOPs).The ablation study confirms that each module contributes to both accuracy and efficiency.Cross-dataset evaluation on DFS yields 77.5%mAP@0.5 and 42.3%mAP@[0.5:0.95],indicating good generalization to unseen scenes.These results suggest that CCLNet offers a practical balance between accuracy and speed for small-target forest fire monitoring with UAVs.
基金supported by the National Natural Science Foundation of China(Nos.62373215,62373219 and 62073193)the Natural Science Foundation of Shandong Province(No.ZR2023MF100)+1 种基金the Key Projects of the Ministry of Industry and Information Technology(No.TC220H057-2022)the Independently Developed Instrument Funds of Shandong University(No.zy20240201)。
文摘Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order to solve this problem,we propose a new method,which combined the lightweight network mobile vision transformer(Mobile Vi T)with the convolutional block attention module(CBAM)mechanism and the new regression loss function.This method needed less computation resources,making it more suitable for embedded edge detection devices.Meanwhile,the new loss function improved the positioning accuracy of the bounding box and enhanced the robustness of the model.In addition,experiments on public datasets demonstrate that the improved model achieves an average accuracy of 87.9%across six typical defect detection tasks,while reducing computational costs by nearly 90%.It significantly reduces the model's computational requirements while maintaining accuracy,ensuring reliable performance for edge deployment.
基金supported by the HFIPS Director’s Foundation(YZJJ202207-TS),the National Natural Science Foundation of China(82371931)the Natural Science Foundation of Anhui Province(2008085MC69)+3 种基金the Natural Science Foundation of Hefei City(2021033)the General Scientific Research Project of Anhui Provincial Health Commission(AHWJ2021b150)the Collaborative Innovation Program of Hefei Science Center,CAS(2021HSC-CIP013)the Anhui Province Key Research and Development Project(202204295107020004).
文摘Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is crucial for computationally limited portable devices such as augmented reality and virtual reality.With the rapid advancements in deep learning,many network models have been developed specifically for eye image segmentation.Some methods divide the segmentation process into multiple stages to achieve model parameter miniaturization while enhancing output through post processing techniques to improve segmentation accuracy.These approaches significantly increase the inference time.Other networks adopt more complex encoding and decoding modules to achieve end-to-end output,which requires substantial computation.Therefore,balancing the model’s size,accuracy,and computational complexity is essential.To address these challenges,we propose a lightweight asymmetric UNet architecture and a projection loss function.We utilize ResNet-3 layer blocks to enhance feature extraction efficiency in the encoding stage.In the decoding stage,we employ regular convolutions and skip connections to upscale the feature maps from the latent space to the original image size,balancing the model size and segmentation accuracy.In addition,we leverage the geometric features of the eye region and design a projection loss function to further improve the segmentation accuracy without adding any additional inference computational cost.We validate our approach on the OpenEDS2019 dataset for virtual reality and achieve state-of-the-art performance with 95.33%mean intersection over union(mIoU).Our model has only 0.63M parameters and 350 FPS,which are 68%and 200%of the state-of-the-art model RITNet,respectively.
基金suported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science and ICT(NRF-2022R1A2C2012243).
文摘Accurately identifying crop pests and diseases ensures agricultural productivity and safety.Although current YOLO-based detection models offer real-time capabilities,their conventional convolutional layers involve high computational redundancy and a fixed receptive field,making it challenging to capture local details and global semantics in complex scenarios simultaneously.This leads to significant issues like missed detections of small targets and heightened sensitivity to background interference.To address these challenges,this paper proposes a lightweight adaptive detection network—StarSpark-AdaptiveNet(SSANet),which optimizes features through a dual-module collaborative mechanism.Specifically,the StarNet module utilizes Depthwise separable convolutions(DW-Conv)and dynamic star operations to establish multi-stage feature extraction pathways,enhancing local detail perception within a lightweight framework.Moreover,the Multi-scale Adaptive Spatial Attention Gate(MASAG)module integrates cross-layer feature fusion and dynamic weight allocation to capture multi-scale global contextual information,effectively suppressing background noise.These modules jointly form a“local enhancement-global calibration”bidirectional optimization mechanism,significantly improving the model’s adaptability to complex disease patterns.Furthermore,the proposed Scale-based Dynamic Loss(SD Loss)dynamically adjusts the weight of scale and localization losses,improving regression stability and localization accuracy,especially for small targets.Experiments on the eggplant fruit disease dataset demonstrate that SSANet achieves an mAP50 of 83.9%and a detection speed of 273.5 FPS with only 2.11 M parameters and 5.1 GFLOPs computational cost,outperforming the baseline YOLO11 model by reducing parameters by 18.1%,increasing mAP50 by 1.3%,and improving inference speed by 9.1%.Ablation studies further confirm the effectiveness and complementarity of the modules.SSANet offers a high-accuracy,low-cost solution suitable for real-time pest and disease detection in crops,facilitating edge device deployment and promoting precision agriculture.
基金supported by the Science and Technology Bureau of Xi’an project(24KGDW0049)the Key Research and Development Programof Shaanxi(2023-YBGY-264)the Key Research and Development Program of Guangxi(GK-AB20159032).
文摘In recent years,the country has spent significant workforce and material resources to prevent traffic accidents,particularly those caused by fatigued driving.The current studies mainly concentrate on driver physiological signals,driving behavior,and vehicle information.However,most of the approaches are computationally intensive and inconvenient for real-time detection.Therefore,this paper designs a network that combines precision,speed and lightweight and proposes an algorithm for facial fatigue detection based on multi-feature fusion.Specifically,the face detection model takes YOLOv8(You Only Look Once version 8)as the basic framework,and replaces its backbone network with MobileNetv3.To focus on the significant regions in the image,CPCA(Channel Prior Convolution Attention)is adopted to enhance the network’s capacity for feature extraction.Meanwhile,the network training phase employs the Focal-EIOU(Focal and Efficient Intersection Over Union)loss function,which makes the network lightweight and increases the accuracy of target detection.Ultimately,the Dlib toolkit was employed to annotate 68 facial feature points.This study established an evaluation metric for facial fatigue and developed a novel fatigue detection algorithm to assess the driver’s condition.A series of comparative experiments were carried out on the self-built dataset.The suggested method’s mAP(mean Average Precision)values for object detection and fatigue detection are 96.71%and 95.75%,respectively,as well as the detection speed is 47 FPS(Frames Per Second).This method can balance the contradiction between computational complexity and model accuracy.Furthermore,it can be transplanted to NVIDIA Jetson Orin NX and quickly detect the driver’s state while maintaining a high degree of accuracy.It contributes to the development of automobile safety systems and reduces the occurrence of traffic accidents.
文摘Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstly,a multiplexed aggregated feature extraction network is proposed using residual bottleneck block(RES-Bottleneck)and middle partial-convolution(MP-Conv)to capture multi-scale spatial features and enhance focus on disease features for better differentiation between disease targets and background information.Secondly,a lightweight feature fusion network is designed using scale-fuse concatenation(SF-Cat)and triple-scale sequence feature fusion(TSSF)module to merge multi-scale feature maps comprehensively.Depthwise convolution(DWConv)and GhostNet lighten the network,while the cross stage partial bottleneck with 3 convolutions ghost-normalization attention module(C3-GN)reduces missed detections by suppressing irrelevant background information.Finally,soft non-maximum suppression(Soft-NMS)is used in the post-processing stage to improve the problem of misdetection of dense disease sites.The results show that the MSL-Net improves mean average precision at intersection over union of 0.5(mAP@0.5)by 2.0%over the baseline you only look once version 5s(YOLOv5s)and reduces parameters by 44%,reducing computation by 27%,outperforming other state-of-the-art(SOTA)models overall.This method also shows excellent performance compared to the latest research.
基金funded by Innovation and Development Special Project of China Meteorological Administration(CXFZ2022J038,CXFZ2024J035)Sichuan Science and Technology Program(No.2023YFQ0072)+1 种基金Key Laboratory of Smart Earth(No.KF2023YB03-07)Automatic Software Generation and Intelligent Service Key Laboratory of Sichuan Province(CUIT-SAG202210).
文摘Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.
文摘Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by learning distinctive representations,most existing spatial and hybrid attention methods focus on local regions with extensive parameters,making them unsuitable for lightweight CNNs.In this paper,we propose a self-attention mechanism tailored for lightweight networks,namely the brief self-attention module(BSAM).BSAM consists of the brief spatial attention(BSA)and advanced channel attention blocks.Unlike conventional self-attention methods with many parameters,our BSA block improves the performance of lightweight networks by effectively learning global semantic representations.Moreover,BSAM can be seamlessly integrated into lightweight CNNs for end-to-end training,maintaining the network’s lightweight and mobile characteristics.We validate the effectiveness of the proposed method on image classification tasks using the Food-101,Caltech-256,and Mini-ImageNet datasets.
基金funded by the National Natural Science Foundation of China,grant number 62262045the Fundamental Research Funds for the Central Universities,grant number 2023CDJYGRH-YB11the Open Funding of SUGON Industrial Control and Security Center,grant number CUIT-SICSC-2025-03.
文摘Automatic segmentation of landslides from remote sensing imagery is challenging because traditional machine learning and early CNN-based models often fail to generalize across heterogeneous landscapes,where segmentation maps contain sparse and fragmented landslide regions under diverse geographical conditions.To address these issues,we propose a lightweight dual-stream siamese deep learning framework that integrates optical and topographical data fusion with an adaptive decoder,guided multimodal fusion,and deep supervision.The framework is built upon the synergistic combination of cross-attention,gated fusion,and sub-pixel upsampling within a unified dual-stream architecture specifically optimized for landslide segmentation,enabling efficient context modeling and robust feature exchange between modalities.The decoder captures long-range context at deeper levels using lightweight cross-attention and refines spatial details at shallower levels through attention-gated skip fusion,enabling precise boundary delineation and fewer false positives.The gated fusion further enhances multimodal integration of optical and topographical cues,and the deep supervision stabilizes training and improves generalization.Moreover,to mitigate checkerboard artifacts,a learnable sub-pixel upsampling is devised to replace the traditional transposed convolution.Despite its compact design with fewer parameters,the model consistently outperforms state-of-the-art baselines.Experiments on two benchmark datasets,Landslide4Sense and Bijie,confirm the effectiveness of the framework.On the Bijie dataset,it achieves an F1-score of 0.9110 and an intersection over union(IoU)of 0.8839.These results highlight its potential for accurate large-scale landslide inventory mapping and real-time disaster response.The implementation is publicly available at https://github.com/mishaown/DiGATe-UNet-LandSlide-Segmentation(accessed on 3 November 2025).
基金funded by Science and Technology Innovation Project grant No.ZZKY20222304.
文摘Aiming at the problem of potential information noise introduced during the generation of ghost feature maps in GhostNet,this paper proposes a novel lightweight neural network model called ResghostNet.This model constructs the Resghost Module by combining residual connections and Adaptive-SE Blocks,which enhances the quality of generated feature maps through direct propagation of original input information and selection of important channels before cheap operations.Specifically,ResghostNet introduces residual connections on the basis of the Ghost Module to optimize the information flow,and designs a weight self-attention mechanism combined with SE blocks to enhance feature expression capabilities in cheap operations.Experimental results on the ImageNet dataset show that,compared to GhostNet,ResghostNet achieves higher accuracy while reducing the number of parameters by 52%.Although the computational complexity increases,by optimizing the usage strategy of GPU cachememory,themodel’s inference speed becomes faster.The ResghostNet is optimized in terms of classification accuracy and the number of model parameters,and shows great potential in edge computing devices.
基金supported by the National Natural Science Foundation of China(52075027)the Fundamental Research Funds for the Central Universities(2020XJJD03).
文摘Unauthorized operations referred to as“black flights”of unmanned aerial vehicles(UAVs)pose a significant danger to public safety,and existing low-attitude object detection algorithms encounter difficulties in balancing detection precision and speed.Additionally,their accuracy is insufficient,particularly for small objects in complex environments.To solve these problems,we propose a lightweight feature-enhanced convolutional neural network able to perform detection with high precision detection for low-attitude flying objects in real time to provide guidance information to suppress black-flying UAVs.The proposed network consists of three modules.A lightweight and stable feature extraction module is used to reduce the computational load and stably extract more low-level feature,an enhanced feature processing module significantly improves the feature extraction ability of the model,and an accurate detection module integrates low-level and advanced features to improve the multiscale detection accuracy in complex environments,particularly for small objects.The proposed method achieves a detection speed of 147 frames per second(FPS)and a mean average precision(mAP)of 90.97%for a dataset composed of flying objects,indicating its potential for low-altitude object detection.Furthermore,evaluation results based on microsoft common objects in context(MS COCO)indicate that the proposed method is also applicable to object detection in general.
基金funded by the Science and Technology Development Program of Jilin Province(20190301024NY)the Precision Agriculture and Big Data Engineering Research Center of Jilin Province(2020C005).
文摘To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.
基金supported by a project jointly funded by the Beijing Municipal Education Commission and Municipal Natural Science Foundation under grant KZ202010005004.
文摘Accurately identifying defect patterns in wafer maps can help engineers find abnormal failure factors in production lines.During the wafer testing stage,deep learning methods are widely used in wafer defect detection due to their powerful feature extraction capa-bilities.However,most of the current wafer defect patterns classification models have high complexity and slow detection speed,which are difficult to apply in the actual wafer production process.In addition,there is a data imbalance in the wafer dataset that seriously affects the training results of the model.To reduce the complexity of the deep model without affecting the wafer feature expression,this paper adjusts the structure of the dense block in the PeleeNet network and proposes a lightweight network WM‐PeleeNet based on the PeleeNet module.In addition,to reduce the impact of data imbalance on model training,this paper proposes a wafer data augmentation method based on a convolutional autoencoder by adding random Gaussian noise to the hidden layer.The method proposed in this paper has an average accuracy of 95.4%on the WM‐811K wafer dataset with only 173.643 KB of the parameters and 316.194 M of FLOPs,and takes only 22.99 s to detect 1000 wafer pictures.Compared with the original PeleeNet network without optimization,the number of parameters and FLOPs are reduced by 92.68%and 58.85%,respectively.Data augmentation on the minority class wafer map improves the average classification accuracy by 1.8%on the WM‐811K dataset.At the same time,the recognition accuracy of minority classes such as Scratch pattern and Donut pattern are significantly improved.
基金supported by the General Project of Natural Science Foundation of Hebei Province of China(H2019201378)the Foundation of the President of Hebei University(XZJJ201917)the Special Project for Cultivating Scientific and Technological Innovation Ability of University and Middle School Students of Hebei Province(2021H060306).
文摘The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study super-resolution(SR)algorithms applied to CT images to improve the reso-lution of CT images.However,most of the existing SR algorithms are studied based on natural images,which are not suitable for medical images;and most of these algorithms improve the reconstruction quality by increasing the network depth,which is not suitable for machines with limited resources.To alleviate these issues,we propose a residual feature attentional fusion network for lightweight chest CT image super-resolution(RFAFN).Specifically,we design a contextual feature extraction block(CFEB)that can extract CT image features more efficiently and accurately than ordinary residual blocks.In addition,we propose a feature-weighted cascading strategy(FWCS)based on attentional feature fusion blocks(AFFB)to utilize the high-frequency detail information extracted by CFEB as much as possible via selectively fusing adjacent level feature information.Finally,we suggest a global hierarchical feature fusion strategy(GHFFS),which can utilize the hierarchical features more effectively than dense concatenation by progressively aggregating the feature information at various levels.Numerous experiments show that our method performs better than most of the state-of-the-art(SOTA)methods on the COVID-19 chest CT dataset.In detail,the peak signal-to-noise ratio(PSNR)is 0.11 dB and 0.47 dB higher on CTtest1 and CTtest2 at×3 SR compared to the suboptimal method,but the number of parameters and multi-adds are reduced by 22K and 0.43G,respectively.Our method can better recover chest CT image quality with fewer computational resources and effectively assist in COVID-19.
基金funding support from the Science and Technology Commission of Shanghai Municipality(Grant No.21DZ1100500)the Shanghai Frontiers Science Center Program(2021-2025 No.20)+2 种基金the Zhangjiang National Innovation Demonstration Zone(Grant No.ZJ2019ZD-005)supported by a fellowship from the China Postdoctoral Science Foundation(2020M671169)the International Postdoctoral Exchange Program from the Administrative Committee of Post-Doctoral Researchers of China([2020]33)。
文摘Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”kernel size of the convolutional operator,for the spatially dense patterns,such as the generic face images,the performance of CNNs is limited.Here,we propose a“non-local”model,termed the Speckle-Transformer(SpT)UNet,for speckle feature extraction of generic face images.It is worth noting that the lightweight SpT UNet reveals a high efficiency and strong comparative performance with Pearson Correlation Coefficient(PCC),and structural similarity measure(SSIM)exceeding 0.989,and 0.950,respectively.
基金This work was funded by the foundation of Liaoning Educational committee under the Grant No.2019LNJC03.
文摘As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of faces is a challenging process.This paper proposes a face age estimation algorithm based on lightweight convolutional neural network in view of the complexity of the environment and the limitations of device computing ability.Improving face age estimation based on Soft Stagewise Regression Network(SSR-Net)and facial images,this paper employs the Center Symmetric Local Binary Pattern(CSLBP)method to obtain the feature image and then combines the face image and the feature image as network input data.Adding feature images to the convolutional neural network can improve the accuracy as well as increase the network model robustness.The experimental results on IMDB-WIKI and MORPH 2 datasets show that the lightweight convolutional neural network method proposed in this paper reduces model complexity and increases the accuracy of face age estimations.