Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by le...Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by learning distinctive representations,most existing spatial and hybrid attention methods focus on local regions with extensive parameters,making them unsuitable for lightweight CNNs.In this paper,we propose a self-attention mechanism tailored for lightweight networks,namely the brief self-attention module(BSAM).BSAM consists of the brief spatial attention(BSA)and advanced channel attention blocks.Unlike conventional self-attention methods with many parameters,our BSA block improves the performance of lightweight networks by effectively learning global semantic representations.Moreover,BSAM can be seamlessly integrated into lightweight CNNs for end-to-end training,maintaining the network’s lightweight and mobile characteristics.We validate the effectiveness of the proposed method on image classification tasks using the Food-101,Caltech-256,and Mini-ImageNet datasets.展开更多
The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditio...The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.展开更多
Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order t...Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order to solve this problem,we propose a new method,which combined the lightweight network mobile vision transformer(Mobile Vi T)with the convolutional block attention module(CBAM)mechanism and the new regression loss function.This method needed less computation resources,making it more suitable for embedded edge detection devices.Meanwhile,the new loss function improved the positioning accuracy of the bounding box and enhanced the robustness of the model.In addition,experiments on public datasets demonstrate that the improved model achieves an average accuracy of 87.9%across six typical defect detection tasks,while reducing computational costs by nearly 90%.It significantly reduces the model's computational requirements while maintaining accuracy,ensuring reliable performance for edge deployment.展开更多
Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is cr...Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is crucial for computationally limited portable devices such as augmented reality and virtual reality.With the rapid advancements in deep learning,many network models have been developed specifically for eye image segmentation.Some methods divide the segmentation process into multiple stages to achieve model parameter miniaturization while enhancing output through post processing techniques to improve segmentation accuracy.These approaches significantly increase the inference time.Other networks adopt more complex encoding and decoding modules to achieve end-to-end output,which requires substantial computation.Therefore,balancing the model’s size,accuracy,and computational complexity is essential.To address these challenges,we propose a lightweight asymmetric UNet architecture and a projection loss function.We utilize ResNet-3 layer blocks to enhance feature extraction efficiency in the encoding stage.In the decoding stage,we employ regular convolutions and skip connections to upscale the feature maps from the latent space to the original image size,balancing the model size and segmentation accuracy.In addition,we leverage the geometric features of the eye region and design a projection loss function to further improve the segmentation accuracy without adding any additional inference computational cost.We validate our approach on the OpenEDS2019 dataset for virtual reality and achieve state-of-the-art performance with 95.33%mean intersection over union(mIoU).Our model has only 0.63M parameters and 350 FPS,which are 68%and 200%of the state-of-the-art model RITNet,respectively.展开更多
Accurately identifying crop pests and diseases ensures agricultural productivity and safety.Although current YOLO-based detection models offer real-time capabilities,their conventional convolutional layers involve hig...Accurately identifying crop pests and diseases ensures agricultural productivity and safety.Although current YOLO-based detection models offer real-time capabilities,their conventional convolutional layers involve high computational redundancy and a fixed receptive field,making it challenging to capture local details and global semantics in complex scenarios simultaneously.This leads to significant issues like missed detections of small targets and heightened sensitivity to background interference.To address these challenges,this paper proposes a lightweight adaptive detection network—StarSpark-AdaptiveNet(SSANet),which optimizes features through a dual-module collaborative mechanism.Specifically,the StarNet module utilizes Depthwise separable convolutions(DW-Conv)and dynamic star operations to establish multi-stage feature extraction pathways,enhancing local detail perception within a lightweight framework.Moreover,the Multi-scale Adaptive Spatial Attention Gate(MASAG)module integrates cross-layer feature fusion and dynamic weight allocation to capture multi-scale global contextual information,effectively suppressing background noise.These modules jointly form a“local enhancement-global calibration”bidirectional optimization mechanism,significantly improving the model’s adaptability to complex disease patterns.Furthermore,the proposed Scale-based Dynamic Loss(SD Loss)dynamically adjusts the weight of scale and localization losses,improving regression stability and localization accuracy,especially for small targets.Experiments on the eggplant fruit disease dataset demonstrate that SSANet achieves an mAP50 of 83.9%and a detection speed of 273.5 FPS with only 2.11 M parameters and 5.1 GFLOPs computational cost,outperforming the baseline YOLO11 model by reducing parameters by 18.1%,increasing mAP50 by 1.3%,and improving inference speed by 9.1%.Ablation studies further confirm the effectiveness and complementarity of the modules.SSANet offers a high-accuracy,low-cost solution suitable for real-time pest and disease detection in crops,facilitating edge device deployment and promoting precision agriculture.展开更多
In recent years,the country has spent significant workforce and material resources to prevent traffic accidents,particularly those caused by fatigued driving.The current studies mainly concentrate on driver physiologi...In recent years,the country has spent significant workforce and material resources to prevent traffic accidents,particularly those caused by fatigued driving.The current studies mainly concentrate on driver physiological signals,driving behavior,and vehicle information.However,most of the approaches are computationally intensive and inconvenient for real-time detection.Therefore,this paper designs a network that combines precision,speed and lightweight and proposes an algorithm for facial fatigue detection based on multi-feature fusion.Specifically,the face detection model takes YOLOv8(You Only Look Once version 8)as the basic framework,and replaces its backbone network with MobileNetv3.To focus on the significant regions in the image,CPCA(Channel Prior Convolution Attention)is adopted to enhance the network’s capacity for feature extraction.Meanwhile,the network training phase employs the Focal-EIOU(Focal and Efficient Intersection Over Union)loss function,which makes the network lightweight and increases the accuracy of target detection.Ultimately,the Dlib toolkit was employed to annotate 68 facial feature points.This study established an evaluation metric for facial fatigue and developed a novel fatigue detection algorithm to assess the driver’s condition.A series of comparative experiments were carried out on the self-built dataset.The suggested method’s mAP(mean Average Precision)values for object detection and fatigue detection are 96.71%and 95.75%,respectively,as well as the detection speed is 47 FPS(Frames Per Second).This method can balance the contradiction between computational complexity and model accuracy.Furthermore,it can be transplanted to NVIDIA Jetson Orin NX and quickly detect the driver’s state while maintaining a high degree of accuracy.It contributes to the development of automobile safety systems and reduces the occurrence of traffic accidents.展开更多
Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstl...Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstly,a multiplexed aggregated feature extraction network is proposed using residual bottleneck block(RES-Bottleneck)and middle partial-convolution(MP-Conv)to capture multi-scale spatial features and enhance focus on disease features for better differentiation between disease targets and background information.Secondly,a lightweight feature fusion network is designed using scale-fuse concatenation(SF-Cat)and triple-scale sequence feature fusion(TSSF)module to merge multi-scale feature maps comprehensively.Depthwise convolution(DWConv)and GhostNet lighten the network,while the cross stage partial bottleneck with 3 convolutions ghost-normalization attention module(C3-GN)reduces missed detections by suppressing irrelevant background information.Finally,soft non-maximum suppression(Soft-NMS)is used in the post-processing stage to improve the problem of misdetection of dense disease sites.The results show that the MSL-Net improves mean average precision at intersection over union of 0.5(mAP@0.5)by 2.0%over the baseline you only look once version 5s(YOLOv5s)and reduces parameters by 44%,reducing computation by 27%,outperforming other state-of-the-art(SOTA)models overall.This method also shows excellent performance compared to the latest research.展开更多
Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning b...Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.展开更多
Unauthorized operations referred to as“black flights”of unmanned aerial vehicles(UAVs)pose a significant danger to public safety,and existing low-attitude object detection algorithms encounter difficulties in balanc...Unauthorized operations referred to as“black flights”of unmanned aerial vehicles(UAVs)pose a significant danger to public safety,and existing low-attitude object detection algorithms encounter difficulties in balancing detection precision and speed.Additionally,their accuracy is insufficient,particularly for small objects in complex environments.To solve these problems,we propose a lightweight feature-enhanced convolutional neural network able to perform detection with high precision detection for low-attitude flying objects in real time to provide guidance information to suppress black-flying UAVs.The proposed network consists of three modules.A lightweight and stable feature extraction module is used to reduce the computational load and stably extract more low-level feature,an enhanced feature processing module significantly improves the feature extraction ability of the model,and an accurate detection module integrates low-level and advanced features to improve the multiscale detection accuracy in complex environments,particularly for small objects.The proposed method achieves a detection speed of 147 frames per second(FPS)and a mean average precision(mAP)of 90.97%for a dataset composed of flying objects,indicating its potential for low-altitude object detection.Furthermore,evaluation results based on microsoft common objects in context(MS COCO)indicate that the proposed method is also applicable to object detection in general.展开更多
To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease rec...To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.展开更多
Accurately identifying defect patterns in wafer maps can help engineers find abnormal failure factors in production lines.During the wafer testing stage,deep learning methods are widely used in wafer defect detection ...Accurately identifying defect patterns in wafer maps can help engineers find abnormal failure factors in production lines.During the wafer testing stage,deep learning methods are widely used in wafer defect detection due to their powerful feature extraction capa-bilities.However,most of the current wafer defect patterns classification models have high complexity and slow detection speed,which are difficult to apply in the actual wafer production process.In addition,there is a data imbalance in the wafer dataset that seriously affects the training results of the model.To reduce the complexity of the deep model without affecting the wafer feature expression,this paper adjusts the structure of the dense block in the PeleeNet network and proposes a lightweight network WM‐PeleeNet based on the PeleeNet module.In addition,to reduce the impact of data imbalance on model training,this paper proposes a wafer data augmentation method based on a convolutional autoencoder by adding random Gaussian noise to the hidden layer.The method proposed in this paper has an average accuracy of 95.4%on the WM‐811K wafer dataset with only 173.643 KB of the parameters and 316.194 M of FLOPs,and takes only 22.99 s to detect 1000 wafer pictures.Compared with the original PeleeNet network without optimization,the number of parameters and FLOPs are reduced by 92.68%and 58.85%,respectively.Data augmentation on the minority class wafer map improves the average classification accuracy by 1.8%on the WM‐811K dataset.At the same time,the recognition accuracy of minority classes such as Scratch pattern and Donut pattern are significantly improved.展开更多
A Light-Weight Simple Network Management Protocol (LW-SNMP) for the wireless sensor network is proposed, which is a kind of hierarchical network management system including a sink manager, cluster proxies, and node ag...A Light-Weight Simple Network Management Protocol (LW-SNMP) for the wireless sensor network is proposed, which is a kind of hierarchical network management system including a sink manager, cluster proxies, and node agents. Considering the resource limitations on the sensor nodes, we design new management messages, new data types and new management information base completely. The management messages between the cluster proxy and node agents are delivered as normal data packets. The experiment results show that LW-SNMP can meet the management demands in the resource-limited wireless sensor networks and has a good performance in stability, effectiveness of memory, extensibility than the traditional Simple Network Management Protocol (SNMP).展开更多
The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study s...The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study super-resolution(SR)algorithms applied to CT images to improve the reso-lution of CT images.However,most of the existing SR algorithms are studied based on natural images,which are not suitable for medical images;and most of these algorithms improve the reconstruction quality by increasing the network depth,which is not suitable for machines with limited resources.To alleviate these issues,we propose a residual feature attentional fusion network for lightweight chest CT image super-resolution(RFAFN).Specifically,we design a contextual feature extraction block(CFEB)that can extract CT image features more efficiently and accurately than ordinary residual blocks.In addition,we propose a feature-weighted cascading strategy(FWCS)based on attentional feature fusion blocks(AFFB)to utilize the high-frequency detail information extracted by CFEB as much as possible via selectively fusing adjacent level feature information.Finally,we suggest a global hierarchical feature fusion strategy(GHFFS),which can utilize the hierarchical features more effectively than dense concatenation by progressively aggregating the feature information at various levels.Numerous experiments show that our method performs better than most of the state-of-the-art(SOTA)methods on the COVID-19 chest CT dataset.In detail,the peak signal-to-noise ratio(PSNR)is 0.11 dB and 0.47 dB higher on CTtest1 and CTtest2 at×3 SR compared to the suboptimal method,but the number of parameters and multi-adds are reduced by 22K and 0.43G,respectively.Our method can better recover chest CT image quality with fewer computational resources and effectively assist in COVID-19.展开更多
Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”k...Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”kernel size of the convolutional operator,for the spatially dense patterns,such as the generic face images,the performance of CNNs is limited.Here,we propose a“non-local”model,termed the Speckle-Transformer(SpT)UNet,for speckle feature extraction of generic face images.It is worth noting that the lightweight SpT UNet reveals a high efficiency and strong comparative performance with Pearson Correlation Coefficient(PCC),and structural similarity measure(SSIM)exceeding 0.989,and 0.950,respectively.展开更多
As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of ...As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of faces is a challenging process.This paper proposes a face age estimation algorithm based on lightweight convolutional neural network in view of the complexity of the environment and the limitations of device computing ability.Improving face age estimation based on Soft Stagewise Regression Network(SSR-Net)and facial images,this paper employs the Center Symmetric Local Binary Pattern(CSLBP)method to obtain the feature image and then combines the face image and the feature image as network input data.Adding feature images to the convolutional neural network can improve the accuracy as well as increase the network model robustness.The experimental results on IMDB-WIKI and MORPH 2 datasets show that the lightweight convolutional neural network method proposed in this paper reduces model complexity and increases the accuracy of face age estimations.展开更多
In the field of agricultural information,the identification and prediction of rice leaf disease have always been the focus of research,and deep learning(DL)technology is currently a hot research topic in the field of ...In the field of agricultural information,the identification and prediction of rice leaf disease have always been the focus of research,and deep learning(DL)technology is currently a hot research topic in the field of pattern recognition.The research and development of high-efficiency,highquality and low-cost automatic identification methods for rice diseases that can replace humans is an important means of dealing with the current situation from a technical perspective.This paper mainly focuses on the problem of huge parameters of the Convolutional Neural Network(CNN)model and proposes a recognitionmodel that combines amulti-scale convolution module with a neural network model based on Visual Geometry Group(VGG).The accuracy and loss of the training set and the test set are used to evaluate the performance of the model.The test accuracy of this model is 97.1%that has increased 5.87%over VGG.Furthermore,the memory requirement is 26.1M,only 1.6%of the VGG.Experiment results show that this model performs better in terms of accuracy,recognition speed and memory size.展开更多
With the continuous development of medical informatics and digital diagnosis,the classification of tuberculosis(TB)cases from computed tomography(CT)images of the lung based on deep learning is an important guiding ai...With the continuous development of medical informatics and digital diagnosis,the classification of tuberculosis(TB)cases from computed tomography(CT)images of the lung based on deep learning is an important guiding aid in clinical diagnosis and treatment.Due to its potential application in medical image classification,this task has received extensive research attention.Existing related neural network techniques are still challenging in terms of feature extraction of global contextual information of images and network complexity in achieving image classification.To address these issues,this paper proposes a lightweight medical image classification network based on a combination of Transformer and convolutional neural network(CNN)for the classification of TB cases from lung CT.The method mainly consists of a fusion of the CNN module and the Transformer module,exploiting the advantages of both in order to accomplish a more accurate classification task.On the one hand,the CNN branch supplements the Transformer branch with basic local feature information in the low level;on the other hand,in the middle and high levels of the model,the CNN branch can also provide the Transformer architecture with different local and global feature information to the Transformer architecture to enhance the ability of the model to obtain feature information and improve the accuracy of image classification.A shortcut is used in each module of the network to solve the problem of poor model results due to gradient divergence and to optimize the effectiveness of TB classification.The proposed lightweight model can well solve the problem of long training time in the process of TB classification of lung CT and improve the speed of classification.The proposed method was validated on a CT image data set provided by the First Hospital of Lanzhou University.The experimental results show that the proposed lightweight classification network for TB based on CT medical images of lungs can fully extract the feature information of the input images and obtain high-accuracy classification results.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Deployable mechanism with preferable deployable performance,strong expansibility,and lightweight has attracted much attention because of their potential in aerospace.A basic deployable pyramid unit with good deployabi...Deployable mechanism with preferable deployable performance,strong expansibility,and lightweight has attracted much attention because of their potential in aerospace.A basic deployable pyramid unit with good deployability and expandability is proposed to construct a sizeable deployable mechanism.Firstly,the basic unit folding principle and expansion method is proposed.The configuration synthesis method of adding constraint chains of spatial closed-loop mechanism is used to synthesize the basic unit.Then,the degree of freedom of the basic unit is analyzed using the screw theory and the link dismantling method.Next,the three-dimensional models of the pyramid unit,expansion unit,and array unit are established,and the folding motion simulation analysis is carried out.Based on the number of components,weight reduction rate,and deployable rate,the performance characteristics of the three types of mechanisms are described in detail.Finally,prototypes of the pyramid unit,combination unit,and expansion unit are developed to verify further the correctness of the configuration synthesis based on the pyramid.The proposed deployable mechanism provides aference for the design and application of antennas with a large aperture,high deployable rate,and lightweight.It has a good application prospect in the aerospace field.展开更多
In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clini...In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.展开更多
文摘Lightweight convolutional neural networks(CNNs)have simple structures but struggle to comprehensively and accurately extract important semantic information from images.While attention mechanisms can enhance CNNs by learning distinctive representations,most existing spatial and hybrid attention methods focus on local regions with extensive parameters,making them unsuitable for lightweight CNNs.In this paper,we propose a self-attention mechanism tailored for lightweight networks,namely the brief self-attention module(BSAM).BSAM consists of the brief spatial attention(BSA)and advanced channel attention blocks.Unlike conventional self-attention methods with many parameters,our BSA block improves the performance of lightweight networks by effectively learning global semantic representations.Moreover,BSAM can be seamlessly integrated into lightweight CNNs for end-to-end training,maintaining the network’s lightweight and mobile characteristics.We validate the effectiveness of the proposed method on image classification tasks using the Food-101,Caltech-256,and Mini-ImageNet datasets.
基金funded by the National Natural Science Foundation of China under Grant No.62371187the Open Program of Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center under Grant No.2024JS101.
文摘The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.
基金supported by the National Natural Science Foundation of China(Nos.62373215,62373219 and 62073193)the Natural Science Foundation of Shandong Province(No.ZR2023MF100)+1 种基金the Key Projects of the Ministry of Industry and Information Technology(No.TC220H057-2022)the Independently Developed Instrument Funds of Shandong University(No.zy20240201)。
文摘Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order to solve this problem,we propose a new method,which combined the lightweight network mobile vision transformer(Mobile Vi T)with the convolutional block attention module(CBAM)mechanism and the new regression loss function.This method needed less computation resources,making it more suitable for embedded edge detection devices.Meanwhile,the new loss function improved the positioning accuracy of the bounding box and enhanced the robustness of the model.In addition,experiments on public datasets demonstrate that the improved model achieves an average accuracy of 87.9%across six typical defect detection tasks,while reducing computational costs by nearly 90%.It significantly reduces the model's computational requirements while maintaining accuracy,ensuring reliable performance for edge deployment.
基金supported by the HFIPS Director’s Foundation(YZJJ202207-TS),the National Natural Science Foundation of China(82371931)the Natural Science Foundation of Anhui Province(2008085MC69)+3 种基金the Natural Science Foundation of Hefei City(2021033)the General Scientific Research Project of Anhui Provincial Health Commission(AHWJ2021b150)the Collaborative Innovation Program of Hefei Science Center,CAS(2021HSC-CIP013)the Anhui Province Key Research and Development Project(202204295107020004).
文摘Semantic segmentation of eye images is a complex task with important applications in human–computer interaction,cognitive science,and neuroscience.Achieving real-time,accurate,and robust segmentation algorithms is crucial for computationally limited portable devices such as augmented reality and virtual reality.With the rapid advancements in deep learning,many network models have been developed specifically for eye image segmentation.Some methods divide the segmentation process into multiple stages to achieve model parameter miniaturization while enhancing output through post processing techniques to improve segmentation accuracy.These approaches significantly increase the inference time.Other networks adopt more complex encoding and decoding modules to achieve end-to-end output,which requires substantial computation.Therefore,balancing the model’s size,accuracy,and computational complexity is essential.To address these challenges,we propose a lightweight asymmetric UNet architecture and a projection loss function.We utilize ResNet-3 layer blocks to enhance feature extraction efficiency in the encoding stage.In the decoding stage,we employ regular convolutions and skip connections to upscale the feature maps from the latent space to the original image size,balancing the model size and segmentation accuracy.In addition,we leverage the geometric features of the eye region and design a projection loss function to further improve the segmentation accuracy without adding any additional inference computational cost.We validate our approach on the OpenEDS2019 dataset for virtual reality and achieve state-of-the-art performance with 95.33%mean intersection over union(mIoU).Our model has only 0.63M parameters and 350 FPS,which are 68%and 200%of the state-of-the-art model RITNet,respectively.
基金suported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science and ICT(NRF-2022R1A2C2012243).
文摘Accurately identifying crop pests and diseases ensures agricultural productivity and safety.Although current YOLO-based detection models offer real-time capabilities,their conventional convolutional layers involve high computational redundancy and a fixed receptive field,making it challenging to capture local details and global semantics in complex scenarios simultaneously.This leads to significant issues like missed detections of small targets and heightened sensitivity to background interference.To address these challenges,this paper proposes a lightweight adaptive detection network—StarSpark-AdaptiveNet(SSANet),which optimizes features through a dual-module collaborative mechanism.Specifically,the StarNet module utilizes Depthwise separable convolutions(DW-Conv)and dynamic star operations to establish multi-stage feature extraction pathways,enhancing local detail perception within a lightweight framework.Moreover,the Multi-scale Adaptive Spatial Attention Gate(MASAG)module integrates cross-layer feature fusion and dynamic weight allocation to capture multi-scale global contextual information,effectively suppressing background noise.These modules jointly form a“local enhancement-global calibration”bidirectional optimization mechanism,significantly improving the model’s adaptability to complex disease patterns.Furthermore,the proposed Scale-based Dynamic Loss(SD Loss)dynamically adjusts the weight of scale and localization losses,improving regression stability and localization accuracy,especially for small targets.Experiments on the eggplant fruit disease dataset demonstrate that SSANet achieves an mAP50 of 83.9%and a detection speed of 273.5 FPS with only 2.11 M parameters and 5.1 GFLOPs computational cost,outperforming the baseline YOLO11 model by reducing parameters by 18.1%,increasing mAP50 by 1.3%,and improving inference speed by 9.1%.Ablation studies further confirm the effectiveness and complementarity of the modules.SSANet offers a high-accuracy,low-cost solution suitable for real-time pest and disease detection in crops,facilitating edge device deployment and promoting precision agriculture.
基金supported by the Science and Technology Bureau of Xi’an project(24KGDW0049)the Key Research and Development Programof Shaanxi(2023-YBGY-264)the Key Research and Development Program of Guangxi(GK-AB20159032).
文摘In recent years,the country has spent significant workforce and material resources to prevent traffic accidents,particularly those caused by fatigued driving.The current studies mainly concentrate on driver physiological signals,driving behavior,and vehicle information.However,most of the approaches are computationally intensive and inconvenient for real-time detection.Therefore,this paper designs a network that combines precision,speed and lightweight and proposes an algorithm for facial fatigue detection based on multi-feature fusion.Specifically,the face detection model takes YOLOv8(You Only Look Once version 8)as the basic framework,and replaces its backbone network with MobileNetv3.To focus on the significant regions in the image,CPCA(Channel Prior Convolution Attention)is adopted to enhance the network’s capacity for feature extraction.Meanwhile,the network training phase employs the Focal-EIOU(Focal and Efficient Intersection Over Union)loss function,which makes the network lightweight and increases the accuracy of target detection.Ultimately,the Dlib toolkit was employed to annotate 68 facial feature points.This study established an evaluation metric for facial fatigue and developed a novel fatigue detection algorithm to assess the driver’s condition.A series of comparative experiments were carried out on the self-built dataset.The suggested method’s mAP(mean Average Precision)values for object detection and fatigue detection are 96.71%and 95.75%,respectively,as well as the detection speed is 47 FPS(Frames Per Second).This method can balance the contradiction between computational complexity and model accuracy.Furthermore,it can be transplanted to NVIDIA Jetson Orin NX and quickly detect the driver’s state while maintaining a high degree of accuracy.It contributes to the development of automobile safety systems and reduces the occurrence of traffic accidents.
文摘Aiming at the problem of low detection accuracy due to the different scale sizes of apple leaf disease spots and their similarity to the background,this paper proposes a multi-scale lightweight network(MSL-Net).Firstly,a multiplexed aggregated feature extraction network is proposed using residual bottleneck block(RES-Bottleneck)and middle partial-convolution(MP-Conv)to capture multi-scale spatial features and enhance focus on disease features for better differentiation between disease targets and background information.Secondly,a lightweight feature fusion network is designed using scale-fuse concatenation(SF-Cat)and triple-scale sequence feature fusion(TSSF)module to merge multi-scale feature maps comprehensively.Depthwise convolution(DWConv)and GhostNet lighten the network,while the cross stage partial bottleneck with 3 convolutions ghost-normalization attention module(C3-GN)reduces missed detections by suppressing irrelevant background information.Finally,soft non-maximum suppression(Soft-NMS)is used in the post-processing stage to improve the problem of misdetection of dense disease sites.The results show that the MSL-Net improves mean average precision at intersection over union of 0.5(mAP@0.5)by 2.0%over the baseline you only look once version 5s(YOLOv5s)and reduces parameters by 44%,reducing computation by 27%,outperforming other state-of-the-art(SOTA)models overall.This method also shows excellent performance compared to the latest research.
基金funded by Innovation and Development Special Project of China Meteorological Administration(CXFZ2022J038,CXFZ2024J035)Sichuan Science and Technology Program(No.2023YFQ0072)+1 种基金Key Laboratory of Smart Earth(No.KF2023YB03-07)Automatic Software Generation and Intelligent Service Key Laboratory of Sichuan Province(CUIT-SAG202210).
文摘Accurate cloud classification plays a crucial role in aviation safety,climate monitoring,and localized weather forecasting.Current research has been focusing on machine learning techniques,particularly deep learning based model,for the types identification.However,traditional approaches such as convolutional neural networks(CNNs)encounter difficulties in capturing global contextual information.In addition,they are computationally expensive,which restricts their usability in resource-limited environments.To tackle these issues,we present the Cloud Vision Transformer(CloudViT),a lightweight model that integrates CNNs with Transformers.The integration enables an effective balance between local and global feature extraction.To be specific,CloudViT comprises two innovative modules:Feature Extraction(E_Module)and Downsampling(D_Module).These modules are able to significantly reduce the number of model parameters and computational complexity while maintaining translation invariance and enhancing contextual comprehension.Overall,the CloudViT includes 0.93×10^(6)parameters,which decreases more than ten times compared to the SOTA(State-of-the-Art)model CloudNet.Comprehensive evaluations conducted on the HBMCD and SWIMCAT datasets showcase the outstanding performance of CloudViT.It achieves classification accuracies of 98.45%and 100%,respectively.Moreover,the efficiency and scalability of CloudViT make it an ideal candidate for deployment inmobile cloud observation systems,enabling real-time cloud image classification.The proposed hybrid architecture of CloudViT offers a promising approach for advancing ground-based cloud image classification.It holds significant potential for both optimizing performance and facilitating practical deployment scenarios.
基金supported by the National Natural Science Foundation of China(52075027)the Fundamental Research Funds for the Central Universities(2020XJJD03).
文摘Unauthorized operations referred to as“black flights”of unmanned aerial vehicles(UAVs)pose a significant danger to public safety,and existing low-attitude object detection algorithms encounter difficulties in balancing detection precision and speed.Additionally,their accuracy is insufficient,particularly for small objects in complex environments.To solve these problems,we propose a lightweight feature-enhanced convolutional neural network able to perform detection with high precision detection for low-attitude flying objects in real time to provide guidance information to suppress black-flying UAVs.The proposed network consists of three modules.A lightweight and stable feature extraction module is used to reduce the computational load and stably extract more low-level feature,an enhanced feature processing module significantly improves the feature extraction ability of the model,and an accurate detection module integrates low-level and advanced features to improve the multiscale detection accuracy in complex environments,particularly for small objects.The proposed method achieves a detection speed of 147 frames per second(FPS)and a mean average precision(mAP)of 90.97%for a dataset composed of flying objects,indicating its potential for low-altitude object detection.Furthermore,evaluation results based on microsoft common objects in context(MS COCO)indicate that the proposed method is also applicable to object detection in general.
基金funded by the Science and Technology Development Program of Jilin Province(20190301024NY)the Precision Agriculture and Big Data Engineering Research Center of Jilin Province(2020C005).
文摘To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.
基金supported by a project jointly funded by the Beijing Municipal Education Commission and Municipal Natural Science Foundation under grant KZ202010005004.
文摘Accurately identifying defect patterns in wafer maps can help engineers find abnormal failure factors in production lines.During the wafer testing stage,deep learning methods are widely used in wafer defect detection due to their powerful feature extraction capa-bilities.However,most of the current wafer defect patterns classification models have high complexity and slow detection speed,which are difficult to apply in the actual wafer production process.In addition,there is a data imbalance in the wafer dataset that seriously affects the training results of the model.To reduce the complexity of the deep model without affecting the wafer feature expression,this paper adjusts the structure of the dense block in the PeleeNet network and proposes a lightweight network WM‐PeleeNet based on the PeleeNet module.In addition,to reduce the impact of data imbalance on model training,this paper proposes a wafer data augmentation method based on a convolutional autoencoder by adding random Gaussian noise to the hidden layer.The method proposed in this paper has an average accuracy of 95.4%on the WM‐811K wafer dataset with only 173.643 KB of the parameters and 316.194 M of FLOPs,and takes only 22.99 s to detect 1000 wafer pictures.Compared with the original PeleeNet network without optimization,the number of parameters and FLOPs are reduced by 92.68%and 58.85%,respectively.Data augmentation on the minority class wafer map improves the average classification accuracy by 1.8%on the WM‐811K dataset.At the same time,the recognition accuracy of minority classes such as Scratch pattern and Donut pattern are significantly improved.
基金supported by the Fundamental Research Funds for the Central Universities under grant No.2009JBM007supported by the National Natural Science Foundation of China under Grants No. 60802016, 60833002 and 60972010
文摘A Light-Weight Simple Network Management Protocol (LW-SNMP) for the wireless sensor network is proposed, which is a kind of hierarchical network management system including a sink manager, cluster proxies, and node agents. Considering the resource limitations on the sensor nodes, we design new management messages, new data types and new management information base completely. The management messages between the cluster proxy and node agents are delivered as normal data packets. The experiment results show that LW-SNMP can meet the management demands in the resource-limited wireless sensor networks and has a good performance in stability, effectiveness of memory, extensibility than the traditional Simple Network Management Protocol (SNMP).
基金supported by the General Project of Natural Science Foundation of Hebei Province of China(H2019201378)the Foundation of the President of Hebei University(XZJJ201917)the Special Project for Cultivating Scientific and Technological Innovation Ability of University and Middle School Students of Hebei Province(2021H060306).
文摘The diagnosis of COVID-19 requires chest computed tomography(CT).High-resolution CT images can provide more diagnostic information to help doctors better diagnose the disease,so it is of clinical importance to study super-resolution(SR)algorithms applied to CT images to improve the reso-lution of CT images.However,most of the existing SR algorithms are studied based on natural images,which are not suitable for medical images;and most of these algorithms improve the reconstruction quality by increasing the network depth,which is not suitable for machines with limited resources.To alleviate these issues,we propose a residual feature attentional fusion network for lightweight chest CT image super-resolution(RFAFN).Specifically,we design a contextual feature extraction block(CFEB)that can extract CT image features more efficiently and accurately than ordinary residual blocks.In addition,we propose a feature-weighted cascading strategy(FWCS)based on attentional feature fusion blocks(AFFB)to utilize the high-frequency detail information extracted by CFEB as much as possible via selectively fusing adjacent level feature information.Finally,we suggest a global hierarchical feature fusion strategy(GHFFS),which can utilize the hierarchical features more effectively than dense concatenation by progressively aggregating the feature information at various levels.Numerous experiments show that our method performs better than most of the state-of-the-art(SOTA)methods on the COVID-19 chest CT dataset.In detail,the peak signal-to-noise ratio(PSNR)is 0.11 dB and 0.47 dB higher on CTtest1 and CTtest2 at×3 SR compared to the suboptimal method,but the number of parameters and multi-adds are reduced by 22K and 0.43G,respectively.Our method can better recover chest CT image quality with fewer computational resources and effectively assist in COVID-19.
基金funding support from the Science and Technology Commission of Shanghai Municipality(Grant No.21DZ1100500)the Shanghai Frontiers Science Center Program(2021-2025 No.20)+2 种基金the Zhangjiang National Innovation Demonstration Zone(Grant No.ZJ2019ZD-005)supported by a fellowship from the China Postdoctoral Science Foundation(2020M671169)the International Postdoctoral Exchange Program from the Administrative Committee of Post-Doctoral Researchers of China([2020]33)。
文摘Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”kernel size of the convolutional operator,for the spatially dense patterns,such as the generic face images,the performance of CNNs is limited.Here,we propose a“non-local”model,termed the Speckle-Transformer(SpT)UNet,for speckle feature extraction of generic face images.It is worth noting that the lightweight SpT UNet reveals a high efficiency and strong comparative performance with Pearson Correlation Coefficient(PCC),and structural similarity measure(SSIM)exceeding 0.989,and 0.950,respectively.
基金This work was funded by the foundation of Liaoning Educational committee under the Grant No.2019LNJC03.
文摘As the use of facial attributes continues to expand,research into facial age estimation is also developing.Because face images are easily affected by factors including illumination and occlusion,the age estimation of faces is a challenging process.This paper proposes a face age estimation algorithm based on lightweight convolutional neural network in view of the complexity of the environment and the limitations of device computing ability.Improving face age estimation based on Soft Stagewise Regression Network(SSR-Net)and facial images,this paper employs the Center Symmetric Local Binary Pattern(CSLBP)method to obtain the feature image and then combines the face image and the feature image as network input data.Adding feature images to the convolutional neural network can improve the accuracy as well as increase the network model robustness.The experimental results on IMDB-WIKI and MORPH 2 datasets show that the lightweight convolutional neural network method proposed in this paper reduces model complexity and increases the accuracy of face age estimations.
基金supported by National key research and development program sub-topics[2018YFF0213606-03(Mu Y.,Hu T.L.,Gong H.,Li S.J.and Sun Y.H.)http://www.most.gov.cn]Jilin Province Science and Technology Development Plan focuses on research and development projects[20200402006NC(Mu Y.,Hu T.L.,Gong H.and Li S.J.)http://kjt.jl.gov.cn]+1 种基金Science and technology support project for key industries in southern Xinjiang[2018DB001(Gong H.,and Li S.J.)http://kjj.xjbt.gov.cn]Key technology R&D project of Changchun Science and Technology Bureau of Jilin Province[21ZGN29(Mu Y.,Bao H.P.,Wang X.B.)http://kjj.changchun.gov.cn].
文摘In the field of agricultural information,the identification and prediction of rice leaf disease have always been the focus of research,and deep learning(DL)technology is currently a hot research topic in the field of pattern recognition.The research and development of high-efficiency,highquality and low-cost automatic identification methods for rice diseases that can replace humans is an important means of dealing with the current situation from a technical perspective.This paper mainly focuses on the problem of huge parameters of the Convolutional Neural Network(CNN)model and proposes a recognitionmodel that combines amulti-scale convolution module with a neural network model based on Visual Geometry Group(VGG).The accuracy and loss of the training set and the test set are used to evaluate the performance of the model.The test accuracy of this model is 97.1%that has increased 5.87%over VGG.Furthermore,the memory requirement is 26.1M,only 1.6%of the VGG.Experiment results show that this model performs better in terms of accuracy,recognition speed and memory size.
文摘With the continuous development of medical informatics and digital diagnosis,the classification of tuberculosis(TB)cases from computed tomography(CT)images of the lung based on deep learning is an important guiding aid in clinical diagnosis and treatment.Due to its potential application in medical image classification,this task has received extensive research attention.Existing related neural network techniques are still challenging in terms of feature extraction of global contextual information of images and network complexity in achieving image classification.To address these issues,this paper proposes a lightweight medical image classification network based on a combination of Transformer and convolutional neural network(CNN)for the classification of TB cases from lung CT.The method mainly consists of a fusion of the CNN module and the Transformer module,exploiting the advantages of both in order to accomplish a more accurate classification task.On the one hand,the CNN branch supplements the Transformer branch with basic local feature information in the low level;on the other hand,in the middle and high levels of the model,the CNN branch can also provide the Transformer architecture with different local and global feature information to the Transformer architecture to enhance the ability of the model to obtain feature information and improve the accuracy of image classification.A shortcut is used in each module of the network to solve the problem of poor model results due to gradient divergence and to optimize the effectiveness of TB classification.The proposed lightweight model can well solve the problem of long training time in the process of TB classification of lung CT and improve the speed of classification.The proposed method was validated on a CT image data set provided by the First Hospital of Lanzhou University.The experimental results show that the proposed lightweight classification network for TB based on CT medical images of lungs can fully extract the feature information of the input images and obtain high-accuracy classification results.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金Supported by National Natural Science Foundation of China(Grant No.52075467)Jiangsu Provincial Natural Science Foundation of China(Grant No.BK20220649)+1 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions of China(Grant No.23KJB460010)Jiangsu Provincial Key R&D Project(Grant No.BE2022062).
文摘Deployable mechanism with preferable deployable performance,strong expansibility,and lightweight has attracted much attention because of their potential in aerospace.A basic deployable pyramid unit with good deployability and expandability is proposed to construct a sizeable deployable mechanism.Firstly,the basic unit folding principle and expansion method is proposed.The configuration synthesis method of adding constraint chains of spatial closed-loop mechanism is used to synthesize the basic unit.Then,the degree of freedom of the basic unit is analyzed using the screw theory and the link dismantling method.Next,the three-dimensional models of the pyramid unit,expansion unit,and array unit are established,and the folding motion simulation analysis is carried out.Based on the number of components,weight reduction rate,and deployable rate,the performance characteristics of the three types of mechanisms are described in detail.Finally,prototypes of the pyramid unit,combination unit,and expansion unit are developed to verify further the correctness of the configuration synthesis based on the pyramid.The proposed deployable mechanism provides aference for the design and application of antennas with a large aperture,high deployable rate,and lightweight.It has a good application prospect in the aerospace field.
基金This work was supported by Science and Technology Cooperation Special Project of Shijiazhuang(SJZZXA23005).
文摘In minimally invasive surgery,endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities.However,in clinical operating environments,endoscopic images often suffer from challenges such as low texture,uneven illumination,and non-rigid structures,which affect feature observation and extraction.This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images,leading to treatment and postoperative recovery issues for patients.To address these challenges,this paper introduces,for the first time,a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion(ASFF)module based on the lightweight architecture of EfficientViT.Additionally,a novel lightweight feature extraction and matching network based on attention mechanism is proposed.This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network.It extracts static and dynamic information features ranging from low-level to high-level,and from local to global,ensuring robust feature extraction across different widths,noise levels,and blur scenarios.Global and local matching are performed through a multi-level cascaded attention mechanism,with cross-channel attention introduced to simultaneously extract low-level and high-level features.Extensive ablation experiments and comparative studies are conducted on the HyperKvasir,EAD,M2caiSeg,CVC-ClinicDB,and UCL synthetic datasets.Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4%in accuracy(Acc),while also enhancing runtime performance and storage efficiency.When compared with the complex DenseDescriptor feature extraction network,the difference in Acc is less than 7.22%,and IoU calculation results on specific datasets outperform complex dense models.Furthermore,this method increases the F1 score by 33.2%and accelerates runtime by 70.2%.It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models,with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.