As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigat...As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.展开更多
There is a problem of real-time detection difficulty in road surface damage detection. This paper proposes an improved lightweight model based on you only look once version 5(YOLOv5). Firstly, this paper fully utilize...There is a problem of real-time detection difficulty in road surface damage detection. This paper proposes an improved lightweight model based on you only look once version 5(YOLOv5). Firstly, this paper fully utilized the convolutional neural network(CNN) + ghosting bottleneck(G_bneck) architecture to reduce redundant feature maps. Afterwards, we upgraded the original upsampling algorithm to content-aware reassembly of features(CARAFE) and increased the receptive field. Finally, we replaced the spatial pyramid pooling fast(SPPF) module with the basic receptive field block(Basic RFB) pooling module and added dilated convolution. After comparative experiments, we can see that the number of parameters and model size of the improved algorithm in this paper have been reduced by nearly half compared to the YOLOv5s. The frame rate per second(FPS) has been increased by 3.25 times. The mean average precision(m AP@0.5: 0.95) has increased by 8%—17% compared to other lightweight algorithms.展开更多
Lightweight deep learning models are increasingly required in resource-constrained environments such as mobile devices and the Internet of Medical Things(IoMT).Multi-head convolution with channel attention can facilit...Lightweight deep learning models are increasingly required in resource-constrained environments such as mobile devices and the Internet of Medical Things(IoMT).Multi-head convolution with channel attention can facilitate learning activations relevant to different kernel sizes within a multi-head convolutional layer.Therefore,this study investigates the capability of novel lightweight models incorporating residual multi-head convolution with channel attention(ResMHCNN)blocks to classify medical images.We introduced three novel lightweight deep learning models(BT-Net,LCC-Net,and BC-Net)utilizing the ResMHCNN block as their backbone.These models were crossvalidated and tested on three publicly available medical image datasets:a brain tumor dataset from Figshare consisting of T1-weighted magnetic resonance imaging slices of meningioma,glioma,and pituitary tumors;the LC25000 dataset,which includes microscopic images of lung and colon cancers;and the BreaKHis dataset,containing benign and malignant breast microscopic images.The lightweight models achieved accuracies of 96.9%for 3-class brain tumor classification using BT-Net,and 99.7%for 5-class lung and colon cancer classification using LCC-Net.For 2-class breast cancer classification,BC-Net achieved an accuracy of 96.7%.The parameter counts for the proposed lightweight models—LCC-Net,BC-Net,and BT-Net—are 0.528,0.226,and 1.154 million,respectively.The presented lightweight models,featuring ResMHCNN blocks,may be effectively employed for accurate medical image classification.In the future,these models might be tested for viability in resource-constrained systems such as mobile devices and IoMT platforms.展开更多
Mineral identification is foundational to geological survey research,mineral resource exploration,and mining engineering.Considering the diversity of mineral types and the challenge of achieving high recognition accur...Mineral identification is foundational to geological survey research,mineral resource exploration,and mining engineering.Considering the diversity of mineral types and the challenge of achieving high recognition accuracy for similar features,this study introduces a mineral detection method based on YOLOv8-SBI.This work enhances feature extraction by integrating spatial pyramid pooling-fast(SPPF)with the simplified self-attention module(SimAM),significantly improving the precision of mineral feature detection.In the feature fusion network,a weighted bidirectional feature pyramid network is employed for advanced cross-channel feature integration,effectively reducing feature redundancy.Additionally,Inner-Intersection Over Union(InnerIOU)is used as the loss function to improve the average quality localization performance of anchor boxes.Experimental results show that the YOLOv8-SBI model achieves an accuracy of 67.9%,a recall of 74.3%,a mAP@0.5 of 75.8%,and a mAP@0.5:0.95 of 56.7%,with a real-time detection speed of 244.2 frames per second.Compared to YOLOv8,YOLOv8-SBI demonstrates a significant improvement with 15.4%increase in accuracy,28.5%increase in recall,and increases of 28.1%and 20.9%in mAP@0.5 and mAP@0.5:0.95,respectively.Furthermore,relative to other models,such as YOLOv3,YOLOv5,YOLOv6,YOLOv8,YOLOv9,and YOLOv10,YOLOv8-SBI has a smaller parameter size of only 3.01×10^(6).This highlights the optimal balance between detection accuracy and speed,thereby offering robust technical support for intelligent mineral classification.展开更多
Efficient three-dimensional(3D)building reconstruction from drone imagery often faces data acquisition,storage,and computational challenges because of its reliance on dense point clouds.In this study,we introduced a n...Efficient three-dimensional(3D)building reconstruction from drone imagery often faces data acquisition,storage,and computational challenges because of its reliance on dense point clouds.In this study,we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds.Our approach eliminates the need to generate dense point clouds,and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data.We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm.This algorithm projects 3D line clouds onto a 2D plane,clusters the projections to identify potential planes,and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction.Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method,demonstrating its superiority over existing techniques in terms of simplicity and efficiency.展开更多
The task of student action recognition in the classroom is to precisely capture and analyze the actions of students in classroom videos,providing a foundation for realizing intelligent and accurate teaching.However,th...The task of student action recognition in the classroom is to precisely capture and analyze the actions of students in classroom videos,providing a foundation for realizing intelligent and accurate teaching.However,the complex nature of the classroom environment has added challenges and difficulties in the process of student action recognition.In this research article,with regard to the circumstances where students are prone to be occluded and classroom computing resources are restricted in real classroom scenarios,a lightweight multi-modal fusion action recognition approach is put forward.This proposed method is capable of enhancing the accuracy of student action recognition while concurrently diminishing the number of parameters of the model and the Computation Amount,thereby achieving a more efficient and accurate recognition performance.In the feature extraction stage,this method fuses the keypoint heatmap with the RGB(Red-Green-Blue color model)image.In order to fully utilize the unique information of different modalities for feature complementarity,a Feature Fusion Module(FFE)is introduced.The FFE encodes and fuses the unique features of the two modalities during the feature extraction process.This fusion strategy not only achieves fusion and complementarity between modalities,but also improves the overall model performance.Furthermore,to reduce the computational load and parameter scale of the model,we use keypoint information to crop RGB images.At the same time,the first three networks of the lightweight feature extraction network X3D are used to extract dual-branch features.These methods significantly reduce the computational load and parameter scale.The number of parameters of the model is 1.40 million,and the computation amount is 5.04 billion floating-point operations per second(GFLOPs),achieving an efficient lightweight design.In the Student Classroom Action Dataset(SCAD),the accuracy of the model is 88.36%.In NTU 60(Nanyang Technological University Red-Green-Blue-Depth RGB+Ddataset with 60 categories),the accuracies on X-Sub(The people in the training set are different from those in the test set)and X-View(The perspectives of the training set and the test set are different)are 95.76%and 98.82%,respectively.On the NTU 120 dataset(Nanyang Technological University Red-Green-Blue-Depth dataset with 120 categories),RGB+Dthe accuracies on X-Sub and X-Set(the perspectives of the training set and the test set are different)are 91.97%and 93.45%,respectively.The model has achieved a balance in terms of accuracy,computation amount,and the number of parameters.展开更多
There is no unified planning standard for unstructured roads,and the morphological structures of these roads are complex and varied.It is important to maintain a balance between accuracy and speed for unstructured roa...There is no unified planning standard for unstructured roads,and the morphological structures of these roads are complex and varied.It is important to maintain a balance between accuracy and speed for unstructured road extraction models.Unstructured road extraction algorithms based on deep learning have problems such as high model complexity,high computational cost,and the inability to adapt to current edge computing devices.Therefore,it is best to use lightweight network models.Considering the need for lightweight models and the characteristics of unstructured roads with different pattern shapes,such as blocks and strips,a TMB(Triple Multi-Block)feature extraction module is proposed,and the overall structure of the TMBNet network is described.The TMB module was compared with SS-nbt,Non-bottleneck-1D,and other modules via experiments.The feasibility and effectiveness of the TMB module design were proven through experiments and visualizations.The comparison experiment,using multiple convolution kernel categories,proved that the TMB module can improve the segmentation accuracy of the network.The comparison with different semantic segmentation networks demonstrates that the TMBNet network has advantages in terms of unstructured road extraction.展开更多
To avoid colliding with trees during its operation,a lawn mower robot must detect the trees.Existing tree detection methods suffer from low detection accuracy(missed detection)and the lack of a lightweight model.In th...To avoid colliding with trees during its operation,a lawn mower robot must detect the trees.Existing tree detection methods suffer from low detection accuracy(missed detection)and the lack of a lightweight model.In this study,a dataset of trees was constructed on the basis of a real lawn environment.According to the theory of channel incremental depthwise convolution and residual suppression,the Embedded-A module is proposed,which expands the depth of the feature map twice to form a residual structure to improve the lightweight degree of the model.According to residual fusion theory,the Embedded-B module is proposed,which improves the accuracy of feature-map downsampling by depthwise convolution and pooling fusion.The Embedded YOLO object detection network is formed by stacking the embedded modules and the fusion of feature maps of different resolutions.Experimental results on the testing set show that the Embedded YOLO tree detection algorithm has 84.17%and 69.91%average precision values respectively for trunk and spherical tree,and 77.04% mean average precision value.The number of convolution parameters is 1.78×10^(6),and the calculation amount is 3.85 billion float operations per second.The size of weight file is 7.11MB,and the detection speed can reach 179 frame/s.This study provides a theoretical basis for the lightweight application of the object detection algorithm based on deep learning for lawn mower robots.展开更多
Significant advancements have been achieved in the field of Single Image Super-Resolution(SISR)through the utilization of Convolutional Neural Networks(CNNs)to attain state-of-the-art performance.Recent efforts have e...Significant advancements have been achieved in the field of Single Image Super-Resolution(SISR)through the utilization of Convolutional Neural Networks(CNNs)to attain state-of-the-art performance.Recent efforts have explored the incorporation of Transformers to augment network performance in SISR.However,the high computational cost of Transformers makes them less suitable for deployment on lightweight devices.Moreover,the majority of enhancements for CNNs rely predominantly on small spatial convolutions,thereby neglecting the potential advantages of large kernel convolution.In this paper,the authors propose a Multi-Perception Large Kernel convNet(MPLKN)which delves into the exploration of large kernel convolution.Specifically,the authors have architected a Multi-Perception Large Kernel(MPLK)module aimed at extracting multi-scale features and employ a stepwise feature fusion strategy to seamlessly integrate these features.In addition,to enhance the network's capacity for nonlinear spatial information processing,the authors have designed a Spatial-Channel Gated Feed-forward Network(SCGFN)that is capable of adapting to feature interactions across both spatial and channel dimensions.Experimental results demonstrate that MPLKN outperforms other lightweight image super-resolution models while maintaining a minimal number of parameters and FLOPs.展开更多
Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onbo...Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.展开更多
To address the challenges of low accuracy and insufficient real-time performance in dynamic object detection for UAV surveillance,this paper proposes a novel tracking framework that integrates a lightweight improved Y...To address the challenges of low accuracy and insufficient real-time performance in dynamic object detection for UAV surveillance,this paper proposes a novel tracking framework that integrates a lightweight improved YOLOv5s model with adaptive motion compensation.A UAV-view dynamic feature enhancement strategy is innovatively introduced,and a lightweight detection network combining attention mechanisms and multi-scale fusion is constructed.The robustness of tracking under motion blur scenarios is also optimized.Experimental results demonstrate that the proposed method achieves a mAP@0.5 of 68.2%on the VisDrone dataset and reaches an inference speed of 32 FPS on the NVIDIA Jetson TX2 platform.This significantly improves the balance between accuracy and efficiency in complex scenes,offering reliable technical support for real-time applications such as emergency response.展开更多
Underwater imaging is frequently influenced by factors such as illumination,scattering,and refraction,which can result in low image contrast and blurriness.Moreover,the presence of numerous small,overlapping targets r...Underwater imaging is frequently influenced by factors such as illumination,scattering,and refraction,which can result in low image contrast and blurriness.Moreover,the presence of numerous small,overlapping targets reduces detection accuracy.To address these challenges,first,green channel images are preprocessed to rectify color bias while improving contrast and clarity.Se-cond,the YOLO-DBS network that employs deformable convolution is proposed to enhance feature learning from underwater blurry images.The ECA attention mechanism is also introduced to strengthen feature focus.Moreover,a bidirectional feature pyramid net-work is utilized for efficient multilayer feature fusion while removing nodes that contribute minimally to detection performance.In addition,the SIoU loss function that considers factors such as angular error and distance deviation is incorporated into the network.Validation on the RUOD dataset demonstrates that YOLO-DBS achieves approximately 3.1%improvement in mAP@0.5 compared with YOLOv8n and surpasses YOLOv9-tiny by 1.3%.YOLO-DBS reduces parameter count by 32%relative to YOLOv8n,thereby demonstrating superior performance in real-time detection on underwater observation platforms.展开更多
In this paper,a novel method of ultra-lightweight convolution neural network(CNN)design based on neural architecture search(NAS)and knowledge distillation(KD)is proposed.It can realize the automatic construction of th...In this paper,a novel method of ultra-lightweight convolution neural network(CNN)design based on neural architecture search(NAS)and knowledge distillation(KD)is proposed.It can realize the automatic construction of the space target inverse synthetic aperture radar(ISAR)image recognition model with ultra-lightweight and high accuracy.This method introduces the NAS method into the radar image recognition for the first time,which solves the time-consuming and labor-consuming problems in the artificial design of the space target ISAR image automatic recognition model(STIIARM).On this basis,the NAS model’s knowledge is transferred to the student model with lower computational complexity by the flow of the solution procedure(FSP)distillation method.Thus,the decline of recognition accuracy caused by the direct compression of model structural parameters can be effectively avoided,and the ultralightweight STIIARM can be obtained.In the method,the Inverted Linear Bottleneck(ILB)and Inverted Residual Block(IRB)are firstly taken as each block’s basic structure in CNN.And the expansion ratio,output filter size,number of IRBs,and convolution kernel size are set as the search parameters to construct a hierarchical decomposition search space.Then,the recognition accuracy and computational complexity are taken as the objective function and constraint conditions,respectively,and the global optimization model of the CNN architecture search is established.Next,the simulated annealing(SA)algorithm is used as the search strategy to search out the lightweight and high accuracy STIIARM directly.After that,based on the three principles of similar block structure,the same corresponding channel number,and the minimum computational complexity,the more lightweight student model is designed,and the FSP matrix pairing between the NAS model and student model is completed.Finally,by minimizing the loss between the FSP matrix pairs of the NAS model and student model,the student model’s weight adjustment is completed.Thus the ultra-lightweight and high accuracy STIIARM is obtained.The proposed method’s effectiveness is verified by the simulation experiments on the ISAR image dataset of five types of space targets.展开更多
Weather phenomenon recognition plays an important role in the field of meteorology.Nowadays,weather radars and weathers sensor have been widely used for weather recognition.However,given the high cost in deploying and...Weather phenomenon recognition plays an important role in the field of meteorology.Nowadays,weather radars and weathers sensor have been widely used for weather recognition.However,given the high cost in deploying and maintaining the devices,it is difficult to apply them to intensive weather phenomenon recognition.Moreover,advanced machine learning models such as Convolutional Neural Networks(CNNs)have shown a lot of promise in meteorology,but these models also require intensive computation and large memory,which make it difficult to use them in reality.In practice,lightweight models are often used to solve such problems.However,lightweight models often result in significant performance losses.To this end,after taking a deep dive into a large number of lightweight models and summarizing their shortcomings,we propose a novel lightweight CNNs model which is constructed based on new building blocks.The experimental results show that the model proposed in this paper has comparable performance with the mainstream non-lightweight model while also saving 25 times of memory consumption.Such memory reduction is even better than that of existing lightweight models.展开更多
Cephalopods identification is a formidable task that involves hand inspection and close observation by a malacologist.Manual observation and iden-tification take time and are always contingent on the involvement of expe...Cephalopods identification is a formidable task that involves hand inspection and close observation by a malacologist.Manual observation and iden-tification take time and are always contingent on the involvement of experts.A system is proposed to alleviate this challenge that uses transfer learning techni-ques to classify the cephalopods automatically.In the proposed method,only the Lightweight pre-trained networks are chosen to enable IoT in the task of cephalopod recognition.First,the efficiency of the chosen models is determined by evaluating their performance and comparing thefindings.Second,the models arefine-tuned by adding dense layers and tweaking hyperparameters to improve the classification of accuracy.The models also employ a well-tuned Rectified Adam optimizer to increase the accuracy rates.Third,Adam with Gradient Cen-tralisation(RAdamGC)is proposed and used infine-tuned models to reduce the training time.The framework enables an Internet of Things(IoT)or embedded device to perform the classification tasks by embedding a suitable lightweight pre-trained network.Thefine-tuned models,MobileNetV2,InceptionV3,and NASNet Mobile have achieved a classification accuracy of 89.74%,87.12%,and 89.74%,respectively.Thefindings have indicated that thefine-tuned models can classify different kinds of cephalopods.The results have also demonstrated that there is a significant reduction in the training time with RAdamGC.展开更多
Hemerocallis citrina Baroni is rich in nutritional value,with a clear trend of increasing market demand,and it is a pillar industry for rural economic development.Hemerocallis citrina Baroni exhibits rapid growth,a sh...Hemerocallis citrina Baroni is rich in nutritional value,with a clear trend of increasing market demand,and it is a pillar industry for rural economic development.Hemerocallis citrina Baroni exhibits rapid growth,a shortened harvest cycle,lacks a consistent maturity identification standard,and relies heavily on manual labor.To address these issues,a new method for detecting the maturity of Hemerocallis citrina Baroni,called LTCB YOLOv7,has been introduced.To begin with,the layer aggregation network and transition module are made more efficient through the incorporation of Ghost convolution,a lightweight technique that streamlines the model architecture.This results in a reduction of model parameters and computational workload.Second,a coordinate attention mechanism is enhanced between the feature extraction and feature fusion networks,which enhances the model precision and compensates for the performance decline caused by lightweight design.Ultimately,a bi-directional feature pyramid network with weighted connections replaces the Concatenate function in the feature fusion network.This modification enables the integration of information across different stages,resulting in a gradual improvement in the overall model performance.The experimental results show that the improved LTCB YOLOv7 algorithm for Hemerocallis citrina Baroni maturity detection reduces the number of model parameters and floating point operations by about 1.7 million and 7.3G,respectively,and the model volume is compressed by about 3.5M.This refinement leads to enhancements in precision and recall by approximately 0.58%and 0.18%respectively,while the average precision metrics mAP@0.5 and mAP@0.5:0.95 show improvements of about 1.61%and 0.82%respectively.Furthermore,the algorithm achieves a real-time detection performance of 96.15 FPS.The proposed LTCB YOLOv7 algorithm exhibits strong performance in detecting maturity in Hemerocallis citrina Baroni,effectively addressing the challenge of balancing model complexity and performance.It also establishes a standardized approach for maturity detection in Hemerocallis citrina Baroni for identification and harvesting purposes.展开更多
In the field of infrared and visible image fusion,researchers have put increasingly complex fusion networks forward to pursue better fusion metrics.This has led to a growing number of parameters in fusion models.Addit...In the field of infrared and visible image fusion,researchers have put increasingly complex fusion networks forward to pursue better fusion metrics.This has led to a growing number of parameters in fusion models.Additionally,most fusion models rarely address the issue of preserving background details in images,while these details are vital to subsequent advanced visual tasks,such as image analysis and recognition.In response to these limitations mentioned above,this paper proposes a novel image fusion algorithm called lightweight multi-scale hierarchical dense fusion network(LMHFusion).Concisely,we propose a lightweight multi-scale encoder.It extracts multi-scale features from input images through four encoding blocks with different receptive fields.Then,a designed hierarchical dense connection method is employed to concatenate distinct scale features.Unlike traditional manual fusion strategies,our fusion network is designed to be learnable and has adjustable weights.Moreover,we have specially designed a histogram equalization loss to train LMHFusion.This new loss produces fused images that contain both prominent structures and rich details.Through comparative analysis of LMHFusion and twelve other representative fusion models,it has been proven that LMHFusion can make the model more suitable for resource-constrained scenarios apart from enhancing the quality and visual effects of fused images.Our model is nearly 5000 times smaller in size compared to RFN-Nest.展开更多
In printed circuit board(PCB)manufacturing,surface defects can significantly affect product quality.To address the performance degradation,high false detection rates,and missed detections caused by complex backgrounds...In printed circuit board(PCB)manufacturing,surface defects can significantly affect product quality.To address the performance degradation,high false detection rates,and missed detections caused by complex backgrounds in current intelligent inspection algorithms,this paper proposes CG-YOLOv8,a lightweight and improved model based on YOLOv8n for PCB surface defect detection.The proposed method optimizes the network architecture and compresses parameters to reduce model complexity while maintaining high detection accuracy,thereby enhancing the capability of identifying diverse defects under complex conditions.Specifically,a cascaded multi-receptive field(CMRF)module is adopted to replace the SPPF module in the backbone to improve feature perception,and an inverted residual mobile block(IRMB)is integrated into the C2f module to further enhance performance.Additionally,conventional convolution layers are replaced with GSConv to reduce computational cost,and a lightweight Convolutional Block Attention Module based Convolution(CBAMConv)module is introduced after Grouped Spatial Convolution(GSConv)to preserve accuracy through attention mechanisms.The detection head is also optimized by removing medium and large-scale detection layers,thereby enhancing the model’s ability to detect small-scale defects and further reducing complexity.Experimental results show that,compared to the original YOLOv8n,the proposed CG-YOLOv8 reduces parameter count by 53.9%,improves mAP@0.5 by 2.2%,and increases precision and recall by 2.0%and 1.8%,respectively.These improvements demonstrate that CG-YOLOv8 offers an efficient and lightweight solution for PCB surface defect detection.展开更多
With the explosion of the number of meteoroid/orbital debris in terrestrial space in recent years, the detection environment of spacecraft becomes more complex. This phenomenon causes most current detection methods ba...With the explosion of the number of meteoroid/orbital debris in terrestrial space in recent years, the detection environment of spacecraft becomes more complex. This phenomenon causes most current detection methods based on machine learning intractable to break through the two difficulties of solving scale transformation problem of the targets in image and accelerating detection rate of high-resolution images. To overcome the two challenges, we propose a novel noncooperative target detection method using the framework of deep convolutional neural network.Firstly, a specific spacecraft simulation dataset using over one thousand images to train and test our detection model is built. The deep separable convolution structure is applied and combined with the residual network module to improve the network’s backbone. To count the different shapes of the spacecrafts in the dataset, a particular prior-box generation method based on K-means cluster algorithm is designed for each detection head with different scales. Finally, a comprehensive loss function is presented considering category confidence, box parameters, as well as box confidence. The experimental results verify that the proposed method has strong robustness against varying degrees of luminance change, and can suppress the interference caused by Gaussian noise and background complexity. The mean accuracy precision of our proposed method reaches 93.28%, and the global loss value is 13.252. The comparative experiment results show that under the same epoch and batchsize, the speed of our method is compressed by about 20% in comparison of YOLOv3, the detection accuracy is increased by about 12%, and the size of the model is reduced by nearly 50%.展开更多
Pointwise convolution is usually utilized to expand or squeeze features in modern lightweight deep models.However,it takes up most of the overall computational cost(usually more than 90%).This paper proposes a novel P...Pointwise convolution is usually utilized to expand or squeeze features in modern lightweight deep models.However,it takes up most of the overall computational cost(usually more than 90%).This paper proposes a novel Poker module to expand features by taking advantage of cheap depthwise convolution.As a result,the Poker module can greatly reduce the computational cost,and meanwhile generate a large number of effective features to guarantee the performance.The proposed module is standardized and can be employed wherever the feature expansion is needed.By varying the stride and the number of channels,different kinds of bottlenecks are designed to plug the proposed Poker module into the network.Thus,a lightweight model can be easily assembled.Experiments conducted on benchmarks reveal the effectiveness of our proposed Poker module.And our Poker Net models can reduce the computational cost by 7.1%-15.6%.Poker Net models achieve comparable or even higher recognition accuracy than previous state-of-the-art(SOTA)models on the Image Net ILSVRC2012 classification dataset.Code is available at https://github.com/diaomin/pokernet.展开更多
基金supported by the National Natural Science Foundation of China(22379021 and 22479021)。
文摘As batteries become increasingly essential for energy storage technologies,battery prognosis,and diagnosis remain central to ensure reliable operation and effective management,as well as to aid the in-depth investigation of degradation mechanisms.However,dynamic operating conditions,cell-to-cell inconsistencies,and limited availability of labeled data have posed significant challenges to accurate and robust prognosis and diagnosis.Herein,we introduce a time-series-decomposition-based ensembled lightweight learning model(TELL-Me),which employs a synergistic dual-module framework to facilitate accurate and reliable forecasting.The feature module formulates features with physical implications and sheds light on battery aging mechanisms,while the gradient module monitors capacity degradation rates and captures aging trend.TELL-Me achieves high accuracy in end-of-life prediction using minimal historical data from a single battery without requiring offline training dataset,and demonstrates impressive generality and robustness across various operating conditions and battery types.Additionally,by correlating feature contributions with degradation mechanisms across different datasets,TELL-Me is endowed with the diagnostic ability that not only enhances prediction reliability but also provides critical insights into the design and optimization of next-generation batteries.
基金supported by the Shanghai Sailing Program,China (No.20YF1447600)the Research Start-Up Project of Shanghai Institute of Technology (No.YJ2021-60)+1 种基金the Collaborative Innovation Project of Shanghai Institute of Technology (No.XTCX2020-12)the Science and Technology Talent Development Fund for Young and Middle-Aged Teachers at Shanghai Institute of Technology (No.ZQ2022-6)。
文摘There is a problem of real-time detection difficulty in road surface damage detection. This paper proposes an improved lightweight model based on you only look once version 5(YOLOv5). Firstly, this paper fully utilized the convolutional neural network(CNN) + ghosting bottleneck(G_bneck) architecture to reduce redundant feature maps. Afterwards, we upgraded the original upsampling algorithm to content-aware reassembly of features(CARAFE) and increased the receptive field. Finally, we replaced the spatial pyramid pooling fast(SPPF) module with the basic receptive field block(Basic RFB) pooling module and added dilated convolution. After comparative experiments, we can see that the number of parameters and model size of the improved algorithm in this paper have been reduced by nearly half compared to the YOLOv5s. The frame rate per second(FPS) has been increased by 3.25 times. The mean average precision(m AP@0.5: 0.95) has increased by 8%—17% compared to other lightweight algorithms.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)-Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government(MSIT)(IITP-2025-RS-2023-00259678)by INHA UNIVERSITY Research Grant.
文摘Lightweight deep learning models are increasingly required in resource-constrained environments such as mobile devices and the Internet of Medical Things(IoMT).Multi-head convolution with channel attention can facilitate learning activations relevant to different kernel sizes within a multi-head convolutional layer.Therefore,this study investigates the capability of novel lightweight models incorporating residual multi-head convolution with channel attention(ResMHCNN)blocks to classify medical images.We introduced three novel lightweight deep learning models(BT-Net,LCC-Net,and BC-Net)utilizing the ResMHCNN block as their backbone.These models were crossvalidated and tested on three publicly available medical image datasets:a brain tumor dataset from Figshare consisting of T1-weighted magnetic resonance imaging slices of meningioma,glioma,and pituitary tumors;the LC25000 dataset,which includes microscopic images of lung and colon cancers;and the BreaKHis dataset,containing benign and malignant breast microscopic images.The lightweight models achieved accuracies of 96.9%for 3-class brain tumor classification using BT-Net,and 99.7%for 5-class lung and colon cancer classification using LCC-Net.For 2-class breast cancer classification,BC-Net achieved an accuracy of 96.7%.The parameter counts for the proposed lightweight models—LCC-Net,BC-Net,and BT-Net—are 0.528,0.226,and 1.154 million,respectively.The presented lightweight models,featuring ResMHCNN blocks,may be effectively employed for accurate medical image classification.In the future,these models might be tested for viability in resource-constrained systems such as mobile devices and IoMT platforms.
基金supported by the National Natural Science Foundation of China(42202175).
文摘Mineral identification is foundational to geological survey research,mineral resource exploration,and mining engineering.Considering the diversity of mineral types and the challenge of achieving high recognition accuracy for similar features,this study introduces a mineral detection method based on YOLOv8-SBI.This work enhances feature extraction by integrating spatial pyramid pooling-fast(SPPF)with the simplified self-attention module(SimAM),significantly improving the precision of mineral feature detection.In the feature fusion network,a weighted bidirectional feature pyramid network is employed for advanced cross-channel feature integration,effectively reducing feature redundancy.Additionally,Inner-Intersection Over Union(InnerIOU)is used as the loss function to improve the average quality localization performance of anchor boxes.Experimental results show that the YOLOv8-SBI model achieves an accuracy of 67.9%,a recall of 74.3%,a mAP@0.5 of 75.8%,and a mAP@0.5:0.95 of 56.7%,with a real-time detection speed of 244.2 frames per second.Compared to YOLOv8,YOLOv8-SBI demonstrates a significant improvement with 15.4%increase in accuracy,28.5%increase in recall,and increases of 28.1%and 20.9%in mAP@0.5 and mAP@0.5:0.95,respectively.Furthermore,relative to other models,such as YOLOv3,YOLOv5,YOLOv6,YOLOv8,YOLOv9,and YOLOv10,YOLOv8-SBI has a smaller parameter size of only 3.01×10^(6).This highlights the optimal balance between detection accuracy and speed,thereby offering robust technical support for intelligent mineral classification.
基金Supported by the Guangdong Major Project of Basic and Applied Basic Research (2023B0303000016)the National Natural Science Foundation of China (U21A20515)。
文摘Efficient three-dimensional(3D)building reconstruction from drone imagery often faces data acquisition,storage,and computational challenges because of its reliance on dense point clouds.In this study,we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds.Our approach eliminates the need to generate dense point clouds,and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data.We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm.This algorithm projects 3D line clouds onto a 2D plane,clusters the projections to identify potential planes,and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction.Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method,demonstrating its superiority over existing techniques in terms of simplicity and efficiency.
基金supported by the National Natural Science Foundation of China under Grant 62107034the Major Science and Technology Project of Yunnan Province(202402AD080002)Yunnan International Joint R&D Center of China-Laos-Thailand Educational Digitalization(202203AP140006).
文摘The task of student action recognition in the classroom is to precisely capture and analyze the actions of students in classroom videos,providing a foundation for realizing intelligent and accurate teaching.However,the complex nature of the classroom environment has added challenges and difficulties in the process of student action recognition.In this research article,with regard to the circumstances where students are prone to be occluded and classroom computing resources are restricted in real classroom scenarios,a lightweight multi-modal fusion action recognition approach is put forward.This proposed method is capable of enhancing the accuracy of student action recognition while concurrently diminishing the number of parameters of the model and the Computation Amount,thereby achieving a more efficient and accurate recognition performance.In the feature extraction stage,this method fuses the keypoint heatmap with the RGB(Red-Green-Blue color model)image.In order to fully utilize the unique information of different modalities for feature complementarity,a Feature Fusion Module(FFE)is introduced.The FFE encodes and fuses the unique features of the two modalities during the feature extraction process.This fusion strategy not only achieves fusion and complementarity between modalities,but also improves the overall model performance.Furthermore,to reduce the computational load and parameter scale of the model,we use keypoint information to crop RGB images.At the same time,the first three networks of the lightweight feature extraction network X3D are used to extract dual-branch features.These methods significantly reduce the computational load and parameter scale.The number of parameters of the model is 1.40 million,and the computation amount is 5.04 billion floating-point operations per second(GFLOPs),achieving an efficient lightweight design.In the Student Classroom Action Dataset(SCAD),the accuracy of the model is 88.36%.In NTU 60(Nanyang Technological University Red-Green-Blue-Depth RGB+Ddataset with 60 categories),the accuracies on X-Sub(The people in the training set are different from those in the test set)and X-View(The perspectives of the training set and the test set are different)are 95.76%and 98.82%,respectively.On the NTU 120 dataset(Nanyang Technological University Red-Green-Blue-Depth dataset with 120 categories),RGB+Dthe accuracies on X-Sub and X-Set(the perspectives of the training set and the test set are different)are 91.97%and 93.45%,respectively.The model has achieved a balance in terms of accuracy,computation amount,and the number of parameters.
基金Supported by National Natural Science Foundation of China(Grant Nos.62261160575,61991414,61973036)Technical Field Foundation of the National Defense Science and Technology 173 Program of China(Grant Nos.20220601053,20220601030)。
文摘There is no unified planning standard for unstructured roads,and the morphological structures of these roads are complex and varied.It is important to maintain a balance between accuracy and speed for unstructured road extraction models.Unstructured road extraction algorithms based on deep learning have problems such as high model complexity,high computational cost,and the inability to adapt to current edge computing devices.Therefore,it is best to use lightweight network models.Considering the need for lightweight models and the characteristics of unstructured roads with different pattern shapes,such as blocks and strips,a TMB(Triple Multi-Block)feature extraction module is proposed,and the overall structure of the TMBNet network is described.The TMB module was compared with SS-nbt,Non-bottleneck-1D,and other modules via experiments.The feasibility and effectiveness of the TMB module design were proven through experiments and visualizations.The comparison experiment,using multiple convolution kernel categories,proved that the TMB module can improve the segmentation accuracy of the network.The comparison with different semantic segmentation networks demonstrates that the TMBNet network has advantages in terms of unstructured road extraction.
基金the National Natural Science Foundation of China (No.51275223)。
文摘To avoid colliding with trees during its operation,a lawn mower robot must detect the trees.Existing tree detection methods suffer from low detection accuracy(missed detection)and the lack of a lightweight model.In this study,a dataset of trees was constructed on the basis of a real lawn environment.According to the theory of channel incremental depthwise convolution and residual suppression,the Embedded-A module is proposed,which expands the depth of the feature map twice to form a residual structure to improve the lightweight degree of the model.According to residual fusion theory,the Embedded-B module is proposed,which improves the accuracy of feature-map downsampling by depthwise convolution and pooling fusion.The Embedded YOLO object detection network is formed by stacking the embedded modules and the fusion of feature maps of different resolutions.Experimental results on the testing set show that the Embedded YOLO tree detection algorithm has 84.17%and 69.91%average precision values respectively for trunk and spherical tree,and 77.04% mean average precision value.The number of convolution parameters is 1.78×10^(6),and the calculation amount is 3.85 billion float operations per second.The size of weight file is 7.11MB,and the detection speed can reach 179 frame/s.This study provides a theoretical basis for the lightweight application of the object detection algorithm based on deep learning for lawn mower robots.
文摘Significant advancements have been achieved in the field of Single Image Super-Resolution(SISR)through the utilization of Convolutional Neural Networks(CNNs)to attain state-of-the-art performance.Recent efforts have explored the incorporation of Transformers to augment network performance in SISR.However,the high computational cost of Transformers makes them less suitable for deployment on lightweight devices.Moreover,the majority of enhancements for CNNs rely predominantly on small spatial convolutions,thereby neglecting the potential advantages of large kernel convolution.In this paper,the authors propose a Multi-Perception Large Kernel convNet(MPLKN)which delves into the exploration of large kernel convolution.Specifically,the authors have architected a Multi-Perception Large Kernel(MPLK)module aimed at extracting multi-scale features and employ a stepwise feature fusion strategy to seamlessly integrate these features.In addition,to enhance the network's capacity for nonlinear spatial information processing,the authors have designed a Spatial-Channel Gated Feed-forward Network(SCGFN)that is capable of adapting to feature interactions across both spatial and channel dimensions.Experimental results demonstrate that MPLKN outperforms other lightweight image super-resolution models while maintaining a minimal number of parameters and FLOPs.
基金co-supported by the National Natural Science Foundation of China(Nos.12302252 and 12472189)the Research Program of National University of Defense Technology,China(No.ZK24-31).
文摘Vision-based relative pose estimation plays a pivotal role in various space missions.Deep learning enhances monocular spacecraft pose estimation,but high computational demands necessitate model simplification for onboard systems.In this paper,we aim to achieve an optimal balance between accuracy and computational efficiency.We present a Perspective-n-Point(PnP)based method for spacecraft pose estimation,leveraging lightweight neural networks to localize semantic keypoints and reduce computational load.Since the accuracy of keypoint localization is closely related to the heatmap resolution,we devise an efficient upsampling module to increase the resolution of heatmaps with minimal overhead.Furthermore,the heatmaps predicted by the lightweight models tend to show high-level noise.To tackle this issue,we propose a weighting strategy by analyzing the statistical characteristics of predicted semantic keypoints and substantially improve the pose estimation accuracy.The experiments carried out on the SPEED dataset underscore the prospect of our method in engineering applications.We dramatically reduce the model parameters to 0.7 M,merely 2.5%of that required by the top-performing method,and achieve lower pose estimation error and better real-time performance.
文摘To address the challenges of low accuracy and insufficient real-time performance in dynamic object detection for UAV surveillance,this paper proposes a novel tracking framework that integrates a lightweight improved YOLOv5s model with adaptive motion compensation.A UAV-view dynamic feature enhancement strategy is innovatively introduced,and a lightweight detection network combining attention mechanisms and multi-scale fusion is constructed.The robustness of tracking under motion blur scenarios is also optimized.Experimental results demonstrate that the proposed method achieves a mAP@0.5 of 68.2%on the VisDrone dataset and reaches an inference speed of 32 FPS on the NVIDIA Jetson TX2 platform.This significantly improves the balance between accuracy and efficiency in complex scenes,offering reliable technical support for real-time applications such as emergency response.
基金funded by the Jilin City Science and Technology Innovation Development Plan Project(No.20240302014)the Jilin Provincial Department of Educa-tion Science and Technology Research Project(No.JJKH 20250879KJ)the Jilin Province Science and Tech-nology Development Plan Project(No.YDZJ202401640 ZYTS).
文摘Underwater imaging is frequently influenced by factors such as illumination,scattering,and refraction,which can result in low image contrast and blurriness.Moreover,the presence of numerous small,overlapping targets reduces detection accuracy.To address these challenges,first,green channel images are preprocessed to rectify color bias while improving contrast and clarity.Se-cond,the YOLO-DBS network that employs deformable convolution is proposed to enhance feature learning from underwater blurry images.The ECA attention mechanism is also introduced to strengthen feature focus.Moreover,a bidirectional feature pyramid net-work is utilized for efficient multilayer feature fusion while removing nodes that contribute minimally to detection performance.In addition,the SIoU loss function that considers factors such as angular error and distance deviation is incorporated into the network.Validation on the RUOD dataset demonstrates that YOLO-DBS achieves approximately 3.1%improvement in mAP@0.5 compared with YOLOv8n and surpasses YOLOv9-tiny by 1.3%.YOLO-DBS reduces parameter count by 32%relative to YOLOv8n,thereby demonstrating superior performance in real-time detection on underwater observation platforms.
文摘In this paper,a novel method of ultra-lightweight convolution neural network(CNN)design based on neural architecture search(NAS)and knowledge distillation(KD)is proposed.It can realize the automatic construction of the space target inverse synthetic aperture radar(ISAR)image recognition model with ultra-lightweight and high accuracy.This method introduces the NAS method into the radar image recognition for the first time,which solves the time-consuming and labor-consuming problems in the artificial design of the space target ISAR image automatic recognition model(STIIARM).On this basis,the NAS model’s knowledge is transferred to the student model with lower computational complexity by the flow of the solution procedure(FSP)distillation method.Thus,the decline of recognition accuracy caused by the direct compression of model structural parameters can be effectively avoided,and the ultralightweight STIIARM can be obtained.In the method,the Inverted Linear Bottleneck(ILB)and Inverted Residual Block(IRB)are firstly taken as each block’s basic structure in CNN.And the expansion ratio,output filter size,number of IRBs,and convolution kernel size are set as the search parameters to construct a hierarchical decomposition search space.Then,the recognition accuracy and computational complexity are taken as the objective function and constraint conditions,respectively,and the global optimization model of the CNN architecture search is established.Next,the simulated annealing(SA)algorithm is used as the search strategy to search out the lightweight and high accuracy STIIARM directly.After that,based on the three principles of similar block structure,the same corresponding channel number,and the minimum computational complexity,the more lightweight student model is designed,and the FSP matrix pairing between the NAS model and student model is completed.Finally,by minimizing the loss between the FSP matrix pairs of the NAS model and student model,the student model’s weight adjustment is completed.Thus the ultra-lightweight and high accuracy STIIARM is obtained.The proposed method’s effectiveness is verified by the simulation experiments on the ISAR image dataset of five types of space targets.
基金This paper is supported by the following funds:National Key R&D Program of China(2018YFF01010100)National natural science foundation of China(61672064)+1 种基金Basic Research Program of Qinghai Province under Grants No.2020-ZJ-709Advanced information network Beijing laboratory(PXM2019_014204_500029).
文摘Weather phenomenon recognition plays an important role in the field of meteorology.Nowadays,weather radars and weathers sensor have been widely used for weather recognition.However,given the high cost in deploying and maintaining the devices,it is difficult to apply them to intensive weather phenomenon recognition.Moreover,advanced machine learning models such as Convolutional Neural Networks(CNNs)have shown a lot of promise in meteorology,but these models also require intensive computation and large memory,which make it difficult to use them in reality.In practice,lightweight models are often used to solve such problems.However,lightweight models often result in significant performance losses.To this end,after taking a deep dive into a large number of lightweight models and summarizing their shortcomings,we propose a novel lightweight CNNs model which is constructed based on new building blocks.The experimental results show that the model proposed in this paper has comparable performance with the mainstream non-lightweight model while also saving 25 times of memory consumption.Such memory reduction is even better than that of existing lightweight models.
文摘Cephalopods identification is a formidable task that involves hand inspection and close observation by a malacologist.Manual observation and iden-tification take time and are always contingent on the involvement of experts.A system is proposed to alleviate this challenge that uses transfer learning techni-ques to classify the cephalopods automatically.In the proposed method,only the Lightweight pre-trained networks are chosen to enable IoT in the task of cephalopod recognition.First,the efficiency of the chosen models is determined by evaluating their performance and comparing thefindings.Second,the models arefine-tuned by adding dense layers and tweaking hyperparameters to improve the classification of accuracy.The models also employ a well-tuned Rectified Adam optimizer to increase the accuracy rates.Third,Adam with Gradient Cen-tralisation(RAdamGC)is proposed and used infine-tuned models to reduce the training time.The framework enables an Internet of Things(IoT)or embedded device to perform the classification tasks by embedding a suitable lightweight pre-trained network.Thefine-tuned models,MobileNetV2,InceptionV3,and NASNet Mobile have achieved a classification accuracy of 89.74%,87.12%,and 89.74%,respectively.Thefindings have indicated that thefine-tuned models can classify different kinds of cephalopods.The results have also demonstrated that there is a significant reduction in the training time with RAdamGC.
基金funded by the Shanxi Provincial Science and Technology Department Surface Project(Grant No.202303021211330)Innovation Platform Project of Science and Technology Innovation Program of Higher Education Institutions in Shanxi Province(Grant No.2022P009)+2 种基金Shanxi Province Basic Research Program Projects(Grant No.202303021212244)the Datong City Shanxi Province Key Research&Development(Agriculture)Program Projects(Grants No.2023006,2023015)the 2024 Basic Research Program of Shanxi Province(Free Exploration Category)Program Projects(Grant No.202403021221181).
文摘Hemerocallis citrina Baroni is rich in nutritional value,with a clear trend of increasing market demand,and it is a pillar industry for rural economic development.Hemerocallis citrina Baroni exhibits rapid growth,a shortened harvest cycle,lacks a consistent maturity identification standard,and relies heavily on manual labor.To address these issues,a new method for detecting the maturity of Hemerocallis citrina Baroni,called LTCB YOLOv7,has been introduced.To begin with,the layer aggregation network and transition module are made more efficient through the incorporation of Ghost convolution,a lightweight technique that streamlines the model architecture.This results in a reduction of model parameters and computational workload.Second,a coordinate attention mechanism is enhanced between the feature extraction and feature fusion networks,which enhances the model precision and compensates for the performance decline caused by lightweight design.Ultimately,a bi-directional feature pyramid network with weighted connections replaces the Concatenate function in the feature fusion network.This modification enables the integration of information across different stages,resulting in a gradual improvement in the overall model performance.The experimental results show that the improved LTCB YOLOv7 algorithm for Hemerocallis citrina Baroni maturity detection reduces the number of model parameters and floating point operations by about 1.7 million and 7.3G,respectively,and the model volume is compressed by about 3.5M.This refinement leads to enhancements in precision and recall by approximately 0.58%and 0.18%respectively,while the average precision metrics mAP@0.5 and mAP@0.5:0.95 show improvements of about 1.61%and 0.82%respectively.Furthermore,the algorithm achieves a real-time detection performance of 96.15 FPS.The proposed LTCB YOLOv7 algorithm exhibits strong performance in detecting maturity in Hemerocallis citrina Baroni,effectively addressing the challenge of balancing model complexity and performance.It also establishes a standardized approach for maturity detection in Hemerocallis citrina Baroni for identification and harvesting purposes.
基金supported by the National Key Laboratory of Air-based Information Perception and Fusion and the Aeronautical Science Foundation of China(Grant No.20220001068001)。
文摘In the field of infrared and visible image fusion,researchers have put increasingly complex fusion networks forward to pursue better fusion metrics.This has led to a growing number of parameters in fusion models.Additionally,most fusion models rarely address the issue of preserving background details in images,while these details are vital to subsequent advanced visual tasks,such as image analysis and recognition.In response to these limitations mentioned above,this paper proposes a novel image fusion algorithm called lightweight multi-scale hierarchical dense fusion network(LMHFusion).Concisely,we propose a lightweight multi-scale encoder.It extracts multi-scale features from input images through four encoding blocks with different receptive fields.Then,a designed hierarchical dense connection method is employed to concatenate distinct scale features.Unlike traditional manual fusion strategies,our fusion network is designed to be learnable and has adjustable weights.Moreover,we have specially designed a histogram equalization loss to train LMHFusion.This new loss produces fused images that contain both prominent structures and rich details.Through comparative analysis of LMHFusion and twelve other representative fusion models,it has been proven that LMHFusion can make the model more suitable for resource-constrained scenarios apart from enhancing the quality and visual effects of fused images.Our model is nearly 5000 times smaller in size compared to RFN-Nest.
基金funded by the Joint Funds of the National Natural Science Foundation of China(U2341223)the Beijing Municipal Natural Science Foundation(No.4232067).
文摘In printed circuit board(PCB)manufacturing,surface defects can significantly affect product quality.To address the performance degradation,high false detection rates,and missed detections caused by complex backgrounds in current intelligent inspection algorithms,this paper proposes CG-YOLOv8,a lightweight and improved model based on YOLOv8n for PCB surface defect detection.The proposed method optimizes the network architecture and compresses parameters to reduce model complexity while maintaining high detection accuracy,thereby enhancing the capability of identifying diverse defects under complex conditions.Specifically,a cascaded multi-receptive field(CMRF)module is adopted to replace the SPPF module in the backbone to improve feature perception,and an inverted residual mobile block(IRMB)is integrated into the C2f module to further enhance performance.Additionally,conventional convolution layers are replaced with GSConv to reduce computational cost,and a lightweight Convolutional Block Attention Module based Convolution(CBAMConv)module is introduced after Grouped Spatial Convolution(GSConv)to preserve accuracy through attention mechanisms.The detection head is also optimized by removing medium and large-scale detection layers,thereby enhancing the model’s ability to detect small-scale defects and further reducing complexity.Experimental results show that,compared to the original YOLOv8n,the proposed CG-YOLOv8 reduces parameter count by 53.9%,improves mAP@0.5 by 2.2%,and increases precision and recall by 2.0%and 1.8%,respectively.These improvements demonstrate that CG-YOLOv8 offers an efficient and lightweight solution for PCB surface defect detection.
基金supported by the National Natural Science Foundation of China(No.61473100)。
文摘With the explosion of the number of meteoroid/orbital debris in terrestrial space in recent years, the detection environment of spacecraft becomes more complex. This phenomenon causes most current detection methods based on machine learning intractable to break through the two difficulties of solving scale transformation problem of the targets in image and accelerating detection rate of high-resolution images. To overcome the two challenges, we propose a novel noncooperative target detection method using the framework of deep convolutional neural network.Firstly, a specific spacecraft simulation dataset using over one thousand images to train and test our detection model is built. The deep separable convolution structure is applied and combined with the residual network module to improve the network’s backbone. To count the different shapes of the spacecrafts in the dataset, a particular prior-box generation method based on K-means cluster algorithm is designed for each detection head with different scales. Finally, a comprehensive loss function is presented considering category confidence, box parameters, as well as box confidence. The experimental results verify that the proposed method has strong robustness against varying degrees of luminance change, and can suppress the interference caused by Gaussian noise and background complexity. The mean accuracy precision of our proposed method reaches 93.28%, and the global loss value is 13.252. The comparative experiment results show that under the same epoch and batchsize, the speed of our method is compressed by about 20% in comparison of YOLOv3, the detection accuracy is increased by about 12%, and the size of the model is reduced by nearly 50%.
基金supported by National Natural Science Foundation of China(Nos.61525306,61633021,61721004,61806194,U1803261 and 61976132)Major Project for New Generation of AI(No.2018AAA0100400)+2 种基金Beijing Nova Program(No.Z201100006820079)Shandong Provincial Key Research and Development Program(No.2019JZZY010119)CAS-AIR。
文摘Pointwise convolution is usually utilized to expand or squeeze features in modern lightweight deep models.However,it takes up most of the overall computational cost(usually more than 90%).This paper proposes a novel Poker module to expand features by taking advantage of cheap depthwise convolution.As a result,the Poker module can greatly reduce the computational cost,and meanwhile generate a large number of effective features to guarantee the performance.The proposed module is standardized and can be employed wherever the feature expansion is needed.By varying the stride and the number of channels,different kinds of bottlenecks are designed to plug the proposed Poker module into the network.Thus,a lightweight model can be easily assembled.Experiments conducted on benchmarks reveal the effectiveness of our proposed Poker module.And our Poker Net models can reduce the computational cost by 7.1%-15.6%.Poker Net models achieve comparable or even higher recognition accuracy than previous state-of-the-art(SOTA)models on the Image Net ILSVRC2012 classification dataset.Code is available at https://github.com/diaomin/pokernet.