The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditio...The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.展开更多
There is a problem of real-time detection difficulty in road surface damage detection. This paper proposes an improved lightweight model based on you only look once version 5(YOLOv5). Firstly, this paper fully utilize...There is a problem of real-time detection difficulty in road surface damage detection. This paper proposes an improved lightweight model based on you only look once version 5(YOLOv5). Firstly, this paper fully utilized the convolutional neural network(CNN) + ghosting bottleneck(G_bneck) architecture to reduce redundant feature maps. Afterwards, we upgraded the original upsampling algorithm to content-aware reassembly of features(CARAFE) and increased the receptive field. Finally, we replaced the spatial pyramid pooling fast(SPPF) module with the basic receptive field block(Basic RFB) pooling module and added dilated convolution. After comparative experiments, we can see that the number of parameters and model size of the improved algorithm in this paper have been reduced by nearly half compared to the YOLOv5s. The frame rate per second(FPS) has been increased by 3.25 times. The mean average precision(m AP@0.5: 0.95) has increased by 8%—17% compared to other lightweight algorithms.展开更多
Background Despite the recent progress in 3D point cloud processing using deep convolutional neural networks,the inability to extract local features remains a challenging problem.In addition,existing methods consider ...Background Despite the recent progress in 3D point cloud processing using deep convolutional neural networks,the inability to extract local features remains a challenging problem.In addition,existing methods consider only the spatial domain in the feature extraction process.Methods In this paper,we propose a spectral and spatial aggregation convolutional network(S^(2)ANet),which combines spectral and spatial features for point cloud processing.First,we calculate the local frequency of the point cloud in the spectral domain.Then,we use the local frequency to group points and provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency.We simultaneously extract the local features in the spatial domain to supplement the final features.Results S^(2)ANet was applied in several point cloud analysis tasks;it achieved stateof-the-art classification accuracies of 93.8%,88.0%,and 83.1%on the ModelNet40,ShapeNetCore,and ScanObjectNN datasets,respectively.For indoor scene segmentation,training and testing were performed on the S3DIS dataset,and the mean intersection over union was 62.4%.Conclusions The proposed S^(2)ANet can effectively capture the local geometric information of point clouds,thereby improving accuracy on various tasks.展开更多
车辆检测是智能交通系统和自动驾驶的重要组成部分。然而,实际交通场景中存在许多不确定因素,导致车辆检测模型的准确率低实时性差。为了解决这个问题,提出了一种快速准确的车辆检测算法——YOLOv8-DEL。使用DGCST(dynamic group convol...车辆检测是智能交通系统和自动驾驶的重要组成部分。然而,实际交通场景中存在许多不确定因素,导致车辆检测模型的准确率低实时性差。为了解决这个问题,提出了一种快速准确的车辆检测算法——YOLOv8-DEL。使用DGCST(dynamic group convolution shuffle transformer)模块代替C2f模块来重构主干网络,以增强特征提取能力并使网络更轻量;添加的P2检测层能使模型更敏锐地定位和检测小目标,同时采用Efficient RepGFPN进行多尺度特征融合,以丰富特征信息并提高模型的特征表达能力;通过结合GroupNorm和共享卷积的优点,设计了一种轻量型共享卷积检测头,在保持精度的前提下,有效减少参数量并提升检测速度。与YOLOv8相比,提出的YOLOv8-DEL在BDD100K数据集和KITTI数据集上,mAP@0.5分别提高了4.8个百分点和1.2个百分点,具有实时检测速度(208.6 FPS和216.4 FPS),在检测精度和速度方面实现了更有利的折中。展开更多
在小样本学习任务中,针对传统的骨干卷积网络在提取图像特征时,由于多层卷积忽视细节特征导致特征信息丢失,因而图像分类准确率不高的问题,提出了基于两阶段特征空间增强的小样本图像分类模型。首先,该模型在残差网络(residual network,...在小样本学习任务中,针对传统的骨干卷积网络在提取图像特征时,由于多层卷积忽视细节特征导致特征信息丢失,因而图像分类准确率不高的问题,提出了基于两阶段特征空间增强的小样本图像分类模型。首先,该模型在残差网络(residual network, ResNet)12的底层引入中值增强的空间和通道注意力块(median-enhanced spatial and channel attention block, MESC);然后,该模型在ResNet12的中高层引入空间组增强(spatial group-wise enhance, SGE)模块,提升卷积神经网络中的语义特征学习能力,使模型有效提取特征图关键信息。该模型通过增强有限的训练样本的特征表示来提高分类性能,增强模型对噪声的鲁棒性。结果表明,该模型在加州理工学院-加利福尼亚大学圣地亚哥分校鸟类(California Institute of Technology-University of California at San Diego birds, CUB)-200-2011数据集上,5类别1样本和5类别5样本2种参数设置下的分类准确率分别比分布传播图网络(distribution propagation graph network, DPGN)模型提高了约5.15%和1.92%;在分层图像网络(tiered ImageNet, tieredImageNet)数据集上,这2种参数设置下的分类准确率分别比DPGN模型提高了约1.04%和0.55%。该模型提升了小样本图像分类任务的性能。展开更多
基金funded by the National Natural Science Foundation of China under Grant No.62371187the Open Program of Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center under Grant No.2024JS101.
文摘The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.
基金supported by the Shanghai Sailing Program,China (No.20YF1447600)the Research Start-Up Project of Shanghai Institute of Technology (No.YJ2021-60)+1 种基金the Collaborative Innovation Project of Shanghai Institute of Technology (No.XTCX2020-12)the Science and Technology Talent Development Fund for Young and Middle-Aged Teachers at Shanghai Institute of Technology (No.ZQ2022-6)。
文摘There is a problem of real-time detection difficulty in road surface damage detection. This paper proposes an improved lightweight model based on you only look once version 5(YOLOv5). Firstly, this paper fully utilized the convolutional neural network(CNN) + ghosting bottleneck(G_bneck) architecture to reduce redundant feature maps. Afterwards, we upgraded the original upsampling algorithm to content-aware reassembly of features(CARAFE) and increased the receptive field. Finally, we replaced the spatial pyramid pooling fast(SPPF) module with the basic receptive field block(Basic RFB) pooling module and added dilated convolution. After comparative experiments, we can see that the number of parameters and model size of the improved algorithm in this paper have been reduced by nearly half compared to the YOLOv5s. The frame rate per second(FPS) has been increased by 3.25 times. The mean average precision(m AP@0.5: 0.95) has increased by 8%—17% compared to other lightweight algorithms.
文摘Background Despite the recent progress in 3D point cloud processing using deep convolutional neural networks,the inability to extract local features remains a challenging problem.In addition,existing methods consider only the spatial domain in the feature extraction process.Methods In this paper,we propose a spectral and spatial aggregation convolutional network(S^(2)ANet),which combines spectral and spatial features for point cloud processing.First,we calculate the local frequency of the point cloud in the spectral domain.Then,we use the local frequency to group points and provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency.We simultaneously extract the local features in the spatial domain to supplement the final features.Results S^(2)ANet was applied in several point cloud analysis tasks;it achieved stateof-the-art classification accuracies of 93.8%,88.0%,and 83.1%on the ModelNet40,ShapeNetCore,and ScanObjectNN datasets,respectively.For indoor scene segmentation,training and testing were performed on the S3DIS dataset,and the mean intersection over union was 62.4%.Conclusions The proposed S^(2)ANet can effectively capture the local geometric information of point clouds,thereby improving accuracy on various tasks.
文摘在小样本学习任务中,针对传统的骨干卷积网络在提取图像特征时,由于多层卷积忽视细节特征导致特征信息丢失,因而图像分类准确率不高的问题,提出了基于两阶段特征空间增强的小样本图像分类模型。首先,该模型在残差网络(residual network, ResNet)12的底层引入中值增强的空间和通道注意力块(median-enhanced spatial and channel attention block, MESC);然后,该模型在ResNet12的中高层引入空间组增强(spatial group-wise enhance, SGE)模块,提升卷积神经网络中的语义特征学习能力,使模型有效提取特征图关键信息。该模型通过增强有限的训练样本的特征表示来提高分类性能,增强模型对噪声的鲁棒性。结果表明,该模型在加州理工学院-加利福尼亚大学圣地亚哥分校鸟类(California Institute of Technology-University of California at San Diego birds, CUB)-200-2011数据集上,5类别1样本和5类别5样本2种参数设置下的分类准确率分别比分布传播图网络(distribution propagation graph network, DPGN)模型提高了约5.15%和1.92%;在分层图像网络(tiered ImageNet, tieredImageNet)数据集上,这2种参数设置下的分类准确率分别比DPGN模型提高了约1.04%和0.55%。该模型提升了小样本图像分类任务的性能。