Significant advancements have been achieved in the field of Single Image Super-Resolution(SISR)through the utilization of Convolutional Neural Networks(CNNs)to attain state-of-the-art performance.Recent efforts have e...Significant advancements have been achieved in the field of Single Image Super-Resolution(SISR)through the utilization of Convolutional Neural Networks(CNNs)to attain state-of-the-art performance.Recent efforts have explored the incorporation of Transformers to augment network performance in SISR.However,the high computational cost of Transformers makes them less suitable for deployment on lightweight devices.Moreover,the majority of enhancements for CNNs rely predominantly on small spatial convolutions,thereby neglecting the potential advantages of large kernel convolution.In this paper,the authors propose a Multi-Perception Large Kernel convNet(MPLKN)which delves into the exploration of large kernel convolution.Specifically,the authors have architected a Multi-Perception Large Kernel(MPLK)module aimed at extracting multi-scale features and employ a stepwise feature fusion strategy to seamlessly integrate these features.In addition,to enhance the network's capacity for nonlinear spatial information processing,the authors have designed a Spatial-Channel Gated Feed-forward Network(SCGFN)that is capable of adapting to feature interactions across both spatial and channel dimensions.Experimental results demonstrate that MPLKN outperforms other lightweight image super-resolution models while maintaining a minimal number of parameters and FLOPs.展开更多
Robust watermarking requires finding invariant features under multiple attacks to ensure correct extraction.Deep learning has extremely powerful in extracting features,and watermarking algorithms based on deep learnin...Robust watermarking requires finding invariant features under multiple attacks to ensure correct extraction.Deep learning has extremely powerful in extracting features,and watermarking algorithms based on deep learning have attracted widespread attention.Most existing methods use 3×3 small kernel convolution to extract image features and embed the watermarking.However,the effective perception fields for small kernel convolution are extremely confined,so the pixels that each watermarking can affect are restricted,thus limiting the performance of the watermarking.To address these problems,we propose a watermarking network based on large kernel convolution and adaptive weight assignment for loss functions.It uses large-kernel depth-wise convolution to extract features for learning large-scale image information and subsequently projects the watermarking into a highdimensional space by 1×1 convolution to achieve adaptability in the channel dimension.Subsequently,the modification of the embedded watermarking on the cover image is extended to more pixels.Because the magnitude and convergence rates of each loss function are different,an adaptive loss weight assignment strategy is proposed to make theweights participate in the network training together and adjust theweight dynamically.Further,a high-frequency wavelet loss is proposed,by which the watermarking is restricted to only the low-frequency wavelet sub-bands,thereby enhancing the robustness of watermarking against image compression.The experimental results show that the peak signal-to-noise ratio(PSNR)of the encoded image reaches 40.12,the structural similarity(SSIM)reaches 0.9721,and the watermarking has good robustness against various types of noise.展开更多
In recent years,deep learning has been introduced into the field of Single-pixel imaging(SPI),garnering significant attention.However,conventional networks still exhibit limitations in preserving image details.To addr...In recent years,deep learning has been introduced into the field of Single-pixel imaging(SPI),garnering significant attention.However,conventional networks still exhibit limitations in preserving image details.To address this issue,we integrate Large Kernel Convolution(LKconv)into the U-Net framework,proposing an enhanced network structure named U-LKconv network,which significantly enhances the capability to recover image details even under low sampling conditions.展开更多
Graph convolutional networks(GCNs)have become a dominant approach for skeleton-based action recognition tasks.Although GCNs have made significant progress in modeling skeletons as spatial-temporal graphs,they often re...Graph convolutional networks(GCNs)have become a dominant approach for skeleton-based action recognition tasks.Although GCNs have made significant progress in modeling skeletons as spatial-temporal graphs,they often require stacking multiple graph convolution layers to effectively capture long-distance relationships among nodes.This stacking not only increases computational burdens but also raises the risk of over-smoothing,which can lead to the neglect of crucial local action features.To address this issue,we propose a novel multi-scale adaptive large kernel graph convolutional network(MSLK-GCN)to effectively aggregate local and global spatio-temporal correlations while maintaining the computational efficiency.The core components of the network include two multi-scale large kernel graph convolution(LKGC)modules,a multi-channel adaptive graph convolution(MAGC)module,and a multi-scale temporal self-attention convolution(MSTC)module.The LKGC module adaptively focuses on active motion regions by utilizing a large convolution kernel and a gating mechanism,effectively capturing long-distance dependencies within the skeleton sequence.Meanwhile,the MAGC module dynamically learns relationships between different joints by adjusting connection weights between nodes.To further enhance the ability to capture temporal dynamics,the MSTC module effectively aggregates the temporal information by integrating Efficient Channel Attention(ECA)with multi-scale convolution.In addition,we use a multi-stream fusion strategy to make full use of different modal skeleton data,including bone,joint,joint motion,and bone motion.Exhaustive experiments on three scale-varying datasets,i.e.,NTU-60,NTU-120,and NW-UCLA,demonstrate that our MSLK-GCN can achieve state-of-the-art performance with fewer parameters.展开更多
Printed circuit boards(PCBs)provide stable connections between electronic components.However,defective printed circuit boards may cause the entire equipment system to malfunction,resulting in incalculable losses.There...Printed circuit boards(PCBs)provide stable connections between electronic components.However,defective printed circuit boards may cause the entire equipment system to malfunction,resulting in incalculable losses.Therefore,it is crucial to detect defective printed circuit boards during the generation process.Traditional detection methods have low accuracy in detecting subtle defects in complex background environments.In order to improve the detection accuracy of surface defects on industrial printed circuit boards,this paper proposes a residual large kernel network based on YOLOv5(You Only Look Once version 5)for PCBs surface defect detection,called YOLO-RLC(You Only Look Once-Residual Large Kernel).Build a deep large kernel backbone to expand the effective field of view,capture global informationmore efficiently,and use 1×1 convolutions to balance the depth of the model,improving feature extraction efficiency through reparameterization methods.The neck network introduces a bidirectional weighted feature fusion network,combined with a brand-new noise filter and feature enhancement extractor,to eliminate noise information generated by information fusion and recalibrate information from different channels to improve the quality of deep features.Simplify the aspect ratio of the bounding box to alleviate the issue of specificity values.After training and testing on the PCB defect dataset,our method achieved an average accuracy of 97.3%(mAP50)after multiple experiments,which is 4.1%higher than YOLOv5-S,with an average accuracy of 97.6%and an Frames Per Second of 76.7.The comparative analysis also proves the superior performance and feasibility of YOLO-RLC in PCB defect detection.展开更多
Automatic crack detection of cement pavement chiefly benefits from the rapid development of deep learning,with convolutional neural networks(CNN)playing an important role in this field.However,as the performance of cr...Automatic crack detection of cement pavement chiefly benefits from the rapid development of deep learning,with convolutional neural networks(CNN)playing an important role in this field.However,as the performance of crack detection in cement pavement improves,the depth and width of the network structure are significantly increased,which necessitates more computing power and storage space.This limitation hampers the practical implementation of crack detection models on various platforms,particularly portable devices like small mobile devices.To solve these problems,we propose a dual-encoder-based network architecture that focuses on extracting more comprehensive fracture feature information and combines cross-fusion modules and coordinated attention mechanisms formore efficient feature fusion.Firstly,we use small channel convolution to construct shallow feature extractionmodule(SFEM)to extract low-level feature information of cracks in cement pavement images,in order to obtainmore information about cracks in the shallowfeatures of images.In addition,we construct large kernel atrous convolution(LKAC)to enhance crack information,which incorporates coordination attention mechanism for non-crack information filtering,and large kernel atrous convolution with different cores,using different receptive fields to extract more detailed edge and context information.Finally,the three-stage feature map outputs from the shallow feature extraction module is cross-fused with the two-stage feature map outputs from the large kernel atrous convolution module,and the shallow feature and detailed edge feature are fully fused to obtain the final crack prediction map.We evaluate our method on three public crack datasets:DeepCrack,CFD,and Crack500.Experimental results on theDeepCrack dataset demonstrate the effectiveness of our proposed method compared to state-of-the-art crack detection methods,which achieves Precision(P)87.2%,Recall(R)87.7%,and F-score(F1)87.4%.Thanks to our lightweight crack detectionmodel,the parameter count of the model in real-world detection scenarios has been significantly reduced to less than 2M.This advancement also facilitates technical support for portable scene detection.展开更多
文摘Significant advancements have been achieved in the field of Single Image Super-Resolution(SISR)through the utilization of Convolutional Neural Networks(CNNs)to attain state-of-the-art performance.Recent efforts have explored the incorporation of Transformers to augment network performance in SISR.However,the high computational cost of Transformers makes them less suitable for deployment on lightweight devices.Moreover,the majority of enhancements for CNNs rely predominantly on small spatial convolutions,thereby neglecting the potential advantages of large kernel convolution.In this paper,the authors propose a Multi-Perception Large Kernel convNet(MPLKN)which delves into the exploration of large kernel convolution.Specifically,the authors have architected a Multi-Perception Large Kernel(MPLK)module aimed at extracting multi-scale features and employ a stepwise feature fusion strategy to seamlessly integrate these features.In addition,to enhance the network's capacity for nonlinear spatial information processing,the authors have designed a Spatial-Channel Gated Feed-forward Network(SCGFN)that is capable of adapting to feature interactions across both spatial and channel dimensions.Experimental results demonstrate that MPLKN outperforms other lightweight image super-resolution models while maintaining a minimal number of parameters and FLOPs.
基金supported,in part,by the National Nature Science Foundation of China under grant numbers 62272236in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD)fund.
文摘Robust watermarking requires finding invariant features under multiple attacks to ensure correct extraction.Deep learning has extremely powerful in extracting features,and watermarking algorithms based on deep learning have attracted widespread attention.Most existing methods use 3×3 small kernel convolution to extract image features and embed the watermarking.However,the effective perception fields for small kernel convolution are extremely confined,so the pixels that each watermarking can affect are restricted,thus limiting the performance of the watermarking.To address these problems,we propose a watermarking network based on large kernel convolution and adaptive weight assignment for loss functions.It uses large-kernel depth-wise convolution to extract features for learning large-scale image information and subsequently projects the watermarking into a highdimensional space by 1×1 convolution to achieve adaptability in the channel dimension.Subsequently,the modification of the embedded watermarking on the cover image is extended to more pixels.Because the magnitude and convergence rates of each loss function are different,an adaptive loss weight assignment strategy is proposed to make theweights participate in the network training together and adjust theweight dynamically.Further,a high-frequency wavelet loss is proposed,by which the watermarking is restricted to only the low-frequency wavelet sub-bands,thereby enhancing the robustness of watermarking against image compression.The experimental results show that the peak signal-to-noise ratio(PSNR)of the encoded image reaches 40.12,the structural similarity(SSIM)reaches 0.9721,and the watermarking has good robustness against various types of noise.
文摘In recent years,deep learning has been introduced into the field of Single-pixel imaging(SPI),garnering significant attention.However,conventional networks still exhibit limitations in preserving image details.To address this issue,we integrate Large Kernel Convolution(LKconv)into the U-Net framework,proposing an enhanced network structure named U-LKconv network,which significantly enhances the capability to recover image details even under low sampling conditions.
基金supported in part by the National Natural Science Foundation of China under Grant No.61976127the Shandong Provincial Natural Science Foundation under Grant No.ZR2024MF030+1 种基金the Taishan Scholar Program of Shandong Province of China under Grant No.tsqn202306150the Key Research and Development Program of Shandong Province of China under Grant No.2025CXPT096.
文摘Graph convolutional networks(GCNs)have become a dominant approach for skeleton-based action recognition tasks.Although GCNs have made significant progress in modeling skeletons as spatial-temporal graphs,they often require stacking multiple graph convolution layers to effectively capture long-distance relationships among nodes.This stacking not only increases computational burdens but also raises the risk of over-smoothing,which can lead to the neglect of crucial local action features.To address this issue,we propose a novel multi-scale adaptive large kernel graph convolutional network(MSLK-GCN)to effectively aggregate local and global spatio-temporal correlations while maintaining the computational efficiency.The core components of the network include two multi-scale large kernel graph convolution(LKGC)modules,a multi-channel adaptive graph convolution(MAGC)module,and a multi-scale temporal self-attention convolution(MSTC)module.The LKGC module adaptively focuses on active motion regions by utilizing a large convolution kernel and a gating mechanism,effectively capturing long-distance dependencies within the skeleton sequence.Meanwhile,the MAGC module dynamically learns relationships between different joints by adjusting connection weights between nodes.To further enhance the ability to capture temporal dynamics,the MSTC module effectively aggregates the temporal information by integrating Efficient Channel Attention(ECA)with multi-scale convolution.In addition,we use a multi-stream fusion strategy to make full use of different modal skeleton data,including bone,joint,joint motion,and bone motion.Exhaustive experiments on three scale-varying datasets,i.e.,NTU-60,NTU-120,and NW-UCLA,demonstrate that our MSLK-GCN can achieve state-of-the-art performance with fewer parameters.
基金supported by the Ministry of Education Humanities and Social Science Research Project(No.23YJAZH034)The Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.SJCX24_2147,SJCX24_2148)+1 种基金National Computer Basic Education Research Project in Higher Education Institutions(Nos.2024-AFCEC-056,2024-AFCEC-057)Enterprise Collaboration Project(Nos.Z421A22349,Z421A22304,Z421A210045).
文摘Printed circuit boards(PCBs)provide stable connections between electronic components.However,defective printed circuit boards may cause the entire equipment system to malfunction,resulting in incalculable losses.Therefore,it is crucial to detect defective printed circuit boards during the generation process.Traditional detection methods have low accuracy in detecting subtle defects in complex background environments.In order to improve the detection accuracy of surface defects on industrial printed circuit boards,this paper proposes a residual large kernel network based on YOLOv5(You Only Look Once version 5)for PCBs surface defect detection,called YOLO-RLC(You Only Look Once-Residual Large Kernel).Build a deep large kernel backbone to expand the effective field of view,capture global informationmore efficiently,and use 1×1 convolutions to balance the depth of the model,improving feature extraction efficiency through reparameterization methods.The neck network introduces a bidirectional weighted feature fusion network,combined with a brand-new noise filter and feature enhancement extractor,to eliminate noise information generated by information fusion and recalibrate information from different channels to improve the quality of deep features.Simplify the aspect ratio of the bounding box to alleviate the issue of specificity values.After training and testing on the PCB defect dataset,our method achieved an average accuracy of 97.3%(mAP50)after multiple experiments,which is 4.1%higher than YOLOv5-S,with an average accuracy of 97.6%and an Frames Per Second of 76.7.The comparative analysis also proves the superior performance and feasibility of YOLO-RLC in PCB defect detection.
基金supported by the National Natural Science Foundation of China(No.62176034)the Science and Technology Research Program of Chongqing Municipal Education Commission(No.KJZD-M202300604)the Natural Science Foundation of Chongqing(Nos.cstc2021jcyj-msxmX0518,2023NSCQ-MSX1781).
文摘Automatic crack detection of cement pavement chiefly benefits from the rapid development of deep learning,with convolutional neural networks(CNN)playing an important role in this field.However,as the performance of crack detection in cement pavement improves,the depth and width of the network structure are significantly increased,which necessitates more computing power and storage space.This limitation hampers the practical implementation of crack detection models on various platforms,particularly portable devices like small mobile devices.To solve these problems,we propose a dual-encoder-based network architecture that focuses on extracting more comprehensive fracture feature information and combines cross-fusion modules and coordinated attention mechanisms formore efficient feature fusion.Firstly,we use small channel convolution to construct shallow feature extractionmodule(SFEM)to extract low-level feature information of cracks in cement pavement images,in order to obtainmore information about cracks in the shallowfeatures of images.In addition,we construct large kernel atrous convolution(LKAC)to enhance crack information,which incorporates coordination attention mechanism for non-crack information filtering,and large kernel atrous convolution with different cores,using different receptive fields to extract more detailed edge and context information.Finally,the three-stage feature map outputs from the shallow feature extraction module is cross-fused with the two-stage feature map outputs from the large kernel atrous convolution module,and the shallow feature and detailed edge feature are fully fused to obtain the final crack prediction map.We evaluate our method on three public crack datasets:DeepCrack,CFD,and Crack500.Experimental results on theDeepCrack dataset demonstrate the effectiveness of our proposed method compared to state-of-the-art crack detection methods,which achieves Precision(P)87.2%,Recall(R)87.7%,and F-score(F1)87.4%.Thanks to our lightweight crack detectionmodel,the parameter count of the model in real-world detection scenarios has been significantly reduced to less than 2M.This advancement also facilitates technical support for portable scene detection.