In the textile industry,the presence of defects on the surface of fabric is an essential factor in determining fabric quality.Therefore,identifying fabric defects forms a crucial part of the fabric production process....In the textile industry,the presence of defects on the surface of fabric is an essential factor in determining fabric quality.Therefore,identifying fabric defects forms a crucial part of the fabric production process.Traditional fabric defect detection algorithms can only detect specific materials and specific fabric defect types;in addition,their detection efficiency is low,and their detection results are relatively poor.Deep learning-based methods have many advantages in the field of fabric defect detection,however,such methods are less effective in identifying multiscale fabric defects and defects with complex shapes.Therefore,we propose an effective algorithm,namely multilayer feature extraction combined with deformable convolution(MFDC),for fabric defect detection.In MFDC,multi-layer feature extraction is used to fuse the underlying location features with high-level classification features through a horizontally connected top-down architecture to improve the detection of multi-scale fabric defects.On this basis,a deformable convolution is added to solve the problem of the algorithm’s weak detection ability of irregularly shaped fabric defects.In this approach,Roi Align and Cascade-RCNN are integrated to enhance the adaptability of the algorithm in materials with complex patterned backgrounds.The experimental results show that the MFDC algorithm can achieve good detection results for both multi-scale fabric defects and defects with complex shapes,at the expense of a small increase in detection time.展开更多
Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the a...Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the ability to simulate geometric transformations.Therefore,a deformable convolution is introduced to enhance the adaptability of convolutional networks to spatial transformation.Considering that the deep convolutional neural networks cannot adequately segment the local objects at the output layer due to using the pooling layers in neural network architecture.To overcome this shortcoming,the rough prediction segmentation results of the neural network output layer will be processed by fully connected conditional random fields to improve the ability of image segmentation.The proposed method can easily be trained by end-to-end using standard backpropagation algorithms.Finally,the proposed method is tested on the ISPRS dataset.The results show that the proposed method can effectively overcome the influence of the complex structure of the segmentation object and obtain state-of-the-art accuracy on the ISPRS Vaihingen 2D semantic labeling dataset.展开更多
This paper presents CW-HRNet,a high-resolution,lightweight crack segmentation network designed to address challenges in complex scenes with slender,deformable,and blurred crack structures.The model incorporates two ke...This paper presents CW-HRNet,a high-resolution,lightweight crack segmentation network designed to address challenges in complex scenes with slender,deformable,and blurred crack structures.The model incorporates two key modules:Constrained Deformable Convolution(CDC),which stabilizes geometric alignment by applying a tanh limiter and learnable scaling factor to the predicted offsets,and the Wavelet Frequency Enhancement Module(WFEM),which decomposes features using Haar wavelets to preserve low-frequency structures while enhancing high-frequency boundaries and textures.Evaluations on the CrackSeg9k benchmark demonstrate CW-HRNet’s superior performance,achieving 82.39%mIoU with only 7.49M parameters and 10.34 GFLOPs,outperforming HrSegNet-B48 by 1.83% in segmentation accuracy with minimal complexity overhead.The model also shows strong cross-dataset generalization,achieving 60.01%mIoU and 66.22%F1 on Asphalt3k without fine-tuning.These results highlight CW-HRNet’s favorable accuracyefficiency trade-off for real-world crack segmentation tasks.展开更多
The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segm...The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segmentation networks fail to extract features in fundus image sufficiently,we propose a novel network(DSeU-net)based on deformable convolution and squeeze excitation residual module.The deformable convolution is utilized to dynamically adjust the receptive field for the feature extraction of retinal vessel.And the squeeze excitation residual module is used to scale the weights of the low-level features so that the network learns the complex relationships of the different feature layers efficiently.We validate the DSeU-net on three public retinal vessel segmentation datasets including DRIVE,CHASEDB1,and STARE,and the experimental results demonstrate the satisfactory segmentation performance of the network.展开更多
Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially...Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially leading to false positives or missed detections.To solve these problems,the YOLOv8 network is enhanced by adding deformable convolution and atrous spatial pyramid pooling(ASPP),along with the integration of a coordinate attention(CA)mechanism.This allows the network to focus on small targets while expanding the receptive field without losing resolution.At the same time,context information on the target is gathered and feature expression is enhanced by attention modules in different directions.It effectively improves the positioning accuracy and achieves good results on the LUNA16 dataset.Compared with other detection algorithms,it improves the accuracy of pulmonary nodule detection to a certain extent.展开更多
Flooding and heavy rainfall under extreme weather conditions pose significant challenges to target detection algorithms.Traditional methods often struggle to address issues such as image blurring,dynamic noise interfe...Flooding and heavy rainfall under extreme weather conditions pose significant challenges to target detection algorithms.Traditional methods often struggle to address issues such as image blurring,dynamic noise interference,and variations in target scale.Conventional neural network(CNN)-based target detection approaches face notable limitations in such adverse weather scenarios,primarily due to the fixed geometric sampling structures that hinder adaptability to complex backgrounds and dynamically changing object appearances.To address these challenges,this paper proposes an optimized YOLOv9 model incorporating an improved deformable convolutional network(DCN)enhanced with a multi-scale dilated attention(MSDA)mechanism.Specifically,the DCN module enhances themodel’s adaptability to target deformation and noise interference by adaptively adjusting the sampling grid positions,while also integrating feature amplitude modulation to further improve robustness.Additionally,theMSDA module is introduced to capture contextual features acrossmultiple scales,effectively addressing issues related to target occlusion and scale variation commonly encountered in flood-affected environments.Experimental evaluations are conducted on the ISE-UFDS and UA-DETRAC datasets.The results demonstrate that the proposedmodel significantly outperforms state-of-the-art methods in key evaluation metrics,including precision,recall,F1-score,and mAP(Mean Average Precision).Notably,the model exhibits superior robustness and generalization performance under simulated severe weather conditions,offering reliable technical support for disaster emergency response systems.This study contributes to enhancing the accuracy and real-time capabilities of flood early warning systems,thereby supporting more effective disaster mitigation strategies.展开更多
Background Exploring correspondences across multiview images is the basis of various computer vision tasks.However,most existing methods have limited accuracy under challenging conditions.Method To learn more robust a...Background Exploring correspondences across multiview images is the basis of various computer vision tasks.However,most existing methods have limited accuracy under challenging conditions.Method To learn more robust and accurate correspondences,we propose DSD-MatchingNet for local feature matching in this study.First,we develop a deformable feature extraction module to obtain multilevel feature maps,which harvest contextual information from dynamic receptive fields.The dynamic receptive fields provided by the deformable convolution network ensure that our method obtains dense and robust correspondence.Second,we utilize sparse-to-dense matching with symmetry of correspondence to implement accurate pixel-level matching,which enables our method to produce more accurate correspondences.Result Experiments show that our proposed DSD-MatchingNet achieves a better performance on the image matching benchmark,as well as on the visual localization benchmark.Specifically,our method achieved 91.3%mean matching accuracy on the HPatches dataset and 99.3%visual localization recalls on the Aachen Day-Night dataset.展开更多
Due to the complex environment of the university laboratory,personnel flow intensive,personnel irregular behavior is easy to cause security risks.Monitoring using mainstream detection algorithms suffers from low detec...Due to the complex environment of the university laboratory,personnel flow intensive,personnel irregular behavior is easy to cause security risks.Monitoring using mainstream detection algorithms suffers from low detection accuracy and slow speed.Therefore,the current management of personnel behavior mainly relies on institutional constraints,education and training,on-site supervision,etc.,which is time-consuming and ineffective.Given the above situation,this paper proposes an improved You Only Look Once version 7(YOLOv7)to achieve the purpose of quickly detecting irregular behaviors of laboratory personnel while ensuring high detection accuracy.First,to better capture the shape features of the target,deformable convolutional networks(DCN)is used in the backbone part of the model to replace the traditional convolution to improve the detection accuracy and speed.Second,to enhance the extraction of important features and suppress useless features,this paper proposes a new convolutional block attention module_efficient channel attention(CBAM_E)for embedding the neck network to improve the model’s ability to extract features from complex scenes.Finally,to reduce the influence of angle factor and bounding box regression accuracy,this paper proposes a newα-SCYLLA intersection over union(α-SIoU)instead of the complete intersection over union(CIoU),which improves the regression accuracy while increasing the convergence speed.Comparison experiments on public and homemade datasets show that the improved algorithm outperforms the original algorithm in all evaluation indexes,with an increase of 2.92%in the precision rate,4.14%in the recall rate,0.0356 in the weighted harmonic mean,3.60%in the mAP@0.5 value,and a reduction in the number of parameters and complexity.Compared with the mainstream algorithm,the improved algorithm has higher detection accuracy,faster convergence speed,and better actual recognition effect,indicating the effectiveness of the improved algorithm in this paper and its potential for practical application in laboratory scenarios.展开更多
In the age of smart technology,the widespread use of small LCD(Liquid Crystal Display)necessitates pre-market defect detection to ensure quality and reduce the incidence of defective products.Manual inspection is both...In the age of smart technology,the widespread use of small LCD(Liquid Crystal Display)necessitates pre-market defect detection to ensure quality and reduce the incidence of defective products.Manual inspection is both time-consuming and labor-intensive.Existing methods struggle with accurately detecting small targets,such as point defects,and handling defects with significant scale variations,such as line defects,especially in complex background conditions.To address these challenges,this paper presents the YOLO-DEI(Deep Enhancement Information)model,which integrates DCNv2(Deformable convolution)into the backbone network to enhance feature extraction under geometric transformations.The model also includes the CEG(Contextual Enhancement Group)module to optimize feature aggregation during extraction,improving performance without increasing computational load.Furthermore,our proposed IGF(Information Guide Fusion)module refines feature fusion in the neck network,preserving both spatial and channel information.Experimental results indicate that the YOLO-DEI model increases precision by 2.9%,recall by 13.3%,and mean Average Precision(mAP50)by 12.9%,all while maintaining comparable parameter counts and computational costs.These significant improvements in defect detection performance highlight the model’s potential for practical applications in ensuring the quality of LCD.展开更多
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such...Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.展开更多
Retinal vessel segmentation in fundus images plays an essential role in the screening,diagnosis,and treatment of many diseases.The acquired fundus images generally have the following problems:uneven illumination,high ...Retinal vessel segmentation in fundus images plays an essential role in the screening,diagnosis,and treatment of many diseases.The acquired fundus images generally have the following problems:uneven illumination,high noise,and complex structure.It makes vessel segmentation very challenging.Previous methods of retinal vascular segmentation mainly use convolutional neural networks on U Network(U-Net)models,and they have many limitations and shortcomings,such as the loss of microvascular details at the end of the vessels.We address the limitations of convolution by introducing the transformer into retinal vessel segmentation.Therefore,we propose a hybrid method for retinal vessel segmentation based on modulated deformable convolution and the transformer,named DT-Net.Firstly,multi-scale image features are extracted by deformable convolution and multi-head selfattention(MHSA).Secondly,image information is recovered,and vessel morphology is refined by the proposed transformer decoder block.Finally,the local prediction results are obtained by the side output layer.The accuracy of the vessel segmentation is improved by the hybrid loss function.Experimental results show that our method obtains good segmentation performance on Specificity(SP),Sensitivity(SE),Accuracy(ACC),Curve(AUC),and F1-score on three publicly available fundus datasets such as DRIVE,STARE,and CHASE_DB1.展开更多
In order to solve the shortcomings of current fatigue detection methods such as low accuracy or poor real-time performance,a fatigue detection method based on multi-feature fusion is proposed.Firstly,the HOG face dete...In order to solve the shortcomings of current fatigue detection methods such as low accuracy or poor real-time performance,a fatigue detection method based on multi-feature fusion is proposed.Firstly,the HOG face detection algorithm and KCF target tracking algorithm are integrated and deformable convolutional neural network is introduced to identify the state of extracted eyes and mouth,fast track the detected faces and extract continuous and stable target faces for more efficient extraction.Then the head pose algorithm is introduced to detect the driver’s head in real time and obtain the driver’s head state information.Finally,a multi-feature fusion fatigue detection method is proposed based on the state of the eyes,mouth and head.According to the experimental results,the proposed method can detect the driver’s fatigue state in real time with high accuracy and good robustness compared with the current fatigue detection algorithms.展开更多
Document images often contain various page components and complex logical structures,which make document layout analysis task challenging.For most deep learning-based document layout analysis methods,convolutional neu...Document images often contain various page components and complex logical structures,which make document layout analysis task challenging.For most deep learning-based document layout analysis methods,convolutional neural networks(CNNs)are adopted as the feature extraction networks.In this paper,a hybrid spatial-channel attention network(HSCA-Net)is proposed to improve feature extraction capability by introducing attention mechanism to explore more salient properties within document pages.The HSCA-Net consists of spatial attention module(SAM),channel attention module(CAM),and designed lateral attention connection.CAM adaptively adjusts channel feature responses by emphasizing selective information,which depends on the contribution of the features of each channel.SAM guides CNNs to focus on the informative contents and capture global context information among page objects.The lateral attention connection incorporates SAM and CAM into multiscale feature pyramid network,and thus retains original feature information.The effectiveness and adaptability of HSCA-Net are evaluated through multiple experiments on publicly available datasets such as PubLayNet,ICDAR-POD,and Article Regions.Experimental results demonstrate that HSCA-Net achieves state-of-the-art performance on document layout analysis task.展开更多
This DC-YOLO Model was designed in order to improve the efficiency for appraising dangerous class of buildings and avoid manual intervention,thereby making the appraisal results more objective.It is an automated metho...This DC-YOLO Model was designed in order to improve the efficiency for appraising dangerous class of buildings and avoid manual intervention,thereby making the appraisal results more objective.It is an automated method designed based on deep learning and target detection algorithms to appraise the dangerous class of building masonry component.Specifically,it(1)adopted K-means clustering to obtain the quantity and size of the prior boxes;(2)expanded the grid size to improve identification to small targets;(3)introduced in deformable convolution to adapt to the irregular shape of the masonry component cracks.The experimental results show that,comparing with the conventional method,the DC-YOLO model has better recognition rates for various targets to different extents,and achieves good effects in precision,recall rate and F1 value,which indicates the good performance in classifying dangerous classes of building masonry component.展开更多
Background Recurrent recovery is a common method for video super-resolution(VSR)that models the correlation between frames via hidden states.However,the application of this structure in real-world scenarios can lead t...Background Recurrent recovery is a common method for video super-resolution(VSR)that models the correlation between frames via hidden states.However,the application of this structure in real-world scenarios can lead to unsatisfactory artifacts.We found that in real-world VSR training,the use of unknown and complex degradation can better simulate the degradation process in the real world.Methods Based on this,we propose the RealFuVSR model,which simulates real-world degradation and mitigates artifacts caused by the VSR.Specifically,we propose a multiscale feature extraction module(MSF)module that extracts and fuses features from multiple scales,thereby facilitating the elimination of hidden state artifacts.To improve the accuracy of the hidden state alignment information,RealFuVSR uses an advanced optical flow-guided deformable convolution.Moreover,a cascaded residual upsampling module was used to eliminate noise caused by the upsampling process.Results The experiment demonstrates that RealFuVSR model can not only recover high-quality videos but also outperforms the state-of-the-art RealBasicVSR and RealESRGAN models.展开更多
Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning...Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning models to achieve STVSR.They first interpolate intermediate frame features between given frames,then perform local and global refinement among the feature sequence,and finally increase the spatial resolutions of these features.However,in the most important feature interpolation phase,they only capture spatial-temporal information from the most adjacent frame features,ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity.In this paper,we propose a novel long-term temporal feature aggregation network(LTFA-Net)for STVSR.Specifically,we design a long-term mixture of experts(LTMoE)module for feature interpolation.LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features,which are then combined with different weights to obtain interpolation results using several gating nets.Next,we perform local and global feature refinement using the Locally-temporal Feature Comparison(LFC)module and bidirectional deformable ConvLSTM layer,respectively.Experimental results on two standard benchmarks,Adobe240 and GoPro,indicate the effectiveness and superiority of our approach over state of the art.展开更多
The YOLOv5 algorithm is widely used in edge computing systems for object detection.However,the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase ...The YOLOv5 algorithm is widely used in edge computing systems for object detection.However,the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices.To address this issue,we propose a smaller,less computationally intensive,and more accurate algorithm for object detection.Multi-scale Feature Fusion-YOLO(MFF-YOLO)is built on top of the YOLOv5s framework,but it contains substantial improvements to YOLOv5s.First,we design the MFF module to improve the feature propagation path in the feature pyramid,which further integrates the semantic information from different paths of feature layers.Then,a large convolution-kernel module is used in the bottleneck.The structure enlarges the receptive field and preserves shallow semantic information,which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks(FPN).In addition,a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity.The experimental results on PASCAL VOC and MS COCO datasets show that,compared with YOLOv5s,MFF-YOLO reduces the number of parameters by 7%and the number of FLoating point Operations Per second(FLOPs)by 11.8%.The mAP@0.5 has improved by 3.7%and 5.5%,and the mAP@0.5:0.95 has improved by 6.5%and 6.2%,respetively.Furthermore,compared with YOLOv7-tiny,PP-YOLO-tiny,and other mainstream methods,MFF-YOLO has achieved better results on multiple indicators.展开更多
基金supported in part by the National Science Foundation of China under Grant 62001236in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 20KJA520003.
文摘In the textile industry,the presence of defects on the surface of fabric is an essential factor in determining fabric quality.Therefore,identifying fabric defects forms a crucial part of the fabric production process.Traditional fabric defect detection algorithms can only detect specific materials and specific fabric defect types;in addition,their detection efficiency is low,and their detection results are relatively poor.Deep learning-based methods have many advantages in the field of fabric defect detection,however,such methods are less effective in identifying multiscale fabric defects and defects with complex shapes.Therefore,we propose an effective algorithm,namely multilayer feature extraction combined with deformable convolution(MFDC),for fabric defect detection.In MFDC,multi-layer feature extraction is used to fuse the underlying location features with high-level classification features through a horizontally connected top-down architecture to improve the detection of multi-scale fabric defects.On this basis,a deformable convolution is added to solve the problem of the algorithm’s weak detection ability of irregularly shaped fabric defects.In this approach,Roi Align and Cascade-RCNN are integrated to enhance the adaptability of the algorithm in materials with complex patterned backgrounds.The experimental results show that the MFDC algorithm can achieve good detection results for both multi-scale fabric defects and defects with complex shapes,at the expense of a small increase in detection time.
基金National Key Research and Development Program of China(No.2017YFC0405806)。
文摘Currently,deep convolutional neural networks have made great progress in the field of semantic segmentation.Because of the fixed convolution kernel geometry,standard convolution neural networks have been limited the ability to simulate geometric transformations.Therefore,a deformable convolution is introduced to enhance the adaptability of convolutional networks to spatial transformation.Considering that the deep convolutional neural networks cannot adequately segment the local objects at the output layer due to using the pooling layers in neural network architecture.To overcome this shortcoming,the rough prediction segmentation results of the neural network output layer will be processed by fully connected conditional random fields to improve the ability of image segmentation.The proposed method can easily be trained by end-to-end using standard backpropagation algorithms.Finally,the proposed method is tested on the ISPRS dataset.The results show that the proposed method can effectively overcome the influence of the complex structure of the segmentation object and obtain state-of-the-art accuracy on the ISPRS Vaihingen 2D semantic labeling dataset.
文摘This paper presents CW-HRNet,a high-resolution,lightweight crack segmentation network designed to address challenges in complex scenes with slender,deformable,and blurred crack structures.The model incorporates two key modules:Constrained Deformable Convolution(CDC),which stabilizes geometric alignment by applying a tanh limiter and learnable scaling factor to the predicted offsets,and the Wavelet Frequency Enhancement Module(WFEM),which decomposes features using Haar wavelets to preserve low-frequency structures while enhancing high-frequency boundaries and textures.Evaluations on the CrackSeg9k benchmark demonstrate CW-HRNet’s superior performance,achieving 82.39%mIoU with only 7.49M parameters and 10.34 GFLOPs,outperforming HrSegNet-B48 by 1.83% in segmentation accuracy with minimal complexity overhead.The model also shows strong cross-dataset generalization,achieving 60.01%mIoU and 66.22%F1 on Asphalt3k without fine-tuning.These results highlight CW-HRNet’s favorable accuracyefficiency trade-off for real-world crack segmentation tasks.
基金Beijing Natural Science Foundation(No.IS23112)Beijing Institute of Technology Research Fund Program for Young Scholars(No.6120220236)。
文摘The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segmentation networks fail to extract features in fundus image sufficiently,we propose a novel network(DSeU-net)based on deformable convolution and squeeze excitation residual module.The deformable convolution is utilized to dynamically adjust the receptive field for the feature extraction of retinal vessel.And the squeeze excitation residual module is used to scale the weights of the low-level features so that the network learns the complex relationships of the different feature layers efficiently.We validate the DSeU-net on three public retinal vessel segmentation datasets including DRIVE,CHASEDB1,and STARE,and the experimental results demonstrate the satisfactory segmentation performance of the network.
文摘Pulmonary nodules represent an early manifestation of lung cancer.However,pulmonary nodules only constitute a small portion of the overall image,posing challenges for physicians in image interpretation and potentially leading to false positives or missed detections.To solve these problems,the YOLOv8 network is enhanced by adding deformable convolution and atrous spatial pyramid pooling(ASPP),along with the integration of a coordinate attention(CA)mechanism.This allows the network to focus on small targets while expanding the receptive field without losing resolution.At the same time,context information on the target is gathered and feature expression is enhanced by attention modules in different directions.It effectively improves the positioning accuracy and achieves good results on the LUNA16 dataset.Compared with other detection algorithms,it improves the accuracy of pulmonary nodule detection to a certain extent.
基金financially supported by the National Key R&D Program of China(No.2022YFC3090603)R&DProgramof BeijingMunicipal EducationCommission(No.KZ202211417049)。
文摘Flooding and heavy rainfall under extreme weather conditions pose significant challenges to target detection algorithms.Traditional methods often struggle to address issues such as image blurring,dynamic noise interference,and variations in target scale.Conventional neural network(CNN)-based target detection approaches face notable limitations in such adverse weather scenarios,primarily due to the fixed geometric sampling structures that hinder adaptability to complex backgrounds and dynamically changing object appearances.To address these challenges,this paper proposes an optimized YOLOv9 model incorporating an improved deformable convolutional network(DCN)enhanced with a multi-scale dilated attention(MSDA)mechanism.Specifically,the DCN module enhances themodel’s adaptability to target deformation and noise interference by adaptively adjusting the sampling grid positions,while also integrating feature amplitude modulation to further improve robustness.Additionally,theMSDA module is introduced to capture contextual features acrossmultiple scales,effectively addressing issues related to target occlusion and scale variation commonly encountered in flood-affected environments.Experimental evaluations are conducted on the ISE-UFDS and UA-DETRAC datasets.The results demonstrate that the proposedmodel significantly outperforms state-of-the-art methods in key evaluation metrics,including precision,recall,F1-score,and mAP(Mean Average Precision).Notably,the model exhibits superior robustness and generalization performance under simulated severe weather conditions,offering reliable technical support for disaster emergency response systems.This study contributes to enhancing the accuracy and real-time capabilities of flood early warning systems,thereby supporting more effective disaster mitigation strategies.
基金Supported by the National Natural Science Foundation of China under Grants 61872241,62077037 and 62272298in part by Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0102。
文摘Background Exploring correspondences across multiview images is the basis of various computer vision tasks.However,most existing methods have limited accuracy under challenging conditions.Method To learn more robust and accurate correspondences,we propose DSD-MatchingNet for local feature matching in this study.First,we develop a deformable feature extraction module to obtain multilevel feature maps,which harvest contextual information from dynamic receptive fields.The dynamic receptive fields provided by the deformable convolution network ensure that our method obtains dense and robust correspondence.Second,we utilize sparse-to-dense matching with symmetry of correspondence to implement accurate pixel-level matching,which enables our method to produce more accurate correspondences.Result Experiments show that our proposed DSD-MatchingNet achieves a better performance on the image matching benchmark,as well as on the visual localization benchmark.Specifically,our method achieved 91.3%mean matching accuracy on the HPatches dataset and 99.3%visual localization recalls on the Aachen Day-Night dataset.
基金This study was supported by the National Natural Science Foundation of China(No.61861007)Guizhou ProvincialDepartment of Education Innovative Group Project(QianJiaohe KY[2021]012)Guizhou Science and Technology Plan Project(Guizhou Science Support[2023]General 412).
文摘Due to the complex environment of the university laboratory,personnel flow intensive,personnel irregular behavior is easy to cause security risks.Monitoring using mainstream detection algorithms suffers from low detection accuracy and slow speed.Therefore,the current management of personnel behavior mainly relies on institutional constraints,education and training,on-site supervision,etc.,which is time-consuming and ineffective.Given the above situation,this paper proposes an improved You Only Look Once version 7(YOLOv7)to achieve the purpose of quickly detecting irregular behaviors of laboratory personnel while ensuring high detection accuracy.First,to better capture the shape features of the target,deformable convolutional networks(DCN)is used in the backbone part of the model to replace the traditional convolution to improve the detection accuracy and speed.Second,to enhance the extraction of important features and suppress useless features,this paper proposes a new convolutional block attention module_efficient channel attention(CBAM_E)for embedding the neck network to improve the model’s ability to extract features from complex scenes.Finally,to reduce the influence of angle factor and bounding box regression accuracy,this paper proposes a newα-SCYLLA intersection over union(α-SIoU)instead of the complete intersection over union(CIoU),which improves the regression accuracy while increasing the convergence speed.Comparison experiments on public and homemade datasets show that the improved algorithm outperforms the original algorithm in all evaluation indexes,with an increase of 2.92%in the precision rate,4.14%in the recall rate,0.0356 in the weighted harmonic mean,3.60%in the mAP@0.5 value,and a reduction in the number of parameters and complexity.Compared with the mainstream algorithm,the improved algorithm has higher detection accuracy,faster convergence speed,and better actual recognition effect,indicating the effectiveness of the improved algorithm in this paper and its potential for practical application in laboratory scenarios.
文摘In the age of smart technology,the widespread use of small LCD(Liquid Crystal Display)necessitates pre-market defect detection to ensure quality and reduce the incidence of defective products.Manual inspection is both time-consuming and labor-intensive.Existing methods struggle with accurately detecting small targets,such as point defects,and handling defects with significant scale variations,such as line defects,especially in complex background conditions.To address these challenges,this paper presents the YOLO-DEI(Deep Enhancement Information)model,which integrates DCNv2(Deformable convolution)into the backbone network to enhance feature extraction under geometric transformations.The model also includes the CEG(Contextual Enhancement Group)module to optimize feature aggregation during extraction,improving performance without increasing computational load.Furthermore,our proposed IGF(Information Guide Fusion)module refines feature fusion in the neck network,preserving both spatial and channel information.Experimental results indicate that the YOLO-DEI model increases precision by 2.9%,recall by 13.3%,and mean Average Precision(mAP50)by 12.9%,all while maintaining comparable parameter counts and computational costs.These significant improvements in defect detection performance highlight the model’s potential for practical applications in ensuring the quality of LCD.
文摘Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.
基金supported in part by the National Natural Science Foundation of China under Grant 61972267the National Natural Science Foundation of Hebei Province under Grant F2018210148the University Science Research Project of Hebei Province under Grant ZD2021334.
文摘Retinal vessel segmentation in fundus images plays an essential role in the screening,diagnosis,and treatment of many diseases.The acquired fundus images generally have the following problems:uneven illumination,high noise,and complex structure.It makes vessel segmentation very challenging.Previous methods of retinal vascular segmentation mainly use convolutional neural networks on U Network(U-Net)models,and they have many limitations and shortcomings,such as the loss of microvascular details at the end of the vessels.We address the limitations of convolution by introducing the transformer into retinal vessel segmentation.Therefore,we propose a hybrid method for retinal vessel segmentation based on modulated deformable convolution and the transformer,named DT-Net.Firstly,multi-scale image features are extracted by deformable convolution and multi-head selfattention(MHSA).Secondly,image information is recovered,and vessel morphology is refined by the proposed transformer decoder block.Finally,the local prediction results are obtained by the side output layer.The accuracy of the vessel segmentation is improved by the hybrid loss function.Experimental results show that our method obtains good segmentation performance on Specificity(SP),Sensitivity(SE),Accuracy(ACC),Curve(AUC),and F1-score on three publicly available fundus datasets such as DRIVE,STARE,and CHASE_DB1.
文摘In order to solve the shortcomings of current fatigue detection methods such as low accuracy or poor real-time performance,a fatigue detection method based on multi-feature fusion is proposed.Firstly,the HOG face detection algorithm and KCF target tracking algorithm are integrated and deformable convolutional neural network is introduced to identify the state of extracted eyes and mouth,fast track the detected faces and extract continuous and stable target faces for more efficient extraction.Then the head pose algorithm is introduced to detect the driver’s head in real time and obtain the driver’s head state information.Finally,a multi-feature fusion fatigue detection method is proposed based on the state of the eyes,mouth and head.According to the experimental results,the proposed method can detect the driver’s fatigue state in real time with high accuracy and good robustness compared with the current fatigue detection algorithms.
文摘Document images often contain various page components and complex logical structures,which make document layout analysis task challenging.For most deep learning-based document layout analysis methods,convolutional neural networks(CNNs)are adopted as the feature extraction networks.In this paper,a hybrid spatial-channel attention network(HSCA-Net)is proposed to improve feature extraction capability by introducing attention mechanism to explore more salient properties within document pages.The HSCA-Net consists of spatial attention module(SAM),channel attention module(CAM),and designed lateral attention connection.CAM adaptively adjusts channel feature responses by emphasizing selective information,which depends on the contribution of the features of each channel.SAM guides CNNs to focus on the informative contents and capture global context information among page objects.The lateral attention connection incorporates SAM and CAM into multiscale feature pyramid network,and thus retains original feature information.The effectiveness and adaptability of HSCA-Net are evaluated through multiple experiments on publicly available datasets such as PubLayNet,ICDAR-POD,and Article Regions.Experimental results demonstrate that HSCA-Net achieves state-of-the-art performance on document layout analysis task.
基金The work is supported by National key research and development plan of China(2016YFC0801408)the Graduate Science and Technology Innovation Project of Shandong University of Science and Technology(SDKDYC180344).
文摘This DC-YOLO Model was designed in order to improve the efficiency for appraising dangerous class of buildings and avoid manual intervention,thereby making the appraisal results more objective.It is an automated method designed based on deep learning and target detection algorithms to appraise the dangerous class of building masonry component.Specifically,it(1)adopted K-means clustering to obtain the quantity and size of the prior boxes;(2)expanded the grid size to improve identification to small targets;(3)introduced in deformable convolution to adapt to the irregular shape of the masonry component cracks.The experimental results show that,comparing with the conventional method,the DC-YOLO model has better recognition rates for various targets to different extents,and achieves good effects in precision,recall rate and F1 value,which indicates the good performance in classifying dangerous classes of building masonry component.
基金Supported by Open Project of the Ministry of Industry and Information Technology Key Laboratory of Performance and Reliability Testing and Evaluation for Basic Software and Hardware。
文摘Background Recurrent recovery is a common method for video super-resolution(VSR)that models the correlation between frames via hidden states.However,the application of this structure in real-world scenarios can lead to unsatisfactory artifacts.We found that in real-world VSR training,the use of unknown and complex degradation can better simulate the degradation process in the real world.Methods Based on this,we propose the RealFuVSR model,which simulates real-world degradation and mitigates artifacts caused by the VSR.Specifically,we propose a multiscale feature extraction module(MSF)module that extracts and fuses features from multiple scales,thereby facilitating the elimination of hidden state artifacts.To improve the accuracy of the hidden state alignment information,RealFuVSR uses an advanced optical flow-guided deformable convolution.Moreover,a cascaded residual upsampling module was used to eliminate noise caused by the upsampling process.Results The experiment demonstrates that RealFuVSR model can not only recover high-quality videos but also outperforms the state-of-the-art RealBasicVSR and RealESRGAN models.
文摘Space-time video super-resolution(STVSR)serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts.Recent approaches utilize end-to-end deep learning models to achieve STVSR.They first interpolate intermediate frame features between given frames,then perform local and global refinement among the feature sequence,and finally increase the spatial resolutions of these features.However,in the most important feature interpolation phase,they only capture spatial-temporal information from the most adjacent frame features,ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity.In this paper,we propose a novel long-term temporal feature aggregation network(LTFA-Net)for STVSR.Specifically,we design a long-term mixture of experts(LTMoE)module for feature interpolation.LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features,which are then combined with different weights to obtain interpolation results using several gating nets.Next,we perform local and global feature refinement using the Locally-temporal Feature Comparison(LFC)module and bidirectional deformable ConvLSTM layer,respectively.Experimental results on two standard benchmarks,Adobe240 and GoPro,indicate the effectiveness and superiority of our approach over state of the art.
基金supported by the Natural Science Foundation of Shandong Province(Nos.ZR2023LZH017 and ZR2024MF066)the Natural Science Foundation of Hebei Province(No.F2022511001)+1 种基金the Key Funding from National Natural Science Foundation of China(No.92067206)the National Natural Science Foundation of China(No.62471493).
文摘The YOLOv5 algorithm is widely used in edge computing systems for object detection.However,the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices.To address this issue,we propose a smaller,less computationally intensive,and more accurate algorithm for object detection.Multi-scale Feature Fusion-YOLO(MFF-YOLO)is built on top of the YOLOv5s framework,but it contains substantial improvements to YOLOv5s.First,we design the MFF module to improve the feature propagation path in the feature pyramid,which further integrates the semantic information from different paths of feature layers.Then,a large convolution-kernel module is used in the bottleneck.The structure enlarges the receptive field and preserves shallow semantic information,which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks(FPN).In addition,a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity.The experimental results on PASCAL VOC and MS COCO datasets show that,compared with YOLOv5s,MFF-YOLO reduces the number of parameters by 7%and the number of FLoating point Operations Per second(FLOPs)by 11.8%.The mAP@0.5 has improved by 3.7%and 5.5%,and the mAP@0.5:0.95 has improved by 6.5%and 6.2%,respetively.Furthermore,compared with YOLOv7-tiny,PP-YOLO-tiny,and other mainstream methods,MFF-YOLO has achieved better results on multiple indicators.