Recently,there has been a widespread application of deep learning in object detection with Synthetic Aperture Radar(SAR).The current algorithms based on Convolutional Neural Networks(CNN)often achieve good accuracy at...Recently,there has been a widespread application of deep learning in object detection with Synthetic Aperture Radar(SAR).The current algorithms based on Convolutional Neural Networks(CNN)often achieve good accuracy at the expense of more complex model structures and huge parameters,which poses a great challenge for real-time and accurate detection of multi-scale targets.To address these problems,we propose a lightweight real-time SAR ship object detector based on detection transformer(LSD-DETR)in this study.First,a lightweight backbone network LCNet containing a stem module and inverted residual structure is constructed to balance the inference speed and detection accuracy of model.Second,we design a transformer encoder with Cascaded Group Attention(CGA Encoder)to enrich the feature information of small targets in SAR images,which makes detection of small-sized ships more precise.Third,an efficient cross-scale feature fusion pyramid module(C3Het-FPN)is proposed through the lightweight units(C3Het)and the introduction of the weighted bidirectional feature pyramid(BiFPN)structure,which realizes the adaptive fusion of multi-scale features with fewer parameters.Ablation experiments and comparative experiments demonstrate the effectiveness of LSD-DETR.The model parameter of LSD-DETR is 8.8 M(only 20.6%of DETR),the model’s FPS reached 43.1,the average detection accuracy mAP50 on the SSDD and HRSID datasets reached 97.3%and 93.4%.Compared to advanced methods,the LSD-DETR can attain superior precision with fewer parameters,which enables accurate real-time object detection of multi-scale ships in SAR images.展开更多
Rock classification plays a crucial role in various fields such as geology,engineering,and environmental studies.Employing deep learning AI(artificial intelligence)methods has a high potential to significantly improve...Rock classification plays a crucial role in various fields such as geology,engineering,and environmental studies.Employing deep learning AI(artificial intelligence)methods has a high potential to significantly improve the accuracy and efficiency of this task.The paper delves into the exploration of two cuttingedge AI techniques,namely Mask DINO and Mask R-CNN(convolutional neural network),as means to identify rock weathering grades and rock types.The results demonstrate that Mask DINO,which is a Detection Transformer(DETR),outperforms Mask R-CNN for the aforementioned purposes.Mask DINO achieved f-1 scores of 91% and 86% in weathering grade detection and rock type detection,as opposed to the Mask R-CNN's f-1 scores of 84% and 75%,respectively.These findings underscore the substantial potential of employing DETR algorithms like Mask DINO for automatic classification of both rock type and weathering states.Although the study examines only two AI models,the data processing and other techniques developed in this study may serve as a foundation for future advancements in the field.By incorporating these advanced AI techniques,logging personnel can obtain valuable references to aid their work,ultimately contributing to the advancement of geological and related fields.展开更多
Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India.The video obtained from such surveillance are of low quality.Still counting vehicles from such videos are necess...Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India.The video obtained from such surveillance are of low quality.Still counting vehicles from such videos are necessity to avoid traf-fic congestion and allows drivers to plan their routes more precisely.On the other hand,detecting vehicles from such low quality videos are highly challenging with vision based methodologies.In this research a meticulous attempt is made to access low-quality videos to describe traffic in Salem town in India,which is mostly an un-attempted entity by most available sources.In this work profound Detection Transformer(DETR)model is used for object(vehicle)detection.Here vehicles are anticipated in a rush-hour traffic video using a set of loss functions that carry out bipartite coordinating among estimated and information acquired on real attributes.Every frame in the traffic footage has its date and time which is detected and retrieved using Tesseract Optical Character Recognition.The date and time extricated and perceived from the input image are incorporated with the length of the recognized objects acquired from the DETR model.This furnishes the vehicles report with timestamp.Transformer Timeseries Prediction Model(TTPM)is proposed to predict the density of the vehicle for future prediction,here the regular NLP layers have been removed and the encoding temporal layer has been modified.The proposed TTPM error rate outperforms the existing models with RMSE of 4.313 and MAE of 3.812.展开更多
This paper focuses on improving the detection performance of spectrum sensing in cognitive radio(CR) networks under complicated electromagnetic environment. Some existing fast spectrum sensing algorithms cannot get sp...This paper focuses on improving the detection performance of spectrum sensing in cognitive radio(CR) networks under complicated electromagnetic environment. Some existing fast spectrum sensing algorithms cannot get specific features of the licensed users'(LUs') signal, thus they cannot be applied in this situation without knowing the power of noise. On the other hand some algorithms that yield specific features are too complicated. In this paper, an algorithm based on the cyclostationary feature detection and theory of Hilbert transformation is proposed. Comparing with the conventional cyclostationary feature detection algorithm, this approach is more flexible i.e. it can flexibly change the computational complexity according to current electromagnetic environment by changing its sampling times and the step size of cyclic frequency. Results of simulation indicate that this approach can flexibly detect the feature of received signal and provide satisfactory detection performance compared to existing approaches in low Signal-to-noise Ratio(SNR) situations.展开更多
Fingerprint authentication system is used to verify users' identification according to the characteristics of their fingerprints.However,this system has some security and privacy problems.For example,some artifici...Fingerprint authentication system is used to verify users' identification according to the characteristics of their fingerprints.However,this system has some security and privacy problems.For example,some artificial fingerprints can trick the fingerprint authentication system and access information using real users' identification.Therefore,a fingerprint liveness detection algorithm needs to be designed to prevent illegal users from accessing privacy information.In this paper,a new software-based liveness detection approach using multi-scale local phase quantity(LPQ) and principal component analysis(PCA) is proposed.The feature vectors of a fingerprint are constructed through multi-scale LPQ.PCA technology is also introduced to reduce the dimensionality of the feature vectors and gain more effective features.Finally,a training model is gained using support vector machine classifier,and the liveness of a fingerprint is detected on the basis of the training model.Experimental results demonstrate that our proposed method can detect the liveness of users' fingerprints and achieve high recognition accuracy.This study also confirms that multi-resolution analysis is a useful method for texture feature extraction during fingerprint liveness detection.展开更多
Since the coal mine in-pit personnel positioning system neither can effectively achieve the function to detect the uniqueness of in-pit coal-mine personnel nor can identify and eliminate violations in attendance manag...Since the coal mine in-pit personnel positioning system neither can effectively achieve the function to detect the uniqueness of in-pit coal-mine personnel nor can identify and eliminate violations in attendance management such as multiple cards for one person, and swiping one's cards by others in China at present. Therefore, the research introduces a uniqueness detection system and method for in-pit coal-mine personnel integrated into the in-pit coal mine personnel positioning system, establishing a system mode based on face recognition + recognition of personnel positioning card + release by automatic detection. Aiming at the facts that the in-pit personnel are wearing helmets and faces are prone to be stained during the face recognition, the study proposes the ideas that pre-process face images using the 2D-wavelet-transformation-based Mallat algorithm and extracts three face features: miner light, eyes and mouths, using the generalized symmetry transformation-based algorithm. This research carried out test with 40 clean face images with no helmets and 40 lightly-stained face images, and then compared with results with the one using the face feature extraction method based on grey-scale transformation and edge detection. The results show that the method described in the paper can detect accurately face features in the above-mentioned two cases, and the accuracy to detect face features is 97.5% in the case of wearing helmets and lightly-stained faces.展开更多
We present in this paper an implementation of a multiscale edges detection algorithm on multiprocessor using SYnDEx which is a programming environment to generate optimized distributed real-time executives. The implem...We present in this paper an implementation of a multiscale edges detection algorithm on multiprocessor using SYnDEx which is a programming environment to generate optimized distributed real-time executives. The implementation has been done on three TMS320C40 and the acceleration in comparison with one processor is 2.2.展开更多
To enhance the robustness and real-time performance of robotic arm visual servoing in complex environments such as space,this study proposes Di FA-DETR,a modified object detection framework based on the DETR architect...To enhance the robustness and real-time performance of robotic arm visual servoing in complex environments such as space,this study proposes Di FA-DETR,a modified object detection framework based on the DETR architecture.The proposed model incorporates an improved Res Net50 Bottleneck structure with spatial dimensionality reduction and sparse interaction mechanisms,alongside a redesigned self-attention module featuring downsampling optimization and adaptive feature enhancement.A custom-annotated satellite component dataset was constructed to train and evaluate the system.Experimental results demonstrate that Di FA-DETR achieves an AP50 of 79.9%,outperforming existing DETR variants while reducing computational complexity by 31.9%and nearly doubling the inference speed.The method was further validated in a ground-based visual servoing system using an industrial robotic arm and camera setup.The system successfully tracked satellite targets under dynamic motion scenarios,maintaining millimeter-level positioning accuracy.These results confirm the feasibility and effectiveness of the proposed method in supporting future space robotic applications requiring precision tracking and fast response.展开更多
End-to-end object detection Transformer(DETR)successfully established the paradigm of the Transformer architecture in the field of object detection.Its end-to-end detection process and the idea of set prediction have ...End-to-end object detection Transformer(DETR)successfully established the paradigm of the Transformer architecture in the field of object detection.Its end-to-end detection process and the idea of set prediction have become one of the hottest network architectures in recent years.There has been an abundance of work improving upon DETR.However,DETR and its variants require a substantial amount of memory resources and computational costs,and the vast number of parameters in these networks is unfavorable for model deployment.To address this issue,a greedy pruning(GP)algorithm is proposed,applied to a variant denoising-DETR(DN-DETR),which can eliminate redundant parameters in the Transformer architecture of DN-DETR.Considering the different roles of the multi-head attention(MHA)module and the feed-forward network(FFN)module in the Transformer architecture,a modular greedy pruning(MGP)algorithm is proposed.This algorithm separates the two modules and applies their respective optimal strategies and parameters.The effectiveness of the proposed algorithm is validated on the COCO 2017 dataset.The model obtained through the MGP algorithm reduces the parameters by 49%and the number of floating point operations(FLOPs)by 44%compared to the Transformer architecture of DN-DETR.At the same time,the mean average precision(mAP)of the model increases from 44.1%to 45.3%.展开更多
The current casting surface defect detection algorithms suffer from poor small target defect recognition and imbalance between detection performance and detection time.An improved algorithmic framework for casting def...The current casting surface defect detection algorithms suffer from poor small target defect recognition and imbalance between detection performance and detection time.An improved algorithmic framework for casting defect detection was proposed based on the DEtection TRansformer(DETR)algorithm.The algorithm takes ResNet with an efficient channel attention(ECA)-Net module as the backbone network.In addition,based on the original algorithm architecture,dynamic anchor boxes,improved multi-scale deformable attention module,and SIoU loss function are introduced to improve the sensitivity of transformer structure to input location information and scale size,and the small target defect detection performance is effectively improved.The recognition performance of the algorithm in a self-built casting defect dataset was studied.The improved DETR algorithm has 97.561% accuracy in recognizing two defects,namely sandinclusion and notch,with the detection rate being improved by 65.854% and 17.073% compared with the original DETR and you only look once(Yolo)-V5,respectively.This algorithm verifies the applicability of the transformer architecture target detection algorithm for casting defect detection tasks and provides new ideas for detecting other similar application scenarios.展开更多
The precise and timely extraction of railway signals is crucial for the creation of railway electronic maps.This paper introduces a novel real-time detection approach for dynamically adjusting railway signals,leveragi...The precise and timely extraction of railway signals is crucial for the creation of railway electronic maps.This paper introduces a novel real-time detection approach for dynamically adjusting railway signals,leveraging an enhanced Real-Time DEtection TRansformer(RT-DETR)model.The enhancement involves the integration of a vision Transformer with Dynamically Quantifiable Sampling Attention Mechanism(DQSAM)into the ResNet50 backbone of the RT-DETR framework,thereby enhancing the model’s efficiency and accuracy in handling intricate visual tasks.Secondly,an ultra-lightweight and effective Dynamic Grouping upSampler(DyGSample)is inserted into the efficient hybrid encode module as the up-sampling part.This operator can effectively upsample the feature graph without increasing the computational burden,and improve the model resolution and detail capture ability.In addition,in order to solve the problem of deep layer of model network and high operating cost,a new bounding box similarity loss function of rotation intersection over union based on minimum point distance is adopted in this paper,which takes into account all relevant factors of existing loss functions,namely overlapping or non-overlapping regions,center point distance,width and height deviation,and simplifies the calculation process.As a lightweight signal detection model with ultra-fast,high real-time,and high precision,the detection accuracy of this method is improved from 90.21%to 97.45%,which proves the superior performance and effectiveness of the improved real-time dynamic adjustment RT-DETR model in railway signal extraction.展开更多
Shape matching plays an important role in various computer vision and graphics applications such as shape retrieval, object detection, image editing,image retrieval, etc. However, detecting shapes in cluttered images ...Shape matching plays an important role in various computer vision and graphics applications such as shape retrieval, object detection, image editing,image retrieval, etc. However, detecting shapes in cluttered images is still quite challenging due to the incomplete edges and changing perspective. In this paper, we propose a novel approach that can efficiently identify a queried shape in a cluttered image. The core idea is to acquire the transformation from the queried shape to the cluttered image by summarising all pointto-point transformations between the queried shape and the image. To do so, we adopt a point-based shape descriptor, the pyramid of arc-length descriptor(PAD),to identify point pairs between the queried shape and the image having similar local shapes. We further calculate the transformations between the identified point pairs based on PAD. Finally, we summarise all transformations in a 4 D transformation histogram and search for the main cluster. Our method can handle both closed shapes and open curves, and is resistant to partial occlusions. Experiments show that our method can robustly detect shapes in images in the presence of partial occlusions, fragile edges, and cluttered backgrounds.展开更多
A concise fractional Fourier transform (CFRFT) is proposed to detect the linear frequency-modulated (LFM) signal with low signal to noise ratio (SNR). The frequency axis in time-frequency plane of the CFRFT is r...A concise fractional Fourier transform (CFRFT) is proposed to detect the linear frequency-modulated (LFM) signal with low signal to noise ratio (SNR). The frequency axis in time-frequency plane of the CFRFT is rotated to get the spectrum of the signal in different an- gles using chirp multiplication and Fourier transform (FT). For LFM signal which distributes as a straight line in time-frequency plane, the CFRFT can gather the energy in the corresponding angle as a peak and improve the detection SNR, thus the LFM signal of low SNR can be de- tected. Meanwhile, the location of the peak value relates to the parameters of the LFM signal. Numerical simulations and experimental results show that, the proposed method can be used to efficiently detect the LFM signal masked by noise and to estimate the signal's parameters accurately. Compared with the conventional fractional Fourier transform (FRFT), the CFRFT reduces the transform complexity and improves the real-time detection performance of LFM signal.展开更多
Real-time and accurate traffic light status recognition can provide reliable data support for autonomous vehicle decision-making and control systems.To address potential problems such as the minor component of traffic...Real-time and accurate traffic light status recognition can provide reliable data support for autonomous vehicle decision-making and control systems.To address potential problems such as the minor component of traffic lights in the perceptual domain of visual sensors and the complexity of recognition scenarios,we propose an end-to-end traffic light status recognition method,ResNeSt50-CBAM-DINO(RC-DINO).First,we performed data cleaning on the Tsinghua-Tencent traffic lights(TTTL)and fused it with the Shanghai Jiao Tong University’s traffic light dataset(S2TLD)to form a Chinese urban traffic light dataset(CUTLD).Second,we combined residual network with split-attention module-50(ResNeSt50)and the convolutional block attention module(CBAM)to extract more significant traffic light features.Finally,the proposed RC-DINO and mainstream recognition algorithms were trained and analyzed using CUTLD.The experimental results show that,compared to the original DINO,RC-DINO improved the average precision(AP),AP at intersection over union(IOU)=0.5(AP50),AP for small objects(APs),average recall(AR),and balanced F score(F1-Score)by 3.1%,1.6%,3.4%,0.9%,and 0.9%,respectively,and had a certain capability to recognize the partially covered traffic light status.The above results indicate that the proposed RC-DINO improved recognition performance and robustness,making it more suitable for traffic light status recognition tasks.展开更多
基金National Nature Science Foundation of China(No.U24A20589)National Key Research and Development Program of China(No.2023YFB3905504)+1 种基金Innovation Team of the Ministry of Education of China(No.8091B042227)Innovation Group of Sichuan Natural Science Foundation(No.2023NSFSC1974).
文摘Recently,there has been a widespread application of deep learning in object detection with Synthetic Aperture Radar(SAR).The current algorithms based on Convolutional Neural Networks(CNN)often achieve good accuracy at the expense of more complex model structures and huge parameters,which poses a great challenge for real-time and accurate detection of multi-scale targets.To address these problems,we propose a lightweight real-time SAR ship object detector based on detection transformer(LSD-DETR)in this study.First,a lightweight backbone network LCNet containing a stem module and inverted residual structure is constructed to balance the inference speed and detection accuracy of model.Second,we design a transformer encoder with Cascaded Group Attention(CGA Encoder)to enrich the feature information of small targets in SAR images,which makes detection of small-sized ships more precise.Third,an efficient cross-scale feature fusion pyramid module(C3Het-FPN)is proposed through the lightweight units(C3Het)and the introduction of the weighted bidirectional feature pyramid(BiFPN)structure,which realizes the adaptive fusion of multi-scale features with fewer parameters.Ablation experiments and comparative experiments demonstrate the effectiveness of LSD-DETR.The model parameter of LSD-DETR is 8.8 M(only 20.6%of DETR),the model’s FPS reached 43.1,the average detection accuracy mAP50 on the SSDD and HRSID datasets reached 97.3%and 93.4%.Compared to advanced methods,the LSD-DETR can attain superior precision with fewer parameters,which enables accurate real-time object detection of multi-scale ships in SAR images.
基金supported by the Construction Industry Council(Grant No.CICR/01/22)the support from the General Research Fund(Grant No.17206822)of the Research Grants Council(Hong Kong).
文摘Rock classification plays a crucial role in various fields such as geology,engineering,and environmental studies.Employing deep learning AI(artificial intelligence)methods has a high potential to significantly improve the accuracy and efficiency of this task.The paper delves into the exploration of two cuttingedge AI techniques,namely Mask DINO and Mask R-CNN(convolutional neural network),as means to identify rock weathering grades and rock types.The results demonstrate that Mask DINO,which is a Detection Transformer(DETR),outperforms Mask R-CNN for the aforementioned purposes.Mask DINO achieved f-1 scores of 91% and 86% in weathering grade detection and rock type detection,as opposed to the Mask R-CNN's f-1 scores of 84% and 75%,respectively.These findings underscore the substantial potential of employing DETR algorithms like Mask DINO for automatic classification of both rock type and weathering states.Although the study examines only two AI models,the data processing and other techniques developed in this study may serve as a foundation for future advancements in the field.By incorporating these advanced AI techniques,logging personnel can obtain valuable references to aid their work,ultimately contributing to the advancement of geological and related fields.
文摘Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India.The video obtained from such surveillance are of low quality.Still counting vehicles from such videos are necessity to avoid traf-fic congestion and allows drivers to plan their routes more precisely.On the other hand,detecting vehicles from such low quality videos are highly challenging with vision based methodologies.In this research a meticulous attempt is made to access low-quality videos to describe traffic in Salem town in India,which is mostly an un-attempted entity by most available sources.In this work profound Detection Transformer(DETR)model is used for object(vehicle)detection.Here vehicles are anticipated in a rush-hour traffic video using a set of loss functions that carry out bipartite coordinating among estimated and information acquired on real attributes.Every frame in the traffic footage has its date and time which is detected and retrieved using Tesseract Optical Character Recognition.The date and time extricated and perceived from the input image are incorporated with the length of the recognized objects acquired from the DETR model.This furnishes the vehicles report with timestamp.Transformer Timeseries Prediction Model(TTPM)is proposed to predict the density of the vehicle for future prediction,here the regular NLP layers have been removed and the encoding temporal layer has been modified.The proposed TTPM error rate outperforms the existing models with RMSE of 4.313 and MAE of 3.812.
基金sponsored by National Basic Research Program of China (973 Program, No. 2013CB329003)National Natural Science Foundation of China (No. 91438205)+1 种基金China Postdoctoral Science Foundation (No. 2011M500664)Open Research fund Program of Key Lab. for Spacecraft TT&C and Communication, Ministry of Education, China (No.CTTC-FX201305)
文摘This paper focuses on improving the detection performance of spectrum sensing in cognitive radio(CR) networks under complicated electromagnetic environment. Some existing fast spectrum sensing algorithms cannot get specific features of the licensed users'(LUs') signal, thus they cannot be applied in this situation without knowing the power of noise. On the other hand some algorithms that yield specific features are too complicated. In this paper, an algorithm based on the cyclostationary feature detection and theory of Hilbert transformation is proposed. Comparing with the conventional cyclostationary feature detection algorithm, this approach is more flexible i.e. it can flexibly change the computational complexity according to current electromagnetic environment by changing its sampling times and the step size of cyclic frequency. Results of simulation indicate that this approach can flexibly detect the feature of received signal and provide satisfactory detection performance compared to existing approaches in low Signal-to-noise Ratio(SNR) situations.
基金supported by the NSFC (U1536206,61232016,U1405254,61373133, 61502242)BK20150925the PAPD fund
文摘Fingerprint authentication system is used to verify users' identification according to the characteristics of their fingerprints.However,this system has some security and privacy problems.For example,some artificial fingerprints can trick the fingerprint authentication system and access information using real users' identification.Therefore,a fingerprint liveness detection algorithm needs to be designed to prevent illegal users from accessing privacy information.In this paper,a new software-based liveness detection approach using multi-scale local phase quantity(LPQ) and principal component analysis(PCA) is proposed.The feature vectors of a fingerprint are constructed through multi-scale LPQ.PCA technology is also introduced to reduce the dimensionality of the feature vectors and gain more effective features.Finally,a training model is gained using support vector machine classifier,and the liveness of a fingerprint is detected on the basis of the training model.Experimental results demonstrate that our proposed method can detect the liveness of users' fingerprints and achieve high recognition accuracy.This study also confirms that multi-resolution analysis is a useful method for texture feature extraction during fingerprint liveness detection.
基金financial supports from the National Natural Science Foundation of China (No. 51134024)the National High Technology Research and Development Program of China (No. 2012AA062203)are gratefully acknowledged
文摘Since the coal mine in-pit personnel positioning system neither can effectively achieve the function to detect the uniqueness of in-pit coal-mine personnel nor can identify and eliminate violations in attendance management such as multiple cards for one person, and swiping one's cards by others in China at present. Therefore, the research introduces a uniqueness detection system and method for in-pit coal-mine personnel integrated into the in-pit coal mine personnel positioning system, establishing a system mode based on face recognition + recognition of personnel positioning card + release by automatic detection. Aiming at the facts that the in-pit personnel are wearing helmets and faces are prone to be stained during the face recognition, the study proposes the ideas that pre-process face images using the 2D-wavelet-transformation-based Mallat algorithm and extracts three face features: miner light, eyes and mouths, using the generalized symmetry transformation-based algorithm. This research carried out test with 40 clean face images with no helmets and 40 lightly-stained face images, and then compared with results with the one using the face feature extraction method based on grey-scale transformation and edge detection. The results show that the method described in the paper can detect accurately face features in the above-mentioned two cases, and the accuracy to detect face features is 97.5% in the case of wearing helmets and lightly-stained faces.
文摘We present in this paper an implementation of a multiscale edges detection algorithm on multiprocessor using SYnDEx which is a programming environment to generate optimized distributed real-time executives. The implementation has been done on three TMS320C40 and the acceleration in comparison with one processor is 2.2.
基金supported by the Foundation of National Key Laboratory of Human Factors Engineering,China(No.HFNKL2023 WWO5)。
文摘To enhance the robustness and real-time performance of robotic arm visual servoing in complex environments such as space,this study proposes Di FA-DETR,a modified object detection framework based on the DETR architecture.The proposed model incorporates an improved Res Net50 Bottleneck structure with spatial dimensionality reduction and sparse interaction mechanisms,alongside a redesigned self-attention module featuring downsampling optimization and adaptive feature enhancement.A custom-annotated satellite component dataset was constructed to train and evaluate the system.Experimental results demonstrate that Di FA-DETR achieves an AP50 of 79.9%,outperforming existing DETR variants while reducing computational complexity by 31.9%and nearly doubling the inference speed.The method was further validated in a ground-based visual servoing system using an industrial robotic arm and camera setup.The system successfully tracked satellite targets under dynamic motion scenarios,maintaining millimeter-level positioning accuracy.These results confirm the feasibility and effectiveness of the proposed method in supporting future space robotic applications requiring precision tracking and fast response.
基金Shanghai Municipal Commission of Economy and Information Technology,China(No.202301054)。
文摘End-to-end object detection Transformer(DETR)successfully established the paradigm of the Transformer architecture in the field of object detection.Its end-to-end detection process and the idea of set prediction have become one of the hottest network architectures in recent years.There has been an abundance of work improving upon DETR.However,DETR and its variants require a substantial amount of memory resources and computational costs,and the vast number of parameters in these networks is unfavorable for model deployment.To address this issue,a greedy pruning(GP)algorithm is proposed,applied to a variant denoising-DETR(DN-DETR),which can eliminate redundant parameters in the Transformer architecture of DN-DETR.Considering the different roles of the multi-head attention(MHA)module and the feed-forward network(FFN)module in the Transformer architecture,a modular greedy pruning(MGP)algorithm is proposed.This algorithm separates the two modules and applies their respective optimal strategies and parameters.The effectiveness of the proposed algorithm is validated on the COCO 2017 dataset.The model obtained through the MGP algorithm reduces the parameters by 49%and the number of floating point operations(FLOPs)by 44%compared to the Transformer architecture of DN-DETR.At the same time,the mean average precision(mAP)of the model increases from 44.1%to 45.3%.
基金the support of National Natural Science Foundation of China(No.51405002)Anhui Provincial Natural Science Foundation(No.2108085ME173)+2 种基金open funds from Anhui Province Key Laboratory of Metallurgical Engineering&Resources Recycling(No.SKF20-05)Opening Project of Engineering Technology Research Center of Anhui Education Department for Energy Saving and Pollutant Control in metallurgical processOpening Project of Anhui Engineering Laboratory for Intelligent Applications and Security of Industrial Internet(Grant No.IASII21-03)for financial support.
文摘The current casting surface defect detection algorithms suffer from poor small target defect recognition and imbalance between detection performance and detection time.An improved algorithmic framework for casting defect detection was proposed based on the DEtection TRansformer(DETR)algorithm.The algorithm takes ResNet with an efficient channel attention(ECA)-Net module as the backbone network.In addition,based on the original algorithm architecture,dynamic anchor boxes,improved multi-scale deformable attention module,and SIoU loss function are introduced to improve the sensitivity of transformer structure to input location information and scale size,and the small target defect detection performance is effectively improved.The recognition performance of the algorithm in a self-built casting defect dataset was studied.The improved DETR algorithm has 97.561% accuracy in recognizing two defects,namely sandinclusion and notch,with the detection rate being improved by 65.854% and 17.073% compared with the original DETR and you only look once(Yolo)-V5,respectively.This algorithm verifies the applicability of the transformer architecture target detection algorithm for casting defect detection tasks and provides new ideas for detecting other similar application scenarios.
基金supported by the Hunan Provincial Natural Science Foundation of China(Nos.2025JJ70018 and 2025JJ70057)the Hunan Provincial Key Research and Development Program(No.2024JK2065).
文摘The precise and timely extraction of railway signals is crucial for the creation of railway electronic maps.This paper introduces a novel real-time detection approach for dynamically adjusting railway signals,leveraging an enhanced Real-Time DEtection TRansformer(RT-DETR)model.The enhancement involves the integration of a vision Transformer with Dynamically Quantifiable Sampling Attention Mechanism(DQSAM)into the ResNet50 backbone of the RT-DETR framework,thereby enhancing the model’s efficiency and accuracy in handling intricate visual tasks.Secondly,an ultra-lightweight and effective Dynamic Grouping upSampler(DyGSample)is inserted into the efficient hybrid encode module as the up-sampling part.This operator can effectively upsample the feature graph without increasing the computational burden,and improve the model resolution and detail capture ability.In addition,in order to solve the problem of deep layer of model network and high operating cost,a new bounding box similarity loss function of rotation intersection over union based on minimum point distance is adopted in this paper,which takes into account all relevant factors of existing loss functions,namely overlapping or non-overlapping regions,center point distance,width and height deviation,and simplifies the calculation process.As a lightweight signal detection model with ultra-fast,high real-time,and high precision,the detection accuracy of this method is improved from 90.21%to 97.45%,which proves the superior performance and effectiveness of the improved real-time dynamic adjustment RT-DETR model in railway signal extraction.
基金supported by the Research Grants Council of the Hong Kong Special Administrative Region,under the RGC General Research Fund(Project No.CUHK 14217516)
文摘Shape matching plays an important role in various computer vision and graphics applications such as shape retrieval, object detection, image editing,image retrieval, etc. However, detecting shapes in cluttered images is still quite challenging due to the incomplete edges and changing perspective. In this paper, we propose a novel approach that can efficiently identify a queried shape in a cluttered image. The core idea is to acquire the transformation from the queried shape to the cluttered image by summarising all pointto-point transformations between the queried shape and the image. To do so, we adopt a point-based shape descriptor, the pyramid of arc-length descriptor(PAD),to identify point pairs between the queried shape and the image having similar local shapes. We further calculate the transformations between the identified point pairs based on PAD. Finally, we summarise all transformations in a 4 D transformation histogram and search for the main cluster. Our method can handle both closed shapes and open curves, and is resistant to partial occlusions. Experiments show that our method can robustly detect shapes in images in the presence of partial occlusions, fragile edges, and cluttered backgrounds.
基金supported by the National Natural Science Foundation of China(11434012)
文摘A concise fractional Fourier transform (CFRFT) is proposed to detect the linear frequency-modulated (LFM) signal with low signal to noise ratio (SNR). The frequency axis in time-frequency plane of the CFRFT is rotated to get the spectrum of the signal in different an- gles using chirp multiplication and Fourier transform (FT). For LFM signal which distributes as a straight line in time-frequency plane, the CFRFT can gather the energy in the corresponding angle as a peak and improve the detection SNR, thus the LFM signal of low SNR can be de- tected. Meanwhile, the location of the peak value relates to the parameters of the LFM signal. Numerical simulations and experimental results show that, the proposed method can be used to efficiently detect the LFM signal masked by noise and to estimate the signal's parameters accurately. Compared with the conventional fractional Fourier transform (FRFT), the CFRFT reduces the transform complexity and improves the real-time detection performance of LFM signal.
基金supported by the National Key R&D Program of China(2021YFB2501200)the Key Program of the National Natural Science Foundation of China(52131204)the Shaanxi Province Key Research and Development Program(2022GY-300).
文摘Real-time and accurate traffic light status recognition can provide reliable data support for autonomous vehicle decision-making and control systems.To address potential problems such as the minor component of traffic lights in the perceptual domain of visual sensors and the complexity of recognition scenarios,we propose an end-to-end traffic light status recognition method,ResNeSt50-CBAM-DINO(RC-DINO).First,we performed data cleaning on the Tsinghua-Tencent traffic lights(TTTL)and fused it with the Shanghai Jiao Tong University’s traffic light dataset(S2TLD)to form a Chinese urban traffic light dataset(CUTLD).Second,we combined residual network with split-attention module-50(ResNeSt50)and the convolutional block attention module(CBAM)to extract more significant traffic light features.Finally,the proposed RC-DINO and mainstream recognition algorithms were trained and analyzed using CUTLD.The experimental results show that,compared to the original DINO,RC-DINO improved the average precision(AP),AP at intersection over union(IOU)=0.5(AP50),AP for small objects(APs),average recall(AR),and balanced F score(F1-Score)by 3.1%,1.6%,3.4%,0.9%,and 0.9%,respectively,and had a certain capability to recognize the partially covered traffic light status.The above results indicate that the proposed RC-DINO improved recognition performance and robustness,making it more suitable for traffic light status recognition tasks.