针对交通目标检测时物体尺度变化大,检测目标较为密集的问题,基于YOLO(you only look once)v8s提出一种高效多尺度YOLO(fast multiscale powerful-YOLO,FMP-YOLO)模型。在主干网络中,引入基于部分卷积(partial convolution,PConv)与快...针对交通目标检测时物体尺度变化大,检测目标较为密集的问题,基于YOLO(you only look once)v8s提出一种高效多尺度YOLO(fast multiscale powerful-YOLO,FMP-YOLO)模型。在主干网络中,引入基于部分卷积(partial convolution,PConv)与快速傅里叶卷积(fast Fourier convolution,FFC)设计的Faster Block模块,减少了冗余计算和内存访问,提高了推理速度,改善了感受野受限的问题;在聚合网络中,使用改进后的组混洗卷积(group shuffle convolution,GSConv)替换普通卷积,更好地捕获不同尺度的特征,进一步降低了模型的参数量和计算量;将Powerful-IoU与软非极大值抑制(soft non maximum suppression,SoftNMS)结合替换原有的非极大值抑制算法(non maximum suppression,NMS),改善了由参数量降低导致特征学习能力下降的问题,并且提高了模型的精度。在SODA10M和MS COCO数据集上进行实验,实验结果表明,改进后的模型性能超越原始模型,参数量和计算量下降40%左右,mAP分别提高了1.7%和1.4%,FMP-YOLO在体积与精度上优于其他经典模型,具有较强的实用性。展开更多
Referring expression comprehension(REC)aims to locate a specific region in an image described by a natural language.Existing two-stage methods generate multiple candidate proposals in the first stage,followed by selec...Referring expression comprehension(REC)aims to locate a specific region in an image described by a natural language.Existing two-stage methods generate multiple candidate proposals in the first stage,followed by selecting one of these proposals as the grounding result in the second stage.Nevertheless,the number of candidate proposals generated in the first stage significantly exceeds ground truth and the recall of critical objects is inadequate,thereby enormously limiting the overall network performance.To address the above issues,the authors propose an innovative method termed Separate Non-Maximum Suppression(Sep-NMS)for two-stage REC.Particularly,Sep-NMS models information from the two stages independently and collaboratively,ultimately achieving an overall improvement in comprehension and identification of the target objects.Specifically,the authors propose a Ref-Relatedness module for filtering referent proposals rigorously,decreasing the redundancy of referent proposals.A CLIP†Relatedness module based on robust multimodal pre-trained encoders is built to precisely assess the relevance between language and proposals to improve the recall of critical objects.It is worth mentioning that the authors are the pioneers in utilising a multimodal pre-training model for proposal filtering in the first stage.Moreover,an Information Fusion module is designed to effectively amalgamate the multimodal information across two stages,ensuring maximum uti-lisation of the available information.Extensive experiments demonstrate that the approach achieves competitive performance with previous state-of-the-art methods.The datasets used are publicly available:RefCOCO,RefCOCO+:https://doi.org/10.1007/978-3-319-46475-6_5 and RefCOCOg:https://doi.org/10.1109/CVPR.2016.9.展开更多
Unmanned aerial vehicle(UAV)photography has become the main power system inspection method;however,automated fault detection remains a major challenge.Conventional algorithms encounter difficulty in processing all the...Unmanned aerial vehicle(UAV)photography has become the main power system inspection method;however,automated fault detection remains a major challenge.Conventional algorithms encounter difficulty in processing all the detected objects in the power transmission lines simultaneously.The object detection method involving deep learning provides a new method for fault detection.However,the traditional non-maximum suppression(NMS)algorithm fails to delete redundant annotations when dealing with objects having two labels such as insulators and dampers.In this study,we propose an area-based non-maximum suppression(A-NMS)algorithm to solve the problem of one object having multiple labels.The A-NMS algorithm is used in the fusion stage of cropping detection to detect small objects.Experiments prove that A-NMS and cropping detection achieve a mean average precision and recall of 88.58%and 91.23%,respectively,in case of the aerial image datasets and realize multi-object fault detection in aerial images.展开更多
文摘针对交通目标检测时物体尺度变化大,检测目标较为密集的问题,基于YOLO(you only look once)v8s提出一种高效多尺度YOLO(fast multiscale powerful-YOLO,FMP-YOLO)模型。在主干网络中,引入基于部分卷积(partial convolution,PConv)与快速傅里叶卷积(fast Fourier convolution,FFC)设计的Faster Block模块,减少了冗余计算和内存访问,提高了推理速度,改善了感受野受限的问题;在聚合网络中,使用改进后的组混洗卷积(group shuffle convolution,GSConv)替换普通卷积,更好地捕获不同尺度的特征,进一步降低了模型的参数量和计算量;将Powerful-IoU与软非极大值抑制(soft non maximum suppression,SoftNMS)结合替换原有的非极大值抑制算法(non maximum suppression,NMS),改善了由参数量降低导致特征学习能力下降的问题,并且提高了模型的精度。在SODA10M和MS COCO数据集上进行实验,实验结果表明,改进后的模型性能超越原始模型,参数量和计算量下降40%左右,mAP分别提高了1.7%和1.4%,FMP-YOLO在体积与精度上优于其他经典模型,具有较强的实用性。
基金funded by the National Natural Science Foundation of China(No.62076032).
文摘Referring expression comprehension(REC)aims to locate a specific region in an image described by a natural language.Existing two-stage methods generate multiple candidate proposals in the first stage,followed by selecting one of these proposals as the grounding result in the second stage.Nevertheless,the number of candidate proposals generated in the first stage significantly exceeds ground truth and the recall of critical objects is inadequate,thereby enormously limiting the overall network performance.To address the above issues,the authors propose an innovative method termed Separate Non-Maximum Suppression(Sep-NMS)for two-stage REC.Particularly,Sep-NMS models information from the two stages independently and collaboratively,ultimately achieving an overall improvement in comprehension and identification of the target objects.Specifically,the authors propose a Ref-Relatedness module for filtering referent proposals rigorously,decreasing the redundancy of referent proposals.A CLIP†Relatedness module based on robust multimodal pre-trained encoders is built to precisely assess the relevance between language and proposals to improve the recall of critical objects.It is worth mentioning that the authors are the pioneers in utilising a multimodal pre-training model for proposal filtering in the first stage.Moreover,an Information Fusion module is designed to effectively amalgamate the multimodal information across two stages,ensuring maximum uti-lisation of the available information.Extensive experiments demonstrate that the approach achieves competitive performance with previous state-of-the-art methods.The datasets used are publicly available:RefCOCO,RefCOCO+:https://doi.org/10.1007/978-3-319-46475-6_5 and RefCOCOg:https://doi.org/10.1109/CVPR.2016.9.
基金the National Grid Corporation Headquarters Science and Technology Project:Key Technology Research,Equipment Development and Engineering Demonstration of Artificial Smart Drived Electric Vehicle Smart Travel Service(No.52020118000G).
文摘Unmanned aerial vehicle(UAV)photography has become the main power system inspection method;however,automated fault detection remains a major challenge.Conventional algorithms encounter difficulty in processing all the detected objects in the power transmission lines simultaneously.The object detection method involving deep learning provides a new method for fault detection.However,the traditional non-maximum suppression(NMS)algorithm fails to delete redundant annotations when dealing with objects having two labels such as insulators and dampers.In this study,we propose an area-based non-maximum suppression(A-NMS)algorithm to solve the problem of one object having multiple labels.The A-NMS algorithm is used in the fusion stage of cropping detection to detect small objects.Experiments prove that A-NMS and cropping detection achieve a mean average precision and recall of 88.58%and 91.23%,respectively,in case of the aerial image datasets and realize multi-object fault detection in aerial images.