针对道路交通环境中车辆和行人目标较小或被遮挡导致的检测精度低以及误检、漏检问题,提出道路目标检测算法RO-YOLOv9。增加小目标检测层,增强算法对小目标的特征学习能力。设计双向与自适应尺度融合特征金字塔网络(bidirectional and a...针对道路交通环境中车辆和行人目标较小或被遮挡导致的检测精度低以及误检、漏检问题,提出道路目标检测算法RO-YOLOv9。增加小目标检测层,增强算法对小目标的特征学习能力。设计双向与自适应尺度融合特征金字塔网络(bidirectional and adaptive scale fusion feature pyramid network,BiASF-FPN)结构,优化多尺度特征融合,保证算法有效捕捉从小尺度到大尺度目标的详细信息。提出OR-RepN4模块,通过重参数化策略,复杂算法结构简单化,提高推理速度。引用Shape-NWD(shape neighborhood weighted decomposition)损失函数,专注边界框形状与尺寸,采用归一化高斯Wasserstein距离平滑回归,实现跨尺度不变性,降低小尺度与遮挡目标的检测误差。实验结果表明,在优化后的SODA10M和BDD100K数据集下,RO-YOLOv9算法的mAP@0.5(mean average precision)分别达到68.1%和56.8%,比YLOLOv9算法提高5.6个百分点和4.4个百分点,并且检测帧率分别达到了55.3帧/s和54.2帧/s,达到检测精度和检测速度的平衡。展开更多
While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection ...While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network(TDBU-FPN),which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement suboriginal information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes(PASCAL VOC)dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.展开更多
基金supported by the Program of Introducing Talents of Discipline to Universities(111 Plan)of China(B14010)the National Natural Science Foundation of China(31727901)
文摘While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network(TDBU-FPN),which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement suboriginal information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes(PASCAL VOC)dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.