With the rapid development of intelligent traffic information monitoring technology,accurate identification of vehicles,pedestrians and other objects on the road has become particularly important.Therefore,in order to...With the rapid development of intelligent traffic information monitoring technology,accurate identification of vehicles,pedestrians and other objects on the road has become particularly important.Therefore,in order to improve the recognition and classification accuracy of image objects in complex traffic scenes,this paper proposes a segmentation method of semantic redefine segmentation using image boundary region.First,we use the Seg Net semantic segmentation model to obtain the rough classification features of the vehicle road object,then use the simple linear iterative clustering(SLIC)algorithm to obtain the over segmented area of the image,which can determine the classification of each pixel in each super pixel area,and then optimize the target segmentation of the boundary and small areas in the vehicle road image.Finally,the edge recovery ability of condition random field(CRF)is used to refine the image boundary.The experimental results show that compared with FCN-8s and Seg Net,the pixel accuracy of the proposed algorithm in this paper improves by 2.33%and 0.57%,respectively.And compared with Unet,the algorithm in this paper performs better when dealing with multi-target segmentation.展开更多
Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providi...Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.展开更多
With the intelligent development of road traffic control and management,higher requirements for the accuracy and effectiveness of traffic data have been put forward.The issue of how to collect and integrate data for t...With the intelligent development of road traffic control and management,higher requirements for the accuracy and effectiveness of traffic data have been put forward.The issue of how to collect and integrate data for traffic scenes has sought importance in this field as various treatment technologies have emerged.A lot of research work have been carried out from the theoretical aspect to engineering application.展开更多
Segmentation of moving objects in a video sequence is a basic task for application of computer vision. However, shadows extracted along with the objects can result in large errors in object localization and recognitio...Segmentation of moving objects in a video sequence is a basic task for application of computer vision. However, shadows extracted along with the objects can result in large errors in object localization and recognition. In this paper, we propose a method of moving shadow detection based on edge information, which can effectively detect the cast shadow of a moving vehicle in a traffic scene. Having confirmed shadows existing in a figure, we execute the shadow removal algorithm proposed in this paper to segment the shadow from the foreground. The shadow eliminating algorithm removes the boundary of the cast shadow and preserves object edges firstly; secondly, it reconstructs coarse object shapes based on the edge information of objects; and finally, it extracts the cast shadow by subtracting the moving object from the change detection mask and performs further processing. The proposed method has been further tested on images taken under different shadow orientations, vehicle colors and vehicle sizes, and the results have revealed that shadows can be successfully eliminated and thus good video segmentation can be obtained.展开更多
为提高复杂交通场景下车辆目标检测模型的检测精度,以YOLOv8n(you only look once version 8 nano)为基准模型,设计具有复合主干的Neck-ARW(包括辅助检测分支、RepBlock模块、加权跳跃特征连接)颈部结构,减少信息瓶颈造成沿网络深度方...为提高复杂交通场景下车辆目标检测模型的检测精度,以YOLOv8n(you only look once version 8 nano)为基准模型,设计具有复合主干的Neck-ARW(包括辅助检测分支、RepBlock模块、加权跳跃特征连接)颈部结构,减少信息瓶颈造成沿网络深度方向的信息丢失;引入RepBlock结构重参数化模块,在训练过程中采用多分支结构提高模型特征提取性能;添加P2检测层捕捉更多小目标细节特征,丰富网络内小目标的特征信息流;采用Dynamic Head自注意力机制检测头,将尺度感知、空间感知和任务感知自注意力机制融合到统一框架中,提高检测性能;采用基于层自适应幅度的剪枝(layer-adaptive magnitude based pruning,LAMP)算法,移除模型的冗余参数,构建YOLO-NPDL(Neck-ARW,P2,Dynamic Head,LAMP)车辆目标检测模型。以UA-DETRAC(university at Albany detection and tracking)数据集为试验数据集,分别进行RepBlock模块嵌入位置试验、不同颈部结构对比试验、剪枝试验、消融试验、模型性能对比试验,验证YOLO-NPDL模型的平均精度均值。试验结果表明:RepBlock模块同时嵌入辅助检测分支和颈部主干结构时对多尺度目标的特征提取能力更优,在训练过程中可保留更多的细节信息,但参数量和计算量均增大;采用Neck-ARW颈部结构后模型的平均精度均值E mAP50、E mAP50-95分别提高1.1%、1.7%,参数量减小约17.9%,结构较优;剪枝率为1.3时,模型参数量、计算量分别减小约38.0%、24.0%,冗余通道占比较少,结构较紧凑;与YOLOv8n模型相比,YOLO-NPDL模型在参数量基本相同的基础上,召回率增大2.7%,E mAP50增大2.7%,达到94.7%,E mAP50-95增大6.4%,达到79.7%;与目前广泛使用的YOLO系列模型相比,YOLO-NPDL模型在较少参数量的基础上,检测精度较高。YOLO-NPDL模型在检测远端目标、雨天及夜景等实际复杂交通情景中无明显误检、漏检情况,可检测到更多的远端小目标车辆,检测效果更优。展开更多
尽管基于深度学习的目标检测在交通场景的应用已经取得了一定的进展,但复杂交通场景多目标精度与速度的博弈仍然是个挑战。大多数提升精度的方法都是参数密集型的,大大增加了模型的参数量。针对这一难题,提出了基于YOLOv8的稀疏参数模型...尽管基于深度学习的目标检测在交通场景的应用已经取得了一定的进展,但复杂交通场景多目标精度与速度的博弈仍然是个挑战。大多数提升精度的方法都是参数密集型的,大大增加了模型的参数量。针对这一难题,提出了基于YOLOv8的稀疏参数模型,实现在降低参数量的同时提升模型的召回率和检测精度。首先使用简单注意力机制(Simple Attention Mechanism,SimAM)以建立更强劲的骨干网络提取特征;其次提出轻量化的内容感知特征重组模块(Lightweight Content-Aware ReAssembly of Features,L-CARAFE)代替上采样操作,在一个更大的感受野上聚合上下文信息;最后通过稀疏参数的多解耦头,在降低参数量的同时提升模型的检测精度。由于交通场景的复杂性,不仅通过KITTI数据集验证模型的有效性,还通过COCO数据集验证模型的泛化性。该模型在公开的数据集上均能大幅提升召回率和平均精度(mean Average Precision,mAP),其中,nano在KITTI数据集上以2.95的参数量使召回率和mAP分别提高了3.1%和0.9%,小模型在COCO数据集上的mAP@0.5达到60.6%。展开更多
基金Supported in part by the Shaanxi Natural Science Basic Research Program(2022JM-298)the National Natural Science Foundation of China(52172324)+1 种基金Shaanxi Provincial Key Research and Development Program(2021SF-483)the Science and Technology Project of Shaan Provincal Transportation Department(21-202K,20-38T)。
文摘With the rapid development of intelligent traffic information monitoring technology,accurate identification of vehicles,pedestrians and other objects on the road has become particularly important.Therefore,in order to improve the recognition and classification accuracy of image objects in complex traffic scenes,this paper proposes a segmentation method of semantic redefine segmentation using image boundary region.First,we use the Seg Net semantic segmentation model to obtain the rough classification features of the vehicle road object,then use the simple linear iterative clustering(SLIC)algorithm to obtain the over segmented area of the image,which can determine the classification of each pixel in each super pixel area,and then optimize the target segmentation of the boundary and small areas in the vehicle road image.Finally,the edge recovery ability of condition random field(CRF)is used to refine the image boundary.The experimental results show that compared with FCN-8s and Seg Net,the pixel accuracy of the proposed algorithm in this paper improves by 2.33%and 0.57%,respectively.And compared with Unet,the algorithm in this paper performs better when dealing with multi-target segmentation.
基金funded by(i)Natural Science Foundation China(NSFC)under Grant Nos.61402397,61263043,61562093 and 61663046(ii)Open Foundation of Key Laboratory in Software Engineering of Yunnan Province:No.2020SE304.(iii)Practical Innovation Project of Yunnan University,Project Nos.2021z34,2021y128 and 2021y129.
文摘Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images,ensuring road safety while providing an important decision-making function for sustainable transportation.In order to provide a comprehensive and reasonable description of complex traffic scenes,a traffic scene semantic captioningmodel withmulti-stage feature enhancement is proposed in this paper.In general,the model follows an encoder-decoder structure.First,multilevel granularity visual features are used for feature enhancement during the encoding process,which enables the model to learn more detailed content in the traffic scene image.Second,the scene knowledge graph is applied to the decoding process,and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again,so that themodel can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions.This paper reports extensive experiments on the challenging MS-COCO dataset,evaluated by five standard automatic evaluation metrics,and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods,especially achieving a score of 129.0 on the CIDEr-D evaluation metric,which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.
文摘With the intelligent development of road traffic control and management,higher requirements for the accuracy and effectiveness of traffic data have been put forward.The issue of how to collect and integrate data for traffic scenes has sought importance in this field as various treatment technologies have emerged.A lot of research work have been carried out from the theoretical aspect to engineering application.
基金The work was supported by the National Natural Science Foundation of PRC (No.60574033)the National Key Fundamental Research & Development Programs(973)of PRC (No.2001CB309403)
文摘Segmentation of moving objects in a video sequence is a basic task for application of computer vision. However, shadows extracted along with the objects can result in large errors in object localization and recognition. In this paper, we propose a method of moving shadow detection based on edge information, which can effectively detect the cast shadow of a moving vehicle in a traffic scene. Having confirmed shadows existing in a figure, we execute the shadow removal algorithm proposed in this paper to segment the shadow from the foreground. The shadow eliminating algorithm removes the boundary of the cast shadow and preserves object edges firstly; secondly, it reconstructs coarse object shapes based on the edge information of objects; and finally, it extracts the cast shadow by subtracting the moving object from the change detection mask and performs further processing. The proposed method has been further tested on images taken under different shadow orientations, vehicle colors and vehicle sizes, and the results have revealed that shadows can be successfully eliminated and thus good video segmentation can be obtained.
文摘为提高复杂交通场景下车辆目标检测模型的检测精度,以YOLOv8n(you only look once version 8 nano)为基准模型,设计具有复合主干的Neck-ARW(包括辅助检测分支、RepBlock模块、加权跳跃特征连接)颈部结构,减少信息瓶颈造成沿网络深度方向的信息丢失;引入RepBlock结构重参数化模块,在训练过程中采用多分支结构提高模型特征提取性能;添加P2检测层捕捉更多小目标细节特征,丰富网络内小目标的特征信息流;采用Dynamic Head自注意力机制检测头,将尺度感知、空间感知和任务感知自注意力机制融合到统一框架中,提高检测性能;采用基于层自适应幅度的剪枝(layer-adaptive magnitude based pruning,LAMP)算法,移除模型的冗余参数,构建YOLO-NPDL(Neck-ARW,P2,Dynamic Head,LAMP)车辆目标检测模型。以UA-DETRAC(university at Albany detection and tracking)数据集为试验数据集,分别进行RepBlock模块嵌入位置试验、不同颈部结构对比试验、剪枝试验、消融试验、模型性能对比试验,验证YOLO-NPDL模型的平均精度均值。试验结果表明:RepBlock模块同时嵌入辅助检测分支和颈部主干结构时对多尺度目标的特征提取能力更优,在训练过程中可保留更多的细节信息,但参数量和计算量均增大;采用Neck-ARW颈部结构后模型的平均精度均值E mAP50、E mAP50-95分别提高1.1%、1.7%,参数量减小约17.9%,结构较优;剪枝率为1.3时,模型参数量、计算量分别减小约38.0%、24.0%,冗余通道占比较少,结构较紧凑;与YOLOv8n模型相比,YOLO-NPDL模型在参数量基本相同的基础上,召回率增大2.7%,E mAP50增大2.7%,达到94.7%,E mAP50-95增大6.4%,达到79.7%;与目前广泛使用的YOLO系列模型相比,YOLO-NPDL模型在较少参数量的基础上,检测精度较高。YOLO-NPDL模型在检测远端目标、雨天及夜景等实际复杂交通情景中无明显误检、漏检情况,可检测到更多的远端小目标车辆,检测效果更优。
文摘尽管基于深度学习的目标检测在交通场景的应用已经取得了一定的进展,但复杂交通场景多目标精度与速度的博弈仍然是个挑战。大多数提升精度的方法都是参数密集型的,大大增加了模型的参数量。针对这一难题,提出了基于YOLOv8的稀疏参数模型,实现在降低参数量的同时提升模型的召回率和检测精度。首先使用简单注意力机制(Simple Attention Mechanism,SimAM)以建立更强劲的骨干网络提取特征;其次提出轻量化的内容感知特征重组模块(Lightweight Content-Aware ReAssembly of Features,L-CARAFE)代替上采样操作,在一个更大的感受野上聚合上下文信息;最后通过稀疏参数的多解耦头,在降低参数量的同时提升模型的检测精度。由于交通场景的复杂性,不仅通过KITTI数据集验证模型的有效性,还通过COCO数据集验证模型的泛化性。该模型在公开的数据集上均能大幅提升召回率和平均精度(mean Average Precision,mAP),其中,nano在KITTI数据集上以2.95的参数量使召回率和mAP分别提高了3.1%和0.9%,小模型在COCO数据集上的mAP@0.5达到60.6%。