摘要
为了解决传统YOLOX-ByteTrack框架中密集遮挡、相似外观及非线性运动等问题,提出了基于YOLOX-ByteTrack的改进行人多目标跟踪算法。通过在特征提取阶段引入Transformer,利用自注意力机制建模全局上下文依赖,并设计交叉注意力层融合多尺度特征,显著增强目标表征的判别性与鲁棒性。实验表明,改进算法在MOT20数据集上,MOTA提升0.9%、IDF1提升1.2%、HOTA提升1.3%,同时FPS仅下降0.2,保持接近实时性能。分析显示,Transformer通过动态捕捉目标间长程语义关联,有效缓解了遮挡漏检与身份混淆问题,尤其在密集场景和复杂运动中表现突出,为端到端检测跟踪联合优化提供了新范式。
In order to solve the problems of dense occlusion,similar appearance,and nonlinear motion in the traditional YOLOX ByteTrack framework,this paper proposes an improved pedestrian multi-target tracking algorithm based on YOLOXByteTrack.By introducing Transformer in the feature extraction stage,self attention mechanism is utilized to model global context dependencies,and cross attention layersare designed to fuse multi-scale features,the discriminative and robust nature of target representation is significantly enhanced.Experiments have shown that the improved algorithm improves MOTA by 0.9%,IDF1 by 1.2%,and HOTA by 1.3% on the MOT20 dataset,while maintaining near real-time performance with only a 0.2% decrease in FPS.Analysis shows that Transformer effectively alleviates the problems of occlusion and identity confusion by dynamically capturing long-range semantic associations between targets,especially in dense scenes and complex movements,providing a new paradigm for end-to-end detection and tracking joint optimization.
作者
薛元杰
李雅红
XUE Yuanjie;LI Yahong(Lvliang Vocational and Technical College,Lvliang,Shanxi 032300,China)
出处
《自动化应用》
2025年第16期112-114,共3页
Automation Application