Multiple Object Tracking(MOT)is essential for applications such as autonomous driving,surveillance,and analytics;However,challenges such as occlusion,low-resolution imaging,and identity switches remain persistent.We p...Multiple Object Tracking(MOT)is essential for applications such as autonomous driving,surveillance,and analytics;However,challenges such as occlusion,low-resolution imaging,and identity switches remain persistent.We propose HAMOT,a hierarchical adaptive multi-object tracker that solves these challenges with a novel,unified framework.Unlike previous methods that rely on isolated components,HAMOT incorporates a Swin Transformer-based Adaptive Enhancement(STAE)module—comprising Scene-Adaptive Transformer Enhancement and Confidence-Adaptive Feature Refinement—to improve detection under low-visibility conditions.The hierarchical DynamicGraphNeuralNetworkwith TemporalAttention(DGNN-TA)models both short-and long-termassociations,and the Adaptive Unscented Kalman Filter with Gated Recurrent Unit(AUKF-GRU)ensures accurate motion prediction.The novel Graph-Based Density-Aware Clustering(GDAC)improves occlusion recovery by adapting to scene density,preserving identity integrity.This integrated approach enables adaptive responses to complex visual scenarios,Achieving exceptional performance across all evaluation metrics,including aHigher Order TrackingAccuracy(HOTA)of 67.05%,a Multiple Object Tracking Accuracy(MOTA)of 82.4%,an ID F1 Score(IDF1)of 83.1%,and a total of 1052 Identity Switches(IDSW)on theMOT17;66.61%HOTA,78.3%MOTA,82.1%IDF1,and a total of 748 IDSWonMOT20;and 66.4%HOTA,92.32%MOTA,and 68.96%IDF1 on DanceTrack.With fixed thresholds,the full HAMOT model(all six components)achieves real-time functionality at 24 FPS on MOT17 using RTX3090,ensuring robustness and scalability for real-world MOT applications.展开更多
Multi-Object Tracking(MOT)represents a fundamental but computationally demanding task in computer vision,with particular challenges arising in occluded and densely populated environments.While contemporary tracking sy...Multi-Object Tracking(MOT)represents a fundamental but computationally demanding task in computer vision,with particular challenges arising in occluded and densely populated environments.While contemporary tracking systems have demonstrated considerable progress,persistent limitations—notably frequent occlusion-induced identity switches and tracking inaccuracies—continue to impede reliable real-world deployment.This work introduces an advanced tracking framework that enhances association robustness through a two-stage matching paradigm combining spatial and appearance features.Proposed framework employs:(1)a Height Modulated and Scale Adaptive Spatial Intersection-over-Union(HMSIoU)metric for improved spatial correspondence estimation across variable object scales and partial occlusions;(2)a feature extraction module generating discriminative appearance descriptors for identity maintenance;and(3)a recovery association mechanism for refining matches between unassociated tracks and detections.Comprehensive evaluation on standard MOT17 and MOT20 benchmarks demonstrates significant improvements in tracking consistency,with state-of-the-art performance across key metrics including HOTA(64),MOTA(80.7),IDF1(79.8),and IDs(1379).These results substantiate the efficacy of our Cue-Tracker framework in complex real-world scenarios characterized by occlusions and crowd interactions.展开更多
In dense pedestrian tracking,frequent object occlusions and close distances between objects cause difficulty when accurately estimating object trajectories.In this study,a conditional random field tracking model is es...In dense pedestrian tracking,frequent object occlusions and close distances between objects cause difficulty when accurately estimating object trajectories.In this study,a conditional random field tracking model is established by using a visual long short term memory network in the three-dimensional(3D)space and the motion estimations jointly performed on object trajectory segments.Object visual field information is added to the long short term memory network to improve the accuracy of the motion related object pair selection and motion estimation.To address the uncertainty of the length and interval of trajectory segments,a multimode long short term memory network is proposed for the object motion estimation.The tracking performance is evaluated using the PETS2009 dataset.The experimental results show that the proposed method achieves better performance than the tracking methods based on the independent motion estimation.展开更多
A literature analysis has shown that object search,recognition,and tracking systems are becoming increasingly popular.However,such systems do not achieve high practical results in analyzing small moving living objects...A literature analysis has shown that object search,recognition,and tracking systems are becoming increasingly popular.However,such systems do not achieve high practical results in analyzing small moving living objects ranging from 8 to 14 mm.This article examines methods and tools for recognizing and tracking the class of small moving objects,such as ants.To fulfill those aims,a customized You Only Look Once Ants Recognition(YOLO_AR)Convolutional Neural Network(CNN)has been trained to recognize Messor Structor ants in the laboratory using the LabelImg object marker tool.The proposed model is an extension of the You Only Look Once v4(Yolov4)512×512 model with an additional Self Regularized Non–Monotonic(Mish)activation function.Additionally,the scalable solution for continuous object recognizing and tracking was implemented.This solution is based on the OpenDatacam system,with extended Object Tracking modules that allow for tracking and counting objects that have crossed the custom boundary line.During the study,the methods of the alignment algorithm for finding the trajectory of moving objects were modified.I discovered that the Hungarian algorithm showed better results in tracking small objects than the K–D dimensional tree(k-d tree)matching algorithm used in OpenDataCam.Remarkably,such an algorithm showed better results with the implemented YOLO_AR model due to the lack of False Positives(FP).Therefore,I provided a new tracker module with a Hungarian matching algorithm verified on the Multiple Object Tracking(MOT)benchmark.Furthermore,additional customization parameters for object recognition and tracking results parsing and filtering were added,like boundary angle threshold(BAT)and past frames trajectory prediction(PFTP).Experimental tests confirmed the results of the study on a mobile device.During the experiment,parameters such as the quality of recognition and tracking of moving objects,the PFTP and BAT,and the configuration parameters of the neural network and boundary line model were analyzed.The results showed an increased tracking accuracy with the proposed methods by 50%.The study results confirmed the relevance of the topic and the effectiveness of the implemented methods and tools.展开更多
In this study,a multi-object tracking(MOT)scheme based on a light detection and ranging sensor was proposed to overcome imprecise velocity observations in object occlusion scenarios.By applying real-time velocity esti...In this study,a multi-object tracking(MOT)scheme based on a light detection and ranging sensor was proposed to overcome imprecise velocity observations in object occlusion scenarios.By applying real-time velocity estimation,a modified unscented Kalman filter(UKF)was proposed for the state estimation of a target object.The proposed method can reduce the calculation cost by obviating unscented transformations.Additionally,combined with the advantages of a two-reference-point selection scheme based on a center point and a corner point,a reference point switching approach was introduced to improve tracking accuracy and consistency.The state estimation capability of the proposed UKF was verified by comparing it with the standard UKF in single-target tracking simulations.Moreover,the performance of the proposed MOT system was evaluated using real traffic datasets.展开更多
目的 针对模糊行人特征造成身份切换的问题和复杂场景下目标之间遮挡造成跟踪精度降低的问题,提出AIoU-Tracker多目标跟踪算法。方法 首先根据骨干网络检测头设计了一个特殊的AIoU(adaptive intersection over union)回归损失函数,从重...目的 针对模糊行人特征造成身份切换的问题和复杂场景下目标之间遮挡造成跟踪精度降低的问题,提出AIoU-Tracker多目标跟踪算法。方法 首先根据骨干网络检测头设计了一个特殊的AIoU(adaptive intersection over union)回归损失函数,从重叠面积、中心点距离和纵横比3个方面去衡量,缓解了由于模糊行人特征判别性不足造成的身份切换现象;其次提出了一种简单有效的层级(hierarchical)关联策略,在高分检测框和低分检测框分别关联之后,充分利用关联失败检测框周围的嵌入信息再次进行关联,提高了在遮挡条件下多目标跟踪的关联精度。结果 通过一系列的对比实验,提出的AIoU-Tracker跟踪方法相比于FairMOT跟踪方法在MOT16数据集上,HOTA(higher order tracking accuracy)值由58.3%提高至59.8%,IDF1(ID F1 score)值由72.6%提高至73.1%,MOTA(multi-object tracking accuracy)值由69.3%提高至74.4%;在MOT17数据集上,HOTA值由59.3%提高至59.9%,IDF1值由72.3%提高至72.9%。结论 本文提出的特征平衡性跟踪方法,使边界框大小特征、热图特征和中心点偏移量特征在训练测试中达到了更好的平衡,使多目标跟踪结果更加准确。展开更多
基金supported in part by Multimedia University under the Research Fellow Grant MMUI/250008in part by Telekom Research&Development Sdn Bhd under Grants RDTC/241149 and RDTC/231095+1 种基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R140)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Multiple Object Tracking(MOT)is essential for applications such as autonomous driving,surveillance,and analytics;However,challenges such as occlusion,low-resolution imaging,and identity switches remain persistent.We propose HAMOT,a hierarchical adaptive multi-object tracker that solves these challenges with a novel,unified framework.Unlike previous methods that rely on isolated components,HAMOT incorporates a Swin Transformer-based Adaptive Enhancement(STAE)module—comprising Scene-Adaptive Transformer Enhancement and Confidence-Adaptive Feature Refinement—to improve detection under low-visibility conditions.The hierarchical DynamicGraphNeuralNetworkwith TemporalAttention(DGNN-TA)models both short-and long-termassociations,and the Adaptive Unscented Kalman Filter with Gated Recurrent Unit(AUKF-GRU)ensures accurate motion prediction.The novel Graph-Based Density-Aware Clustering(GDAC)improves occlusion recovery by adapting to scene density,preserving identity integrity.This integrated approach enables adaptive responses to complex visual scenarios,Achieving exceptional performance across all evaluation metrics,including aHigher Order TrackingAccuracy(HOTA)of 67.05%,a Multiple Object Tracking Accuracy(MOTA)of 82.4%,an ID F1 Score(IDF1)of 83.1%,and a total of 1052 Identity Switches(IDSW)on theMOT17;66.61%HOTA,78.3%MOTA,82.1%IDF1,and a total of 748 IDSWonMOT20;and 66.4%HOTA,92.32%MOTA,and 68.96%IDF1 on DanceTrack.With fixed thresholds,the full HAMOT model(all six components)achieves real-time functionality at 24 FPS on MOT17 using RTX3090,ensuring robustness and scalability for real-world MOT applications.
文摘Multi-Object Tracking(MOT)represents a fundamental but computationally demanding task in computer vision,with particular challenges arising in occluded and densely populated environments.While contemporary tracking systems have demonstrated considerable progress,persistent limitations—notably frequent occlusion-induced identity switches and tracking inaccuracies—continue to impede reliable real-world deployment.This work introduces an advanced tracking framework that enhances association robustness through a two-stage matching paradigm combining spatial and appearance features.Proposed framework employs:(1)a Height Modulated and Scale Adaptive Spatial Intersection-over-Union(HMSIoU)metric for improved spatial correspondence estimation across variable object scales and partial occlusions;(2)a feature extraction module generating discriminative appearance descriptors for identity maintenance;and(3)a recovery association mechanism for refining matches between unassociated tracks and detections.Comprehensive evaluation on standard MOT17 and MOT20 benchmarks demonstrates significant improvements in tracking consistency,with state-of-the-art performance across key metrics including HOTA(64),MOTA(80.7),IDF1(79.8),and IDs(1379).These results substantiate the efficacy of our Cue-Tracker framework in complex real-world scenarios characterized by occlusions and crowd interactions.
文摘In dense pedestrian tracking,frequent object occlusions and close distances between objects cause difficulty when accurately estimating object trajectories.In this study,a conditional random field tracking model is established by using a visual long short term memory network in the three-dimensional(3D)space and the motion estimations jointly performed on object trajectory segments.Object visual field information is added to the long short term memory network to improve the accuracy of the motion related object pair selection and motion estimation.To address the uncertainty of the length and interval of trajectory segments,a multimode long short term memory network is proposed for the object motion estimation.The tracking performance is evaluated using the PETS2009 dataset.The experimental results show that the proposed method achieves better performance than the tracking methods based on the independent motion estimation.
文摘A literature analysis has shown that object search,recognition,and tracking systems are becoming increasingly popular.However,such systems do not achieve high practical results in analyzing small moving living objects ranging from 8 to 14 mm.This article examines methods and tools for recognizing and tracking the class of small moving objects,such as ants.To fulfill those aims,a customized You Only Look Once Ants Recognition(YOLO_AR)Convolutional Neural Network(CNN)has been trained to recognize Messor Structor ants in the laboratory using the LabelImg object marker tool.The proposed model is an extension of the You Only Look Once v4(Yolov4)512×512 model with an additional Self Regularized Non–Monotonic(Mish)activation function.Additionally,the scalable solution for continuous object recognizing and tracking was implemented.This solution is based on the OpenDatacam system,with extended Object Tracking modules that allow for tracking and counting objects that have crossed the custom boundary line.During the study,the methods of the alignment algorithm for finding the trajectory of moving objects were modified.I discovered that the Hungarian algorithm showed better results in tracking small objects than the K–D dimensional tree(k-d tree)matching algorithm used in OpenDataCam.Remarkably,such an algorithm showed better results with the implemented YOLO_AR model due to the lack of False Positives(FP).Therefore,I provided a new tracker module with a Hungarian matching algorithm verified on the Multiple Object Tracking(MOT)benchmark.Furthermore,additional customization parameters for object recognition and tracking results parsing and filtering were added,like boundary angle threshold(BAT)and past frames trajectory prediction(PFTP).Experimental tests confirmed the results of the study on a mobile device.During the experiment,parameters such as the quality of recognition and tracking of moving objects,the PFTP and BAT,and the configuration parameters of the neural network and boundary line model were analyzed.The results showed an increased tracking accuracy with the proposed methods by 50%.The study results confirmed the relevance of the topic and the effectiveness of the implemented methods and tools.
基金the National Natural Science Foundation of China(No.51775331)。
文摘In this study,a multi-object tracking(MOT)scheme based on a light detection and ranging sensor was proposed to overcome imprecise velocity observations in object occlusion scenarios.By applying real-time velocity estimation,a modified unscented Kalman filter(UKF)was proposed for the state estimation of a target object.The proposed method can reduce the calculation cost by obviating unscented transformations.Additionally,combined with the advantages of a two-reference-point selection scheme based on a center point and a corner point,a reference point switching approach was introduced to improve tracking accuracy and consistency.The state estimation capability of the proposed UKF was verified by comparing it with the standard UKF in single-target tracking simulations.Moreover,the performance of the proposed MOT system was evaluated using real traffic datasets.
文摘目的 针对模糊行人特征造成身份切换的问题和复杂场景下目标之间遮挡造成跟踪精度降低的问题,提出AIoU-Tracker多目标跟踪算法。方法 首先根据骨干网络检测头设计了一个特殊的AIoU(adaptive intersection over union)回归损失函数,从重叠面积、中心点距离和纵横比3个方面去衡量,缓解了由于模糊行人特征判别性不足造成的身份切换现象;其次提出了一种简单有效的层级(hierarchical)关联策略,在高分检测框和低分检测框分别关联之后,充分利用关联失败检测框周围的嵌入信息再次进行关联,提高了在遮挡条件下多目标跟踪的关联精度。结果 通过一系列的对比实验,提出的AIoU-Tracker跟踪方法相比于FairMOT跟踪方法在MOT16数据集上,HOTA(higher order tracking accuracy)值由58.3%提高至59.8%,IDF1(ID F1 score)值由72.6%提高至73.1%,MOTA(multi-object tracking accuracy)值由69.3%提高至74.4%;在MOT17数据集上,HOTA值由59.3%提高至59.9%,IDF1值由72.3%提高至72.9%。结论 本文提出的特征平衡性跟踪方法,使边界框大小特征、热图特征和中心点偏移量特征在训练测试中达到了更好的平衡,使多目标跟踪结果更加准确。