Multiple object tracking(MOT)in unmanned aerial vehicle(UAV)videos has attracted attention.Because of the observation perspectives of UAV,the object scale changes dramatically and is relatively small.Besides,most MOT ...Multiple object tracking(MOT)in unmanned aerial vehicle(UAV)videos has attracted attention.Because of the observation perspectives of UAV,the object scale changes dramatically and is relatively small.Besides,most MOT algorithms in UAV videos cannot achieve real-time due to the tracking-by-detection paradigm.We propose a feature-aligned attention network(FAANet).It mainly consists of a channel and spatial attention module and a feature-aligned aggregation module.We also improve the real-time performance using the joint-detection-embedding paradigm and structural re-parameterization technique.We validate the effectiveness with extensive experiments on UAV detection and tracking benchmark,achieving new state-of-the-art 44.0 MOTA,64.6 IDF1 with 38.24 frames per second running speed on a single 1080Ti graphics processing unit.展开更多
An object model-based tracking method is useful for tracking multiple objects, but the main difficulties are modeling objects reliably and tracking objects via models in successive frames. An effective tracking method...An object model-based tracking method is useful for tracking multiple objects, but the main difficulties are modeling objects reliably and tracking objects via models in successive frames. An effective tracking method using the object models is proposed to track multiple objects in a real-time visual surveillance system. Firstly, for detecting objects, an adaptive kernel density estimation method is utilized, which uses an adaptive bandwidth and features combining colour and gradient. Secondly, some models of objects are built for describing motion, shape and colour features. Then, a matching matrix is formed to analyze tracking situations. If objects are tracked under occlusions, the optimal "visual" object is found to represent the occluded object, and the posterior probability of pixel is used to determine which pixel is utilized for updating object models. Extensive experiments show that this method improves the accuracy and validity of tracking objects even under occlusions and is used in real-time visual surveillance systems.展开更多
Aiming at the difficulties of the health status recognition of yellow feather broilers in large-scale broiler farms and the low recognition rate of current models,a novel method based on machine vision to achieve prec...Aiming at the difficulties of the health status recognition of yellow feather broilers in large-scale broiler farms and the low recognition rate of current models,a novel method based on machine vision to achieve precise tracking of multiple broilers was proposed in this paper.Broilers’behavior in the breeding environment can be tracked to analyze their behaviors and health status further.An improved YOLOv3(You Only Look Once v3)algorithm was used as the detector of the Deep SORT(Simple Online and Realtime Tracking)algorithm to realize the multiple object tracking of yellow feather broilers in the flat breeding chamber,which replaced the backbone of YOLOv3 with MobileNetV2 to improve the inference speed of the detection module.The DRSN(Deep Residual Shrinkage Network)was integrated with MobileNetV2 to enhance the feature extraction capability of the network.Moreover,in view of the slight change in the individual size of the yellow feather broiler,the feature fusion network was also redesigned by combining it with the attention mechanism to enable the adaptive learning of the objects’multi-scale features.Compared with traditional YOLOv3,improved YOLOv3 achieves 93.2%mAP(mean Average Precision)and 29 fps(frames per second),representing high-precision real-time detection performance.Furthermore,while the MOTA(Multiple Object Tracking Accuracy)increases from 51%to 54%,the IDSW(Identity Switch)decreases by 62.2%compared with traditional YOLOv3-based objective detectors.The proposed algorithm can provide a technical reference for analyzing the behavioral perception and health status of broilers in the flat breeding environment.展开更多
Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to tar...Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to target representation and data association. So discriminative and reliable target representation is vital for accurate data association in multi-tracking. Pervious works always combine bunch of features to increase the discriminative power, but this is prone to error accumulation and unnecessary computational cost, which may increase ambiguity on the contrary. Moreover, reliability of a same feature in different scenes may vary a lot, especially for currently widespread network cameras, which are settled in various and complex indoor scenes, previous fixed feature selection schemes cannot meet general requirements. To properly handle these problems, first, we propose a scene-adaptive hierarchical data association scheme, which adaptively selects features with higher reliability on target representation in the applied scene, and gradually combines features to the minimum requirement of discriminating ambiguous targets; second, a novel depth-invariant part-based appearance model using RGB-D data is proposed which makes the appearance model robust to scale change, partial occlusion and view-truncation. The introduce of RGB-D data increases the diversity of features, which provides more types of features for feature selection in data association and enhances the final multi-tracking performance. We validate our method from several aspects including scene-adaptive feature selection scheme, hierarchical data association scheme and RGB-D based appearance modeling scheme in various indoor scenes, which demonstrates its effectiveness and efficiency on improving multi-tracking performances in various indoor scenes.展开更多
In dense pedestrian tracking,frequent object occlusions and close distances between objects cause difficulty when accurately estimating object trajectories.In this study,a conditional random field tracking model is es...In dense pedestrian tracking,frequent object occlusions and close distances between objects cause difficulty when accurately estimating object trajectories.In this study,a conditional random field tracking model is established by using a visual long short term memory network in the three-dimensional(3D)space and the motion estimations jointly performed on object trajectory segments.Object visual field information is added to the long short term memory network to improve the accuracy of the motion related object pair selection and motion estimation.To address the uncertainty of the length and interval of trajectory segments,a multimode long short term memory network is proposed for the object motion estimation.The tracking performance is evaluated using the PETS2009 dataset.The experimental results show that the proposed method achieves better performance than the tracking methods based on the independent motion estimation.展开更多
In this paper,we provide a new approach for intelligent traffic transportation in the intelligent vehicular networks,which aims at collecting the vehicles’locations,trajectories and other key driving parameters for t...In this paper,we provide a new approach for intelligent traffic transportation in the intelligent vehicular networks,which aims at collecting the vehicles’locations,trajectories and other key driving parameters for the time-critical autonomous driving’s requirement.The key of our method is a multi-vehicle tracking framework in the traffic monitoring scenario..Our proposed framework is composed of three modules:multi-vehicle detection,multi-vehicle association and miss-detected vehicle tracking.For the first module,we integrate self-attention mechanism into detector of using key point estimation for better detection effect.For the second module,we apply the multi-dimensional information for robustness promotion,including vehicle re-identification(Re-ID)features,historical trajectory information,and spatial position information For the third module,we re-track the miss-detected vehicles with occlusions in the first detection module.Besides,we utilize the asymmetric convolution and depth-wise separable convolution to reduce the model’s parameters for speed-up.Extensive experimental results show the effectiveness of our proposed multi-vehicle tracking framework.展开更多
The identification and classification of collective people’s activities are gaining momentum as significant themes in machine learning,with many potential applications emerging.The need for representation of collecti...The identification and classification of collective people’s activities are gaining momentum as significant themes in machine learning,with many potential applications emerging.The need for representation of collective human behavior is especially crucial in applications such as assessing security conditions and preventing crowd congestion.This paper investigates the capability of deep neural network(DNN)algorithms to achieve our carefully engineered pipeline for crowd analysis.It includes three principal stages that cover crowd analysis challenges.First,individual’s detection is represented using the You Only Look Once(YOLO)model for human detection and Kalman filter for multiple human tracking;Second,the density map and crowd counting of a certain location are generated using bounding boxes from a human detector;and Finally,in order to classify normal or abnormal crowds,individual activities are identified with pose estimation.The proposed system successfully achieves designing an effective collective representation of the crowd given the individuals in addition to introducing a significant change of crowd in terms of activities change.Experimental results onMOT20 and SDHA datasets demonstrate that the proposed system is robust and efficient.The framework achieves an improved performance of recognition and detection peoplewith a mean average precision of 99.0%,a real-time speed of 0.6ms non-maximumsuppression(NMS)per image for the SDHAdataset,and 95.3%mean average precision for MOT20 with 1.5ms NMS per image.展开更多
In recent years,simultaneous localization and mapping in dynamic environments(dynamic SLAM)has attracted significant attention from both academia and industry.Some pioneering work on this technique has expanded the po...In recent years,simultaneous localization and mapping in dynamic environments(dynamic SLAM)has attracted significant attention from both academia and industry.Some pioneering work on this technique has expanded the potential of robotic applications.Compared to standard SLAM under the static world assumption,dynamic SLAM divides features into static and dynamic categories and leverages each type of feature properly.Therefore,dynamic SLAM can provide more robust localization for intelligent robots that operate in complex dynamic environments.Additionally,to meet the demands of some high-level tasks,dynamic SLAM can be integrated with multiple object tracking.This article presents a survey on dynamic SLAM from the perspective of feature choices.A discussion of the advantages and disadvantages of different visual features is provided in this article.展开更多
This paper presented a real-time millimeter wave radar-based system for tracking vehicle trajectories in a wide area (continuously along a roadway with essentially no length limit in practice). The trajectory tracking...This paper presented a real-time millimeter wave radar-based system for tracking vehicle trajectories in a wide area (continuously along a roadway with essentially no length limit in practice). The trajectory tracking results were first validated for single vehicle trajectory tracking using the Real-Time Kinematic positioning technology based on the Beidou satellite navigation systems. The validation showed that the vehicle positions were captured with a mean lateral offset of −0.284 m and a mean longitudinal offset of −0.352 m. The mean estimated speeds were found to have a difference of only −0.048 km/h from the ground truths. The trajectory tracking results were also validated through multi-object tracking using video data from an unmanned aerial vehicle (UAV). Compared with the UAV video footage, the millimeter wave-based system was found to correctly capture around 92% of the total number of vehicles. For the correctly captured vehicles, their positions were found to be within 0.99 m (about a quarter of the width of a regular traffic lane) of the ground truth. In this paper, we also would like to share openly the entire validated datasets which has been recently published online as the TJRD TS platform. Additionally, a demonstration in using the dataset to detect aggressive driving behaviors, such as speeding was presented. It is expected that the open dataset will help enable researchers and practitioners to further explore the behaviors of road users, track the dynamics of safety risks and congestion formation, and evaluate incident impacts in a more microscopic but comprehensive way.展开更多
基金This work was supported by National Program on Key Basic Research Project(No.2014CB744903)National Natural Science Foundation of China(Nos.61673270 and 61973212)Key Technology Research Program of Sichuan Provincial Department of Science and Technology(No.2020YFSY0027).
文摘Multiple object tracking(MOT)in unmanned aerial vehicle(UAV)videos has attracted attention.Because of the observation perspectives of UAV,the object scale changes dramatically and is relatively small.Besides,most MOT algorithms in UAV videos cannot achieve real-time due to the tracking-by-detection paradigm.We propose a feature-aligned attention network(FAANet).It mainly consists of a channel and spatial attention module and a feature-aligned aggregation module.We also improve the real-time performance using the joint-detection-embedding paradigm and structural re-parameterization technique.We validate the effectiveness with extensive experiments on UAV detection and tracking benchmark,achieving new state-of-the-art 44.0 MOTA,64.6 IDF1 with 38.24 frames per second running speed on a single 1080Ti graphics processing unit.
基金supported by the National Natural Science Foundation of China(60835004 60775047+2 种基金 60872130)the National High Technology Research and Development Program of China(863 Program)(2007AA04Z244 2008AA04Z214)
文摘An object model-based tracking method is useful for tracking multiple objects, but the main difficulties are modeling objects reliably and tracking objects via models in successive frames. An effective tracking method using the object models is proposed to track multiple objects in a real-time visual surveillance system. Firstly, for detecting objects, an adaptive kernel density estimation method is utilized, which uses an adaptive bandwidth and features combining colour and gradient. Secondly, some models of objects are built for describing motion, shape and colour features. Then, a matching matrix is formed to analyze tracking situations. If objects are tracked under occlusions, the optimal "visual" object is found to represent the occluded object, and the posterior probability of pixel is used to determine which pixel is utilized for updating object models. Extensive experiments show that this method improves the accuracy and validity of tracking objects even under occlusions and is used in real-time visual surveillance systems.
基金funded by Jiangsu Agriculture Science and Technology Innovation Fund(Grant No.CX(21)3058)Xuzhou Key Research and Development Project(Modern Agriculture)(Grant No.KC21135)International Science and Technology Cooperation Program of Jiangsu Province(Grant No.BZ2023013).
文摘Aiming at the difficulties of the health status recognition of yellow feather broilers in large-scale broiler farms and the low recognition rate of current models,a novel method based on machine vision to achieve precise tracking of multiple broilers was proposed in this paper.Broilers’behavior in the breeding environment can be tracked to analyze their behaviors and health status further.An improved YOLOv3(You Only Look Once v3)algorithm was used as the detector of the Deep SORT(Simple Online and Realtime Tracking)algorithm to realize the multiple object tracking of yellow feather broilers in the flat breeding chamber,which replaced the backbone of YOLOv3 with MobileNetV2 to improve the inference speed of the detection module.The DRSN(Deep Residual Shrinkage Network)was integrated with MobileNetV2 to enhance the feature extraction capability of the network.Moreover,in view of the slight change in the individual size of the yellow feather broiler,the feature fusion network was also redesigned by combining it with the attention mechanism to enable the adaptive learning of the objects’multi-scale features.Compared with traditional YOLOv3,improved YOLOv3 achieves 93.2%mAP(mean Average Precision)and 29 fps(frames per second),representing high-precision real-time detection performance.Furthermore,while the MOTA(Multiple Object Tracking Accuracy)increases from 51%to 54%,the IDSW(Identity Switch)decreases by 62.2%compared with traditional YOLOv3-based objective detectors.The proposed algorithm can provide a technical reference for analyzing the behavioral perception and health status of broilers in the flat breeding environment.
基金This work is supported by National Natural Science Foundation of China (NSFC, No. 61340046), National High Technology Research and Development Program of China (863 Program, No. 2006AA04Z247), Scientific and Technical Innovation Commission of Shenzhen Municipality (JCYJ20130331144631730, JCYJ20130331144716089), Specialized Research Fund for the Doctoral Program of Higher Education (No. 20130001110011).
文摘Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to target representation and data association. So discriminative and reliable target representation is vital for accurate data association in multi-tracking. Pervious works always combine bunch of features to increase the discriminative power, but this is prone to error accumulation and unnecessary computational cost, which may increase ambiguity on the contrary. Moreover, reliability of a same feature in different scenes may vary a lot, especially for currently widespread network cameras, which are settled in various and complex indoor scenes, previous fixed feature selection schemes cannot meet general requirements. To properly handle these problems, first, we propose a scene-adaptive hierarchical data association scheme, which adaptively selects features with higher reliability on target representation in the applied scene, and gradually combines features to the minimum requirement of discriminating ambiguous targets; second, a novel depth-invariant part-based appearance model using RGB-D data is proposed which makes the appearance model robust to scale change, partial occlusion and view-truncation. The introduce of RGB-D data increases the diversity of features, which provides more types of features for feature selection in data association and enhances the final multi-tracking performance. We validate our method from several aspects including scene-adaptive feature selection scheme, hierarchical data association scheme and RGB-D based appearance modeling scheme in various indoor scenes, which demonstrates its effectiveness and efficiency on improving multi-tracking performances in various indoor scenes.
文摘In dense pedestrian tracking,frequent object occlusions and close distances between objects cause difficulty when accurately estimating object trajectories.In this study,a conditional random field tracking model is established by using a visual long short term memory network in the three-dimensional(3D)space and the motion estimations jointly performed on object trajectory segments.Object visual field information is added to the long short term memory network to improve the accuracy of the motion related object pair selection and motion estimation.To address the uncertainty of the length and interval of trajectory segments,a multimode long short term memory network is proposed for the object motion estimation.The tracking performance is evaluated using the PETS2009 dataset.The experimental results show that the proposed method achieves better performance than the tracking methods based on the independent motion estimation.
基金This work was supported in part by the Beijing Natural Science Foundation(L191004)the National Natural Science Foundation of China under No.61720106007 and No.61872047+1 种基金the Beijing Nova Program under No.Z201100006820124the Funds for Cre ative Research Groups of China under No.61921003,and the 111 Project(B18008).
文摘In this paper,we provide a new approach for intelligent traffic transportation in the intelligent vehicular networks,which aims at collecting the vehicles’locations,trajectories and other key driving parameters for the time-critical autonomous driving’s requirement.The key of our method is a multi-vehicle tracking framework in the traffic monitoring scenario..Our proposed framework is composed of three modules:multi-vehicle detection,multi-vehicle association and miss-detected vehicle tracking.For the first module,we integrate self-attention mechanism into detector of using key point estimation for better detection effect.For the second module,we apply the multi-dimensional information for robustness promotion,including vehicle re-identification(Re-ID)features,historical trajectory information,and spatial position information For the third module,we re-track the miss-detected vehicles with occlusions in the first detection module.Besides,we utilize the asymmetric convolution and depth-wise separable convolution to reduce the model’s parameters for speed-up.Extensive experimental results show the effectiveness of our proposed multi-vehicle tracking framework.
文摘The identification and classification of collective people’s activities are gaining momentum as significant themes in machine learning,with many potential applications emerging.The need for representation of collective human behavior is especially crucial in applications such as assessing security conditions and preventing crowd congestion.This paper investigates the capability of deep neural network(DNN)algorithms to achieve our carefully engineered pipeline for crowd analysis.It includes three principal stages that cover crowd analysis challenges.First,individual’s detection is represented using the You Only Look Once(YOLO)model for human detection and Kalman filter for multiple human tracking;Second,the density map and crowd counting of a certain location are generated using bounding boxes from a human detector;and Finally,in order to classify normal or abnormal crowds,individual activities are identified with pose estimation.The proposed system successfully achieves designing an effective collective representation of the crowd given the individuals in addition to introducing a significant change of crowd in terms of activities change.Experimental results onMOT20 and SDHA datasets demonstrate that the proposed system is robust and efficient.The framework achieves an improved performance of recognition and detection peoplewith a mean average precision of 99.0%,a real-time speed of 0.6ms non-maximumsuppression(NMS)per image for the SDHAdataset,and 95.3%mean average precision for MOT20 with 1.5ms NMS per image.
基金This work was supported by National Natural Science Foundation of China,Nos.62002359 and 61836015the Beijing Advanced Discipline Fund,No.115200S001.
文摘In recent years,simultaneous localization and mapping in dynamic environments(dynamic SLAM)has attracted significant attention from both academia and industry.Some pioneering work on this technique has expanded the potential of robotic applications.Compared to standard SLAM under the static world assumption,dynamic SLAM divides features into static and dynamic categories and leverages each type of feature properly.Therefore,dynamic SLAM can provide more robust localization for intelligent robots that operate in complex dynamic environments.Additionally,to meet the demands of some high-level tasks,dynamic SLAM can be integrated with multiple object tracking.This article presents a survey on dynamic SLAM from the perspective of feature choices.A discussion of the advantages and disadvantages of different visual features is provided in this article.
基金supported by the National Key R&D Program of China(2019YFB1600703)the Chinese National Natural Science Foundation(Grant No.72001161 and 52172348)the Fundamental Research Funds for the Central Universities.
文摘This paper presented a real-time millimeter wave radar-based system for tracking vehicle trajectories in a wide area (continuously along a roadway with essentially no length limit in practice). The trajectory tracking results were first validated for single vehicle trajectory tracking using the Real-Time Kinematic positioning technology based on the Beidou satellite navigation systems. The validation showed that the vehicle positions were captured with a mean lateral offset of −0.284 m and a mean longitudinal offset of −0.352 m. The mean estimated speeds were found to have a difference of only −0.048 km/h from the ground truths. The trajectory tracking results were also validated through multi-object tracking using video data from an unmanned aerial vehicle (UAV). Compared with the UAV video footage, the millimeter wave-based system was found to correctly capture around 92% of the total number of vehicles. For the correctly captured vehicles, their positions were found to be within 0.99 m (about a quarter of the width of a regular traffic lane) of the ground truth. In this paper, we also would like to share openly the entire validated datasets which has been recently published online as the TJRD TS platform. Additionally, a demonstration in using the dataset to detect aggressive driving behaviors, such as speeding was presented. It is expected that the open dataset will help enable researchers and practitioners to further explore the behaviors of road users, track the dynamics of safety risks and congestion formation, and evaluate incident impacts in a more microscopic but comprehensive way.