跳至主要內容
Multiple object tracking: A literature review

Abstract

Multiple Object Tracking (MOT) has gained increasing attention due to its academic and commercial potential. Although different approaches have been proposed to tackle this problem, it still remains challenging due to factors like abrupt appearance changes and severe object occlusions. In this work, we contribute the first comprehensive and most recent review on this problem. We inspect the recent advances in various aspects and propose some interesting directions for future research. To the best of our knowledge, there has not been any extensive review on this topic in the community. We endeavor to provide a thorough review on the development of this problem in recent decades. The main contributions of this review are fourfold: 1) Key aspects in an MOT system, including formulation, categorization, key principles, evaluation of MOT are discussed; 2) Instead of enumerating individual works, we discuss existing approaches according to various aspects, in each of which methods are divided into different groups and each group is discussed in detail for the principles, advances and drawbacks; 3) We examine experiments of existing publications and summarize results on popular datasets to provide quantitative and comprehensive comparisons. By analyzing the results from different perspectives, we have verified some basic agreements in the field; and 4) We provide a discussion about issues of MOT research, as well as some interesting directions which will become potential research effort in the future.


游日山原创大约 15 分钟Machine LeaningMachine LeaningMulti-object Tracking
DETRs Beat YOLOs on Real-time Object Detection

DOI: 10.48550/arXiv.2304.08069

Abstract

Recently, end-to-end transformer-based detectors (DETRs) have achieved remarkable performance. However, the high computational cost of DETRs limits their practical application and prevents them from fully exploiting the advantage of no post-processing, such as non-maximum suppression (NMS). In this paper, we first analyze the negative impact of NMS on the accuracy and speed of existing real-time object detectors, and establish an end-to-end speed benchmark. To solve the above problems, we propose a Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge. Specifically, we design an efficient hybrid encoder to efficiently process multi-scale features by decoupling the intra-scale interaction and cross-scale fusion, and propose IoU-aware query selection to further improve performance by providing higher quality initial object queries to the decoder. In addition, our proposed detector supports flexible adjustment of the inference speed by using different decoder layers without the need for retraining, which facilitates the practical application in various real-time scenarios. Our RT-DETR-L achieves 53.0% AP on COCO val2017 and 114 FPS on T4 GPU, while RT-DETR-X achieves 54.8% AP and 74 FPS, outperforming the stateof-the-art YOLO detectors of the same scale in both speed and accuracy. Furthermore, our RT-DETR-R50 achieves 53.1% AP and 108 FPS, outperforming DINO-DeformableDETR-R50 by 2.2% AP in accuracy and by about 21 times in FPS.


游日山原创大约 8 分钟Machine LeaningMachine LeaningMulti-object Tracking
Multi-object tracking via deep feature fusion and association analysis

DOI: 10.1016/j.engappai.2023.106527

Abstract

We describe a tracking-by-detection framework for multi-object tracking (MOT). It first detects the objects of interest in each frame of the video, followed by identifying association with the object detected in the previous frame. A deep association network is described to perform object feature matching in the arbitrary two frames to infer association degree of objects, and then similarity matrix loss is used to calculate association between each object in different frames to achieve an accurate tracking. The novelty of the work lies in the design of a multi-scale fusion strategy by gradually concatenating sub-networks of low-resolution feature maps in parallel to the main network of high-resolution feature maps, in the construction of a deeper backbone network which can enhance the semantic information of object features, and in the use of a siamese network for training a pair of discontinuous frames. The main advantage of our framework is that it avoids missing detection and partial detection. It is particularly suitable for solving the problem of object ID switch caused by occlusion, entering and leaving of objects. Our method is evaluated and demonstrated on the publicly available MOT15, MOT16, MOT17 and MOT20 benchmark datasets. Compared with the state-of-the-art methods, our method achieves better tracking performance, and is therefore, more suited for MOT tasks.


游日山原创大约 16 分钟Machine LeaningMachine LeaningMulti-object Tracking