摘要
Object detection under high-speed motion remains challenging due to severe motion blur,which degrades spatial appearance and limits the effectiveness of single-frame detectors.While temporal modeling is widely explored to enhance performance,its specific behavior under extreme motion blur is not yet fully characterized.In this work,we conducted an experimental study comparing a single-frame YOLOv8n detector and a temporal-enhanced variant that incorporates multi-frame inputs and frame-difference cues within the backbone.Results on a highly blurred table tennis dataset show that although the temporal-enhanced model and single-frame baseline achieve similar average precision(AP~0.52),they exhibit markedly different failure modes.Quantitative analysis reveals a Jaccard index of only 0.43,demonstrating a pronounced complementarity between the detection outcomes of the two models.By exploiting this behavioral divergence through a simple ensemble strategy,we achieve a substantial aggregate performance gain,increasing AP50 from 0.52 to 0.73.These findings suggest that under extreme blur,temporal modeling can induce complementary detection behavior beyond improving individual detector accuracy,offering an alternative perspective for designing robust detection systems in highly degraded visual environments.