Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human prior...Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human priorknowledge;thus, they can be sensitive to predefined hyperparameters and failto fit the spatial and scale variations of samples. In this study, we first developa novel dynamic label assignment (DLA) module to handle the diverse datadistributions and adaptively distinguish the foreground from the backgroundbased on the statistical characteristics of the target in visual object tracking.The core of DLA module is a two-step selection mechanism. The first stepselects candidate samples according to the Euclidean distance between trainingsamples and ground truth, and the second step selects positive/negativesamples based on the mean and standard deviation of candidate samples.The proposed approach is general-purpose and can be easily integrated intoanchor-based and anchor-free trackers for optimal sample-label matching.According to extensive experimental findings, Siamese-based trackers withDLA modules can refine target locations and outperformbaseline trackers onOTB100, VOT2019, UAV123 and LaSOT. Particularly, DLA-SiamRPN++improves SiamRPN++ by 1% AUC and DLA-SiamCAR improves Siam-CAR by 2.5% AUC on OTB100. Furthermore, hyper-parameters analysisexperiments show that DLA module hardly increases spatio-temporal complexity,the proposed approach maintains the same speed as the originaltracker without additional overhead.展开更多
Object detection serves as a challenging yet crucial task in computer vision.Despite significant advancements,modern detectors remain struggling with task alignment between localization and classification.In this pape...Object detection serves as a challenging yet crucial task in computer vision.Despite significant advancements,modern detectors remain struggling with task alignment between localization and classification.In this paper,Global Collaborative Learning(GCL)is introduced to address these challenges from often-overlooked perspectives.First,the essence of GCL is reflected in the label assignment of the detector.Adjusting the loss function to transform samples with strong localization yet weak classification into high-quality samples in both tasks,provides more effective training signals,enabling the model to capture key consistent features.Second,the spirit of GCL is embodied in the head design.By enabling global feature interaction within the decoupled head,the approach ensures that final predictions are made more comprehensively and robustly,thereby preventing the two independent branches from converging into suboptimal solutions for their respective tasks.Extensive experiments on the challenging MS COCO and CrowdHuman datasets demonstrate that the proposed GCL method substantially enhances performance and generalization capabilities.展开更多
Early dental caries detection by endoscope can prevent complications,such as pulpitis and apical infection.However,automatically identifying dental caries remains challenging due to the uncertainty in size,contrast,lo...Early dental caries detection by endoscope can prevent complications,such as pulpitis and apical infection.However,automatically identifying dental caries remains challenging due to the uncertainty in size,contrast,low saliency,and high interclass similarity of dental caries.To address these problems,we propose the Global Feature Detector(GFDet)that integrates the proposed Feature Selection Pyramid Network(FSPN)and Adaptive Assignment-Balanced Mechanism(AABM).Specifically,FSPN performs upsampling with the semantic information of adjacent feature layers to mitigate the semantic information loss due to sharp channel reduction and enhance discriminative features by aggregating fine-grained details and high-level semantics.In addition,a new label assignment mechanism is proposed that enables the model to select more high-quality samples as positive samples,which can address the problem of easily ignored small objects.Meanwhile,we have built an endoscopic dataset for caries detection,consisting of 1318 images labeled by five dentists.For experiments on the collected dataset,the F1-score of our model is 75.6%,which out-performances the state-of-the-art models by 7.1%.展开更多
基金support of the National Natural Science Foundation of China (Grant No.52127809,author Z.W,http://www.nsfc.gov.cn/No.51625501,author Z.W,http://www.nsfc.gov.cn/)is greatly appreciated.
文摘Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human priorknowledge;thus, they can be sensitive to predefined hyperparameters and failto fit the spatial and scale variations of samples. In this study, we first developa novel dynamic label assignment (DLA) module to handle the diverse datadistributions and adaptively distinguish the foreground from the backgroundbased on the statistical characteristics of the target in visual object tracking.The core of DLA module is a two-step selection mechanism. The first stepselects candidate samples according to the Euclidean distance between trainingsamples and ground truth, and the second step selects positive/negativesamples based on the mean and standard deviation of candidate samples.The proposed approach is general-purpose and can be easily integrated intoanchor-based and anchor-free trackers for optimal sample-label matching.According to extensive experimental findings, Siamese-based trackers withDLA modules can refine target locations and outperformbaseline trackers onOTB100, VOT2019, UAV123 and LaSOT. Particularly, DLA-SiamRPN++improves SiamRPN++ by 1% AUC and DLA-SiamCAR improves Siam-CAR by 2.5% AUC on OTB100. Furthermore, hyper-parameters analysisexperiments show that DLA module hardly increases spatio-temporal complexity,the proposed approach maintains the same speed as the originaltracker without additional overhead.
基金supported by National Key R&D Program of China under Grant No.2022YFB3305700Shanghai Science Innovation Action Plan under Grant No.21511104302.
文摘Object detection serves as a challenging yet crucial task in computer vision.Despite significant advancements,modern detectors remain struggling with task alignment between localization and classification.In this paper,Global Collaborative Learning(GCL)is introduced to address these challenges from often-overlooked perspectives.First,the essence of GCL is reflected in the label assignment of the detector.Adjusting the loss function to transform samples with strong localization yet weak classification into high-quality samples in both tasks,provides more effective training signals,enabling the model to capture key consistent features.Second,the spirit of GCL is embodied in the head design.By enabling global feature interaction within the decoupled head,the approach ensures that final predictions are made more comprehensively and robustly,thereby preventing the two independent branches from converging into suboptimal solutions for their respective tasks.Extensive experiments on the challenging MS COCO and CrowdHuman datasets demonstrate that the proposed GCL method substantially enhances performance and generalization capabilities.
基金supported by the Zhejiang Provincial Natural Science Foundation of China(No.LGF22F020014)the National Key Research and Development Program of China(No.2020YFB1707700)the National Natural Science Foundation of China(Nos.62036009 and U1909203).
文摘Early dental caries detection by endoscope can prevent complications,such as pulpitis and apical infection.However,automatically identifying dental caries remains challenging due to the uncertainty in size,contrast,low saliency,and high interclass similarity of dental caries.To address these problems,we propose the Global Feature Detector(GFDet)that integrates the proposed Feature Selection Pyramid Network(FSPN)and Adaptive Assignment-Balanced Mechanism(AABM).Specifically,FSPN performs upsampling with the semantic information of adjacent feature layers to mitigate the semantic information loss due to sharp channel reduction and enhance discriminative features by aggregating fine-grained details and high-level semantics.In addition,a new label assignment mechanism is proposed that enables the model to select more high-quality samples as positive samples,which can address the problem of easily ignored small objects.Meanwhile,we have built an endoscopic dataset for caries detection,consisting of 1318 images labeled by five dentists.For experiments on the collected dataset,the F1-score of our model is 75.6%,which out-performances the state-of-the-art models by 7.1%.