Stratified sampling is often used in opinion polls to reduce standard errors,and it is known as variance reduction technique in sampling theory.The most common approach of resampling method is based on bootstrapping t...Stratified sampling is often used in opinion polls to reduce standard errors,and it is known as variance reduction technique in sampling theory.The most common approach of resampling method is based on bootstrapping the dataset with replacement.A main purpose of this work is to investigate extensions of the resampling methods in classification problems,specifically we use decision trees,from a family of stratification models to improve prediction accuracy by aggregating classifiers built on a perturbed dataset.We use bagging,as a method of estimating a good decision boundary according to a family of stratification models.The overall conclusion is that for decision trees,un-stratified bootstrapping with bagging can yield lower error rates than other sampling strategies for simulated datasets.Based on the results in these experiments,a possible explanation as to why un-stratified sampling is a best is because bagging is itself a method of stratification.展开更多
The long-tailed data distribution poses an enormous challenge for training neural networks in classification.A classification network can be decoupled into a feature extractor and a classifier.This paper takes a semi-...The long-tailed data distribution poses an enormous challenge for training neural networks in classification.A classification network can be decoupled into a feature extractor and a classifier.This paper takes a semi-discrete optimal transport(OT)perspective to analyze the long-tailed classification problem,where the feature space is viewed as a continuous source domain,and the classifier weights are viewed as a discrete target domain.The classifier is indeed to find a cell decomposition of the feature space with each cell corresponding to one class.An imbalanced training set causes the more frequent classes to have larger volume cells,which means that the classifier's decision boundary is biased towards less frequent classes,resulting in reduced classification performance in the inference phase.Therefore,we propose a novel OTdynamic softmax loss,which dynamically adjusts the decision boundary in the training phase to avoid overfitting in the tail classes.In addition,our method incorporates the supervised contrastive loss so that the feature space can satisfy the uniform distribution condition.Extensive and comprehensive experiments demonstrate that our method achieves state-ofthe-art performance on multiple long-tailed recognition benchmarks,including CIFAR-LT,ImageNet-LT,iNaturalist 2018,and Places-LT.展开更多
基金we would like to acknowledge the Research and Consulting Centre(RCC),University of Benghazi,Libya for funded this work.
文摘Stratified sampling is often used in opinion polls to reduce standard errors,and it is known as variance reduction technique in sampling theory.The most common approach of resampling method is based on bootstrapping the dataset with replacement.A main purpose of this work is to investigate extensions of the resampling methods in classification problems,specifically we use decision trees,from a family of stratification models to improve prediction accuracy by aggregating classifiers built on a perturbed dataset.We use bagging,as a method of estimating a good decision boundary according to a family of stratification models.The overall conclusion is that for decision trees,un-stratified bootstrapping with bagging can yield lower error rates than other sampling strategies for simulated datasets.Based on the results in these experiments,a possible explanation as to why un-stratified sampling is a best is because bagging is itself a method of stratification.
基金supported by the National Key Research and Development Program of China under Grant No.2021YFA1003003the National Natural Science Foundation of China under Grant Nos.61936002 and T2225012.
文摘The long-tailed data distribution poses an enormous challenge for training neural networks in classification.A classification network can be decoupled into a feature extractor and a classifier.This paper takes a semi-discrete optimal transport(OT)perspective to analyze the long-tailed classification problem,where the feature space is viewed as a continuous source domain,and the classifier weights are viewed as a discrete target domain.The classifier is indeed to find a cell decomposition of the feature space with each cell corresponding to one class.An imbalanced training set causes the more frequent classes to have larger volume cells,which means that the classifier's decision boundary is biased towards less frequent classes,resulting in reduced classification performance in the inference phase.Therefore,we propose a novel OTdynamic softmax loss,which dynamically adjusts the decision boundary in the training phase to avoid overfitting in the tail classes.In addition,our method incorporates the supervised contrastive loss so that the feature space can satisfy the uniform distribution condition.Extensive and comprehensive experiments demonstrate that our method achieves state-ofthe-art performance on multiple long-tailed recognition benchmarks,including CIFAR-LT,ImageNet-LT,iNaturalist 2018,and Places-LT.