Large amounts of labeled data are usually needed for training deep neural networks in medical image studies,particularly in medical image classification.However,in the field of semi-supervised medical image analysis,l...Large amounts of labeled data are usually needed for training deep neural networks in medical image studies,particularly in medical image classification.However,in the field of semi-supervised medical image analysis,labeled data is very scarce due to patient privacy concerns.For researchers,obtaining high-quality labeled images is exceedingly challenging because it involves manual annotation and clinical understanding.In addition,skin datasets are highly suitable for medical image classification studies due to the inter-class relationships and the inter-class similarities of skin lesions.In this paper,we propose a model called Coalition Sample Relation Consistency(CSRC),a consistency-based method that leverages Canonical Correlation Analysis(CCA)to capture the intrinsic relationships between samples.Considering that traditional consistency-based models only focus on the consistency of prediction,we additionally explore the similarity between features by using CCA.We enforce feature relation consistency based on traditional models,encouraging the model to learn more meaningful information from unlabeled data.Finally,considering that cross-entropy loss is not as suitable as the supervised loss when studying with imbalanced datasets(i.e.,ISIC 2017 and ISIC 2018),we improve the supervised loss to achieve better classification accuracy.Our study shows that this model performs better than many semi-supervised methods.展开更多
The volume of social media data on the Internet is constantly growing.This has created a substantial research field for data analysts.The diversity of articles,posts,and comments on news websites and social networks a...The volume of social media data on the Internet is constantly growing.This has created a substantial research field for data analysts.The diversity of articles,posts,and comments on news websites and social networks astonishes imagination.Nevertheless,most researchers focus on posts on Twitter that have a specific format and length restriction.The majority of them are written in the English language.As relatively few works have paid attention to sentiment analysis in the Russian and Kazakh languages,this article thoroughly analyzes news posts in the Kazakhstan media space.The amassed datasets include texts labeled according to three sentiment classes:positive,negative,and neutral.The datasets are highly imbalanced,with a significant predominance of the positive class.Three resampling techniques(undersampling,oversampling,and synthetic minority oversampling(SMOTE))are used to resample the datasets to deal with this issue.Subsequently,the texts are vectorized with the TF-IDF metric and classified with seven machine learning(ML)algorithms:naïve Bayes,support vector machine,logistic regression,k-nearest neighbors,decision tree,random forest,and XGBoost.Experimental results reveal that oversampling and SMOTE with logistic regression,decision tree,and random forest achieve the best classification scores.These models are effectively employed in the developed social analytics platform.展开更多
This paper is motivated by the interest in finding significant movements in financial stock prices. However, when the number of profitable opportunities is scarce, the prediction of these cases is difficult. In a prev...This paper is motivated by the interest in finding significant movements in financial stock prices. However, when the number of profitable opportunities is scarce, the prediction of these cases is difficult. In a previous work, we have introduced evolving decision rules (EDR) to detect financial opportunities. The objective of EDR is to classify the minority class (positive eases) in imbalaneed environments. EDR provides a range of classifications to find the best balance between not making mistakes and not missing opportunities. The goals of this paper are: 1) to show that EDR produces a range of solutions to suit the investor's preferences and 2) to analyze the factors that benefit the performance of EDR. A series of experiments was performed. EDR was tested using a data set from the London Financial Market. To analyze the EDR behaviour, another experiment was carried out using three artificial data sets, whose solutions have different levels of complexity. Finally, an illustrative example was provided to show how a bigger collection of rules is able to classify more positive eases in imbalanced data sets. Experimental results show that: 1) EDR offers a range of solutions to fit the risk guidelines of different types of investors, and 2) a bigger collection of rules is able to classify more positive eases in imbalanced environments.展开更多
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which...In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel under- sampling technique has been successfully applied in searching for the best majority class subset for training a good- performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.展开更多
基金sponsored by the National Natural Science Foundation of China Grant No.62271302the Shanghai Municipal Natural Science Foundation Grant 20ZR1423500.
文摘Large amounts of labeled data are usually needed for training deep neural networks in medical image studies,particularly in medical image classification.However,in the field of semi-supervised medical image analysis,labeled data is very scarce due to patient privacy concerns.For researchers,obtaining high-quality labeled images is exceedingly challenging because it involves manual annotation and clinical understanding.In addition,skin datasets are highly suitable for medical image classification studies due to the inter-class relationships and the inter-class similarities of skin lesions.In this paper,we propose a model called Coalition Sample Relation Consistency(CSRC),a consistency-based method that leverages Canonical Correlation Analysis(CCA)to capture the intrinsic relationships between samples.Considering that traditional consistency-based models only focus on the consistency of prediction,we additionally explore the similarity between features by using CCA.We enforce feature relation consistency based on traditional models,encouraging the model to learn more meaningful information from unlabeled data.Finally,considering that cross-entropy loss is not as suitable as the supervised loss when studying with imbalanced datasets(i.e.,ISIC 2017 and ISIC 2018),we improve the supervised loss to achieve better classification accuracy.Our study shows that this model performs better than many semi-supervised methods.
文摘The volume of social media data on the Internet is constantly growing.This has created a substantial research field for data analysts.The diversity of articles,posts,and comments on news websites and social networks astonishes imagination.Nevertheless,most researchers focus on posts on Twitter that have a specific format and length restriction.The majority of them are written in the English language.As relatively few works have paid attention to sentiment analysis in the Russian and Kazakh languages,this article thoroughly analyzes news posts in the Kazakhstan media space.The amassed datasets include texts labeled according to three sentiment classes:positive,negative,and neutral.The datasets are highly imbalanced,with a significant predominance of the positive class.Three resampling techniques(undersampling,oversampling,and synthetic minority oversampling(SMOTE))are used to resample the datasets to deal with this issue.Subsequently,the texts are vectorized with the TF-IDF metric and classified with seven machine learning(ML)algorithms:naïve Bayes,support vector machine,logistic regression,k-nearest neighbors,decision tree,random forest,and XGBoost.Experimental results reveal that oversampling and SMOTE with logistic regression,decision tree,and random forest achieve the best classification scores.These models are effectively employed in the developed social analytics platform.
文摘This paper is motivated by the interest in finding significant movements in financial stock prices. However, when the number of profitable opportunities is scarce, the prediction of these cases is difficult. In a previous work, we have introduced evolving decision rules (EDR) to detect financial opportunities. The objective of EDR is to classify the minority class (positive eases) in imbalaneed environments. EDR provides a range of classifications to find the best balance between not making mistakes and not missing opportunities. The goals of this paper are: 1) to show that EDR produces a range of solutions to suit the investor's preferences and 2) to analyze the factors that benefit the performance of EDR. A series of experiments was performed. EDR was tested using a data set from the London Financial Market. To analyze the EDR behaviour, another experiment was carried out using three artificial data sets, whose solutions have different levels of complexity. Finally, an illustrative example was provided to show how a bigger collection of rules is able to classify more positive eases in imbalanced data sets. Experimental results show that: 1) EDR offers a range of solutions to fit the risk guidelines of different types of investors, and 2) a bigger collection of rules is able to classify more positive eases in imbalanced environments.
基金Acknowledgements We would like to express our gratitude to both the associate editor and the anonymous reviewers for their constructive comments that improved the quality of our manuscript to a large extent. This work was supported by the National Natural Science Foundation of China (Grant No.61501229) and the Fundamental Research Funds for the Central Universities (NS2015091, NS2014067, NJ20160013).
文摘In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel under- sampling technique has been successfully applied in searching for the best majority class subset for training a good- performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.