For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). ...Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). The collected EEG signals are processed using Machine Learning-Random Forest and Naive Bayes- and Deep Learning-Recurrent Neural Network (RNN), Neural Network (NN) and Long Short Term Memory (LSTM)-Algorithms to obtain the recent mood of a person. The Algorithms mentioned above have been imposed on the data set in order to find out what the person is feeling at a particular moment. The following thesis is conducted to find out one of the following moods (happy, surprised, disgust, fear, anger and sadness) of a person at an instant, with an aim to obtain the result with least amount of time delay as the mood differs. It is pretty obvious that the accuracy of the output varies depending upon the algorithm used, time taken to process the data, so that it is easy for us to compare the reliability and dependency of a particular algorithm to another, prior to its practical implementation. The imbalance data sets that were used had an imbalanced class and thus, over fitting occurred. This problem was handled by generating Artificial Data sets with the use of SMOTE Oversampling Technique.展开更多
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
文摘Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). The collected EEG signals are processed using Machine Learning-Random Forest and Naive Bayes- and Deep Learning-Recurrent Neural Network (RNN), Neural Network (NN) and Long Short Term Memory (LSTM)-Algorithms to obtain the recent mood of a person. The Algorithms mentioned above have been imposed on the data set in order to find out what the person is feeling at a particular moment. The following thesis is conducted to find out one of the following moods (happy, surprised, disgust, fear, anger and sadness) of a person at an instant, with an aim to obtain the result with least amount of time delay as the mood differs. It is pretty obvious that the accuracy of the output varies depending upon the algorithm used, time taken to process the data, so that it is easy for us to compare the reliability and dependency of a particular algorithm to another, prior to its practical implementation. The imbalance data sets that were used had an imbalanced class and thus, over fitting occurred. This problem was handled by generating Artificial Data sets with the use of SMOTE Oversampling Technique.