Imbalanced multiclass datasets pose challenges for machine learning algorithms.They often contain minority classes that are important for accurate predictions.However,when the data is sparsely distributed and overlaps...Imbalanced multiclass datasets pose challenges for machine learning algorithms.They often contain minority classes that are important for accurate predictions.However,when the data is sparsely distributed and overlaps with data points fromother classes,it introduces noise.As a result,existing resamplingmethods may fail to preserve the original data patterns,further disrupting data quality and reducingmodel performance.This paper introduces Neighbor Displacement-based Enhanced Synthetic Oversampling(NDESO),a hybridmethod that integrates a data displacement strategy with a resampling technique to achieve data balance.It begins by computing the average distance of noisy data points to their neighbors and adjusting their positions toward the center before applying random oversampling.Extensive evaluations compare 14 alternatives on nine classifiers across synthetic and 20 real-world datasetswith varying imbalance ratios.This evaluation was structured into two distinct test groups.First,the effects of k-neighbor variations and distance metrics are evaluated,followed by a comparison of resampled data distributions against alternatives,and finally,determining the most suitable oversampling technique for data balancing.Second,the overall performance of the NDESO algorithm was assessed,focusing on G-mean and statistical significance.The results demonstrate that our method is robust to a wide range of variations in these parameters and the overall performance achieves an average G-mean score of 0.90,which is among the highest.Additionally,it attains the lowest mean rank of 2.88,indicating statistically significant improvements over existing approaches.This advantage underscores its potential for effectively handling data imbalance in practical scenarios.展开更多
This paper deeply explores oversampling technology and its applications in biomedical signal detection.It first expounds on the significance of oversampling technology in biomedical signal detection,and then analyzes ...This paper deeply explores oversampling technology and its applications in biomedical signal detection.It first expounds on the significance of oversampling technology in biomedical signal detection,and then analyzes the application strategies of oversampling technology in this field.On this basis,it details the specific applications of oversampling technology in electrophysiological signal detection,biomedical imaging signal processing,and other biomedical signal detections,and verifies its effectiveness through practical case analysis,aiming to provide certain references for relevant researchers.展开更多
BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling techn...BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.展开更多
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL...Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.展开更多
Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is graduall...Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.展开更多
Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship ...Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.展开更多
According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extrac...According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.展开更多
Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbase...Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.展开更多
Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machin...Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority.Therefore,the accuracy may be high,but the model cannot recognize data instances in the minority class to classify them,leading to many misclassifications.Different methods have been proposed in the literature to handle the imbalance problem,but most are complicated and tend to simulate unnecessary noise.In this paper,we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering,called GK-Means.The new method aims to avoid generating noise and control imbalances between and within classes.Various experiments have been carried out with six classifiers and four oversampling methods.Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.展开更多
An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by ...An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by pilot symbols and time domain filter.With frequency aliasing squared timing recovery with pilots,it is accessible to estimate timing error under oversampling rate less than 2.The time domain filter simultaneously performs matched-filtering and arbitrary interpolation.Because of pilot assisting,timing error estimation can be free from alias and self noise,so our method has good performance.Compared with traditional time-domain methods requiring oversampling rate above 2,this method can be adapted to any rational oversampling rate including less than 2.Moreover,compared with symbol synchronization in frequency domain which can operate under low oversampling rate,our method saves the complicated operation of conversion between time domain and frequency domain.By low oversampling rate and resource saving filter,this method is suitable for ultra-high-speed communication systems under resource-restricted hardware.The paper carries on the simulation and realization under 64QAM system.The simulation result shows that the loss is very low(less than 0.5 dB),and the real-time implementation on field programmable gate array(FPGA)also works fine.展开更多
Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labv...Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labview and oversampling was implemented in Labview. And analog signal conditioning circuit was improved on. The result indicated that the system could detect ECG signal accurately with high signal-to-noise ratio and the signal processing methods could be adjusted easily. So the new system can satisfy many kinds of ECG acquisition. It is a flexible experiment platform for exploring new ECG acquisition methods.展开更多
A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital ...A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital converters(ADCs).To compensate for the per-formance loss caused by the coarse quantization,oversampling is applied at the receiver.The main challenge for the acquisition of cascaded channel state information in such a system is to handle the distortion caused by the 1-bit quantization and the sample correlation caused by oversampling.In this work,Bussgang decomposition is applied to deal with the coarse quantization,and a Markov chain is developed to char-acterize the banded structure of the oversampling filter.An approximate message-passing based algorithm is proposed for the estimation of the cascaded channels.Simulation results demonstrate that our proposed 1-bit systems with oversampling can approach the 2-bit systems in terms of the mean square error performance while the former consumes much less power at the receiver.展开更多
HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort...HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.展开更多
The identification of high-quality marine shale gas reservoirs has always been a key task in the exploration and development stage.However,due to the serious nonlinear relationship between the logging curve response a...The identification of high-quality marine shale gas reservoirs has always been a key task in the exploration and development stage.However,due to the serious nonlinear relationship between the logging curve response and high-quality reservoirs,the rapid identification of high-quality reservoirs has always been a problem of low accuracy.This study proposes a combination of the oversampling method and random forest algorithm to improve the identification accuracy of high-quality reservoirs based on logging data.The oversampling method is used to balance the number of samples of different types and the random forest algorithm is used to establish a highprecision and high-quality reservoir identification model.From the perspective of the prediction effect,the reservoir identification method that combines the oversampling method and the random forest algorithm has increased the accuracy of reservoir identification from the 44%seen in other machine learning algorithms to 78%,and the effect is significant.This research can improve the identifiability of high-quality marine shale gas reservoirs,guide the drilling of horizontal wells,and provide tangible help for the precise formulation of marine shale gas development plans.展开更多
Load deviations between the output of ultra-supercritical(USC)coal-fired power units and automatic generation control(AGC)commands can adversely affect the safe and stable operation of these units and grid load dispat...Load deviations between the output of ultra-supercritical(USC)coal-fired power units and automatic generation control(AGC)commands can adversely affect the safe and stable operation of these units and grid load dispatching.Data-driven diagnostic methods often fail to account for the imbalanced distribution of data samples,leading to reduced classification performance in diagnosing load deviations in USC units.To address the class imbalance issue in USC load deviation datasets,this study proposes a diagnostic method based on the multi-label natural neighbor boundary oversampling technique(MLNaNBDOS).The method is articulated in three phases.Initially,the traditional binary oversampling strategy is improved by constructing a binary multi-label relationship for the load deviations in coal-fired units.Subsequently,an adaptive adjustment of the oversampling factor is implemented to determine the oversampling weight for each sample class.Finally,the generation of new instances is refined by dynamically evaluating the similarity between new cases and natural neighbors through a random factor,ensuring precise control over the instance generation process.In comparisons with nine benchmark methods across three imbalanced USC load deviation datasets,the proposed method demonstrates superior performance on several key evaluation metrics,including Micro-F1,Micro-G-mean,and Hamming Loss,with average values of 0.8497,0.9150,and 0.1503,respectively.These results substantiate the effectiveness of the proposed method in accurately diagnosing the sources of load deviations in USC units.展开更多
Oversampling is widely used in practical applications of digital signal processing. As the fractional Fourier transform has been developed and applied in signal processing fields, it is necessary to consider the overs...Oversampling is widely used in practical applications of digital signal processing. As the fractional Fourier transform has been developed and applied in signal processing fields, it is necessary to consider the oversampling theorem in the fractional Fourier domain. In this paper, the oversampling theorem in the fractional Fourier domain is analyzed. The fractional Fourier spectral relation between the original oversampled sequence and its subsequences is derived first, and then the expression for exact reconstruction of the missing samples in terms of the subsequences is obtained. Moreover, by taking a chirp signal as an example, it is shown that, reconstruction of the missing samples in the oversampled signal is suitable in the fractional Fourier domain for the signal whose time-frequency distribution has the minimum support in the fractional Fourier domain.展开更多
Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, ov...Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling (WPO) technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperforms many prevalent imbalance learning solutions.展开更多
Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scena...Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios(OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpec Miner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.展开更多
Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenom...Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.展开更多
文摘Imbalanced multiclass datasets pose challenges for machine learning algorithms.They often contain minority classes that are important for accurate predictions.However,when the data is sparsely distributed and overlaps with data points fromother classes,it introduces noise.As a result,existing resamplingmethods may fail to preserve the original data patterns,further disrupting data quality and reducingmodel performance.This paper introduces Neighbor Displacement-based Enhanced Synthetic Oversampling(NDESO),a hybridmethod that integrates a data displacement strategy with a resampling technique to achieve data balance.It begins by computing the average distance of noisy data points to their neighbors and adjusting their positions toward the center before applying random oversampling.Extensive evaluations compare 14 alternatives on nine classifiers across synthetic and 20 real-world datasetswith varying imbalance ratios.This evaluation was structured into two distinct test groups.First,the effects of k-neighbor variations and distance metrics are evaluated,followed by a comparison of resampled data distributions against alternatives,and finally,determining the most suitable oversampling technique for data balancing.Second,the overall performance of the NDESO algorithm was assessed,focusing on G-mean and statistical significance.The results demonstrate that our method is robust to a wide range of variations in these parameters and the overall performance achieves an average G-mean score of 0.90,which is among the highest.Additionally,it attains the lowest mean rank of 2.88,indicating statistically significant improvements over existing approaches.This advantage underscores its potential for effectively handling data imbalance in practical scenarios.
文摘This paper deeply explores oversampling technology and its applications in biomedical signal detection.It first expounds on the significance of oversampling technology in biomedical signal detection,and then analyzes the application strategies of oversampling technology in this field.On this basis,it details the specific applications of oversampling technology in electrophysiological signal detection,biomedical imaging signal processing,and other biomedical signal detections,and verifies its effectiveness through practical case analysis,aiming to provide certain references for relevant researchers.
基金Supported by Discipline Advancement Program of Shanghai Fourth People’s Hospital,No.SY-XKZT-2020-2013.
文摘BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.
文摘Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.
基金Project(52161135301)supported by the International Cooperation and Exchange of the National Natural Science Foundation of ChinaProject(202306370296)supported by China Scholarship Council。
文摘Rockburst is a common geological disaster in underground engineering,which seriously threatens the safety of personnel,equipment and property.Utilizing machine learning models to evaluate risk of rockburst is gradually becoming a trend.In this study,the integrated algorithms under Gradient Boosting Decision Tree(GBDT)framework were used to evaluate and classify rockburst intensity.First,a total of 301 rock burst data samples were obtained from a case database,and the data were preprocessed using synthetic minority over-sampling technique(SMOTE).Then,the rockburst evaluation models including GBDT,eXtreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Categorical Features Gradient Boosting(CatBoost)were established,and the optimal hyperparameters of the models were obtained through random search grid and five-fold cross-validation.Afterwards,use the optimal hyperparameter configuration to fit the evaluation models,and analyze these models using test set.In order to evaluate the performance,metrics including accuracy,precision,recall,and F1-score were selected to analyze and compare with other machine learning models.Finally,the trained models were used to conduct rock burst risk assessment on rock samples from a mine in Shanxi Province,China,and providing theoretical guidance for the mine's safe production work.The models under the GBDT framework perform well in the evaluation of rockburst levels,and the proposed methods can provide a reliable reference for rockburst risk level analysis and safety management.
基金funded by the National Science Foundation of China(62006068)Hebei Natural Science Foundation(A2021402008),Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province(ZD2020185,QN2020188)333 Talent Supported Project of Hebei Province(C20221026).
文摘Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.
基金supported by the National Key Research and Development Program of China(2016YFB0500901)the Natural Science Foundation of Shanghai(18ZR1437200)the Satellite Mapping Technology and Application National Key Laboratory of Geographical Information Bureau(KLSMTA-201709)
文摘According to the oversampling imaging characteristics, an infrared small target detection method based on deep learning is proposed. A 7-layer deep convolutional neural network(CNN) is designed to automatically extract small target features and suppress clutters in an end-to-end manner. The input of CNN is an original oversampling image while the output is a cluttersuppressed feature map. The CNN contains only convolution and non-linear operations, and the resolution of the output feature map is the same as that of the input image. The L1-norm loss function is used, and a mass of training data is generated to train the network effectively. Results show that compared with several baseline methods, the proposed method improves the signal clutter ratio gain and background suppression factor by 3–4 orders of magnitude, and has more powerful target detection performance.
基金This work was supported by National Key R&D Program of China(Grant Numbers 2020YFB1005900,2022YFB3305802).
文摘Due to the anonymity of blockchain,frequent security incidents and attacks occur through it,among which the Ponzi scheme smart contract is a classic type of fraud resulting in huge economic losses.Machine learningbased methods are believed to be promising for detecting ethereum Ponzi schemes.However,there are still some flaws in current research,e.g.,insufficient feature extraction of Ponzi scheme smart contracts,without considering class imbalance.In addition,there is room for improvement in detection precision.Aiming at the above problems,this paper proposes an ethereum Ponzi scheme detection scheme through opcode context analysis and adaptive boosting(AdaBoost)algorithm.Firstly,this paper uses the n-gram algorithm to extract more comprehensive contract opcode features and combine them with contract account features,which helps to improve the feature extraction effect.Meanwhile,adaptive synthetic sampling(ADASYN)is introduced to deal with class imbalanced data,and integrated with the Adaboost classifier.Finally,this paper uses the improved AdaBoost classifier for the identification of Ponzi scheme contracts.Experimentally,this paper tests our model in real-world smart contracts and compares it with representative methods in the aspect of F1-score and precision.Moreover,this article compares and discusses the state of art methods with our method in four aspects:data acquisition,data preprocessing,feature extraction,and classifier design.Both experiment and discussion validate the effectiveness of our model.
文摘Learning from imbalanced data is one of the greatest challenging problems in binary classification,and this problem has gained more importance in recent years.When the class distribution is imbalanced,classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority.Therefore,the accuracy may be high,but the model cannot recognize data instances in the minority class to classify them,leading to many misclassifications.Different methods have been proposed in the literature to handle the imbalance problem,but most are complicated and tend to simulate unnecessary noise.In this paper,we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering,called GK-Means.The new method aims to avoid generating noise and control imbalances between and within classes.Various experiments have been carried out with six classifiers and four oversampling methods.Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.
文摘An efficient single-carrier symbol synchronization method is proposed in this paper,which can work under a very low oversampling rate.This method is based on the frequency aliasing squared timing recovery assisted by pilot symbols and time domain filter.With frequency aliasing squared timing recovery with pilots,it is accessible to estimate timing error under oversampling rate less than 2.The time domain filter simultaneously performs matched-filtering and arbitrary interpolation.Because of pilot assisting,timing error estimation can be free from alias and self noise,so our method has good performance.Compared with traditional time-domain methods requiring oversampling rate above 2,this method can be adapted to any rational oversampling rate including less than 2.Moreover,compared with symbol synchronization in frequency domain which can operate under low oversampling rate,our method saves the complicated operation of conversion between time domain and frequency domain.By low oversampling rate and resource saving filter,this method is suitable for ultra-high-speed communication systems under resource-restricted hardware.The paper carries on the simulation and realization under 64QAM system.The simulation result shows that the loss is very low(less than 0.5 dB),and the real-time implementation on field programmable gate array(FPGA)also works fine.
文摘Traditional ECG acquisition system lacks for flexibility. To improve the flexibility of ECG acquisition system and the signal-to-noise ratio of ECG, a new ECG acquisition system was designed based on DAQ card and Labview and oversampling was implemented in Labview. And analog signal conditioning circuit was improved on. The result indicated that the system could detect ECG signal accurately with high signal-to-noise ratio and the signal processing methods could be adjusted easily. So the new system can satisfy many kinds of ECG acquisition. It is a flexible experiment platform for exploring new ECG acquisition methods.
文摘A reconfigurable intelligent surface(RIS)aided massive multiple-input multiple-output(MIMO)system is considered,where the base station employs a large antenna array with low-cost and low-power 1-bit analog-to-digital converters(ADCs).To compensate for the per-formance loss caused by the coarse quantization,oversampling is applied at the receiver.The main challenge for the acquisition of cascaded channel state information in such a system is to handle the distortion caused by the 1-bit quantization and the sample correlation caused by oversampling.In this work,Bussgang decomposition is applied to deal with the coarse quantization,and a Markov chain is developed to char-acterize the banded structure of the oversampling filter.An approximate message-passing based algorithm is proposed for the estimation of the cascaded channels.Simulation results demonstrate that our proposed 1-bit systems with oversampling can approach the 2-bit systems in terms of the mean square error performance while the former consumes much less power at the receiver.
文摘HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75.
基金This project was funded by the Laboratory for Marine Geology,Qingdao National Laboratory for Marine Science and Technology,(MGQNLM-KF202004)China Postdoctoral Science Foundation(2021M690161,2021T140691)+2 种基金Postdoctoral Funded Project in Hainan Province(General Program)Chinese Academy of Sciences-Special Research Assistant Projectthe Open Fund of Key Laboratory of Exploration Technologies for Oil and Gas Resources(Yangtze University),Ministry of Education(No.K2021–03,K2021-08)。
文摘The identification of high-quality marine shale gas reservoirs has always been a key task in the exploration and development stage.However,due to the serious nonlinear relationship between the logging curve response and high-quality reservoirs,the rapid identification of high-quality reservoirs has always been a problem of low accuracy.This study proposes a combination of the oversampling method and random forest algorithm to improve the identification accuracy of high-quality reservoirs based on logging data.The oversampling method is used to balance the number of samples of different types and the random forest algorithm is used to establish a highprecision and high-quality reservoir identification model.From the perspective of the prediction effect,the reservoir identification method that combines the oversampling method and the random forest algorithm has increased the accuracy of reservoir identification from the 44%seen in other machine learning algorithms to 78%,and the effect is significant.This research can improve the identifiability of high-quality marine shale gas reservoirs,guide the drilling of horizontal wells,and provide tangible help for the precise formulation of marine shale gas development plans.
基金supported by the National Natural Science Foundation of China(Grant No.62173050)Shenzhen Municipal Science and Technology Innovation Committee(Grant No.KCXFZ20211020165004006)+3 种基金Natural Science Foundation of Hunan Province of China(Grant No.2023JJ30051)Hunan Provincial Graduate Student Research Innovation Project(Grant No.QL20230214)Major Scientific and Technological Innovation Platform Project of Hunan Province(2024JC1003)Hunan Provincial University Students’Energy Conservation and Emission Reduction Innovation and Entrepreneurship Education Center(Grant No.2019-10).
文摘Load deviations between the output of ultra-supercritical(USC)coal-fired power units and automatic generation control(AGC)commands can adversely affect the safe and stable operation of these units and grid load dispatching.Data-driven diagnostic methods often fail to account for the imbalanced distribution of data samples,leading to reduced classification performance in diagnosing load deviations in USC units.To address the class imbalance issue in USC load deviation datasets,this study proposes a diagnostic method based on the multi-label natural neighbor boundary oversampling technique(MLNaNBDOS).The method is articulated in three phases.Initially,the traditional binary oversampling strategy is improved by constructing a binary multi-label relationship for the load deviations in coal-fired units.Subsequently,an adaptive adjustment of the oversampling factor is implemented to determine the oversampling weight for each sample class.Finally,the generation of new instances is refined by dynamically evaluating the similarity between new cases and natural neighbors through a random factor,ensuring precise control over the instance generation process.In comparisons with nine benchmark methods across three imbalanced USC load deviation datasets,the proposed method demonstrates superior performance on several key evaluation metrics,including Micro-F1,Micro-G-mean,and Hamming Loss,with average values of 0.8497,0.9150,and 0.1503,respectively.These results substantiate the effectiveness of the proposed method in accurately diagnosing the sources of load deviations in USC units.
基金Supported partially by the National Natural Science Foundation of China for Distinguished Young Scholars (Grant No. 60625104)the National Natural Science Foundation of China (Grant Nos. 60890072, 60572094)the National Basic Research Program of China (Grant No.2009CB724003)
文摘Oversampling is widely used in practical applications of digital signal processing. As the fractional Fourier transform has been developed and applied in signal processing fields, it is necessary to consider the oversampling theorem in the fractional Fourier domain. In this paper, the oversampling theorem in the fractional Fourier domain is analyzed. The fractional Fourier spectral relation between the original oversampled sequence and its subsequences is derived first, and then the expression for exact reconstruction of the missing samples in terms of the subsequences is obtained. Moreover, by taking a chirp signal as an example, it is shown that, reconstruction of the missing samples in the oversampled signal is suitable in the fractional Fourier domain for the signal whose time-frequency distribution has the minimum support in the fractional Fourier domain.
基金Acknowledgements This research was partially supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA06030200), the National Natural Science Foundation of China (Grant Nos. M1552006, 61403369, 61272427, and 61363030), Xinjiang Uygur Autonomous Region Science and Technology Project (201230123), Beijing Key Lab of Intelligent Telecommunication Software, Multimedia (ITSM201502), Guangxi Key Laboratory of Trusted Software (kx201418).
文摘Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling (WPO) technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperforms many prevalent imbalance learning solutions.
基金supported by the Scientific Research Project of the Education Department of Hubei Province,China(No.Q20181508)the Youths Science Foundation of Wuhan Institute of Technology(No.k201622)+5 种基金the Surveying and Mapping Geographic Information Public Welfare Scientific Research Special Industry(No.201412014)the Educational Commission of Hubei Province,China(No.Q20151504)the National Natural Science Foundation of China(Nos.41501505,61502355,61502355,and 61502354)the China Postdoctoral Science Foundation(No.2015M581887)the Key Program of Higher Education Institutions of Henan Province,China(No.17A520040)and the Natural Science Foundation of Henan Province,China(No.162300410177)
文摘Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios(OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpec Miner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.
文摘Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.