Sonic Hedgehog Medulloblastoma(SHH-MB)is one of the four primary molecular subgroups of Medulloblastoma.It is estimated to be responsible for nearly one-third of allMB cases.Using transcriptomic and DNA methylation pr...Sonic Hedgehog Medulloblastoma(SHH-MB)is one of the four primary molecular subgroups of Medulloblastoma.It is estimated to be responsible for nearly one-third of allMB cases.Using transcriptomic and DNA methylation profiling techniques,new developments in this field determined four molecular subtypes for SHH-MB.SHH-MB subtypes show distinct DNAmethylation patterns that allow their discrimination fromoverlapping subtypes and predict clinical outcomes.Class overlapping occurs when two or more classes share common features,making it difficult to distinguish them as separate.Using the DNA methylation dataset,a novel classification technique is presented to address the issue of overlapping SHH-MBsubtypes.Penalizedmultinomial regression(PMR),Tomek links(TL),and singular value decomposition(SVD)were all smoothly integrated into a single framework.SVD and group lasso improve computational efficiency,address the problem of high-dimensional datasets,and clarify class distinctions by removing redundant or irrelevant features that might lead to class overlap.As a method to eliminate the issues of decision boundary overlap and class imbalance in the classification task,TL enhances dataset balance and increases the clarity of decision boundaries through the elimination of overlapping samples.Using fivefold cross-validation,our proposed method(TL-SVDPMR)achieved a remarkable overall accuracy of almost 95%in the classification of SHH-MB molecular subtypes.The results demonstrate the strong performance of the proposed classification model among the various SHH-MB subtypes given a high average of the area under the curve(AUC)values.Additionally,the statistical significance test indicates that TL-SVDPMR is more accurate than both SVM and random forest algorithms in classifying the overlapping SHH-MB subtypes,highlighting its importance for precision medicine applications.Our findings emphasized the success of combining SVD,TL,and PMRtechniques to improve the classification performance for biomedical applications with many features and overlapping subtypes.展开更多
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba...Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.展开更多
Effectively handling imbalanced datasets remains a fundamental challenge in computational modeling and machine learning,particularly when class overlap significantly deteriorates classification performance.Traditional...Effectively handling imbalanced datasets remains a fundamental challenge in computational modeling and machine learning,particularly when class overlap significantly deteriorates classification performance.Traditional oversampling methods often generate synthetic samples without considering density variations,leading to redundant or misleading instances that exacerbate class overlap in high-density regions.To address these limitations,we propose Wasserstein Generative Adversarial Network Variational Density Estimation WGAN-VDE,a computationally efficient density-aware adversarial resampling framework that enhances minority class representation while strategically reducing class overlap.The originality of WGAN-VDE lies in its density-aware sample refinement,ensuring that synthetic samples are positioned in underrepresented regions,thereby improving class distinctiveness.By applying structured feature representation,targeted sample generation,and density-based selection mechanisms strategies,the proposed framework ensures the generation of well-separated and diverse synthetic samples,improving class separability and reducing redundancy.The experimental evaluation on 20 benchmark datasets demonstrates that this approach outperforms 11 state-of-the-art rebalancing techniques,achieving superior results in F1-score,Accuracy,G-Mean,and AUC metrics.These results establish the proposed method as an effective and robust computational approach,suitable for diverse engineering and scientific applications involving imbalanced data classification and computational modeling.展开更多
Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder ...Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder accurate classification.Researchers have explored various approaches to mitigate the effects of class imbalance.However,most studies focus only on processing correlations within a single category of samples.This paper introduces an ensemble framework called Inter-and Intra-Class Overlapping Ensemble(llCOE),which incorporates two sampling methods.The first method,which is based on classification hardness undersampling,targets majority category samples by using simple samples as the foundation for classification and improving performance by focusing on samples near classification boundaries.The second method addresses the issue of overfitting minority category samples in undersampling and ensemble learning.To mitigate this,an adaptive augment hybrid sampling method is proposed,which enhances the classification boundary of samples and reduces overfitting.This paper conducts multiple experiments on 15 public datasets and concludes that the IlCOE ensemble framework outperforms other ensemble learning algorithms in classifying imbalanced data.展开更多
基金funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01137).
文摘Sonic Hedgehog Medulloblastoma(SHH-MB)is one of the four primary molecular subgroups of Medulloblastoma.It is estimated to be responsible for nearly one-third of allMB cases.Using transcriptomic and DNA methylation profiling techniques,new developments in this field determined four molecular subtypes for SHH-MB.SHH-MB subtypes show distinct DNAmethylation patterns that allow their discrimination fromoverlapping subtypes and predict clinical outcomes.Class overlapping occurs when two or more classes share common features,making it difficult to distinguish them as separate.Using the DNA methylation dataset,a novel classification technique is presented to address the issue of overlapping SHH-MBsubtypes.Penalizedmultinomial regression(PMR),Tomek links(TL),and singular value decomposition(SVD)were all smoothly integrated into a single framework.SVD and group lasso improve computational efficiency,address the problem of high-dimensional datasets,and clarify class distinctions by removing redundant or irrelevant features that might lead to class overlap.As a method to eliminate the issues of decision boundary overlap and class imbalance in the classification task,TL enhances dataset balance and increases the clarity of decision boundaries through the elimination of overlapping samples.Using fivefold cross-validation,our proposed method(TL-SVDPMR)achieved a remarkable overall accuracy of almost 95%in the classification of SHH-MB molecular subtypes.The results demonstrate the strong performance of the proposed classification model among the various SHH-MB subtypes given a high average of the area under the curve(AUC)values.Additionally,the statistical significance test indicates that TL-SVDPMR is more accurate than both SVM and random forest algorithms in classifying the overlapping SHH-MB subtypes,highlighting its importance for precision medicine applications.Our findings emphasized the success of combining SVD,TL,and PMRtechniques to improve the classification performance for biomedical applications with many features and overlapping subtypes.
文摘Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.
基金supported by Ongoing Research Funding Program(ORF-2025-488)King Saud University,Riyadh,Saudi Arabia.
文摘Effectively handling imbalanced datasets remains a fundamental challenge in computational modeling and machine learning,particularly when class overlap significantly deteriorates classification performance.Traditional oversampling methods often generate synthetic samples without considering density variations,leading to redundant or misleading instances that exacerbate class overlap in high-density regions.To address these limitations,we propose Wasserstein Generative Adversarial Network Variational Density Estimation WGAN-VDE,a computationally efficient density-aware adversarial resampling framework that enhances minority class representation while strategically reducing class overlap.The originality of WGAN-VDE lies in its density-aware sample refinement,ensuring that synthetic samples are positioned in underrepresented regions,thereby improving class distinctiveness.By applying structured feature representation,targeted sample generation,and density-based selection mechanisms strategies,the proposed framework ensures the generation of well-separated and diverse synthetic samples,improving class separability and reducing redundancy.The experimental evaluation on 20 benchmark datasets demonstrates that this approach outperforms 11 state-of-the-art rebalancing techniques,achieving superior results in F1-score,Accuracy,G-Mean,and AUC metrics.These results establish the proposed method as an effective and robust computational approach,suitable for diverse engineering and scientific applications involving imbalanced data classification and computational modeling.
基金supported by the National Natural Science Foundation of China(No.62173158)the National Key Research and Development Program of China(No.2019YFC0119600)the Major Science and Technology Program of Hainan Province(No.ZDKJ202004).
文摘Class imbalance can substantially affect classification tasks using traditional classifiers,especially when identifying instances of minority categories.In addition to class imbalance,other challenges can also hinder accurate classification.Researchers have explored various approaches to mitigate the effects of class imbalance.However,most studies focus only on processing correlations within a single category of samples.This paper introduces an ensemble framework called Inter-and Intra-Class Overlapping Ensemble(llCOE),which incorporates two sampling methods.The first method,which is based on classification hardness undersampling,targets majority category samples by using simple samples as the foundation for classification and improving performance by focusing on samples near classification boundaries.The second method addresses the issue of overfitting minority category samples in undersampling and ensemble learning.To mitigate this,an adaptive augment hybrid sampling method is proposed,which enhances the classification boundary of samples and reduces overfitting.This paper conducts multiple experiments on 15 public datasets and concludes that the IlCOE ensemble framework outperforms other ensemble learning algorithms in classifying imbalanced data.