Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the ident...Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.展开更多
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malwar...Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.展开更多
Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation...Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation metric for image classifier models and apply it to the CT image classification of lung cancer. A convolutional neural network is employed as the deep neural network (DNN) image classifier, with the residual network (ResNet) 50 chosen as the DNN archi-tecture. The image data used comprise a lung CT image set. Two classification models are built from datasets with varying amounts of data, and lung cancer is categorized into four classes using 10-fold cross-validation. Furthermore, we employ t-distributed stochastic neighbor embedding to visually explain the data distribution after classification. Experimental results demonstrate that cross en-tropy is a highly useful metric for evaluating the reliability of image classifier models. It is noted that for a more comprehensive evaluation of model perfor-mance, combining with other evaluation metrics is considered essential. .展开更多
针对利用海量数据构建分类模型时训练数据规模大、训练时间长且碳排放量大的问题,提出面向低能耗高性能的分类器两阶段数据选择方法TSDS(Two-Stage Data Selection)。首先,通过修正余弦相似度确定聚类中心,并将样本数据进行基于不相似...针对利用海量数据构建分类模型时训练数据规模大、训练时间长且碳排放量大的问题,提出面向低能耗高性能的分类器两阶段数据选择方法TSDS(Two-Stage Data Selection)。首先,通过修正余弦相似度确定聚类中心,并将样本数据进行基于不相似点的分裂层次聚类;其次,对聚类结果按数据分布自适应抽样以组成高质量的子样本集;最后,利用子样本集在分类模型上训练,在加速训练过程的同时提升模型精度。在Spambase、Bupa和Phoneme等6个数据集上构建支持向量机(SVM)和多层感知机(MLP)分类模型,验证TSDS的性能。实验结果表明在样本数据压缩比达到85.00%的情况下,TSDS能将分类模型准确率提升3~10个百分点,同时加速模型训练,使训练SVM分类器的能耗平均降低93.76%,训练MLP分类器的能耗平均降低75.41%。可见,TSDS在大数据场景的分类任务上既能缩短训练时间和减少能耗,又能提升分类器性能,从而助力实现“双碳”目标。展开更多
文摘Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.
基金This researchwork is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R411),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance is evaluated using metrics such as accuracy,precision,recall,and F1 score.The study’s outcomes comprise a comparative analysis of supervised machine learning classifiers for Windows malware detection.Results reveal the effectiveness and efficiency of each classifier in detecting different types of malware.Additionally,insights into their strengths and limitations provide practical guidance for enhancing cybersecurity defenses.Overall,this research contributes to advancing malware detection techniques and bolstering the security posture of Windows systems against evolving cyber threats.
文摘Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation metric for image classifier models and apply it to the CT image classification of lung cancer. A convolutional neural network is employed as the deep neural network (DNN) image classifier, with the residual network (ResNet) 50 chosen as the DNN archi-tecture. The image data used comprise a lung CT image set. Two classification models are built from datasets with varying amounts of data, and lung cancer is categorized into four classes using 10-fold cross-validation. Furthermore, we employ t-distributed stochastic neighbor embedding to visually explain the data distribution after classification. Experimental results demonstrate that cross en-tropy is a highly useful metric for evaluating the reliability of image classifier models. It is noted that for a more comprehensive evaluation of model perfor-mance, combining with other evaluation metrics is considered essential. .
文摘针对利用海量数据构建分类模型时训练数据规模大、训练时间长且碳排放量大的问题,提出面向低能耗高性能的分类器两阶段数据选择方法TSDS(Two-Stage Data Selection)。首先,通过修正余弦相似度确定聚类中心,并将样本数据进行基于不相似点的分裂层次聚类;其次,对聚类结果按数据分布自适应抽样以组成高质量的子样本集;最后,利用子样本集在分类模型上训练,在加速训练过程的同时提升模型精度。在Spambase、Bupa和Phoneme等6个数据集上构建支持向量机(SVM)和多层感知机(MLP)分类模型,验证TSDS的性能。实验结果表明在样本数据压缩比达到85.00%的情况下,TSDS能将分类模型准确率提升3~10个百分点,同时加速模型训练,使训练SVM分类器的能耗平均降低93.76%,训练MLP分类器的能耗平均降低75.41%。可见,TSDS在大数据场景的分类任务上既能缩短训练时间和减少能耗,又能提升分类器性能,从而助力实现“双碳”目标。