Many machine learning-based Android malware detection often suffers from concept drift,where models trained on historical data fail to generalize to evolving threats.This paper proposes SCAN(Structural Clustering with...Many machine learning-based Android malware detection often suffers from concept drift,where models trained on historical data fail to generalize to evolving threats.This paper proposes SCAN(Structural Clustering with Adaptive thresholds for iNtelligent Android malware detection),a hybrid intelligent framework designed to mitigate concept drift without retraining.SCAN integrates Gaussian Mixture Models(GMMs)-based clustering with cluster-wise adaptive thresholding and supervised classifiers tailored to each cluster.A key challenge in clusteringbased malware detection is cluster-wise class imbalance,where clusters contain disproportionate distributions of benign and malicious samples.SCAN addresses this issue through adaptive thresholding,which dynamically adjusts the decision boundary of each cluster according to its malicious-to-benign ratio.In the final training stage,four supervised learning algorithms—Random Forest(RF),Support Vector Machine(SVM),k-NN,and XGBoost—are applied within the GMM-defined clusters.We train SCAN on Android applications collected from 2014-2017 and test it with applications from 2018-2023.Experimental results demonstrate that SCAN combined with RF consistently achieves superior performance,with both average accuracy and average F1-score exceeding 91%.These findings confirm SCAN’s robustness to concept drift and highlight its potential as a sustainable and intelligent solution for long-term Android malware detection in the real world.展开更多
Android smartphones have become an integral part of our daily lives,becoming targets for ransomware attacks.Such attacks encrypt user information and ask for payment to recover it.Conventional detection mechanisms,suc...Android smartphones have become an integral part of our daily lives,becoming targets for ransomware attacks.Such attacks encrypt user information and ask for payment to recover it.Conventional detection mechanisms,such as signature-based and heuristic techniques,often fail to detect new and polymorphic ransomware samples.To address this challenge,we employed various ensemble classifiers,such as Random Forest,Gradient Boosting,Bagging,and AutoML models.We aimed to showcase how AutoML can automate processes such as model selection,feature engineering,and hyperparameter optimization,to minimize manual effort while ensuring or enhancing performance compared to traditional approaches.We used this framework to test it with a publicly available dataset from the Kaggle repository,which contains features for Android ransomware network traffic.The dataset comprises 392,024 flow records,divided into eleven groups.There are ten classes for various ransomware types,including SVpeng,PornDroid,Koler,WannaLocker,and Lockerpin.There is also a class for regular traffic.We applied a three-step procedure to select themost relevant features:filter,wrapper,and embeddedmethods.The Bagging classifier was highly accurate,correctly getting 99.84%of the time.The FLAML AutoML framework was evenmore accurate,correctly getting 99.85%of the time.This is indicative of howwellAutoML performs in improving things with minimal human assistance.Our findings indicate that AutoML is an efficient,scalable,and flexible method to discover Android ransomware,and it will facilitate the development of next-generation intrusion detection systems.展开更多
基金supported in part by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science and ICT(No.2021R1A2C2012574)in part by the IITP(Institute of Information&Communications Technology Planning&Evaluation)-ITRC(Information Technology Research Center)grant funded by the Korea government(Ministry of Science and ICT)(IITP-2025-RS-2023-00259967).
文摘Many machine learning-based Android malware detection often suffers from concept drift,where models trained on historical data fail to generalize to evolving threats.This paper proposes SCAN(Structural Clustering with Adaptive thresholds for iNtelligent Android malware detection),a hybrid intelligent framework designed to mitigate concept drift without retraining.SCAN integrates Gaussian Mixture Models(GMMs)-based clustering with cluster-wise adaptive thresholding and supervised classifiers tailored to each cluster.A key challenge in clusteringbased malware detection is cluster-wise class imbalance,where clusters contain disproportionate distributions of benign and malicious samples.SCAN addresses this issue through adaptive thresholding,which dynamically adjusts the decision boundary of each cluster according to its malicious-to-benign ratio.In the final training stage,four supervised learning algorithms—Random Forest(RF),Support Vector Machine(SVM),k-NN,and XGBoost—are applied within the GMM-defined clusters.We train SCAN on Android applications collected from 2014-2017 and test it with applications from 2018-2023.Experimental results demonstrate that SCAN combined with RF consistently achieves superior performance,with both average accuracy and average F1-score exceeding 91%.These findings confirm SCAN’s robustness to concept drift and highlight its potential as a sustainable and intelligent solution for long-term Android malware detection in the real world.
基金supported through theOngoing Research Funding Program(ORF-2025-498),King Saud University,Riyadh,Saudi Arabia.
文摘Android smartphones have become an integral part of our daily lives,becoming targets for ransomware attacks.Such attacks encrypt user information and ask for payment to recover it.Conventional detection mechanisms,such as signature-based and heuristic techniques,often fail to detect new and polymorphic ransomware samples.To address this challenge,we employed various ensemble classifiers,such as Random Forest,Gradient Boosting,Bagging,and AutoML models.We aimed to showcase how AutoML can automate processes such as model selection,feature engineering,and hyperparameter optimization,to minimize manual effort while ensuring or enhancing performance compared to traditional approaches.We used this framework to test it with a publicly available dataset from the Kaggle repository,which contains features for Android ransomware network traffic.The dataset comprises 392,024 flow records,divided into eleven groups.There are ten classes for various ransomware types,including SVpeng,PornDroid,Koler,WannaLocker,and Lockerpin.There is also a class for regular traffic.We applied a three-step procedure to select themost relevant features:filter,wrapper,and embeddedmethods.The Bagging classifier was highly accurate,correctly getting 99.84%of the time.The FLAML AutoML framework was evenmore accurate,correctly getting 99.85%of the time.This is indicative of howwellAutoML performs in improving things with minimal human assistance.Our findings indicate that AutoML is an efficient,scalable,and flexible method to discover Android ransomware,and it will facilitate the development of next-generation intrusion detection systems.