Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personalinformation, such as age, gender, occupation, and education, based on various linguistic features, e.g....Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personalinformation, such as age, gender, occupation, and education, based on various linguistic features, e.g., stylistic,semantic, and syntactic. The importance of AP lies in various fields, including forensics, security, medicine, andmarketing. In previous studies, many works have been done using different languages, e.g., English, Arabic, French,etc.However, the research on RomanUrdu is not up to the mark.Hence, this study focuses on detecting the author’sage and gender based on Roman Urdu text messages. The dataset used in this study is Fire’18-MaponSMS. Thisstudy proposed an ensemble model based on AdaBoostM1 and Random Forest (AMBRF) for AP using multiplelinguistic features that are stylistic, character-based, word-based, and sentence-based. The proposed model iscontrasted with several of the well-known models fromthe literature, including J48-Decision Tree (J48),Na飗e Bays(NB), K Nearest Neighbor (KNN), and Composite Hypercube on Random Projection (CHIRP), NB-Updatable,RF, and AdaboostM1. The overall outcome shows the better performance of the proposed AdaboostM1 withRandom Forest (ABMRF) with an accuracy of 54.2857% for age prediction and 71.1429% for gender predictioncalculated on stylistic features. Regarding word-based features, age and gender were considered in 50.5714% and60%, respectively. On the other hand, KNN and CHIRP show the weakest performance using all the linguisticfeatures for age and gender prediction.展开更多
This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text...This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.展开更多
Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phis...Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.展开更多
基金the support of Prince Sultan University for the Article Processing Charges(APC)of this publication。
文摘Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personalinformation, such as age, gender, occupation, and education, based on various linguistic features, e.g., stylistic,semantic, and syntactic. The importance of AP lies in various fields, including forensics, security, medicine, andmarketing. In previous studies, many works have been done using different languages, e.g., English, Arabic, French,etc.However, the research on RomanUrdu is not up to the mark.Hence, this study focuses on detecting the author’sage and gender based on Roman Urdu text messages. The dataset used in this study is Fire’18-MaponSMS. Thisstudy proposed an ensemble model based on AdaBoostM1 and Random Forest (AMBRF) for AP using multiplelinguistic features that are stylistic, character-based, word-based, and sentence-based. The proposed model iscontrasted with several of the well-known models fromthe literature, including J48-Decision Tree (J48),Na飗e Bays(NB), K Nearest Neighbor (KNN), and Composite Hypercube on Random Projection (CHIRP), NB-Updatable,RF, and AdaboostM1. The overall outcome shows the better performance of the proposed AdaboostM1 withRandom Forest (ABMRF) with an accuracy of 54.2857% for age prediction and 71.1429% for gender predictioncalculated on stylistic features. Regarding word-based features, age and gender were considered in 50.5714% and60%, respectively. On the other hand, KNN and CHIRP show the weakest performance using all the linguisticfeatures for age and gender prediction.
文摘This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.
文摘Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.