Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its ...Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its wide range of dialect variations.Te availability of annotated datasets is limited,and preprocessing of the noisy content is even more challenging,sometimes resulting in the removal of important cues of sentiment from the input.To overcome such problems,this study investigates the applicability of using transfer learning based on pre-trained transformer models to classify sentiment in Arabic texts with high accuracy.Specifcally,it uses the CAMeLBERT model fnetuned for the Multi-Domain Arabic Resources for Sentiment Analysis(MARSA)dataset containing more than 56,000 manually annotated tweets annotated across political,social,sports,and technology domains.Te proposed method avoids extensive use of preprocessing and shows that raw data provides better results because they tend to retain more linguistic features.Te fne-tuned CAMeLBERT model produces state-of-the-art accuracy of 92%,precision of 91.7%,recall of 92.3%,and F1-score of 91.5%,outperforming standard machine learning models and ensemble-based/deep learning techniques.Our performance comparisons against other pre-trained models,namely AraBERTv02-twitter and MARBERT,show that transformer-based architectures are consistently the best suited when dealing with noisy Arabic texts.Tis work leads to a strong remedy for the problems in Arabic sentiment analysis and provides recommendations on easy tuning of the pre-trained models to adapt to challenging linguistic features and domain-specifc tasks.展开更多
基金funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)(grant number IMSIU-DDRSP2504).
文摘Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its wide range of dialect variations.Te availability of annotated datasets is limited,and preprocessing of the noisy content is even more challenging,sometimes resulting in the removal of important cues of sentiment from the input.To overcome such problems,this study investigates the applicability of using transfer learning based on pre-trained transformer models to classify sentiment in Arabic texts with high accuracy.Specifcally,it uses the CAMeLBERT model fnetuned for the Multi-Domain Arabic Resources for Sentiment Analysis(MARSA)dataset containing more than 56,000 manually annotated tweets annotated across political,social,sports,and technology domains.Te proposed method avoids extensive use of preprocessing and shows that raw data provides better results because they tend to retain more linguistic features.Te fne-tuned CAMeLBERT model produces state-of-the-art accuracy of 92%,precision of 91.7%,recall of 92.3%,and F1-score of 91.5%,outperforming standard machine learning models and ensemble-based/deep learning techniques.Our performance comparisons against other pre-trained models,namely AraBERTv02-twitter and MARBERT,show that transformer-based architectures are consistently the best suited when dealing with noisy Arabic texts.Tis work leads to a strong remedy for the problems in Arabic sentiment analysis and provides recommendations on easy tuning of the pre-trained models to adapt to challenging linguistic features and domain-specifc tasks.