Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its ...Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its wide range of dialect variations.Te availability of annotated datasets is limited,and preprocessing of the noisy content is even more challenging,sometimes resulting in the removal of important cues of sentiment from the input.To overcome such problems,this study investigates the applicability of using transfer learning based on pre-trained transformer models to classify sentiment in Arabic texts with high accuracy.Specifcally,it uses the CAMeLBERT model fnetuned for the Multi-Domain Arabic Resources for Sentiment Analysis(MARSA)dataset containing more than 56,000 manually annotated tweets annotated across political,social,sports,and technology domains.Te proposed method avoids extensive use of preprocessing and shows that raw data provides better results because they tend to retain more linguistic features.Te fne-tuned CAMeLBERT model produces state-of-the-art accuracy of 92%,precision of 91.7%,recall of 92.3%,and F1-score of 91.5%,outperforming standard machine learning models and ensemble-based/deep learning techniques.Our performance comparisons against other pre-trained models,namely AraBERTv02-twitter and MARBERT,show that transformer-based architectures are consistently the best suited when dealing with noisy Arabic texts.Tis work leads to a strong remedy for the problems in Arabic sentiment analysis and provides recommendations on easy tuning of the pre-trained models to adapt to challenging linguistic features and domain-specifc tasks.展开更多
针对现有的专家推荐算法忽略了用户评论中蕴含的情感表达对专家专长表征的影响,从而导致推荐准确度不高的问题,提出基于双向编码器表示-多头注意力机制(bidirectional encoder representations from transformers-multi-head attention,...针对现有的专家推荐算法忽略了用户评论中蕴含的情感表达对专家专长表征的影响,从而导致推荐准确度不高的问题,提出基于双向编码器表示-多头注意力机制(bidirectional encoder representations from transformers-multi-head attention,BERT-MHA)的深度语义增强专家推荐算法。该算法基于预训练BERT模型,融合MHA机制,自动调整用户评论对专家历史回答问题的情感注意力权重,获取专家动态专长表征,并与静态专长联合以实现专家特征文本的语义增强,表征专家综合专长;通过注意力机制识别用户问题特征;采用多层感知机建模专家综合专长与用户问题间的非线性交互,预测推荐专家的匹配度。利用好大夫网站(haodf.com)的数据进行了不同参数配置及不同算法的对比实验,实验结果表明该算法在准确率(accuracy,ACC)和曲线下的面积(area under curve,AUC)指标下明显优于其他算法,能有效提高在线问答社区的专家推荐准确度。展开更多
在大数据时代,推荐算法有效缓解了信息过载问题,尤其在岗位推荐领域展现出重要价值。然而,针对高校毕业生的人岗推荐面临数据冷启动和数据稀疏性挑战,需综合考量专业、实习经历和就业意向等因素。本文提出基于Transformer的双向编码器表...在大数据时代,推荐算法有效缓解了信息过载问题,尤其在岗位推荐领域展现出重要价值。然而,针对高校毕业生的人岗推荐面临数据冷启动和数据稀疏性挑战,需综合考量专业、实习经历和就业意向等因素。本文提出基于Transformer的双向编码器表示(bidirectional encoder representation from Transformers,BERT)模型的混合推荐模型,设计冷启动与热启动双路径推荐策略。冷启动路径基于BERT模型计算岗位与学生嵌入向量的相似度,解决新用户历史数据缺失的困境,热启动路径基于既有用户行为数据,采用加权平均融合策略整合岗位相似度与用户相似度评分矩阵,以提升推荐精度。用户满意度调查显示:推荐岗位数量在“3~10个”时,符合预期或引起足够兴趣的百分比超70%,验证了该系统满足毕业生就业服务需求的有效性。展开更多
基金funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)(grant number IMSIU-DDRSP2504).
文摘Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its wide range of dialect variations.Te availability of annotated datasets is limited,and preprocessing of the noisy content is even more challenging,sometimes resulting in the removal of important cues of sentiment from the input.To overcome such problems,this study investigates the applicability of using transfer learning based on pre-trained transformer models to classify sentiment in Arabic texts with high accuracy.Specifcally,it uses the CAMeLBERT model fnetuned for the Multi-Domain Arabic Resources for Sentiment Analysis(MARSA)dataset containing more than 56,000 manually annotated tweets annotated across political,social,sports,and technology domains.Te proposed method avoids extensive use of preprocessing and shows that raw data provides better results because they tend to retain more linguistic features.Te fne-tuned CAMeLBERT model produces state-of-the-art accuracy of 92%,precision of 91.7%,recall of 92.3%,and F1-score of 91.5%,outperforming standard machine learning models and ensemble-based/deep learning techniques.Our performance comparisons against other pre-trained models,namely AraBERTv02-twitter and MARBERT,show that transformer-based architectures are consistently the best suited when dealing with noisy Arabic texts.Tis work leads to a strong remedy for the problems in Arabic sentiment analysis and provides recommendations on easy tuning of the pre-trained models to adapt to challenging linguistic features and domain-specifc tasks.
文摘针对现有的专家推荐算法忽略了用户评论中蕴含的情感表达对专家专长表征的影响,从而导致推荐准确度不高的问题,提出基于双向编码器表示-多头注意力机制(bidirectional encoder representations from transformers-multi-head attention,BERT-MHA)的深度语义增强专家推荐算法。该算法基于预训练BERT模型,融合MHA机制,自动调整用户评论对专家历史回答问题的情感注意力权重,获取专家动态专长表征,并与静态专长联合以实现专家特征文本的语义增强,表征专家综合专长;通过注意力机制识别用户问题特征;采用多层感知机建模专家综合专长与用户问题间的非线性交互,预测推荐专家的匹配度。利用好大夫网站(haodf.com)的数据进行了不同参数配置及不同算法的对比实验,实验结果表明该算法在准确率(accuracy,ACC)和曲线下的面积(area under curve,AUC)指标下明显优于其他算法,能有效提高在线问答社区的专家推荐准确度。
文摘在大数据时代,推荐算法有效缓解了信息过载问题,尤其在岗位推荐领域展现出重要价值。然而,针对高校毕业生的人岗推荐面临数据冷启动和数据稀疏性挑战,需综合考量专业、实习经历和就业意向等因素。本文提出基于Transformer的双向编码器表示(bidirectional encoder representation from Transformers,BERT)模型的混合推荐模型,设计冷启动与热启动双路径推荐策略。冷启动路径基于BERT模型计算岗位与学生嵌入向量的相似度,解决新用户历史数据缺失的困境,热启动路径基于既有用户行为数据,采用加权平均融合策略整合岗位相似度与用户相似度评分矩阵,以提升推荐精度。用户满意度调查显示:推荐岗位数量在“3~10个”时,符合预期或引起足够兴趣的百分比超70%,验证了该系统满足毕业生就业服务需求的有效性。