期刊文献+
共找到28篇文章
< 1 2 >
每页显示 20 50 100
RUSAS: Roman Urdu Sentiment Analysis System
1
作者 Kazim Jawad Muhammad Ahmad +1 位作者 Majdah Alvi Muhammad Bux Alvi 《Computers, Materials & Continua》 SCIE EI 2024年第4期1463-1480,共18页
Sentiment analysis, the meta field of Natural Language Processing (NLP), attempts to analyze and identify thesentiments in the opinionated text data. People share their judgments, reactions, and feedback on the intern... Sentiment analysis, the meta field of Natural Language Processing (NLP), attempts to analyze and identify thesentiments in the opinionated text data. People share their judgments, reactions, and feedback on the internetusing various languages. Urdu is one of them, and it is frequently used worldwide. Urdu-speaking people prefer tocommunicate on social media in Roman Urdu (RU), an English scripting style with the Urdu language dialect.Researchers have developed versatile lexical resources for features-rich comprehensive languages, but limitedlinguistic resources are available to facilitate the sentiment classification of Roman Urdu. This effort encompassesextracting subjective expressions in Roman Urdu and determining the implied opinionated text polarity. Theprimary sources of the dataset are Daraz (an e-commerce platform), Google Maps, and the manual effort. Thecontributions of this study include a Bilingual Roman Urdu Language Detector (BRULD) and a Roman UrduSpelling Checker (RUSC). These integrated modules accept the user input, detect the text language, correct thespellings, categorize the sentiments, and return the input sentence’s orientation with a sentiment intensity score.The developed system gains strength with each input experience gradually. The results show that the languagedetector gives an accuracy of 97.1% on a close domain dataset, with an overall sentiment classification accuracy of94.3%. 展开更多
关键词 Roman urdu sentiment analysis Roman urdu language detector Roman urdu spelling checker FLASK
在线阅读 下载PDF
LKMT:Linguistics Knowledge-Driven Multi-Task Neural Machine Translation for Urdu and English
2
作者 Muhammad Naeem Ul Hassan Zhengtao Yu +4 位作者 Jian Wang Ying Li Shengxiang Gao Shuwan Yang Cunli Mao 《Computers, Materials & Continua》 SCIE EI 2024年第10期951-969,共19页
Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the ... Thanks to the strong representation capability of pre-trained language models,supervised machine translation models have achieved outstanding performance.However,the performances of these models drop sharply when the scale of the parallel training corpus is limited.Considering the pre-trained language model has a strong ability for monolingual representation,it is the key challenge for machine translation to construct the in-depth relationship between the source and target language by injecting the lexical and syntactic information into pre-trained language models.To alleviate the dependence on the parallel corpus,we propose a Linguistics Knowledge-Driven MultiTask(LKMT)approach to inject part-of-speech and syntactic knowledge into pre-trained models,thus enhancing the machine translation performance.On the one hand,we integrate part-of-speech and dependency labels into the embedding layer and exploit large-scale monolingual corpus to update all parameters of pre-trained language models,thus ensuring the updated language model contains potential lexical and syntactic information.On the other hand,we leverage an extra self-attention layer to explicitly inject linguistic knowledge into the pre-trained language model-enhanced machine translation model.Experiments on the benchmark dataset show that our proposed LKMT approach improves the Urdu-English translation accuracy by 1.97 points and the English-Urdu translation accuracy by 2.42 points,highlighting the effectiveness of our LKMT framework.Detailed ablation experiments confirm the positive impact of part-of-speech and dependency parsing on machine translation. 展开更多
关键词 urdu NMT(neural machine translation) urdu natural language processing urdu Linguistic features low resources language linguistic features pretrain model
在线阅读 下载PDF
Offline Urdu Nastaleeq Optical Character Recognition Based on Stacked Denoising Autoencoder 被引量:2
3
作者 Ibrar Ahmad Xiaojie Wang +1 位作者 Ruifan Li Shahid Rasheed 《China Communications》 SCIE CSCD 2017年第1期146-157,共12页
Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentat... Offline Urdu Nastaleeq text recognition has long been a serious problem due to its very cursive nature. In order to get rid of the character segmentation problems, many researchers are shifting focus towards segmentation free ligature based recognition approaches. Majority of the prevalent ligature based recognition systems heavily rely on hand-engineered feature extraction techniques. However, such techniques are more error prone and may often lead to a loss of useful information that might hardly be captured later by any manual features. Most of the prevalent Urdu Nastaleeq test recognition was trained and tested on small sets. This paper proposes the use of stacked denoising autoencoder for automatic feature extraction directly from raw pixel values of ligature images. Such deep learning networks have not been applied for the recognition of Urdu text thus far. Different stacked denoising autoencoders have been trained on 178573 ligatures with 3732 classes from un-degraded(noise free) UPTI(Urdu Printed Text Image) data set. Subsequently, trained networks are validated and tested on degraded versions of UPTI data set. The experimental results demonstrate accuracies in range of 93% to 96% which are better than the existing Urdu OCR systems for such large dataset of ligatures. 展开更多
关键词 offline printed ligature recognition urdu nastaleeq denoising autoencoder deep learning classification
在线阅读 下载PDF
Roman Urdu News Headline Classification Empowered with Machine Learning 被引量:2
4
作者 Rizwan Ali Naqvi Muhammad Adnan Khan +3 位作者 Nauman Malik Shazia Saqib Tahir Alyas Dildar Hussain 《Computers, Materials & Continua》 SCIE EI 2020年第11期1221-1236,共16页
Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for ... Roman Urdu has been used for text messaging over the Internet for years especially in Indo-Pak Subcontinent.Persons from the subcontinent may speak the same Urdu language but they might be using different scripts for writing.The communication using the Roman characters,which are used in the script of Urdu language on social media,is now considered the most typical standard of communication in an Indian landmass that makes it an expensive information supply.English Text classification is a solved problem but there have been only a few efforts to examine the rich information supply of Roman Urdu in the past.This is due to the numerous complexities involved in the processing of Roman Urdu data.The complexities associated with Roman Urdu include the non-availability of the tagged corpus,lack of a set of rules,and lack of standardized spellings.A large amount of Roman Urdu news data is available on mainstream news websites and social media websites like Facebook,Twitter but meaningful information can only be extracted if data is in a structured format.We have developed a Roman Urdu news headline classifier,which will help to classify news into relevant categories on which further analysis and modeling can be done.The author of this research aims to develop the Roman Urdu news classifier,which will classify the news into five categories(health,business,technology,sports,international).First,we will develop the news dataset using scraping tools and then after preprocessing,we will compare the results of different machine learning algorithms like Logistic Regression(LR),Multinomial Naïve Bayes(MNB),Long short term memory(LSTM),and Convolutional Neural Network(CNN).After this,we will use a phonetic algorithm to control lexical variation and test news from different websites.The preliminary results suggest that a more accurate classification can be accomplished by monitoring noise inside data and by classifying the news.After applying above mentioned different machine learning algorithms,results have shown that Multinomial Naïve Bayes classifier is giving the best accuracy of 90.17%which is due to the noise lexical variation. 展开更多
关键词 Roman urdu news headline classification long short term memory recurrent neural network logistic regression multinomial naïve Bayes random forest k neighbor gradient boosting classifier
在线阅读 下载PDF
Sentiment Analysis of Roman Urdu on E-Commerce Reviews Using Machine Learning 被引量:1
5
作者 Bilal Chandio Asadullah Shaikh +5 位作者 Maheen Bakhtyar Mesfer Alrizq Junaid Baber Adel Sulaiman Adel Rajab Waheed Noor 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第6期1263-1287,共25页
Sentiment analysis task has widely been studied for various languages such as English and French.However,Roman Urdu sentiment analysis yet requires more attention from peer-researchers due to the lack of Off-the-Shelf... Sentiment analysis task has widely been studied for various languages such as English and French.However,Roman Urdu sentiment analysis yet requires more attention from peer-researchers due to the lack of Off-the-Shelf Natural Language Processing(NLP)solutions.The primary objective of this study is to investigate the diverse machine learning methods for the sentiment analysis of Roman Urdu data which is very informal in nature and needs to be lexically normalized.To mitigate this challenge,we propose a fine-tuned Support Vector Machine(SVM)powered by Roman Urdu Stemmer.In our proposed scheme,the corpus data is initially cleaned to remove the anomalies from the text.After initial pre-processing,each user review is being stemmed.The input text is transformed into a feature vector using the bag-of-word model.Subsequently,the SVM is used to classify and detect user sentiment.Our proposed scheme is based on a dictionary based Roman Urdu stemmer.The creation of the Roman Urdu stemmer is aimed at standardizing the text so as to minimize the level of complexity.The efficacy of our proposed model is also empirically evaluated with diverse experimental configurations,so as to fine-tune the hyper-parameters and achieve superior performance.Moreover,a series of experiments are conducted on diverse machine learning and deep learning models to compare the performance with our proposed model.We also introduced the largest dataset on Roman Urdu,i.e.,Roman Urdu e-commerce dataset(RUECD),which contains 26K+user reviews annotated by the group of experts.The RUECD is challenging and the largest dataset available of Roman Urdu.The experiments show that the newly generated dataset is quite challenging and requires more attention from the peer researchers for Roman Urdu sentiment analysis. 展开更多
关键词 Sentiment analysis Roman urdu machine learning SVM
在线阅读 下载PDF
Urdu Ligature Recognition System:An Evolutionary Approach
6
作者 Naila Habib Khan Awais Adnan +3 位作者 AbdulWaheed Mahdi Zareei Abdallah Aldosary Ehab Mahmoud Mohamed 《Computers, Materials & Continua》 SCIE EI 2021年第2期1347-1367,共21页
Cursive text recognition of Arabic script-based languages like Urdu is extremely complicated due to its diverse and complex characteristics.Evolutionary approaches like genetic algorithms have been used in the past fo... Cursive text recognition of Arabic script-based languages like Urdu is extremely complicated due to its diverse and complex characteristics.Evolutionary approaches like genetic algorithms have been used in the past for various optimization as well as pattern recognition tasks,reporting exceptional results.The proposed Urdu ligature recognition system uses a genetic algorithm for optimization and recognition.Overall the proposed recognition system observes the processes of pre-processing,segmentation,feature extraction,hierarchical clustering,classification rules and genetic algorithm optimization and recognition.The pre-processing stage removes noise from the sentence images,whereas,in segmentation,the sentences are segmented into ligature components.Fifteen features are extracted from each of the segmented ligature images.Intra-feature hierarchical clustering is observed that results in clustered data.Next,classification rules are used for the representation of the clustered data.The genetic algorithm performs an optimization mechanism using multi-level sorting of the clustered data for improving the classification rules used for recognition of Urdu ligatures.Experiments conducted on the benchmark UPTI dataset for the proposed Urdu ligature recognition system yields promising results,achieving a recognition rate of 96.72%. 展开更多
关键词 Classification rules genetic algorithm intra-feature hierarchical clustering ligature recognition urdu script
在线阅读 下载PDF
Recognition of Urdu Handwritten Alphabet Using Convolutional Neural Network (CNN)
7
作者 Gulzar Ahmed Tahir Alyas +4 位作者 Muhammad Waseem Iqbal Muhammad Usman Ashraf Ahmed Mohammed Alghamdi Adel A.Bahaddad Khalid Ali Almarhabi 《Computers, Materials & Continua》 SCIE EI 2022年第11期2967-2984,共18页
Handwritten character recognition systems are used in every field of life nowadays,including shopping malls,banks,educational institutes,etc.Urdu is the national language of Pakistan,and it is the fourth spoken langua... Handwritten character recognition systems are used in every field of life nowadays,including shopping malls,banks,educational institutes,etc.Urdu is the national language of Pakistan,and it is the fourth spoken language in the world.However,it is still challenging to recognize Urdu handwritten characters owing to their cursive nature.Our paper presents a Convolutional Neural Networks(CNN)model to recognize Urdu handwritten alphabet recognition(UHAR)offline and online characters.Our research contributes an Urdu handwritten dataset(aka UHDS)to empower future works in this field.For offline systems,optical readers are used for extracting the alphabets,while diagonal-based extraction methods are implemented in online systems.Moreover,our research tackled the issue concerning the lack of comprehensive and standard Urdu alphabet datasets to empower research activities in the area of Urdu text recognition.To this end,we collected 1000 handwritten samples for each alphabet and a total of 38000 samples from 12 to 25 age groups to train our CNN model using online and offline mediums.Subsequently,we carried out detailed experiments for character recognition,as detailed in the results.The proposed CNN model outperformed as compared to previously published approaches. 展开更多
关键词 urdu handwritten text recognition handwritten dataset convolutional neural network artificial intelligence machine learning deep learning
在线阅读 下载PDF
Translation of English Language into Urdu Language Using LSTM Model
8
作者 Sajadul Hassan Kumhar Syed Immamul Ansarullah +3 位作者 Akber Abid Gardezi Shafiq Ahmad Abdelaty Edrees Sayed Muhammad Shafiq 《Computers, Materials & Continua》 SCIE EI 2023年第2期3899-3912,共14页
English to Urdu machine translation is still in its beginning and lacks simple translation methods to provide motivating and adequate English to Urdu translation.In order tomake knowledge available to the masses,there... English to Urdu machine translation is still in its beginning and lacks simple translation methods to provide motivating and adequate English to Urdu translation.In order tomake knowledge available to the masses,there should be mechanisms and tools in place to make things understandable by translating from source language to target language in an automated fashion.Machine translation has achieved this goal with encouraging results.When decoding the source text into the target language,the translator checks all the characteristics of the text.To achieve machine translation,rule-based,computational,hybrid and neural machine translation approaches have been proposed to automate the work.In this research work,a neural machine translation approach is employed to translate English text into Urdu.Long Short Term Short Model(LSTM)Encoder Decoder is used to translate English to Urdu.The various steps required to perform translation tasks include preprocessing,tokenization,grammar and sentence structure analysis,word embeddings,training data preparation,encoder-decoder models,and output text generation.The results show that the model used in the research work shows better performance in translation.The results were evaluated using bilingual research metrics and showed that the test and training data yielded the highest score sequences with an effective length of ten(10). 展开更多
关键词 Machine translation urdu language word embedding
在线阅读 下载PDF
An Analytical Study of Sociolinguistic Variations in Urdu Language
9
作者 Saba Aftab Moeen Khan Zai Adeena Aftab 《Advances in Social Behavior Research》 2023年第2期20-26,共7页
This paper will talk about the sociolinguistic variations in Urdu Language.It takes account with the interviews that conduct from the various senior citizens of Urdu speakers in Pakistan,each interview base on 4 quest... This paper will talk about the sociolinguistic variations in Urdu Language.It takes account with the interviews that conduct from the various senior citizens of Urdu speakers in Pakistan,each interview base on 4 questions to find out Urdu language variations for this paper.The questions on the value of the Urdu language in today’s world,what changes took place in the Urdu language over the period from lexical and dialectical point of view.The interviews were conducted as semi-structured with the use of different modern and simple tools.The study of this paper will be very helpful for the young generation of Pakistan because all the past studies did not solely talk about the sociolinguistics in Urdu language.It guides them to bring their native or national language with original accents and vocabulary which shows their actual culture and purity of Urdu language. 展开更多
关键词 language variations DIALECT ACCENT CULTURE urdu language
在线阅读 下载PDF
A Convolutional Neural Network Based Optical Character Recognition for Purely Handwritten Characters and Digits
10
作者 Syed Atir Raza Muhammad Shoaib Farooq +3 位作者 Uzma Farooq Hanen Karamti Tahir Khurshaid Imran Ashraf 《Computers, Materials & Continua》 2025年第8期3149-3173,共25页
Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have bee... Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have been proposed,most of them focus on recognizing printed Urdu characters and digits.To the best of our knowledge,very little research has focused solely on Urdu pure handwriting recognition,and the results of such proposed methods are often inadequate.In this study,we introduce a novel approach to recognizing Urdu pure handwritten digits and characters using Convolutional Neural Networks(CNN).Our proposed method utilizes convolutional layers to extract important features from input images and classifies them using fully connected layers,enabling efficient and accurate detection of Urdu handwritten digits and characters.We implemented the proposed technique on a large publicly available dataset of Urdu handwritten digits and characters.The findings demonstrate that the CNN model achieves an accuracy of 98.30%and an F1 score of 88.6%,indicating its effectiveness in detecting and classifyingUrdu handwritten digits and characters.These results have far-reaching implications for various applications,including document analysis,text recognition,and language understanding,which have previously been unexplored in the context of Urdu handwriting data.This work lays a solid foundation for future research and development in Urdu language detection and processing,opening up new opportunities for advancement in this field. 展开更多
关键词 Image processing natural language processing handwritten urdu characters optical character recognition deep learning feature extraction CLASSIFICATION
在线阅读 下载PDF
巴基斯坦的语言与民族关系探析 被引量:10
11
作者 满在江 谢妍 艾佳 《徐州师范大学学报(哲学社会科学版)》 北大核心 2011年第3期16-20,共5页
巴基斯坦是一个有两种官方语言的多语言国家。乌尔都语作为巴基斯坦的官方语言之一,将其作为母语使用的人数却很少,而作为交际用语其使用范围又很广。造成这一现象的原因与伊斯兰民族认同有着密切的关系。乌尔都语国语地位的确立是穆斯... 巴基斯坦是一个有两种官方语言的多语言国家。乌尔都语作为巴基斯坦的官方语言之一,将其作为母语使用的人数却很少,而作为交际用语其使用范围又很广。造成这一现象的原因与伊斯兰民族认同有着密切的关系。乌尔都语国语地位的确立是穆斯林民族主义意识的必然结果,同时也和统治阶层为了维护民族统一所采取的语言政策相关。英语和乌尔都语作为巴基斯坦的两种官方语言是殖民统治的结果,而乌尔都语至今无法完全取代英语,则与社会的发展、尤其是英语的全球化密切关联。乌尔都语与孟加拉语曾经的冲突以及东巴基斯坦最终的脱离都进一步表明语言在民族意识觉醒中所起的重要作用。乌尔都语是巴基斯坦民族意识的产物,同时,作为伊斯兰民族认同的象征,它又促进和增强了民族的凝聚力,在一定程度上缓解了一个多民族、多语言的国家在民族统一、宗教信仰等方面的分歧和冲突。 展开更多
关键词 巴基斯坦 乌尔都语 民族认同
在线阅读 下载PDF
察合台史事四题——卒年、驻地、汉民与投下 被引量:4
12
作者 党宝海 《西域研究》 CSSCI 北大核心 2019年第3期58-70,157,共14页
察合台的卒年为1242年,很可能是在1242年的年中,比窝阔台晚去世七个月。察合台卒年的确定有助于厘清14世纪史学巨著《史集》各主要抄本编写、修订的先后关系。察合台的驻地忽牙思在今新疆霍城县克干河以东,伊犁河以北。察合台汗国钱币上... 察合台的卒年为1242年,很可能是在1242年的年中,比窝阔台晚去世七个月。察合台卒年的确定有助于厘清14世纪史学巨著《史集》各主要抄本编写、修订的先后关系。察合台的驻地忽牙思在今新疆霍城县克干河以东,伊犁河以北。察合台汗国钱币上的'大斡耳朵'与波斯文史籍的Ulugh-ef为同地,很可能是察合台新建的忽都鲁(Qutlugh)镇。察合台身边不乏汉族官员。在太原府成为察合台的投下封地后,不少当地汉人迁移到察合台直接统治的西域中亚地区。察合台认可了已经掌握太原府行政权力的大家族对当地的治理权。同时,他向当地派遣达鲁花赤等监临官,还派使臣调查当地的艺能之士,征召到他的西域驻地。 展开更多
关键词 察合台 卒年 忽牙思 大斡耳朵 太原府 汉人 投下官员 地方治理 史集抄本关系
原文传递
汉语构词法能产性对乌尔都语者汉语词汇习得影响的研究 被引量:2
13
作者 顾介鑫 朱苏琼 《语言文字应用》 CSSCI 北大核心 2017年第3期60-69,共10页
汉语构词法能产性已被证明是影响母语词汇认知加工的,但它是否影响汉语二语词汇习得仍有待进一步研究。本文通过乌尔都语者的汉语词汇命名实验,尝试从汉语构词法性质能产性、数量能产性两个角度分析汉语构词法能产性是否影响乌尔都语者... 汉语构词法能产性已被证明是影响母语词汇认知加工的,但它是否影响汉语二语词汇习得仍有待进一步研究。本文通过乌尔都语者的汉语词汇命名实验,尝试从汉语构词法性质能产性、数量能产性两个角度分析汉语构词法能产性是否影响乌尔都语者习得汉语词汇。实验发现:由能产构词法产出的词的命名反应时短于由不能产构词法产出的词的命名反应时;偏正型、动宾型、联合型、补充型、主谓型复合词的命名反应时依次增加,但有违语料库研究中的汉语复合构词法数量能产性次序。在排除乌尔都语母语迁移影响的前提下,本文论证了汉语构词性质能产性的确影响乌尔都语者学习汉语词汇,但数量能产性则不然。 展开更多
关键词 汉语构词法能产性 性质 数量 二语词汇习得 乌尔都语
原文传递
Neural Machine Translation Models with Attention-Based Dropout Layer 被引量:1
14
作者 Huma Israr Safdar Abbas Khan +3 位作者 Muhammad Ali Tahir Muhammad Khuram Shahzad Muneer Ahmad Jasni Mohamad Zain 《Computers, Materials & Continua》 SCIE EI 2023年第5期2981-3009,共29页
In bilingual translation,attention-based Neural Machine Translation(NMT)models are used to achieve synchrony between input and output sequences and the notion of alignment.NMT model has obtained state-of-the-art perfo... In bilingual translation,attention-based Neural Machine Translation(NMT)models are used to achieve synchrony between input and output sequences and the notion of alignment.NMT model has obtained state-of-the-art performance for several language pairs.However,there has been little work exploring useful architectures for Urdu-to-English machine translation.We conducted extensive Urdu-to-English translation experiments using Long short-term memory(LSTM)/Bidirectional recurrent neural networks(Bi-RNN)/Statistical recurrent unit(SRU)/Gated recurrent unit(GRU)/Convolutional neural network(CNN)and Transformer.Experimental results show that Bi-RNN and LSTM with attention mechanism trained iteratively,with a scalable data set,make precise predictions on unseen data.The trained models yielded competitive results by achieving 62.6%and 61%accuracy and 49.67 and 47.14 BLEU scores,respectively.From a qualitative perspective,the translation of the test sets was examined manually,and it was observed that trained models tend to produce repetitive output more frequently.The attention score produced by Bi-RNN and LSTM produced clear alignment,while GRU showed incorrect translation for words,poor alignment and lack of a clear structure.Therefore,we considered refining the attention-based models by defining an additional attention-based dropout layer.Attention dropout fixes alignment errors and minimizes translation errors at the word level.After empirical demonstration and comparison with their counterparts,we found improvement in the quality of the resulting translation system and a decrease in the perplexity and over-translation score.The ability of the proposed model was evaluated using Arabic-English and Persian-English datasets as well.We empirically concluded that adding an attention-based dropout layer helps improve GRU,SRU,and Transformer translation and is considerably more efficient in translation quality and speed. 展开更多
关键词 Natural language processing neural machine translation word embedding ATTENTION PERPLEXITY selective dropout regularization urdu PERSIAN Arabic BLEU
在线阅读 下载PDF
新中国60年印度乌尔都语文学研究的回顾与评析
15
作者 蔡晶 《外语教学》 CSSCI 北大核心 2015年第4期90-94,共5页
新中国乌尔都语文学研究从20世纪50年代的少量译介开始,20世纪80年代之后,译介数量有所增多,范围有所扩大,且对印度乌尔都语文学史的梳理及各种综合研究、个案研究逐渐展开。新中国60年乌尔都语文学研究不乏发现与创新,但仍存在研究群... 新中国乌尔都语文学研究从20世纪50年代的少量译介开始,20世纪80年代之后,译介数量有所增多,范围有所扩大,且对印度乌尔都语文学史的梳理及各种综合研究、个案研究逐渐展开。新中国60年乌尔都语文学研究不乏发现与创新,但仍存在研究群体有限、视野不够开阔等不足。 展开更多
关键词 印度 乌尔都语文学 研究现状
原文传递
Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning
16
作者 Aizaz Ali Maqbool Khan +2 位作者 Khalil Khan Rehan Ullah Khan Abdulrahman Aloraini 《Computers, Materials & Continua》 SCIE EI 2024年第4期713-733,共21页
Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understandingpublic opinion and user sentiment across diverse languages.While numerous scholars conduct sentime... Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understandingpublic opinion and user sentiment across diverse languages.While numerous scholars conduct sentiment analysisin widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grapplingwith resource-poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language,characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu,Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguisticfeatures, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis aformidable undertaking. The limited availability of resources has fueled increased interest among researchers,prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu languagesentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into fivelabels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments andemotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, theinitial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such asnewspapers, articles, and socialmedia comments. Subsequent to this data collection, a thorough process of cleaningand preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deeplearningmodels, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for bothtraining and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning tooptimize the models’ efficacy. Evaluation metrics such as precision, recall, and the F1-score are employed to assessthe effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis,gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN,solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language. 展开更多
关键词 urdu sentiment analysis convolutional neural networks recurrent neural network deep learning natural language processing neural networks
在线阅读 下载PDF
Deep Learning and Machine Learning-Based Model for Conversational Sentiment Classification
17
作者 Sami Ullah Muhammad Ramzan Talib +2 位作者 Toqir A.Rana Muhammad Kashif Hanif Muhammad Awais 《Computers, Materials & Continua》 SCIE EI 2022年第8期2323-2339,共17页
In the current era of the internet,people use online media for conversation,discussion,chatting,and other similar purposes.Analysis of such material where more than one person is involved has a spate challenge as comp... In the current era of the internet,people use online media for conversation,discussion,chatting,and other similar purposes.Analysis of such material where more than one person is involved has a spate challenge as compared to other text analysis tasks.There are several approaches to identify users’emotions fromthe conversational text for the English language,however regional or low resource languages have been neglected.The Urdu language is one of them and despite being used by millions of users across the globe,with the best of our knowledge there exists no work on dialogue analysis in the Urdu language.Therefore,in this paper,we have proposed a model which utilizes deep learning and machine learning approaches for the classification of users’emotions from the text.To accomplish this task,we have first created a dataset for the Urdu language with the help of existing English language datasets for dialogue analysis.After that,we have preprocessed the data and selected dialogues with common emotions.Once the dataset is prepared,we have used different deep learning and machine learning techniques for the classification of emotion.We have tuned the algorithms according to the Urdu language datasets.The experimental evaluation has shown encouraging results with 67%accuracy for the Urdu dialogue datasets,more than 10,000 dialogues are classified into five emotions i.e.,joy,fear,anger,sadness,and neutral.We believe that this is the first effort for emotion detection from the conversational text in the Urdu language domain. 展开更多
关键词 Dialogue analysis conversational opinion mining sentiment analysis sentiment analysis in urdu language deep learning machine learning
在线阅读 下载PDF
Impacts of Foreigners on the Gulf Arab Vernacular:The Case of Indian Immigrants in the United Arab Emirates
18
作者 Mohammed Salisu 《Cultural and Religious Studies》 2020年第6期337-345,共9页
Activities of Indians in the Gulf Cooperation Council(GCC)States continue to be a major area of attention by the government and citizens of the sub-region.The GCC States have had to contend with varied numbers of fore... Activities of Indians in the Gulf Cooperation Council(GCC)States continue to be a major area of attention by the government and citizens of the sub-region.The GCC States have had to contend with varied numbers of foreign nationals who continue to troop into their territories in search of economic opportunities.The United Arab Emirates(UAE),which is a federation of seven emirates,has over the years,been a major destination for foreigners.This article identifies various categories of Indian immigrants in the UAE,their areas of activities,and how they have impacted the vernacular of the citizens of the federal monarchical state.It begins by highlighting India-UAE relation prior to the independence of what is now the United Arab Emirates in 1971.Second,it describes the characterization of the relation between Indians and Emiratis after 1971.With some selected examples,the article reveals the impacts Hindi/Urdu languages have made on the vernacular of Emiratis. 展开更多
关键词 INDIANS VERNACULAR United Arab Emirates locals Impact Hindi/urdu
在线阅读 下载PDF
融合乌尔都语词性序列预测的汉乌神经机器翻译 被引量:1
19
作者 陈欢欢 王剑 Muhammad Naeem Ul Hassan 《计算机工程与科学》 CSCD 北大核心 2024年第3期518-524,共7页
面向南亚和东南亚的小语种机器翻译,目前已有不少研究团队开展了深入研究,但作为巴基斯坦官方语言的乌尔都语,由于稀缺的数据资源和与汉语之间的巨大差距,有针对性的汉乌机器翻译方法研究非常稀少。针对这种情况,提出了基于Transformer... 面向南亚和东南亚的小语种机器翻译,目前已有不少研究团队开展了深入研究,但作为巴基斯坦官方语言的乌尔都语,由于稀缺的数据资源和与汉语之间的巨大差距,有针对性的汉乌机器翻译方法研究非常稀少。针对这种情况,提出了基于Transformer的融合乌尔都语词性序列的汉乌神经机器翻译模型。首先利用Transformer对目标语言乌尔都语的词性序列进行预测,然后将翻译模型的预测结果和词性序列模型的预测结果相结合进行联合预测,从而实现语言知识到翻译模型的融入。在现有小规模汉乌数据集上的实验表明,所提方法在数据集上的BLEU值相较于基准模型提升了0.13,取得了较为明显的效果。 展开更多
关键词 TRANSFORMER 神经机器翻译 乌尔都语 词性序列
在线阅读 下载PDF
预标准化Transformer在乌英机器翻译中的实现 被引量:14
20
作者 高巍 陈子祥 +1 位作者 李大舟 李耀松 《小型微型计算机系统》 CSCD 北大核心 2020年第11期2286-2291,共6页
随着人工智能技术的高速发展,基于神经网络的机器翻译技术愈发受到人们的重视.然而,限于有限的数据资源,基于该方法的小语种翻译效果并不理想.乌尔都语作为印度和巴基斯坦的官方语言被广泛使用,实现它与英语之间的翻译模型具有重要意义... 随着人工智能技术的高速发展,基于神经网络的机器翻译技术愈发受到人们的重视.然而,限于有限的数据资源,基于该方法的小语种翻译效果并不理想.乌尔都语作为印度和巴基斯坦的官方语言被广泛使用,实现它与英语之间的翻译模型具有重要意义.本文基于编码器-解码器框架,提出了一种预标准化Transformer的乌英机器翻译模型.该模型在基准Transformer模型上增加了预标准化层,保证数据分布一致的同时避免发生梯度消失.实验采用BLEU作为评价指标.实验表明,在少量乌尔都语与英语平行语料库的基础上,本文提出的基于预标准化Transformer的乌英机器翻译模型能够取得较好的结果.与基准Transformer模型相比在BLEU值上有了一定的提高. 展开更多
关键词 机器翻译 乌尔都语 预标准化Transformer 编码器-解码器 BLEU
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部