With the rapid growth of the Internet and social media, information is widely disseminated in multimodal forms, such as text and images, where discriminatory content can manifest in various ways. Discrimination detect...With the rapid growth of the Internet and social media, information is widely disseminated in multimodal forms, such as text and images, where discriminatory content can manifest in various ways. Discrimination detection techniques for multilingual and multimodal data can identify potential discriminatory behavior and help foster a more equitable and inclusive cyberspace. However, existing methods often struggle in complex contexts and multilingual environments. To address these challenges, this paper proposes an innovative detection method, using image and multilingual text encoders to separately extract features from different modalities. It continuously updates a historical feature memory bank, aggregates the Top-K most similar samples, and utilizes a Gated Recurrent Unit (GRU) to integrate current and historical features, generating enhanced feature representations with stronger semantic expressiveness to improve the model’s ability to capture discriminatory signals. Experimental results demonstrate that the proposed method exhibits superior discriminative power and detection accuracy in multilingual and multimodal contexts, offering a reliable and effective solution for identifying discriminatory content.展开更多
In this modern era, platforms for digital/social media and video games are growing daily. People are becoming dependent on them from all ages and with many positive aspects, but there are drawbacks as well, one of whi...In this modern era, platforms for digital/social media and video games are growing daily. People are becoming dependent on them from all ages and with many positive aspects, but there are drawbacks as well, one of which is cyberbullying. Cyberbullying is a form of bullying that uses technological platforms to bully others. It has effects on victims mentally, emotionally, and physically, which include low self-esteem, acting violently, despair, increased stress/anxiety, depression, self-harming/suicide, etc. Findings from this research study justify that it affects young people more, impacting their emotional development and overall safety. Real-time cyberbullying detection identifies and protects the target from further abuse and its effects. This study aids in determining the seriousness of the issue and the vulnerabilities that individuals can take advantage of to bully others. Additionally, it will help to understand how various features of cyberbullying detection function assist in developing a strong and trustworthy system and making a healthy online community. Natural Language Processing (NLP) models assess the textual content and analyze hashtags and comments. Similarly, image context is analyzed using Optical Character Recognition (OCR), which converts images into a machine-readable format for further examination. There are also Deep Neural Network models, such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BLSTM). CNN is utilized for text/picture classification, LSTM is used for long-term dependency learning, and BLSTM expands the network’s input by encoding data in both forward and backward directions. Classifiers like Support Vector Machine (SVM) and Naïve Bayes help detect cyberbullying. A working cyberbullying detection system can detect cyberbullying on multiple platforms. A deeper understanding of each machine learning algorithm allows one to build a model that improves upon their predecessors. With models being developed for different attributes providing results with high accuracy, the cyberbullying detection system contributes by leading us to a healthier online community.展开更多
This editorial critically evaluated the recent study by AlMousa et al,which examined the impact of the Arabic version of the American Academy of Or-thopedic Surgeons Foot and Ankle Outcomes Questionnaire(AAOS-FAOQ)on ...This editorial critically evaluated the recent study by AlMousa et al,which examined the impact of the Arabic version of the American Academy of Or-thopedic Surgeons Foot and Ankle Outcomes Questionnaire(AAOS-FAOQ)on postoperative quality of life and recovery in Arabic-speaking patients with traumatic foot and ankle injuries.In the context of systemic linguistic exclusion in orthopedic research—where English-language journals dominated most publi-cations and non-English-speaking populations faced dual barriers of trial under-representation and semantic distortions(e.g.,mistranslations of terms like"joint instability"in Arabic)—AlMousa et al's work highlighted the transformative po-tential of culturally adapted methodologies.Their rigorous four-stage adaptation framework validated the Arabic AAOS-FAOQ as a reliable tool,enhancing ecological validity and reducing bias in patient-reported outcomes.However,limitations such as regional specificity(Gulf-centric sampling)and short follow-up periods(4 months)underscored broader challenges in non-English research:Redundant studies,prolonged hospital stays for limited English proficiency patients,and underrepresentation of certain ethnic groups in trials.To dismantle linguistic hegemony,we proposed semantic reconstruction(e.g.,integrating culturally specific indicators like"prayer posture"),dialect-aware neural tran-slation,and World Health Organization led terminology standardization.In line with these proposed solutions,AlMousa et al’s study exemplified how language-sensitive adaptations could bridge equity gaps,while future efforts would need to balance cultural specificity with cross-study comparability through AI-driven multilingual databases and policy mandates for cultural adaptation roadmaps.展开更多
Recently, a rise in attention of Brexit has come to the public. The UK and European Union would both have an enormouamount of influence mutually. One of the important issues raised by the European Union(EU) is also de...Recently, a rise in attention of Brexit has come to the public. The UK and European Union would both have an enormouamount of influence mutually. One of the important issues raised by the European Union(EU) is also deserved to cope with, namely language policy. In this paper, I develop an analysis aimed at presenting the basis introduction of multilingualism and its negative effectduring the implementary process. The paper focuses on the current language policy—how does the multilingualism work and some challenges confronted by EU language official institution, whether maintaining or abandoning the present form of institutional multilingualism or not to keep balance between promoting diversity and achieving efficiency.展开更多
Teachers’ beliefs toward multilingual awareness in target language learning play a significant role in shaping learners’ attitudes to language awareness, affect learners’ linguistic behavior and teachers’ teaching...Teachers’ beliefs toward multilingual awareness in target language learning play a significant role in shaping learners’ attitudes to language awareness, affect learners’ linguistic behavior and teachers’ teaching practice. Therefore, the present study was aimed to explore English teachers’ beliefs about Inner Mongolian university students’ multilingual awareness in L3 learning and their teaching practice in Chinese EFL context. One hundred English teachers from six universities in Inner Mongolia, China, participated in this investigation. The data was collected through a questionnaire and teacher interviews. The results indicate that English teachers hold positive attitudes to multilingual awareness in general;however, there are belief differences between Mongolian and Han teachers;there exist discrepancies between English teachers’ beliefs about multilingual awareness and their teaching practice, and social-cultural environment, family language policy,teacher identity, learning experience, teaching materials, and, more importantly, teachers’ lack of awareness of fostering learners’ multilingual awareness lead to the discrepancies. The present research highlights the necessity of raising teacher awareness of cultivating multilingual awareness in future teacher development and emphasizes the significance of exploring the potential cognitive advantages of multilingualism in promoting L3 learning and developing English learners’ multilingual competence in the EFL context in China.展开更多
Languages and linguistic resources transport from one locality to another,adapting to the norms,customs,and regulations of a new locality.This process involves translocalization.Translocalization emphasizes the moveme...Languages and linguistic resources transport from one locality to another,adapting to the norms,customs,and regulations of a new locality.This process involves translocalization.Translocalization emphasizes the movement of linguistic resources against the backdrop of globalization and the combination or reframing of resources from different localities.This research explores the extent to which translocalization is reflected by the linguistic landscapes of three distinct commercial areas in Guangzhou,China.It goes on to discuss how translocalization works together with social rescaling to incur the movement of linguistic resources and to result in distinct linguistic landscapes of the three commercial areas.It concludes that some languages or linguistic resources,such as English,pinyin and traditional Chinese writing,are transported to local contexts for the purpose of rescaling,whereas other languages or dialects,like Cantonese,might gradually lose their function of rescaling and retain its function in indexing local identity and solidarity.This study calls for more attention to the local resources and contexts in linguistic landscape studies.It argues for the indexical function of linguistic resources in social rescaling and city planning.展开更多
Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency a...Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency and some of its phonological, morphological, and orthographic features challenge the development and initiatives in the area of ASR. By considering these challenges, this study is the first endeavor, which analyzed the characteristics of the language, prepared speech corpus, and developed different ASR systems. A small 3-hour read speech corpus was prepared and transcribed. Different basic and rounded phone unit-based speech recognizers were explored using multilingual deep neural network (DNN) modeling methods. The experimental results demonstrated that all the basic phone and rounded phone unit-based multilingual models outperformed the corresponding unilingual models with the relative performance improvements of 5.47% to 19.87% and 5.74% to 16.77%, respectively. The rounded phone unit-based multilingual models outperformed the equivalent basic phone unit-based models with relative performance improvements of 0.95% to 4.98%. Overall, we discovered that multilingual DNN modeling methods are profoundly effective to develop Chaha speech recognizers. Both the basic and rounded phone acoustic units are convenient to build Chaha ASR system. However, the rounded phone unit-based models are superior in performance and faster in recognition speed over the corresponding basic phone unit-based models. Hence, the rounded phone units are the most suitable acoustic units to develop Chaha ASR systems.展开更多
This paper presents an overview of some cross-linguistic influences in multilingual language acquisition on both syntax and lexical level from Gabrys Baker's new book.
Inspired by a Chinese notion of "无为而治", this paper tends to stresses the nature of language learners, who always bring in their multilingual competence in their language learning process. Based on a thor...Inspired by a Chinese notion of "无为而治", this paper tends to stresses the nature of language learners, who always bring in their multilingual competence in their language learning process. Based on a thorough presentation about the nature of language learners, this paper also provides constructive teaching implications for future actual teaching practices.展开更多
In the age of the internet,social media are connecting us all at the tip of our fingers.People are linkedthrough different social media.The social network,Twitter,allows people to tweet their thoughts on any particula...In the age of the internet,social media are connecting us all at the tip of our fingers.People are linkedthrough different social media.The social network,Twitter,allows people to tweet their thoughts on any particular event or a specific political body which provides us with a diverse range of political insights.This paper serves the purpose of text processing of a multilingual dataset including Urdu,English,and Roman Urdu.Explore machine learning solutions for sentiment analysis and train models,collect the data on government from Twitter,apply sentiment analysis,and provide a python library that classifies text sentiment.Training data contained tweets in three languages:English:200k,Urdu:200k and Roman Urdu:11k.Five different classification models are applied to determine sentiments,and eventually,the use of ensemble technique to move forward with the acquired results is explored.The Logistic Regression model performed best with an accuracy of 75%,followed by the Linear Support Vector classifier and Stochastic Gradient Descent model,both having 74%accuracy.Lastly,Multinomial Naïve Bayes and Complement Naïve Bayes models both achieved 73%accuracy.展开更多
Translation is a significant communicative activity with a long history. Translation study is developing into two turnsmacro and micro turn. Edwin Gentzler researched these two turns deeply in his thesis Macro-and Mic...Translation is a significant communicative activity with a long history. Translation study is developing into two turnsmacro and micro turn. Edwin Gentzler researched these two turns deeply in his thesis Macro-and Micro-turns in translation studies. The study in the circumstance of America can provide our Chinese translation study with some new aspects. Translation is more a frequent activity in the inner circunstance in China.展开更多
The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate go...The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate goal of the project, but a first step towards universal access to human knowledge. In particular, there are four challenges about the new way to analyze, process, operate, visualize and interact with digital media resource in this library. To tackle these challenges, North China Centre of Million Book Project (in Chinese Academy of Sciences) has initiated several innovative research projects in areas such as multimedia content analysis and retrieval, bilingual services, multimodal information presentation, and knowledge-based organization and services. In this keynote speech, we simply review our work in these areas, and argue that by technological cooperation with these innovation research topics, the project will develop a top-level digital library platform for the million book library.展开更多
In the field of natural language processing(NLP),the advancement of neural machine translation has paved the way for cross-lingual research.Yet,most studies in NLP have evaluated the proposed language models on well-r...In the field of natural language processing(NLP),the advancement of neural machine translation has paved the way for cross-lingual research.Yet,most studies in NLP have evaluated the proposed language models on well-refined datasets.We investigatewhether amachine translation approach is suitable for multilingual analysis of unrefined datasets,particularly,chat messages in Twitch.In order to address it,we collected the dataset,which included 7,066,854 and 3,365,569 chat messages from English and Korean streams,respectively.We employed several machine learning classifiers and neural networks with two different types of embedding:word-sequence embedding and the final layer of a pre-trained language model.The results of the employed models indicate that the accuracy difference between English,and English to Korean was relatively high,ranging from 3%to 12%.For Korean data(Korean,and Korean to English),it ranged from 0%to 2%.Therefore,the results imply that translation from a low-resource language(e.g.,Korean)into a high-resource language(e.g.,English)shows higher performance,in contrast to vice versa.Several implications and limitations of the presented results are also discussed.For instance,we suggest the feasibility of translation from resource-poor languages for using the tools of resource-rich languages in further analysis.展开更多
The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for tuming printed books into digital books as its contribution to the building of a Universal Digital Library. This workflow is a p...The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for tuming printed books into digital books as its contribution to the building of a Universal Digital Library. This workflow is a process consisting of multiple phases, namely, scanning, image processing, OCR, digital archiving, document encoding, and publishing. Over the past couple of years, the BA has defined procedures and special techniques for the scanning, processing, OCR and publishing, especially of Arabic books. This workflow has been automated, allowing the governance of the different phases and making possible the production of 18000 books so far. The BA has also designed and implemented a framework for the encoding of digital books that allows publishing as well as a software system for managing the creation, maintenance, and publishing of the overall digital repository.展开更多
OOV term translation plays an important role in natural language processing. Although many researchers in the past have endeavored to solve the OOV term translation problems, but none existing methods offer definition...OOV term translation plays an important role in natural language processing. Although many researchers in the past have endeavored to solve the OOV term translation problems, but none existing methods offer definition or context information of OOV terms. Furthermore, non-existing methods focus on cross-language definition retrieval for OOV terms. Never the less, it has always been so difficult to evaluate the correctness of an OOV term translation without domain specific knowledge and correct references. Our English definition ranking method differentiate the types of OOV terms, and applies different methods for translation extraction. Our English definition ranking method also extracts multilingual context information and monolingual definitions of OOV terms. In addition, we propose a novel cross-language definition retrieval system for OOV terms. Never the less, we propose an auto re-evaluation method to evaluate the correctness of OOV translations and definitions. Our methods achieve high performances against existing methods.展开更多
The rise of social networking enables the development of multilingual Internet-accessible digital documents in several languages.The digital document needs to be evaluated physically through the Cross-Language Text Su...The rise of social networking enables the development of multilingual Internet-accessible digital documents in several languages.The digital document needs to be evaluated physically through the Cross-Language Text Summarization(CLTS)involved in the disparate and generation of the source documents.Cross-language document processing is involved in the generation of documents from disparate language sources toward targeted documents.The digital documents need to be processed with the contextual semantic data with the decoding scheme.This paper presented a multilingual crosslanguage processing of the documents with the abstractive and summarising of the documents.The proposed model is represented as the Hidden Markov Model LSTM Reinforcement Learning(HMMlstmRL).First,the developed model uses the Hidden Markov model for the computation of keywords in the cross-language words for the clustering.In the second stage,bi-directional long-short-term memory networks are used for key word extraction in the cross-language process.Finally,the proposed HMMlstmRL uses the voting concept in reinforcement learning for the identification and extraction of the keywords.The performance of the proposed HMMlstmRL is 2%better than that of the conventional bi-direction LSTM model.展开更多
In view of the study of finance and economics information, we research on the real-time financial news posted on the authority sites in the world's major advanced economies. Analyzing the massive financial news of...In view of the study of finance and economics information, we research on the real-time financial news posted on the authority sites in the world's major advanced economies. Analyzing the massive financial news of different information sources and language origins, we come up with a basic theory model and its algorithm on financial news, which is capable of intelligent collection, quick access, deduplication, correction and integration with financial news' backgrounds. Furthermore, we can find out connections between financial news and readers' interest. So we can achieve a real-time and on-demand financial news feed, as well as provide a theoretical basis and verification of the scientific problems on real-time processing of massive information. Finally, the simulation experiment shows that the multilingual financial news matching technology can give more help to distinguish the similar financial news in different languages than the traditional method.展开更多
基金funded by the Open Foundation of Key Laboratory of Cyberspace Security,Ministry of Education[KLCS20240210].
文摘With the rapid growth of the Internet and social media, information is widely disseminated in multimodal forms, such as text and images, where discriminatory content can manifest in various ways. Discrimination detection techniques for multilingual and multimodal data can identify potential discriminatory behavior and help foster a more equitable and inclusive cyberspace. However, existing methods often struggle in complex contexts and multilingual environments. To address these challenges, this paper proposes an innovative detection method, using image and multilingual text encoders to separately extract features from different modalities. It continuously updates a historical feature memory bank, aggregates the Top-K most similar samples, and utilizes a Gated Recurrent Unit (GRU) to integrate current and historical features, generating enhanced feature representations with stronger semantic expressiveness to improve the model’s ability to capture discriminatory signals. Experimental results demonstrate that the proposed method exhibits superior discriminative power and detection accuracy in multilingual and multimodal contexts, offering a reliable and effective solution for identifying discriminatory content.
文摘In this modern era, platforms for digital/social media and video games are growing daily. People are becoming dependent on them from all ages and with many positive aspects, but there are drawbacks as well, one of which is cyberbullying. Cyberbullying is a form of bullying that uses technological platforms to bully others. It has effects on victims mentally, emotionally, and physically, which include low self-esteem, acting violently, despair, increased stress/anxiety, depression, self-harming/suicide, etc. Findings from this research study justify that it affects young people more, impacting their emotional development and overall safety. Real-time cyberbullying detection identifies and protects the target from further abuse and its effects. This study aids in determining the seriousness of the issue and the vulnerabilities that individuals can take advantage of to bully others. Additionally, it will help to understand how various features of cyberbullying detection function assist in developing a strong and trustworthy system and making a healthy online community. Natural Language Processing (NLP) models assess the textual content and analyze hashtags and comments. Similarly, image context is analyzed using Optical Character Recognition (OCR), which converts images into a machine-readable format for further examination. There are also Deep Neural Network models, such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BLSTM). CNN is utilized for text/picture classification, LSTM is used for long-term dependency learning, and BLSTM expands the network’s input by encoding data in both forward and backward directions. Classifiers like Support Vector Machine (SVM) and Naïve Bayes help detect cyberbullying. A working cyberbullying detection system can detect cyberbullying on multiple platforms. A deeper understanding of each machine learning algorithm allows one to build a model that improves upon their predecessors. With models being developed for different attributes providing results with high accuracy, the cyberbullying detection system contributes by leading us to a healthier online community.
文摘This editorial critically evaluated the recent study by AlMousa et al,which examined the impact of the Arabic version of the American Academy of Or-thopedic Surgeons Foot and Ankle Outcomes Questionnaire(AAOS-FAOQ)on postoperative quality of life and recovery in Arabic-speaking patients with traumatic foot and ankle injuries.In the context of systemic linguistic exclusion in orthopedic research—where English-language journals dominated most publi-cations and non-English-speaking populations faced dual barriers of trial under-representation and semantic distortions(e.g.,mistranslations of terms like"joint instability"in Arabic)—AlMousa et al's work highlighted the transformative po-tential of culturally adapted methodologies.Their rigorous four-stage adaptation framework validated the Arabic AAOS-FAOQ as a reliable tool,enhancing ecological validity and reducing bias in patient-reported outcomes.However,limitations such as regional specificity(Gulf-centric sampling)and short follow-up periods(4 months)underscored broader challenges in non-English research:Redundant studies,prolonged hospital stays for limited English proficiency patients,and underrepresentation of certain ethnic groups in trials.To dismantle linguistic hegemony,we proposed semantic reconstruction(e.g.,integrating culturally specific indicators like"prayer posture"),dialect-aware neural tran-slation,and World Health Organization led terminology standardization.In line with these proposed solutions,AlMousa et al’s study exemplified how language-sensitive adaptations could bridge equity gaps,while future efforts would need to balance cultural specificity with cross-study comparability through AI-driven multilingual databases and policy mandates for cultural adaptation roadmaps.
文摘Recently, a rise in attention of Brexit has come to the public. The UK and European Union would both have an enormouamount of influence mutually. One of the important issues raised by the European Union(EU) is also deserved to cope with, namely language policy. In this paper, I develop an analysis aimed at presenting the basis introduction of multilingualism and its negative effectduring the implementary process. The paper focuses on the current language policy—how does the multilingualism work and some challenges confronted by EU language official institution, whether maintaining or abandoning the present form of institutional multilingualism or not to keep balance between promoting diversity and achieving efficiency.
文摘Teachers’ beliefs toward multilingual awareness in target language learning play a significant role in shaping learners’ attitudes to language awareness, affect learners’ linguistic behavior and teachers’ teaching practice. Therefore, the present study was aimed to explore English teachers’ beliefs about Inner Mongolian university students’ multilingual awareness in L3 learning and their teaching practice in Chinese EFL context. One hundred English teachers from six universities in Inner Mongolia, China, participated in this investigation. The data was collected through a questionnaire and teacher interviews. The results indicate that English teachers hold positive attitudes to multilingual awareness in general;however, there are belief differences between Mongolian and Han teachers;there exist discrepancies between English teachers’ beliefs about multilingual awareness and their teaching practice, and social-cultural environment, family language policy,teacher identity, learning experience, teaching materials, and, more importantly, teachers’ lack of awareness of fostering learners’ multilingual awareness lead to the discrepancies. The present research highlights the necessity of raising teacher awareness of cultivating multilingual awareness in future teacher development and emphasizes the significance of exploring the potential cognitive advantages of multilingualism in promoting L3 learning and developing English learners’ multilingual competence in the EFL context in China.
基金supported by MOE Project of Humanities and Social Sciences for Young Researchers(Project No.:16YJC740023)Project of Humanities and Social Sciences in Universities and Colleges in Guangdong Province[Project No.:2016WTSCX033]the support from the Chinese MOE Research Project of Humanities and Social Science(Project No.:16JJD740006)conducted by the Center for Linguistics and Applied Linguistics,Guangdong University of Foreign Studies.
文摘Languages and linguistic resources transport from one locality to another,adapting to the norms,customs,and regulations of a new locality.This process involves translocalization.Translocalization emphasizes the movement of linguistic resources against the backdrop of globalization and the combination or reframing of resources from different localities.This research explores the extent to which translocalization is reflected by the linguistic landscapes of three distinct commercial areas in Guangzhou,China.It goes on to discuss how translocalization works together with social rescaling to incur the movement of linguistic resources and to result in distinct linguistic landscapes of the three commercial areas.It concludes that some languages or linguistic resources,such as English,pinyin and traditional Chinese writing,are transported to local contexts for the purpose of rescaling,whereas other languages or dialects,like Cantonese,might gradually lose their function of rescaling and retain its function in indexing local identity and solidarity.This study calls for more attention to the local resources and contexts in linguistic landscape studies.It argues for the indexical function of linguistic resources in social rescaling and city planning.
文摘Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency and some of its phonological, morphological, and orthographic features challenge the development and initiatives in the area of ASR. By considering these challenges, this study is the first endeavor, which analyzed the characteristics of the language, prepared speech corpus, and developed different ASR systems. A small 3-hour read speech corpus was prepared and transcribed. Different basic and rounded phone unit-based speech recognizers were explored using multilingual deep neural network (DNN) modeling methods. The experimental results demonstrated that all the basic phone and rounded phone unit-based multilingual models outperformed the corresponding unilingual models with the relative performance improvements of 5.47% to 19.87% and 5.74% to 16.77%, respectively. The rounded phone unit-based multilingual models outperformed the equivalent basic phone unit-based models with relative performance improvements of 0.95% to 4.98%. Overall, we discovered that multilingual DNN modeling methods are profoundly effective to develop Chaha speech recognizers. Both the basic and rounded phone acoustic units are convenient to build Chaha ASR system. However, the rounded phone unit-based models are superior in performance and faster in recognition speed over the corresponding basic phone unit-based models. Hence, the rounded phone units are the most suitable acoustic units to develop Chaha ASR systems.
文摘This paper presents an overview of some cross-linguistic influences in multilingual language acquisition on both syntax and lexical level from Gabrys Baker's new book.
文摘Inspired by a Chinese notion of "无为而治", this paper tends to stresses the nature of language learners, who always bring in their multilingual competence in their language learning process. Based on a thorough presentation about the nature of language learners, this paper also provides constructive teaching implications for future actual teaching practices.
文摘In the age of the internet,social media are connecting us all at the tip of our fingers.People are linkedthrough different social media.The social network,Twitter,allows people to tweet their thoughts on any particular event or a specific political body which provides us with a diverse range of political insights.This paper serves the purpose of text processing of a multilingual dataset including Urdu,English,and Roman Urdu.Explore machine learning solutions for sentiment analysis and train models,collect the data on government from Twitter,apply sentiment analysis,and provide a python library that classifies text sentiment.Training data contained tweets in three languages:English:200k,Urdu:200k and Roman Urdu:11k.Five different classification models are applied to determine sentiments,and eventually,the use of ensemble technique to move forward with the acquired results is explored.The Logistic Regression model performed best with an accuracy of 75%,followed by the Linear Support Vector classifier and Stochastic Gradient Descent model,both having 74%accuracy.Lastly,Multinomial Naïve Bayes and Complement Naïve Bayes models both achieved 73%accuracy.
文摘Translation is a significant communicative activity with a long history. Translation study is developing into two turnsmacro and micro turn. Edwin Gentzler researched these two turns deeply in his thesis Macro-and Micro-turns in translation studies. The study in the circumstance of America can provide our Chinese translation study with some new aspects. Translation is more a frequent activity in the inner circunstance in China.
文摘The China-US Million Book Digital Library Project (Million Book Project) is an intemational cooperation program between China and the US. However, one million digitized books are considered not to be the ultimate goal of the project, but a first step towards universal access to human knowledge. In particular, there are four challenges about the new way to analyze, process, operate, visualize and interact with digital media resource in this library. To tackle these challenges, North China Centre of Million Book Project (in Chinese Academy of Sciences) has initiated several innovative research projects in areas such as multimedia content analysis and retrieval, bilingual services, multimodal information presentation, and knowledge-based organization and services. In this keynote speech, we simply review our work in these areas, and argue that by technological cooperation with these innovation research topics, the project will develop a top-level digital library platform for the million book library.
基金This work was supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2021-0-00358,AI·Big data based Cyber Security Orchestration and Automated Response Technology Development).
文摘In the field of natural language processing(NLP),the advancement of neural machine translation has paved the way for cross-lingual research.Yet,most studies in NLP have evaluated the proposed language models on well-refined datasets.We investigatewhether amachine translation approach is suitable for multilingual analysis of unrefined datasets,particularly,chat messages in Twitch.In order to address it,we collected the dataset,which included 7,066,854 and 3,365,569 chat messages from English and Korean streams,respectively.We employed several machine learning classifiers and neural networks with two different types of embedding:word-sequence embedding and the final layer of a pre-trained language model.The results of the employed models indicate that the accuracy difference between English,and English to Korean was relatively high,ranging from 3%to 12%.For Korean data(Korean,and Korean to English),it ranged from 0%to 2%.Therefore,the results imply that translation from a low-resource language(e.g.,Korean)into a high-resource language(e.g.,English)shows higher performance,in contrast to vice versa.Several implications and limitations of the presented results are also discussed.For instance,we suggest the feasibility of translation from resource-poor languages for using the tools of resource-rich languages in further analysis.
文摘The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for tuming printed books into digital books as its contribution to the building of a Universal Digital Library. This workflow is a process consisting of multiple phases, namely, scanning, image processing, OCR, digital archiving, document encoding, and publishing. Over the past couple of years, the BA has defined procedures and special techniques for the scanning, processing, OCR and publishing, especially of Arabic books. This workflow has been automated, allowing the governance of the different phases and making possible the production of 18000 books so far. The BA has also designed and implemented a framework for the encoding of digital books that allows publishing as well as a software system for managing the creation, maintenance, and publishing of the overall digital repository.
文摘OOV term translation plays an important role in natural language processing. Although many researchers in the past have endeavored to solve the OOV term translation problems, but none existing methods offer definition or context information of OOV terms. Furthermore, non-existing methods focus on cross-language definition retrieval for OOV terms. Never the less, it has always been so difficult to evaluate the correctness of an OOV term translation without domain specific knowledge and correct references. Our English definition ranking method differentiate the types of OOV terms, and applies different methods for translation extraction. Our English definition ranking method also extracts multilingual context information and monolingual definitions of OOV terms. In addition, we propose a novel cross-language definition retrieval system for OOV terms. Never the less, we propose an auto re-evaluation method to evaluate the correctness of OOV translations and definitions. Our methods achieve high performances against existing methods.
文摘The rise of social networking enables the development of multilingual Internet-accessible digital documents in several languages.The digital document needs to be evaluated physically through the Cross-Language Text Summarization(CLTS)involved in the disparate and generation of the source documents.Cross-language document processing is involved in the generation of documents from disparate language sources toward targeted documents.The digital documents need to be processed with the contextual semantic data with the decoding scheme.This paper presented a multilingual crosslanguage processing of the documents with the abstractive and summarising of the documents.The proposed model is represented as the Hidden Markov Model LSTM Reinforcement Learning(HMMlstmRL).First,the developed model uses the Hidden Markov model for the computation of keywords in the cross-language words for the clustering.In the second stage,bi-directional long-short-term memory networks are used for key word extraction in the cross-language process.Finally,the proposed HMMlstmRL uses the voting concept in reinforcement learning for the identification and extraction of the keywords.The performance of the proposed HMMlstmRL is 2%better than that of the conventional bi-direction LSTM model.
基金the National Social Science Foundation of China(Nos.15CTQ028 and 14@ZH036)the Social Science Foundation of Beijing(No.15SHA002)the Young Faculty Research Fund of Beijing Foreign Studies University(No.2015JT008)
文摘In view of the study of finance and economics information, we research on the real-time financial news posted on the authority sites in the world's major advanced economies. Analyzing the massive financial news of different information sources and language origins, we come up with a basic theory model and its algorithm on financial news, which is capable of intelligent collection, quick access, deduplication, correction and integration with financial news' backgrounds. Furthermore, we can find out connections between financial news and readers' interest. So we can achieve a real-time and on-demand financial news feed, as well as provide a theoretical basis and verification of the scientific problems on real-time processing of massive information. Finally, the simulation experiment shows that the multilingual financial news matching technology can give more help to distinguish the similar financial news in different languages than the traditional method.