Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)has emerged as a key strategy for distinguishing huma...Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)has emerged as a key strategy for distinguishing human users from automated bots.Text-based CAPTCHAs,designed to be easily decipherable by humans yet challenging for machines,are a common form of this verification.However,advancements in deep learning have facilitated the creation of models adept at recognizing these text-based CAPTCHAs with surprising efficiency.In our comprehensive investigation into CAPTCHA recognition,we have tailored the renowned UpDown image captioning model specifically for this purpose.Our approach innovatively combines an encoder to extract both global and local features,significantly boosting the model’s capability to identify complex details within CAPTCHA images.For the decoding phase,we have adopted a refined attention mechanism,integrating enhanced visual attention with dual layers of Long Short-Term Memory(LSTM)networks to elevate CAPTCHA recognition accuracy.Our rigorous testing across four varied datasets,including those from Weibo,BoC,Gregwar,and Captcha 0.3,demonstrates the versatility and effectiveness of our method.The results not only highlight the efficiency of our approach but also offer profound insights into its applicability across different CAPTCHA types,contributing to a deeper understanding of CAPTCHA recognition technology.展开更多
Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.Whil...Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.While text-based CAPTCHAs are designed to challenge machines while remaining human-readable,recent advances in deep learning have enabled models to recognize them with remarkable efficiency.In this regard,we propose a novel two-layer visual attention framework for CAPTCHA recognition that builds on traditional attention mechanisms by incorporating Guided Visual Attention(GVA),which sharpens focus on relevant visual features.We have specifically adapted the well-established image captioning task to address this need.Our approach utilizes the first-level attention module as guidance to the second-level attention component,incorporating two LSTM(Long Short-Term Memory)layers to enhance CAPTCHA recognition.Our extensive evaluation across four diverse datasets—Weibo,BoC(Bank of China),Gregwar,and Captcha 0.3—shows the adaptability and efficacy of our method.Our approach demonstrated impressive performance,achieving an accuracy of 96.70%for BoC and 95.92%for Webo.These results underscore the effectiveness of our method in accurately recognizing and processing CAPTCHA datasets,showcasing its robustness,reliability,and ability to handle varied challenges in CAPTCHA recognition.展开更多
Automated and accurate movie genre classification is crucial for content organization,recommendation systems,and audience targeting in the film industry.Although most existing approaches focus on audiovisual features ...Automated and accurate movie genre classification is crucial for content organization,recommendation systems,and audience targeting in the film industry.Although most existing approaches focus on audiovisual features such as trailers and posters,the text-based classification remains underexplored despite its accessibility and semantic richness.This paper introduces the Genre Attention Model(GAM),a deep learning architecture that integrates transformer models with a hierarchical attention mechanism to extract and leverage contextual information from movie plots formulti-label genre classification.In order to assess its effectiveness,we assessmultiple transformer-based models,including Bidirectional Encoder Representations fromTransformers(BERT),ALite BERT(ALBERT),Distilled BERT(DistilBERT),Robustly Optimized BERT Pretraining Approach(RoBERTa),Efficiently Learning an Encoder that Classifies Token Replacements Accurately(ELECTRA),eXtreme Learning Network(XLNet)and Decodingenhanced BERT with Disentangled Attention(DeBERTa).Experimental results demonstrate the superior performance of DeBERTa-based GAM,which employs a two-tier hierarchical attention mechanism:word-level attention highlights key terms,while sentence-level attention captures critical narrative segments,ensuring a refined and interpretable representation of movie plots.Evaluated on three benchmark datasets Trailers12K,Large Movie Trailer Dataset-9(LMTD-9),and MovieLens37K.GAM achieves micro-average precision scores of 83.63%,83.32%,and 83.34%,respectively,surpassing state-of-the-artmodels.Additionally,GAMis computationally efficient,requiring just 6.10Giga Floating Point Operations Per Second(GFLOPS),making it a scalable and cost-effective solution.These results highlight the growing potential of text-based deep learning models in genre classification and GAM’s effectiveness in improving predictive accuracy while maintaining computational efficiency.With its robust performance,GAM offers a versatile and scalable framework for content recommendation,film indexing,and media analytics,providing an interpretable alternative to traditional audiovisual-based classification techniques.展开更多
The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communica...The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communication: computer-mediated communication (CMC). Text-based CMC is one of the most popular forms of CMC in which people send instant messages to others in different settings. Since this mode of interaction combines features of both the written and spoken language (Greenfield & Subrahmanyam, 2003), it's of great interest whether it follows the same sequential rule as the telephone conversation. However, compared to telephone conversations, computer-mediated communication has received much less attention, let alone text-based CMC. The existing body of literature mostly focuses on content analysis and linguistic features but neglects the sequential organization of such interaction (Paolillo, 1999; Greenfield and Subrahmanyam, 2003; Herring, 1999). In light of this, this paper examines the opening moves of instant message exchanges among Chinese adults in an attempt to find out the unique features characterizing the way they open an online chat. The framework that was chosen for data analysis was the sequential model proposed by Schegloff for American telephone openings.展开更多
It has been suggested that text-based computer-mediated communication can help learners to use target language both in classrooms and in social contexts.It’s necessary to investigate the effect of text-based CMC on l...It has been suggested that text-based computer-mediated communication can help learners to use target language both in classrooms and in social contexts.It’s necessary to investigate the effect of text-based CMC on learners’communicative competence by conducting the method of systematic review.The findings implied that text-based CMC settings allowed learners to interact.The interaction provided learners with more opportunities to develop their communicative competence of target language.展开更多
Questions can be classified from different perspectives: grammatical form, communicative value, content orientation and cognitive level. In language pedagogy, text-based questioning as both an attention drawing devic...Questions can be classified from different perspectives: grammatical form, communicative value, content orientation and cognitive level. In language pedagogy, text-based questioning as both an attention drawing device and a form of learning tasks serves text instruction. Therefore, language teachers preparing text-based questioning should take into consideration all dimensions of questions, especially cognitive requirement and communicative character. In addition, interaction between learner, text and the world can be achieved by the adoption of both about-the-text and beyond-the-text questions in teachers' text-based question construction.展开更多
Recently,with the spread of online services involving websites,attack-ers have the opportunity to expose these services to malicious actions.To protect these services,A Completely Automated Public Turing Test to Tell ...Recently,with the spread of online services involving websites,attack-ers have the opportunity to expose these services to malicious actions.To protect these services,A Completely Automated Public Turing Test to Tell Computers and Humans Apart(CAPTCHA)is a proposed technique.Since many Arabic countries have developed their online services in Arabic,Arabic text-based CAPTCHA has been introduced to improve the usability for their users.More-over,there exist a visual cryptography(VC)technique which can be exploited in order to enhance the security of text-based CAPTCHA by encrypting a CAPTCHA image into two shares and decrypting it by asking the user to stack them on each other.However,as yet,the implementation of this technique with regard to Arabic text-based CAPTCHA has not been carried out.Therefore,this paper aims to implement an Arabic printed and handwritten text-based CAPTCHA scheme based on the VC technique.To evaluate this scheme,experi-mental studies are conducted,and the results show that the implemented scheme offers a reasonable security and usability levels with text-based CAPTCHA itself.展开更多
This study aimed at comparing the level of social presence generated in a voice-based chat room and a text-based forum when learners tried to build personal relationships and form an online community for learning on a...This study aimed at comparing the level of social presence generated in a voice-based chat room and a text-based forum when learners tried to build personal relationships and form an online community for learning on an online language course in China. A mixed-method approach was taken for the study, drawing on data from questionnaires to find out about student perception of social presence, and postings of text messages and audio messages in the communication of the student learning process to search for students’ projected social presence in terms of affective, interactive and cohesive features. Interviews were also conducted to supplement additional information with the hope of forming a complete picture of social presence in the reality of an online learning environment. The text-based forum and the voice-based chat room were found to have a different impact on student social presence. In terms of student perception, most of them were more likely to get to know peers in the text-based forum and thus developed a sense of community in their learning process of the online course. Yet they believed that the voice-based chat room had the advantage of helping them with course learning. In the actual interaction, the voice-based chat room was more interactive although the text-based forum was more affective and cohesive. But in terms of the affective category, the problem with the existing framework in literature was that there were no prosodic features included. Therefore, in future more research is needed to probe for the relationship between prosodic sound features and social presence, and the present theoretic framework must be extended. In interviews, students explained that in the voice-based chat room prosodic features led to higher peer awareness, which further reinforced this need.展开更多
This paper records the author's comments and reflection on reading Siegel's(2013) article on L2 listening instruction.It first summarizes major arguments put forward in Siegel's(2013) article, and then pro...This paper records the author's comments and reflection on reading Siegel's(2013) article on L2 listening instruction.It first summarizes major arguments put forward in Siegel's(2013) article, and then proceeds to review commonly used approaches to L2 listening instruction, after which it examines the discrepancy between research and practice. In the end, the author gives some suggestions for further research.展开更多
<strong>Aim:</strong> The aim of this study was to explore patients’ preferences for forms of patient education material, including leaflets, podcasts, and videos;that is, to determine what forms of infor...<strong>Aim:</strong> The aim of this study was to explore patients’ preferences for forms of patient education material, including leaflets, podcasts, and videos;that is, to determine what forms of information, besides that provided verbally by healthcare personnel, do patients prefer following visits to hospital? <strong>Methods: </strong>The study was a mixed-methods study, using a survey design with primarily quantitative items but with a qualitative component. A survey was distributed to patients over 18 years between May and July 2020 and 480 patients chose to respond.<strong> Results:</strong> Text-based patient education materials (leaflets), is the form that patients have the most experience with and was preferred by 86.46% of respondents;however, 50.21% and 31.67% of respondents would also like to receive patient education material in video and podcast formats, respectively. Furthermore, several respondents wrote about the need for different forms of patient education material, depending on the subject of the supplementary information. <strong>Conclusion: </strong>This study provides an overview of patient preferences regarding forms of patient education material. The results show that the majority of respondents prefer to use combinations of written, audio, and video material, thus applying and co-constructing a multimodal communication system, from which they select and apply different modes of communication from different sources simultaneously.展开更多
Storyboards comprising key illustrations and images help filmmakers to outline ideas,key moments,and story events when filming movies.Inspired by this,we introduce the first contextual benchmark dataset Script-to-Stor...Storyboards comprising key illustrations and images help filmmakers to outline ideas,key moments,and story events when filming movies.Inspired by this,we introduce the first contextual benchmark dataset Script-to-Storyboard(Sc2St)composed of storyboards to explicitly express story structures in the movie domain,and propose the contextual retrieval task to facilitate movie story understanding.The Sc2St dataset contains fine-grained and diverse texts,annotated semantic keyframes,and coherent storylines in storyboards,unlike existing movie datasets.The contextual retrieval task takes as input a multi-sentence movie script summary with keyframe history and aims to retrieve a future keyframe described by a corresponding sentence to form the storyboard.Compared to classic text-based visual retrieval tasks,this requires capturing the context from the description(script)and keyframe history.We benchmark existing text-based visual retrieval methods on the new dataset and propose a recurrent-based framework with three variants for effective context encoding.Comprehensive experiments demonstrate that our methods compare favourably to existing methods;ablation studies validate the effectiveness of the proposed context encoding approaches.展开更多
Social perception refers to how individuals interpret and understand the social world.It is a foundational area of theory and measurement within the social sciences,particularly in communication,political science,psyc...Social perception refers to how individuals interpret and understand the social world.It is a foundational area of theory and measurement within the social sciences,particularly in communication,political science,psychology,and sociology.Classical models include the Stereotype Content Model(SCM),Dual Perspective Model(DPM),and Semantic Differential(SD).Extensive research has been conducted on these models.However,their interrelationships are still difficult to define using conventional comparison methods,which often lack efficiency,validity,and scalability.To tackle this challenge,we employ a text-based computational approach to quantitatively represent each theoretical dimension of the models.Specifically,we map key content dimensions into a shared semantic space using word embeddings and automate the selection of over 500 contrasting word pairs based on semantic differential theory.The results suggest that social perception can be organized around two fundamental components:subjective evaluation(e.g.,how good or likable someone is)and objective attributes(e.g.,power or competence).Furthermore,we validate this computational approach with the widely used Rosenberg’s 64 personality traits,demonstrating improvements in predictive performance over previous methods,with increases of 19%,13%,and 4%for the SD,DPM,and SCM dimensions,respectively.By enabling scalable and interpretable comparisons across these models,our findings would facilitate both theoretical integration and practical applications.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
基金supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047)High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).
文摘Improving website security to prevent malicious online activities is crucial,and CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)has emerged as a key strategy for distinguishing human users from automated bots.Text-based CAPTCHAs,designed to be easily decipherable by humans yet challenging for machines,are a common form of this verification.However,advancements in deep learning have facilitated the creation of models adept at recognizing these text-based CAPTCHAs with surprising efficiency.In our comprehensive investigation into CAPTCHA recognition,we have tailored the renowned UpDown image captioning model specifically for this purpose.Our approach innovatively combines an encoder to extract both global and local features,significantly boosting the model’s capability to identify complex details within CAPTCHA images.For the decoding phase,we have adopted a refined attention mechanism,integrating enhanced visual attention with dual layers of Long Short-Term Memory(LSTM)networks to elevate CAPTCHA recognition accuracy.Our rigorous testing across four varied datasets,including those from Weibo,BoC,Gregwar,and Captcha 0.3,demonstrates the versatility and effectiveness of our method.The results not only highlight the efficiency of our approach but also offer profound insights into its applicability across different CAPTCHA types,contributing to a deeper understanding of CAPTCHA recognition technology.
基金supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047)High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).
文摘Enhancing website security is crucial to combat malicious activities,and CAPTCHA(Completely Automated Public Turing tests to tell Computers and Humans Apart)has become a key method to distinguish humans from bots.While text-based CAPTCHAs are designed to challenge machines while remaining human-readable,recent advances in deep learning have enabled models to recognize them with remarkable efficiency.In this regard,we propose a novel two-layer visual attention framework for CAPTCHA recognition that builds on traditional attention mechanisms by incorporating Guided Visual Attention(GVA),which sharpens focus on relevant visual features.We have specifically adapted the well-established image captioning task to address this need.Our approach utilizes the first-level attention module as guidance to the second-level attention component,incorporating two LSTM(Long Short-Term Memory)layers to enhance CAPTCHA recognition.Our extensive evaluation across four diverse datasets—Weibo,BoC(Bank of China),Gregwar,and Captcha 0.3—shows the adaptability and efficacy of our method.Our approach demonstrated impressive performance,achieving an accuracy of 96.70%for BoC and 95.92%for Webo.These results underscore the effectiveness of our method in accurately recognizing and processing CAPTCHA datasets,showcasing its robustness,reliability,and ability to handle varied challenges in CAPTCHA recognition.
基金would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support(QU-APC-2025).
文摘Automated and accurate movie genre classification is crucial for content organization,recommendation systems,and audience targeting in the film industry.Although most existing approaches focus on audiovisual features such as trailers and posters,the text-based classification remains underexplored despite its accessibility and semantic richness.This paper introduces the Genre Attention Model(GAM),a deep learning architecture that integrates transformer models with a hierarchical attention mechanism to extract and leverage contextual information from movie plots formulti-label genre classification.In order to assess its effectiveness,we assessmultiple transformer-based models,including Bidirectional Encoder Representations fromTransformers(BERT),ALite BERT(ALBERT),Distilled BERT(DistilBERT),Robustly Optimized BERT Pretraining Approach(RoBERTa),Efficiently Learning an Encoder that Classifies Token Replacements Accurately(ELECTRA),eXtreme Learning Network(XLNet)and Decodingenhanced BERT with Disentangled Attention(DeBERTa).Experimental results demonstrate the superior performance of DeBERTa-based GAM,which employs a two-tier hierarchical attention mechanism:word-level attention highlights key terms,while sentence-level attention captures critical narrative segments,ensuring a refined and interpretable representation of movie plots.Evaluated on three benchmark datasets Trailers12K,Large Movie Trailer Dataset-9(LMTD-9),and MovieLens37K.GAM achieves micro-average precision scores of 83.63%,83.32%,and 83.34%,respectively,surpassing state-of-the-artmodels.Additionally,GAMis computationally efficient,requiring just 6.10Giga Floating Point Operations Per Second(GFLOPS),making it a scalable and cost-effective solution.These results highlight the growing potential of text-based deep learning models in genre classification and GAM’s effectiveness in improving predictive accuracy while maintaining computational efficiency.With its robust performance,GAM offers a versatile and scalable framework for content recommendation,film indexing,and media analytics,providing an interpretable alternative to traditional audiovisual-based classification techniques.
文摘The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communication: computer-mediated communication (CMC). Text-based CMC is one of the most popular forms of CMC in which people send instant messages to others in different settings. Since this mode of interaction combines features of both the written and spoken language (Greenfield & Subrahmanyam, 2003), it's of great interest whether it follows the same sequential rule as the telephone conversation. However, compared to telephone conversations, computer-mediated communication has received much less attention, let alone text-based CMC. The existing body of literature mostly focuses on content analysis and linguistic features but neglects the sequential organization of such interaction (Paolillo, 1999; Greenfield and Subrahmanyam, 2003; Herring, 1999). In light of this, this paper examines the opening moves of instant message exchanges among Chinese adults in an attempt to find out the unique features characterizing the way they open an online chat. The framework that was chosen for data analysis was the sequential model proposed by Schegloff for American telephone openings.
文摘It has been suggested that text-based computer-mediated communication can help learners to use target language both in classrooms and in social contexts.It’s necessary to investigate the effect of text-based CMC on learners’communicative competence by conducting the method of systematic review.The findings implied that text-based CMC settings allowed learners to interact.The interaction provided learners with more opportunities to develop their communicative competence of target language.
文摘Questions can be classified from different perspectives: grammatical form, communicative value, content orientation and cognitive level. In language pedagogy, text-based questioning as both an attention drawing device and a form of learning tasks serves text instruction. Therefore, language teachers preparing text-based questioning should take into consideration all dimensions of questions, especially cognitive requirement and communicative character. In addition, interaction between learner, text and the world can be achieved by the adoption of both about-the-text and beyond-the-text questions in teachers' text-based question construction.
文摘Recently,with the spread of online services involving websites,attack-ers have the opportunity to expose these services to malicious actions.To protect these services,A Completely Automated Public Turing Test to Tell Computers and Humans Apart(CAPTCHA)is a proposed technique.Since many Arabic countries have developed their online services in Arabic,Arabic text-based CAPTCHA has been introduced to improve the usability for their users.More-over,there exist a visual cryptography(VC)technique which can be exploited in order to enhance the security of text-based CAPTCHA by encrypting a CAPTCHA image into two shares and decrypting it by asking the user to stack them on each other.However,as yet,the implementation of this technique with regard to Arabic text-based CAPTCHA has not been carried out.Therefore,this paper aims to implement an Arabic printed and handwritten text-based CAPTCHA scheme based on the VC technique.To evaluate this scheme,experi-mental studies are conducted,and the results show that the implemented scheme offers a reasonable security and usability levels with text-based CAPTCHA itself.
文摘This study aimed at comparing the level of social presence generated in a voice-based chat room and a text-based forum when learners tried to build personal relationships and form an online community for learning on an online language course in China. A mixed-method approach was taken for the study, drawing on data from questionnaires to find out about student perception of social presence, and postings of text messages and audio messages in the communication of the student learning process to search for students’ projected social presence in terms of affective, interactive and cohesive features. Interviews were also conducted to supplement additional information with the hope of forming a complete picture of social presence in the reality of an online learning environment. The text-based forum and the voice-based chat room were found to have a different impact on student social presence. In terms of student perception, most of them were more likely to get to know peers in the text-based forum and thus developed a sense of community in their learning process of the online course. Yet they believed that the voice-based chat room had the advantage of helping them with course learning. In the actual interaction, the voice-based chat room was more interactive although the text-based forum was more affective and cohesive. But in terms of the affective category, the problem with the existing framework in literature was that there were no prosodic features included. Therefore, in future more research is needed to probe for the relationship between prosodic sound features and social presence, and the present theoretic framework must be extended. In interviews, students explained that in the voice-based chat room prosodic features led to higher peer awareness, which further reinforced this need.
文摘This paper records the author's comments and reflection on reading Siegel's(2013) article on L2 listening instruction.It first summarizes major arguments put forward in Siegel's(2013) article, and then proceeds to review commonly used approaches to L2 listening instruction, after which it examines the discrepancy between research and practice. In the end, the author gives some suggestions for further research.
文摘<strong>Aim:</strong> The aim of this study was to explore patients’ preferences for forms of patient education material, including leaflets, podcasts, and videos;that is, to determine what forms of information, besides that provided verbally by healthcare personnel, do patients prefer following visits to hospital? <strong>Methods: </strong>The study was a mixed-methods study, using a survey design with primarily quantitative items but with a qualitative component. A survey was distributed to patients over 18 years between May and July 2020 and 480 patients chose to respond.<strong> Results:</strong> Text-based patient education materials (leaflets), is the form that patients have the most experience with and was preferred by 86.46% of respondents;however, 50.21% and 31.67% of respondents would also like to receive patient education material in video and podcast formats, respectively. Furthermore, several respondents wrote about the need for different forms of patient education material, depending on the subject of the supplementary information. <strong>Conclusion: </strong>This study provides an overview of patient preferences regarding forms of patient education material. The results show that the majority of respondents prefer to use combinations of written, audio, and video material, thus applying and co-constructing a multimodal communication system, from which they select and apply different modes of communication from different sources simultaneously.
基金supported by RCUK grant CAMERA(EP/M023281/1,EP/T022523/1)the Centre for Augmented Reasoning(CAR)at the Australian Institute for Machine Learning,and a gift from Adobe.
文摘Storyboards comprising key illustrations and images help filmmakers to outline ideas,key moments,and story events when filming movies.Inspired by this,we introduce the first contextual benchmark dataset Script-to-Storyboard(Sc2St)composed of storyboards to explicitly express story structures in the movie domain,and propose the contextual retrieval task to facilitate movie story understanding.The Sc2St dataset contains fine-grained and diverse texts,annotated semantic keyframes,and coherent storylines in storyboards,unlike existing movie datasets.The contextual retrieval task takes as input a multi-sentence movie script summary with keyframe history and aims to retrieve a future keyframe described by a corresponding sentence to form the storyboard.Compared to classic text-based visual retrieval tasks,this requires capturing the context from the description(script)and keyframe history.We benchmark existing text-based visual retrieval methods on the new dataset and propose a recurrent-based framework with three variants for effective context encoding.Comprehensive experiments demonstrate that our methods compare favourably to existing methods;ablation studies validate the effectiveness of the proposed context encoding approaches.
文摘Social perception refers to how individuals interpret and understand the social world.It is a foundational area of theory and measurement within the social sciences,particularly in communication,political science,psychology,and sociology.Classical models include the Stereotype Content Model(SCM),Dual Perspective Model(DPM),and Semantic Differential(SD).Extensive research has been conducted on these models.However,their interrelationships are still difficult to define using conventional comparison methods,which often lack efficiency,validity,and scalability.To tackle this challenge,we employ a text-based computational approach to quantitatively represent each theoretical dimension of the models.Specifically,we map key content dimensions into a shared semantic space using word embeddings and automate the selection of over 500 contrasting word pairs based on semantic differential theory.The results suggest that social perception can be organized around two fundamental components:subjective evaluation(e.g.,how good or likable someone is)and objective attributes(e.g.,power or competence).Furthermore,we validate this computational approach with the widely used Rosenberg’s 64 personality traits,demonstrating improvements in predictive performance over previous methods,with increases of 19%,13%,and 4%for the SD,DPM,and SCM dimensions,respectively.By enabling scalable and interpretable comparisons across these models,our findings would facilitate both theoretical integration and practical applications.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.