The integration of Communicative Language Testing(CLT)principles with AI-driven automated assessment poses a significant challenge in professional language testing.Addressing this issue within the specific context of ...The integration of Communicative Language Testing(CLT)principles with AI-driven automated assessment poses a significant challenge in professional language testing.Addressing this issue within the specific context of Civil Aviation Ground Service English,this study explores pathways for their logical reconciliation.Through conceptual analysis and theoretical deduction,with a focus on human-AI interaction scenarios,we demonstrate that the synergy between CLT and AI stems from a shared focus on competency measurement.Key findings reveal that:(1)standardized competency dimensions in CLT can be operationalized into data-processable formats for AI;(2)within professional contexts,AI algorithms can be tailored using authentic service corpora to meet CLT’s demand for situational authenticity;and(3)a division of labor based on competency level-where AI handles standardized scoring of lower-order competencies and human-AI collaboration assesses higher-order competencies-effectively resolves the tension between CLT’s dynamic communication and AI’s static algorithms.Ultimately,the study constructs a three-dimensional integration framework encompassing“professional register,”“competency level,”and“human-AI division of labor,”offering a theoretical model for CLT-AI integration and a practical blueprint for innovating Civil Aviation Ground Service English assessment.展开更多
LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora...LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora,which introduces significant bias in performance evaluation;and(2)mainstream metrics focus primarily on surface-level accuracy,failing to uncover the underlying factors that constrain model capabilities.To address these issues,this paper presents TCode(Translation-Oriented Code Evaluation benchmark)—a complexity-controllable,contamination-free benchmark dataset for code translation—alongside a dedicated static feature sensitivity evaluation framework.The dataset is carefully designed to control complexity along multiple dimensions—including syntactic nesting and expression intricacy—enabling both broad coverage and fine-grained differentiation of sample difficulty.This design supports precise evaluation of model capabilities across a wide spectrum of translation challenges.The proposed evaluation framework introduces a correlation-driven analysis mechanism based on static program features,enabling predictive modeling of translation success from two perspectives:Code Form Complexity(e.g.,code length and character density)and Semantic Modeling Complexity(e.g.,syntactic depth,control-flow nesting,and type system complexity).Empirical evaluations across representative LLMs—including Qwen2.5-72B and Llama3.3-70B—demonstrate that even state-of-the-art models achieve over 80% compilation success on simple samples,but their accuracy drops sharply below 40% on complex cases.Further correlation analysis indicates that Semantic Modeling Complexity alone is correlated with up to 60% of the variance in translation success,with static program features exhibiting nonlinear threshold effects that highlight clear capability boundaries.This study departs fromthe traditional accuracy-centric evaluation paradigm and,for the first time,systematically characterizes the capabilities of large languagemodels in translation tasks through the lens of programstatic features.The findings provide actionable insights for model refinement and training strategy development.展开更多
In this paper,some basic theories and terms of language testing,like types of language tests and qualities of good tests were briefly discussed.Then CET3(College English Test Band3) in Beijing was analyzed from three ...In this paper,some basic theories and terms of language testing,like types of language tests and qualities of good tests were briefly discussed.Then CET3(College English Test Band3) in Beijing was analyzed from three perspectives:design principles,item formats and test content. In the end,some suggestions were given to improve this test.展开更多
Speaking,as a productive skill,is a priority for many foreign-language learners.They often evaluate their success in language learning on the basis of how much they feel they have improved in their spoken language pro...Speaking,as a productive skill,is a priority for many foreign-language learners.They often evaluate their success in language learning on the basis of how much they feel they have improved in their spoken language proficiency.Consequently,testing of oral skills has hardly been neglected in college English examination.The communicative testing theory in 1970s greatly influenced language testing,especially the oral tests.This essay briefly explores the theory of communicative language testing and discusses the methods of TOEFL oral test and college English oral test and proposes ways to the latter one for further improve ment.展开更多
This article is mainly talked about CET English writing tests from the perspective of language testing.Writing tests designed to test the language proficiency,have direct and integrative characteristics.Writing requir...This article is mainly talked about CET English writing tests from the perspective of language testing.Writing tests designed to test the language proficiency,have direct and integrative characteristics.Writing requires the candidates to use language accurately,fluently and appropriately.展开更多
This paper is to investigate whether the CET-4 writing section has positive effect in terms of Communicative Language Testing.The questionnaire survey method is adopted to carry out the research to collect data.It is ...This paper is to investigate whether the CET-4 writing section has positive effect in terms of Communicative Language Testing.The questionnaire survey method is adopted to carry out the research to collect data.It is concluded that currently the CET-4 writing section has certain harmful effect upon English learning of college non-English major students.Further reforms and improvement from the perspective of communicative language testing concerning the scoring criteria and the design of the task should be the focus of the future.展开更多
Language testing is an important link in language teaching,in this paper,the two important criteria of language test the reliability and validity has carried on the detailed elaboration,in order to a language teacher ...Language testing is an important link in language teaching,in this paper,the two important criteria of language test the reliability and validity has carried on the detailed elaboration,in order to a language teacher proposition and evaluation test more scientific.展开更多
This paper shows some points of view on communicative language ability and issues of processing an effective communicative language test. Different considerations are given in different stages both from aspects of val...This paper shows some points of view on communicative language ability and issues of processing an effective communicative language test. Different considerations are given in different stages both from aspects of validity and reliability. All the processes are involved in considering the characteristics of communicative language test and special needs from communicative language ability. Different from other kinds of tests, it pays much more attention to context, precision measurement and criterion.展开更多
Speaking skill is an important component in the student’s communicative competence.The testing of this skill is indispensable with its functions of providing useful feedback and motivating students.Where direct face-...Speaking skill is an important component in the student’s communicative competence.The testing of this skill is indispensable with its functions of providing useful feedback and motivating students.Where direct face-to-face oral test is impossible when large number of candidates involved,language lab or computer can be used to carry out various testing tasks in the evaluation of the students’ oral communicative competence.展开更多
Background:Large language models(LLMs)have shown promise in educational applications,but their performance on high-stakes admissions tests,such as the Dental Admission Test(DAT),remains unclear.Understanding the capab...Background:Large language models(LLMs)have shown promise in educational applications,but their performance on high-stakes admissions tests,such as the Dental Admission Test(DAT),remains unclear.Understanding the capabilities and limitations of these models is critical for determining their suitability in test preparation.Methods:This study evaluated the ability of 16 LLMs,including general-purpose models(e.g.,GPT-3.5,GPT-4,GPT-4o,GPT-o1,Google’s Bard,mistral-large,and Claude),domain-specific finetuned models(e.g.,DentalGPT,MedGPT,and BioGPT),and open-source models(e.g.,Llama2-7B,Llama2-13B,Llama2-70B,Llama3-8B,and Llama3-70B),to answer questions from a sample DAT.Quantitative analysis was performed to assess model accuracy in different sections,and qualitative thematic analysis by subject matter experts examined specific challenges encountered by the models.Results:GPT-4o and GPT-o1 outperformed others in text-based questions assessing knowledge and comprehension,with GPT-o1 achieving perfect scores in the natural sciences(NS)and reading comprehension(RC)sections.Open-source models such as Llama3-70B also performed competitively in RC tasks.However,all models,including GPT-4o,struggled substantially with perceptual ability(PA)items,highlighting a persistent limitation in handling image-based tasks requiring visual-spatial reasoning.Fine-tuned medical models(e.g.,DentalGPT,MedGPT,and BioGPT)demonstrated moderate success in text-based tasks but underperformed in areas requiring critical thinking and reasoning.Thematic analysis identified key challenges,including difficulties with stepwise problem-solving,transferring knowledge,comprehending intricate questions,and hallucinations,particularly on advanced items.Conclusions:While LLMs show potential for reinforcing factual knowledge and supporting learners,their limitations in handling higherorder cognitive tasks and image-based reasoning underscore the need for judicious integration with instructor-led guidance and targeted practice.This study provides valuable insights into the capabilities and limitations of current LLMs in preparing prospective dental students and highlights pathways for future innovations to improve performance across all cognitive skills assessed by the DAT.展开更多
With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more gener...With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more general constructs.Based on the profile mechanism of unified modeling language(UML) 2.2,a kind of DSML is presented to model simulation testing systems of avionic software(STSAS).To define the syntax,semantics and notions of the DSML,the domain model of the STSAS from which we generalize the domain concepts and relationships among these concepts is given,and then,the domain model is mapped into a UML meta-model,named UML-STSAS profile.Assuming a flight control system(FCS) as system under test(SUT),we design the relevant STSAS.The results indicate that extending UML to the simulation testing domain can effectively and precisely model STSAS.展开更多
Testing speaking ability offers plenty of scope for meeting the criteria for communicative testing.The article describes the model of CLA,analyzes basic factors involved in speaking competence,discusses what is a comm...Testing speaking ability offers plenty of scope for meeting the criteria for communicative testing.The article describes the model of CLA,analyzes basic factors involved in speaking competence,discusses what is a communicative language test of speaking,and suggests some factors that should be taken into consideration when designing a communicative language test of speaking.展开更多
The 37th Language Testing Research Colloquium(LTRC 2015①)was held at Eaton Chelsea Hotel in Toronto Canada during March 16-20,2015.The first two days of March 16-17 were preconference workshop days with March 18-20 a...The 37th Language Testing Research Colloquium(LTRC 2015①)was held at Eaton Chelsea Hotel in Toronto Canada during March 16-20,2015.The first two days of March 16-17 were preconference workshop days with March 18-20 as the three main conference days.More than 300 participants from 27 countries and regions joined the conference.The top numbers of the展开更多
This paper aims to talk about the superficial reflections on language testing from the perspective of professional training for test writers,to qualify test writers with basic testing theories,to make the test valid a...This paper aims to talk about the superficial reflections on language testing from the perspective of professional training for test writers,to qualify test writers with basic testing theories,to make the test valid and reliable,to help improve educational reform and language teaching,to fully conform to the standards set by syllabus.展开更多
This paper primarily focuses on the major development of language testing since 1980s. Critical to testing is me concept of language proficiency or ability which consists two components of language competence (or lan...This paper primarily focuses on the major development of language testing since 1980s. Critical to testing is me concept of language proficiency or ability which consists two components of language competence (or language knowledge) and strategic competence. A communicative approach to testing is demanding to catch up with the new teaching concept and to measure a learner' s cornmunicative competence. Communicative language tests are intended to measure people' s ability to use language communicatively in a variety of real life situations.展开更多
Departing from a brief presentation of the various views of language, this article elaborates the influence of views of language on language teaching and language testing. This will help us have background knowledge o...Departing from a brief presentation of the various views of language, this article elaborates the influence of views of language on language teaching and language testing. This will help us have background knowledge on language teaching and language testing.展开更多
The relationship between language testing and language teaching is said to be inseparably interconnected and act together.Language testing has been unceasingly improving itself responding to the requirements of langua...The relationship between language testing and language teaching is said to be inseparably interconnected and act together.Language testing has been unceasingly improving itself responding to the requirements of language teaching so as to improve testees’language ability.From the prescientific movement to the now newest communicative movement,four main approaches to testing have its pros and cons in achieving the reliability and validity of testing.By them,we can see communicative approach is of necessity to become the milestone of language展开更多
基金supported by the Vocational Education Research Project from China Commercial Technicians Association:“Research on the Construction and Application of AI-Enabled English Testing System for Civil Aviation Ground Services”(20ZSJYB20250420)the Education Science Planning Projects(Higher Education Special Program)from Guangdong Provincial Department of Education:“Research on the Evaluation System of Digital Competence in Curriculum Ideology and Politics for Higher Vocational Teachers in Guangdong under the Background of Educational Digitalization”(2024GXJK877)“Digital Empowerment for High-Quality Development in Guangdong:An Innovative Study on Cultivating Interdisciplinary Foreign Language Talents”(2023GXJK691).
文摘The integration of Communicative Language Testing(CLT)principles with AI-driven automated assessment poses a significant challenge in professional language testing.Addressing this issue within the specific context of Civil Aviation Ground Service English,this study explores pathways for their logical reconciliation.Through conceptual analysis and theoretical deduction,with a focus on human-AI interaction scenarios,we demonstrate that the synergy between CLT and AI stems from a shared focus on competency measurement.Key findings reveal that:(1)standardized competency dimensions in CLT can be operationalized into data-processable formats for AI;(2)within professional contexts,AI algorithms can be tailored using authentic service corpora to meet CLT’s demand for situational authenticity;and(3)a division of labor based on competency level-where AI handles standardized scoring of lower-order competencies and human-AI collaboration assesses higher-order competencies-effectively resolves the tension between CLT’s dynamic communication and AI’s static algorithms.Ultimately,the study constructs a three-dimensional integration framework encompassing“professional register,”“competency level,”and“human-AI division of labor,”offering a theoretical model for CLT-AI integration and a practical blueprint for innovating Civil Aviation Ground Service English assessment.
文摘LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora,which introduces significant bias in performance evaluation;and(2)mainstream metrics focus primarily on surface-level accuracy,failing to uncover the underlying factors that constrain model capabilities.To address these issues,this paper presents TCode(Translation-Oriented Code Evaluation benchmark)—a complexity-controllable,contamination-free benchmark dataset for code translation—alongside a dedicated static feature sensitivity evaluation framework.The dataset is carefully designed to control complexity along multiple dimensions—including syntactic nesting and expression intricacy—enabling both broad coverage and fine-grained differentiation of sample difficulty.This design supports precise evaluation of model capabilities across a wide spectrum of translation challenges.The proposed evaluation framework introduces a correlation-driven analysis mechanism based on static program features,enabling predictive modeling of translation success from two perspectives:Code Form Complexity(e.g.,code length and character density)and Semantic Modeling Complexity(e.g.,syntactic depth,control-flow nesting,and type system complexity).Empirical evaluations across representative LLMs—including Qwen2.5-72B and Llama3.3-70B—demonstrate that even state-of-the-art models achieve over 80% compilation success on simple samples,but their accuracy drops sharply below 40% on complex cases.Further correlation analysis indicates that Semantic Modeling Complexity alone is correlated with up to 60% of the variance in translation success,with static program features exhibiting nonlinear threshold effects that highlight clear capability boundaries.This study departs fromthe traditional accuracy-centric evaluation paradigm and,for the first time,systematically characterizes the capabilities of large languagemodels in translation tasks through the lens of programstatic features.The findings provide actionable insights for model refinement and training strategy development.
文摘In this paper,some basic theories and terms of language testing,like types of language tests and qualities of good tests were briefly discussed.Then CET3(College English Test Band3) in Beijing was analyzed from three perspectives:design principles,item formats and test content. In the end,some suggestions were given to improve this test.
文摘Speaking,as a productive skill,is a priority for many foreign-language learners.They often evaluate their success in language learning on the basis of how much they feel they have improved in their spoken language proficiency.Consequently,testing of oral skills has hardly been neglected in college English examination.The communicative testing theory in 1970s greatly influenced language testing,especially the oral tests.This essay briefly explores the theory of communicative language testing and discusses the methods of TOEFL oral test and college English oral test and proposes ways to the latter one for further improve ment.
文摘This article is mainly talked about CET English writing tests from the perspective of language testing.Writing tests designed to test the language proficiency,have direct and integrative characteristics.Writing requires the candidates to use language accurately,fluently and appropriately.
文摘This paper is to investigate whether the CET-4 writing section has positive effect in terms of Communicative Language Testing.The questionnaire survey method is adopted to carry out the research to collect data.It is concluded that currently the CET-4 writing section has certain harmful effect upon English learning of college non-English major students.Further reforms and improvement from the perspective of communicative language testing concerning the scoring criteria and the design of the task should be the focus of the future.
文摘Language testing is an important link in language teaching,in this paper,the two important criteria of language test the reliability and validity has carried on the detailed elaboration,in order to a language teacher proposition and evaluation test more scientific.
文摘This paper shows some points of view on communicative language ability and issues of processing an effective communicative language test. Different considerations are given in different stages both from aspects of validity and reliability. All the processes are involved in considering the characteristics of communicative language test and special needs from communicative language ability. Different from other kinds of tests, it pays much more attention to context, precision measurement and criterion.
文摘Speaking skill is an important component in the student’s communicative competence.The testing of this skill is indispensable with its functions of providing useful feedback and motivating students.Where direct face-to-face oral test is impossible when large number of candidates involved,language lab or computer can be used to carry out various testing tasks in the evaluation of the students’ oral communicative competence.
基金partially supported by the National Institutes of Health’s National Center for Complementary and Integrative Health under grant number R01AT009457National Institute on Aging under grant number R01AG078154National Cancer Institute under grant number R01CA287413.
文摘Background:Large language models(LLMs)have shown promise in educational applications,but their performance on high-stakes admissions tests,such as the Dental Admission Test(DAT),remains unclear.Understanding the capabilities and limitations of these models is critical for determining their suitability in test preparation.Methods:This study evaluated the ability of 16 LLMs,including general-purpose models(e.g.,GPT-3.5,GPT-4,GPT-4o,GPT-o1,Google’s Bard,mistral-large,and Claude),domain-specific finetuned models(e.g.,DentalGPT,MedGPT,and BioGPT),and open-source models(e.g.,Llama2-7B,Llama2-13B,Llama2-70B,Llama3-8B,and Llama3-70B),to answer questions from a sample DAT.Quantitative analysis was performed to assess model accuracy in different sections,and qualitative thematic analysis by subject matter experts examined specific challenges encountered by the models.Results:GPT-4o and GPT-o1 outperformed others in text-based questions assessing knowledge and comprehension,with GPT-o1 achieving perfect scores in the natural sciences(NS)and reading comprehension(RC)sections.Open-source models such as Llama3-70B also performed competitively in RC tasks.However,all models,including GPT-4o,struggled substantially with perceptual ability(PA)items,highlighting a persistent limitation in handling image-based tasks requiring visual-spatial reasoning.Fine-tuned medical models(e.g.,DentalGPT,MedGPT,and BioGPT)demonstrated moderate success in text-based tasks but underperformed in areas requiring critical thinking and reasoning.Thematic analysis identified key challenges,including difficulties with stepwise problem-solving,transferring knowledge,comprehending intricate questions,and hallucinations,particularly on advanced items.Conclusions:While LLMs show potential for reinforcing factual knowledge and supporting learners,their limitations in handling higherorder cognitive tasks and image-based reasoning underscore the need for judicious integration with instructor-led guidance and targeted practice.This study provides valuable insights into the capabilities and limitations of current LLMs in preparing prospective dental students and highlights pathways for future innovations to improve performance across all cognitive skills assessed by the DAT.
基金Aeronautical Science Foundation of China (20095551025)
文摘With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more general constructs.Based on the profile mechanism of unified modeling language(UML) 2.2,a kind of DSML is presented to model simulation testing systems of avionic software(STSAS).To define the syntax,semantics and notions of the DSML,the domain model of the STSAS from which we generalize the domain concepts and relationships among these concepts is given,and then,the domain model is mapped into a UML meta-model,named UML-STSAS profile.Assuming a flight control system(FCS) as system under test(SUT),we design the relevant STSAS.The results indicate that extending UML to the simulation testing domain can effectively and precisely model STSAS.
文摘Testing speaking ability offers plenty of scope for meeting the criteria for communicative testing.The article describes the model of CLA,analyzes basic factors involved in speaking competence,discusses what is a communicative language test of speaking,and suggests some factors that should be taken into consideration when designing a communicative language test of speaking.
文摘The 37th Language Testing Research Colloquium(LTRC 2015①)was held at Eaton Chelsea Hotel in Toronto Canada during March 16-20,2015.The first two days of March 16-17 were preconference workshop days with March 18-20 as the three main conference days.More than 300 participants from 27 countries and regions joined the conference.The top numbers of the
文摘This paper aims to talk about the superficial reflections on language testing from the perspective of professional training for test writers,to qualify test writers with basic testing theories,to make the test valid and reliable,to help improve educational reform and language teaching,to fully conform to the standards set by syllabus.
文摘This paper primarily focuses on the major development of language testing since 1980s. Critical to testing is me concept of language proficiency or ability which consists two components of language competence (or language knowledge) and strategic competence. A communicative approach to testing is demanding to catch up with the new teaching concept and to measure a learner' s cornmunicative competence. Communicative language tests are intended to measure people' s ability to use language communicatively in a variety of real life situations.
文摘Departing from a brief presentation of the various views of language, this article elaborates the influence of views of language on language teaching and language testing. This will help us have background knowledge on language teaching and language testing.
文摘The relationship between language testing and language teaching is said to be inseparably interconnected and act together.Language testing has been unceasingly improving itself responding to the requirements of language teaching so as to improve testees’language ability.From the prescientific movement to the now newest communicative movement,four main approaches to testing have its pros and cons in achieving the reliability and validity of testing.By them,we can see communicative approach is of necessity to become the milestone of language