This study examines the reliability and validity of AI-generated scoring for continuation writing tasks.By comparing GPT-4 with eight experienced human raters across 21 student responses,it evaluates AI’s consistency...This study examines the reliability and validity of AI-generated scoring for continuation writing tasks.By comparing GPT-4 with eight experienced human raters across 21 student responses,it evaluates AI’s consistency,severity,and alignment with human scoring criteria.Results show that AI exhibits high self-consistency and adapts effectively to different scoring roles(e.g.,teacher vs.highstakes rater).However,AI scores were more lenient than human raters and demonstrated divergent evaluation focuses—prioritizing narrative coherence and emotional depth,while teachers emphasized linguistic accuracy and richness of detail.The findings suggest AI’s potential as a supplementary assessment tool,offering rapid,holistic feedback,but highlight the need for further calibration to align with educational standards.Implications include exploring hybrid evaluation models that leverage the strengths of both AI and human raters to achieve more equitable,efficient,and pedagogically meaningful writing assessments.展开更多
Artificial Intelligence(AI)constitutes a rapidly evolving set of technologies that offer significant economic,environmental,and societal benefits.However,the application of AI systems may also pose considerable risks ...Artificial Intelligence(AI)constitutes a rapidly evolving set of technologies that offer significant economic,environmental,and societal benefits.However,the application of AI systems may also pose considerable risks and inflict harm—whether material or immaterial,including physical,psychological,societal,or economic harm—to public interests and fundamental rights protected under Union law.展开更多
文摘This study examines the reliability and validity of AI-generated scoring for continuation writing tasks.By comparing GPT-4 with eight experienced human raters across 21 student responses,it evaluates AI’s consistency,severity,and alignment with human scoring criteria.Results show that AI exhibits high self-consistency and adapts effectively to different scoring roles(e.g.,teacher vs.highstakes rater).However,AI scores were more lenient than human raters and demonstrated divergent evaluation focuses—prioritizing narrative coherence and emotional depth,while teachers emphasized linguistic accuracy and richness of detail.The findings suggest AI’s potential as a supplementary assessment tool,offering rapid,holistic feedback,but highlight the need for further calibration to align with educational standards.Implications include exploring hybrid evaluation models that leverage the strengths of both AI and human raters to achieve more equitable,efficient,and pedagogically meaningful writing assessments.
文摘Artificial Intelligence(AI)constitutes a rapidly evolving set of technologies that offer significant economic,environmental,and societal benefits.However,the application of AI systems may also pose considerable risks and inflict harm—whether material or immaterial,including physical,psychological,societal,or economic harm—to public interests and fundamental rights protected under Union law.