This study investigates the role of generative large language models(GLLMs)in supporting complex selection and evaluation tasks within the academic paper review process.Using empirical data from management journal sub...This study investigates the role of generative large language models(GLLMs)in supporting complex selection and evaluation tasks within the academic paper review process.Using empirical data from management journal submissions,we compared the performance of six leading GLLMs(Claude 3.5,GPT-4O,Gemini 2.5,Deepseek-R3,Moonshot-V1(kimi),and Qwen-Long)against human editors and reviewers.The results show that,at the editorial screening stage,GLLMs can help editors identify manuscripts with low publication potential,with aggregated model scores closely matching human editorial decisions.At the review stage,comments generated by the union of any three GLLMs from six GLLMs can cover over 61%of issues raised by human reviewers and are rated as superior by management professors.These findings demonstrate that GLLMs can complement human judgment in multi-stage,knowledge-intensive decision processes,improving both the efficiency and quality of academic paper reviews.The study expands the application boundaries of generative Al in management research evaluation and offers practical insights for integrating GLLMs into scholarly review workflows.展开更多
基金supported by the National Natural Science Foundation of China[grant number 72372102]for the project'Research on the coordination mechanism of digital transformation strategies between leading firms and follower firms in business ecosystem'.
文摘This study investigates the role of generative large language models(GLLMs)in supporting complex selection and evaluation tasks within the academic paper review process.Using empirical data from management journal submissions,we compared the performance of six leading GLLMs(Claude 3.5,GPT-4O,Gemini 2.5,Deepseek-R3,Moonshot-V1(kimi),and Qwen-Long)against human editors and reviewers.The results show that,at the editorial screening stage,GLLMs can help editors identify manuscripts with low publication potential,with aggregated model scores closely matching human editorial decisions.At the review stage,comments generated by the union of any three GLLMs from six GLLMs can cover over 61%of issues raised by human reviewers and are rated as superior by management professors.These findings demonstrate that GLLMs can complement human judgment in multi-stage,knowledge-intensive decision processes,improving both the efficiency and quality of academic paper reviews.The study expands the application boundaries of generative Al in management research evaluation and offers practical insights for integrating GLLMs into scholarly review workflows.