摘要
随着学术论文数量的持续增长,传统依赖专家评审的人工评审方式在效率低、主观性强和一致性差等方面面临严峻挑战,难以满足大规模、高质量评估的需求。为此,文章提出并实现了一种基于机器学习的学位论文质量自动评估系统。该系统以构建多维度综合评估指标体系为核心,涵盖结构、内容、语言、格式和规范等多个方面并从中提取40余项特征向量。系统采用Stacking集成学习框架,整合随机森林、XGBoost和LightGBM模型,结合SMOTE算法解决数据不平衡问题。该系统基于Python技术栈,实现了端到端的自动化处理流程,包括PDF文本解析、预处理、特征工程、模型训练与评估以及API服务。实验结果表明,该系统在测试集上的评估准确率超过85%,能够有效区分不同质量等级的学位论文。这项研究不仅在技术上提供了提升评审效率、降低人工成本的可行方案,还为教育质量保障和科研管理提供了新的思路和工具支持。
With the rapid growth in the number of academic theses,traditional expert-based manual review methods are increasingly challenged by inefficiency,subjectivity,and lack of consistency,making them inadequate for large-scale,high-quality evaluation.This paper proposes and implements an automatic thesis quality assessment system based on machine learning.The system is centered on a comprehensive multi-dimensional evaluation framework covering structure,content,language,format,and standardization,from which more than 40 features are extracted.A Stacking ensemble learning approach integrates Random Forest,XGBoost,and LightGBM,with the SMOTE algorithm employed to address class imbalance.The entire workflow is automated using the Python technology stack,including PDF parsing(PyPDF2,pdfplumber),preprocessing(jieba),feature engineering,model training and evaluation,and API services(Flask).Experimental results demonstrate that the system achieves an evaluation accuracy exceeding 85% on the test set and effectively distinguishes theses across different quality levels.Beyond offering a feasible technical solution for improving review efficiency and reducing human costs,this study also provides novel insights and methodological contributions to educational quality assurance and research management.
作者
吴怡
李灿
WU Yi;LI Can(School of Artificial Intelligence and Information Technology,Nanjing University of Chinese Medicine,Nanjing 210023,China)
出处
《无线互联科技》
2025年第22期52-56,共5页
Wireless Internet Science and Technology
基金
南京中医药大学教育教学研究课题,项目名称:高校学位论文质量保障体系构建与优化,项目编号:NZYJY2024-Z-18。
关键词
学位论文质量评估
机器学习
特征工程
集成学习
自然语言处理
thesis quality assessment
machine learning
feature engineering
ensemble learning
natural language processing