期刊文献+

认知诊断测评中缺失数据的处理:随机森林阈值插补法 被引量:4

Missing data analysis in cognitive diagnostic models:Random forest threshold imputation method
在线阅读 下载PDF
导出
摘要 认知诊断测评中缺失数据的处理是理论和实际应用者非常关注的研究主题。借鉴随机森林插补法(RFI)不依赖于缺失机制假设的特点,对已有的RFI方法进行改进,提出采用个人拟合指标(RCI)确定插补阈值的新方法:随机森林阈值插补方法(RFTI)。模拟研究表明,RFTI在插补正确率上明显高于RFI方法;与RFI和EM方法相比,RFTI在被试属性模式判准率和边际判准率上表现出明显优势,尤其是非随机缺失和混合缺失机制,以及缺失比例较高的条件下,其优势更加明显。但对项目参数的估计,RFTI方法与EM方法相比不具有优势。 In recent years,interest in cognitive diagnostic assessments(CDAs),as a new form of test,has increased drastically.Due to the specific design of the test,missing data is an inevitable problem in CDAs.Proper handling of missing data in CDAs is important to provide accurate diagnostic feedback to students and teachers.With the use of machine learning in education,relevant advancements have been made in missing data imputation.Research showed machine learning techniques have more desirable features for missing data imputation than traditional approaches.The random forest algorithm has been extended to become the random forest imputation(RFI)method in handling of CDAs missing data for CDAs.The method takes into consideration the characteristics of the data rather than assumes certain missing mechanism.RFI is a new non-parametric method that makes full use of the available response information and characteristics of response patterns to impute missing data.Making use of advantages of RFI in categorization/prediction and its non-reliant on missing mechanism type,we improved and proposed the new random forest threshold imputation(RFTI)method.It could be used to impute missing responses in the widely used DINA(Deterministic Inputs,Noise“And”Gate)model.This research proposed to apply the Response Conformity Index(RCI)in the missing data imputation to set the threshold of imputation and to develop a method for missing response treatment for CDAs without totally relying on imputation.Two simulation studies were conducted to compare the performance of the proposed method and traditional models.Study 1 began by introducing the theoretical background and algorithm implementation of RFTI.Then,RFTI and RFI were compared in terms of accuracy rate of imputation for data with different proportions of missingness(10%,20%,30%,40%,50%)and missing data mechanisms(MIXED,MNAR,MAR,MCAR).This was to affirm the necessity of including RCI during imputation.Study 2 aimed to investigate the performance of RFTI,as well as RFI and EM algorithm in imputing missing data under different conditions.The manipulated design factors were identical to those in Study 1.We evaluated RFTI in terms of its accuracy in assessing the model attributes and item parameters.We also compared RFTI against the traditionally better performed EM and RFI under various design conditions to explore the advantages and conditions of using RFTI.Results of Study 1 showed that RFTI,as compared to RFI,improved accuracy when imputation threshold was one.In various design conditions,RFTI imputation rate and accuracy were also better.Study 2 showed that RFTI outperformed other methods(RFI,EM algorithm)in accurately assessing the attribute pattern and attribute margin.This advantage was affected by the missing data mechanism and the proportion of missing data.Notably,RFTI was particularly better than other methods in handling mixed type of missing or MNAR data,and when the proportion of missing data was higher than 30%.However,RFTI was not any better than other methods in its accuracy of item parameter estimates.In most conditions,EM algorithm provided the most accurate parameter estimates.In sum,we propose a method to impute missing data in CDAs by applying machine learning methods in measurement models.The advantage of this new method is affirmed through its accurate assessment of attribute pattern and attribute margin of DINA model.Theoretically,the current study provides a missing data imputation approach with less assumptions,which extends the traditional methods to impute missing data in CDAs framework.Moreover,we investigate how to estimate the attribute pattern of students accurately through the responses of a few items.It sheds lights on imputing missing data due to particularly designs in assessment or teaching.
作者 游晓锋 杨建芹 秦春影 刘红云 YOU Xiaofeng;YANG Jianqin;Qin Chunying;LIU Hongyun(School of Mathematics and Information Science,Nanchang Normal University,Nanchang 330022,China;Beijing Key Laboratory of Applied Experimental Psychology,Beijing Normal University,Beijing 100875,China;Faculty of Psychology,Beijing Normal University,Beijing 100875,China)
出处 《心理学报》 CSSCI CSCD 北大核心 2023年第7期1192-1206,共15页 Acta Psychologica Sinica
基金 江西省教育厅科技重点项目(GJJ212601) 南昌市教育大数据智能技术重点实验室(2020-NCZDSY-012) 国家自然科学基金项目(32071091)。
关键词 缺失数据 认知诊断测评 随机森林阈值插补 随机森林插补 EM 算法 missing data cognitive diagnostic assessment random forest threshold imputation random forest imputation expectation-maximization algorithm
  • 相关文献

参考文献8

二级参考文献69

  • 1林海菁,丁树良.具有认知诊断功能的计算机化自适应测验的研究与实现[J].心理学报,2007,39(4):747-753. 被引量:21
  • 2Stekhoven DJ, Buhlmann P. MissForest-non-parameWic missing value imputation for mixed-type data. Bioinformatics ,2012,28 ( 1 ) : 112-118.
  • 3Oba S, Sato M, Takemasa I, et al. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 2003 (19): 2088-2096.
  • 4Karahalios A, Baglietto L, Carlin, et al. A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures. B MC Medical Research Methodology, 2012,12:96.
  • 5Enders CK. Applied Missing Data Analysis. New York: The Guilford Press ,2010 : 37-54.
  • 6Little RJ, Rubin DB. Statistical Analysis with Missing Data. 2nd ed. New York:Wiley ,2002:59-74.
  • 7Buuren SV, Oudshoom K. MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software ,2010,7 ( 16 ) : 1 -68.
  • 8喻晓锋.(2009).贝叶斯网以知诊断中的应用.江西师范大学硕士学位论文.
  • 9Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation.Journal of Machine Learning Research, 5, 1089-1105.
  • 10Cheng, J., & Greiner, R. (1999). Comparing Bayesian network classifiers. Paper presented at the Proceedings of the Fifteenth Conference Annual Conference on Uncertainly in Altificial Intelligence (UAI-99), San Francisco, CA.

共引文献84

同被引文献43

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部