矩阵取样设计中的似真值能力估计方法被引量：3

The Method of Plausible Value for Group-Level Estimation with Multiple Matrix Sampling

导出

摘要矩阵取样是大规模教育评估中最有效的一种数据收集方式。本研究采用模拟数据考察在均衡的不完全分块(BIB)矩阵取样设计中,似真值(PV)与传统的MLE、WLE和EAP方法对学生能力总体参数估计的精确性和稳健性。结果表明,PV对总体平均数和标准差的估计最为精确和稳健;EAP倾向于低估,MLE和WLE倾向于高估,且精确性和稳健性远远不如PV。同时,总被试量对估计结果的影响很小,而每个题本中的项目数量对估计结果的影响较大。 In order to expand the coverage of the subject to areas and reduce the test time for individuals, multiple matrix sampling is often used in large-scale educational assessments. Since scores are reported to the government and the public, more attention has been paid to population statistics; reducing population error becomes important. Consequently, researchers use plausible values （PV） to ac- count for the uncertainty about the latent traits. A simulation study was used to compare PV and traditional methods （MLE, WLE and EAP） for group-level estimation （mean and standard deviation） in different matrix sampling. The Results could provide evidence to support the student performance report to large-scale assessments. In a simulation study, a data file containing student responses was generated for different item tests with various students. The independent variables were the number of items in each form and sample size. The number of items had three levels： 8/16/24; and the sample size also had three levels： 490/980/4900. There were three con- trol variables： the total item numbers （56 dichotomous numbers）, the distribution and range of item difficulty （ -3,3 ）, the distribution and value range of ability （ 3,3 ）（ Wu, 2005）. The Balanced Incomplete Block Design （BIB） was used as the method of sampling and Rasch model was employed in the data analysis. EAP, MLE, WLE and five PVs were computed for each student, and the sample means and standard deviations were computed for each of these sets. Two statistical indices, ABS and RMSD, were used to compare the accuracy and robustness of the PV method and other traditional estimating methods. The results indicated that the accuracy and robustness of PV were the best, close to the true values ; even in unfavorable situations, when the total number of subjects or items in each testlet was especially low, PV can still provide a good estimate. EAP, MLE and WLE could provide as favorable estimate for population means as PV, but bias appeared when they were used to estimate population standard deviation. EAP was an underestimate and both MLE and WLE were overestimates of the population variance, even when the number of subjects and items was the largest. Meanwhile, the bias did not diminish when the sample size increased, but it reduced as the number of items increased, indicating that in order to improve the precision and stability of estimating methods, adding more items plays a more important role than increasing the subject number. The current study considered the simplest matrix sampling design with the Rasch model only. Future study should take more com- plex designs into consideration and 2 or 3 parameter models should also be used. Furthermore, the sample size and the number of items are two basic factors influencing the population parameter estimation. So some other factors, such as test length, item difficulty, item type, have to be controlled for further inference.

作者黄慧静辛涛李珍

机构地区北京师范大学发展心理研究所

出处《心理科学》 CSSCI CSCD 北大核心 2012年第5期1233-1239,共7页 Journal of Psychological Science

基金教育部新世纪优秀人才支持计划(NCET-07-0097)的资助

关键词大规模教育评估矩阵取样似真值 large-scale assessment, multiple matrix sampling, plausible value

分类号 B841 [哲学宗教—基础心理学]

引文网络
相关文献

参考文献16

1李凌艳,辛涛,董奇.矩阵取样技术在大尺度教育测评中的运用[J].北京师范大学学报（社会科学版）,2007(6):19-25. 被引量：15
2Beaton, A.E. ( 1987 ). Implementing the new design : The NAEP 1983 - 84 technical report (No. 15- TR - 20 ) . Princeton, NJ: Educational Testing Service.
3Glas, C. A. W. , & Geerlings, H. (2008). A study of structural model- ing using plausible value imputation ( LSAC Research Report 08 - 03 ). Newtown : PA, Law School Admission Council, Inc.
4Goldstein, H. , Bonnet, G. , & Rocher, T. (2007). Multilevel structur- al equation models for the analysis of comparative data on educational performance. Journal of Educational and Behavioral Statistics, 32 (3), 252-286.
5Gonzalez, J. M. , & Eltinge, J.L. (2007). Multiple matrix sampling: A review. Proceedings of the Section on Survey Research Methods, Amer- ican Statistical Association, 3069 - 3075.
6Houseman, E. A. , & Milton, D.K. (2006). Partial questionnaire de- signs, questionnaire non - response, and attributable fraction : Ap- plications to adult onset asthma. Statistics in Medicine, 25, 1499 - 1519.
7Lord, F.M. (1962). Estimating norms by item - sampling. Educational and Psychological Measurement, 22, 250 - 267.
8Mislevy, R.J. , & Sheehan, K.M. (1989). Information matrices in la- tent - variable models. Journal of Educational Statistics, 14 ( 4 ) , 335 - 350.
9OECD (2005). PISA 2003 Data Analysis Manual. Paris, OECD.
10OECD (2009). PISA 2006 Technical Report. Paris, OECD.

二级参考文献20

1周红.美国国家教育进展评估(NAEP)体系的产生与发展[J].外国教育研究,2005,32(2):77-80. 被引量：17
2IEA官方网页[2007-03-15],http://www.iea.nl/brief_history_of_iea.html.
3LORD F M. Estimating norms by item samplingLJJ. Educational and Psychological Measurement, 1962, (22) : 259-267.
4LORD F M, NOVICK M R. Statistical Theories of Mental Test Seores[M]. Reading, Mass. : Addison-Wesley, 1968.
5COOK D L, STUFFLEBEAM D L. Estimating test norms from variable size item and examinee samples[J]. Journal of Educational Measurement, 1967, (4): 27-33.
6FELDT L S, FORSYTH R A. An examination of the context effect in item sampling[J]. Journal of Educational Measurement, 1974, (2) : 73-83.
7SHOEMAKER D M. Principles and Procedures of Multiple Matrix Sampling [M]. Cambridge, Mass. : Ballinger, 1973.
8DINGS J, CHILDS R, KINGSTON N. The Effects of Matrix Sampling on Student Score Comparability in Constructed-response and Multiple-choice Assessments[R]. Washington, DC: Council of Chief State School Officers, 2002.
9HUSEK T R, SIROTNIK K. Item Sampling in Educational Researeh[R]. Center for the Study of Evaluation,Occasional Report No. 2. Los Angeles: University of California, 1967.
10CHUNDOWSKY N, PELI.EGRINO J W. Large-scale assessments that support learning: what will it take? [J]. Theory into Practice, 2003,(42) :75-83.