Model checking evaluates whether a statistical model faithfully captures the underlying data-generating process.Classical tests—such as local-smoothing and empirical-process methods—break down in high dimensions.Mor...Model checking evaluates whether a statistical model faithfully captures the underlying data-generating process.Classical tests—such as local-smoothing and empirical-process methods—break down in high dimensions.More recent approaches use predictiveness comparisons with flexible machine-learning model fitting procedures to yield algorithm-agnostic tests,yet they require large labeled samples.The authors introduce a prediction-powered,semi-supervised framework that:1)Imputes responses for unlabeled data via a pretrained model;2)Corrects imputation bias with a rectifier calibrated on labeled data;3)Adaptively balances these components through a data-driven power-tuning parameter.Building on algorithm-agnostic out-of-sample predictiveness comparisons,the proposed method integrates unlabeled information to enhance power.Theoretical analyses and numerical results demonstrate that the proposed test controls Type I error and substantially improves power over fully supervised counterparts,even under imputation-model misspecification.展开更多
基金supported by the National Key R&D Program of China under Grant Nos.2022YFA1003800 and 2022YFA1003703the National Natural Science Foundation of China under Grant Nos.12531011,12231011 and 12471255+3 种基金the Natural Science Foundation of Shanghai under Grant No.23ZR1419400the Fundamental Research Funds for the Central Universities under Grant No.63253110supported by China Postdoctoral Science Foundation General Funding Program under Grant No.2025M7730792025 Annual Planning Project of the Commerce Statistical Society of China under Grant No.2025STY115。
文摘Model checking evaluates whether a statistical model faithfully captures the underlying data-generating process.Classical tests—such as local-smoothing and empirical-process methods—break down in high dimensions.More recent approaches use predictiveness comparisons with flexible machine-learning model fitting procedures to yield algorithm-agnostic tests,yet they require large labeled samples.The authors introduce a prediction-powered,semi-supervised framework that:1)Imputes responses for unlabeled data via a pretrained model;2)Corrects imputation bias with a rectifier calibrated on labeled data;3)Adaptively balances these components through a data-driven power-tuning parameter.Building on algorithm-agnostic out-of-sample predictiveness comparisons,the proposed method integrates unlabeled information to enhance power.Theoretical analyses and numerical results demonstrate that the proposed test controls Type I error and substantially improves power over fully supervised counterparts,even under imputation-model misspecification.