This paper presents the Bayes estimation and empirical Bayes estimation of causal effects in a counterfactual model. It also gives three kinds of prior distribution of the assumptions of replaceability. The experiment...This paper presents the Bayes estimation and empirical Bayes estimation of causal effects in a counterfactual model. It also gives three kinds of prior distribution of the assumptions of replaceability. The experiment shows that empirical Bayes estimation is better than other estimations when not knowing which assumption is true.展开更多
Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artifi...Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artificial intelligence(XAI)algorithms is one of key steps toward to the artificial intelligence 2.0.With the aim of bringing knowledge of causal inference to scholars of machine learning and artificial intelligence,we invited researchers working on causal inference to write this survey from different aspects of causal inference.This survey includes the following sections:“Estimating average treatment effect:A brief review and beyond”from Dr.Kun Kuang,“Attribution problems in counterfactual inference”from Prof.Lian Li,“The Yule–Simpson paradox and the surrogate paradox”from Prof.Zhi Geng,“Causal potential theory”from Prof.Lei Xu,“Discovering causal information from observational data”from Prof.Kun Zhang,“Formal argumentation in causal reasoning and explanation”from Profs.Beishui Liao and Huaxin Huang,“Causal inference with complex experiments”from Prof.Peng Ding,“Instrumental variables and negative controls for observational studies”from Prof.Wang Miao,and“Causal inference with interference”from Dr.Zhichao Jiang.展开更多
Determining the causal effect of special education is a critical topic when mak-ing educational policy that focuses on student achievement.However,current special education research is facing challenges from persisten...Determining the causal effect of special education is a critical topic when mak-ing educational policy that focuses on student achievement.However,current special education research is facing challenges from persistent selection bias and complex confounding.Bayesian Additive Regression Trees(BART)is em-ployed in this study to provide a flexible estimation of the academic perfor-mance.Targeted Maximum Likelihood Estimation(TMLE)is also integrated into the BART model,supporting doubly robust estimation of the special ed-ucation effect.This study extracted survey data from the Early Childhood Lon-gitudinal Study,Kindergarten Class(ECLS-K),to estimate the causal impact of special education status on students’combined mathematics and reading achievement scores.The analysis results of the BART-TMLE model show that children receiving special education services demonstrated approximately 9 points lower scores on average for combined math and reading scores,even adjusting for a considerable number of covariates,compared to their peers who did not receive these services.The estimated negative treatment effect persists after controlling for observed covariates that are closely correlated to the combined test score.The negative effect likely reflects unobserved factors,such as the underlying severity of learning disabilities,parent involvement and other potential traits,which are actual factors that determine the placement of special education status,rather than indicating the ineffectiveness of special education service.The achievement gap in academic performance reflects the current observable status of special education.The estimated effect could be improved by future research incorporating educational domain knowledge,allowing the model to be constructed more accurately.展开更多
Doubly robust(DR)methods that employ both the propensity score and outcome models are widely used to estimate the causal effect of a treatment and generally outperform those methods only using the propensity score or ...Doubly robust(DR)methods that employ both the propensity score and outcome models are widely used to estimate the causal effect of a treatment and generally outperform those methods only using the propensity score or the outcome model.However,without appropriately chosen the working models,DR estimators may substantially lose efficiency.In this paper,based on the augmented inverse probability weighting procedure,we derive a new estimating equation for the causal effect by the strategy of combining estimating equations.The resulting estimator by solving the new estimating equation retains doubly robust and can improve the efficiency under the misspecification of conditional mean working model.We further show the large sample properties of the proposed estimator under some regularity conditions.Through simulation experiments and a real data analysis,we illustrate that the proposed method is competitive with its competitors,which is in line with those implied by the asymptotic theory.展开更多
目的利用SAS开发的CAUSALTRT过程,实现三类估计方法的因果效应估计。方法采用SmokingWeight数据集,以戒烟为处理变量,体重变化为结局变量,其他因素为混杂变量,通过增强逆概率加权法(augmented inverse probability weighting,AIPW)对平...目的利用SAS开发的CAUSALTRT过程,实现三类估计方法的因果效应估计。方法采用SmokingWeight数据集,以戒烟为处理变量,体重变化为结局变量,其他因素为混杂变量,通过增强逆概率加权法(augmented inverse probability weighting,AIPW)对平均处理效应(the average treatment effect,ATE)进行估计,通过回归调整法(regression adjustment,REGADJ)对处理组平均处理效应(the average treatment effect for the treated,ATT)进行估计。结果戒烟对体重变化的ATE和ATT分别为3.209(95%CI:2.232~4.187)和3.276(95%CI:2.332~4.219)。结论CAUSALTRT可以实现不同的因果效应估计,但应用时需要考虑其是否满足前提假设以及注意事项。展开更多
反事实预测和选择偏差是因果效应估计中的重大挑战。为对潜在协变量的复杂混杂分布进行有效表征,同时增强反事实预测泛化能力,提出一种面向工业因果效应估计应用的重加权对抗变分自编码器网络(RVAENet)模型。针对混杂分布去偏问题,借鉴...反事实预测和选择偏差是因果效应估计中的重大挑战。为对潜在协变量的复杂混杂分布进行有效表征,同时增强反事实预测泛化能力,提出一种面向工业因果效应估计应用的重加权对抗变分自编码器网络(RVAENet)模型。针对混杂分布去偏问题,借鉴域适应思想,采用对抗学习机制对由变分自编码器(VAE)获得的隐含变量进行表示学习的分布平衡;在此基础上,通过学习样本倾向性权重对样本进行重加权,进一步缩小实验组(Treatment)与对照组(Control)样本间的分布差异。实验结果表明,在工业真实场景数据集的两个场景下,所提模型的提升曲线下的面积(AUUC)比TEDVAE(Treatment Effect with Disentangled VAE)分别提升了15.02%、16.02%;在公开数据集上,所提模型的平均干预效果(ATE)和异构估计精度(PEHE)普遍取得最优结果。展开更多
文摘This paper presents the Bayes estimation and empirical Bayes estimation of causal effects in a counterfactual model. It also gives three kinds of prior distribution of the assumptions of replaceability. The experiment shows that empirical Bayes estimation is better than other estimations when not knowing which assumption is true.
文摘Causal inference is a powerful modeling tool for explanatory analysis,which might enable current machine learning to become explainable.How to marry causal inference with machine learning to develop explainable artificial intelligence(XAI)algorithms is one of key steps toward to the artificial intelligence 2.0.With the aim of bringing knowledge of causal inference to scholars of machine learning and artificial intelligence,we invited researchers working on causal inference to write this survey from different aspects of causal inference.This survey includes the following sections:“Estimating average treatment effect:A brief review and beyond”from Dr.Kun Kuang,“Attribution problems in counterfactual inference”from Prof.Lian Li,“The Yule–Simpson paradox and the surrogate paradox”from Prof.Zhi Geng,“Causal potential theory”from Prof.Lei Xu,“Discovering causal information from observational data”from Prof.Kun Zhang,“Formal argumentation in causal reasoning and explanation”from Profs.Beishui Liao and Huaxin Huang,“Causal inference with complex experiments”from Prof.Peng Ding,“Instrumental variables and negative controls for observational studies”from Prof.Wang Miao,and“Causal inference with interference”from Dr.Zhichao Jiang.
文摘Determining the causal effect of special education is a critical topic when mak-ing educational policy that focuses on student achievement.However,current special education research is facing challenges from persistent selection bias and complex confounding.Bayesian Additive Regression Trees(BART)is em-ployed in this study to provide a flexible estimation of the academic perfor-mance.Targeted Maximum Likelihood Estimation(TMLE)is also integrated into the BART model,supporting doubly robust estimation of the special ed-ucation effect.This study extracted survey data from the Early Childhood Lon-gitudinal Study,Kindergarten Class(ECLS-K),to estimate the causal impact of special education status on students’combined mathematics and reading achievement scores.The analysis results of the BART-TMLE model show that children receiving special education services demonstrated approximately 9 points lower scores on average for combined math and reading scores,even adjusting for a considerable number of covariates,compared to their peers who did not receive these services.The estimated negative treatment effect persists after controlling for observed covariates that are closely correlated to the combined test score.The negative effect likely reflects unobserved factors,such as the underlying severity of learning disabilities,parent involvement and other potential traits,which are actual factors that determine the placement of special education status,rather than indicating the ineffectiveness of special education service.The achievement gap in academic performance reflects the current observable status of special education.The estimated effect could be improved by future research incorporating educational domain knowledge,allowing the model to be constructed more accurately.
基金supported by theNational Natural Science Foundation of China(No.11771032 and No.11971045)Natural Science Foundation of Beijing(No.1202001)+1 种基金supported by the National Natural Science Foundation of China(No.11871001,No.12131006 and No.11971001)the Fundamental Research Funds for the Central Universities(2019NTSS18).
文摘Doubly robust(DR)methods that employ both the propensity score and outcome models are widely used to estimate the causal effect of a treatment and generally outperform those methods only using the propensity score or the outcome model.However,without appropriately chosen the working models,DR estimators may substantially lose efficiency.In this paper,based on the augmented inverse probability weighting procedure,we derive a new estimating equation for the causal effect by the strategy of combining estimating equations.The resulting estimator by solving the new estimating equation retains doubly robust and can improve the efficiency under the misspecification of conditional mean working model.We further show the large sample properties of the proposed estimator under some regularity conditions.Through simulation experiments and a real data analysis,we illustrate that the proposed method is competitive with its competitors,which is in line with those implied by the asymptotic theory.
文摘目的利用SAS开发的CAUSALTRT过程,实现三类估计方法的因果效应估计。方法采用SmokingWeight数据集,以戒烟为处理变量,体重变化为结局变量,其他因素为混杂变量,通过增强逆概率加权法(augmented inverse probability weighting,AIPW)对平均处理效应(the average treatment effect,ATE)进行估计,通过回归调整法(regression adjustment,REGADJ)对处理组平均处理效应(the average treatment effect for the treated,ATT)进行估计。结果戒烟对体重变化的ATE和ATT分别为3.209(95%CI:2.232~4.187)和3.276(95%CI:2.332~4.219)。结论CAUSALTRT可以实现不同的因果效应估计,但应用时需要考虑其是否满足前提假设以及注意事项。
文摘反事实预测和选择偏差是因果效应估计中的重大挑战。为对潜在协变量的复杂混杂分布进行有效表征,同时增强反事实预测泛化能力,提出一种面向工业因果效应估计应用的重加权对抗变分自编码器网络(RVAENet)模型。针对混杂分布去偏问题,借鉴域适应思想,采用对抗学习机制对由变分自编码器(VAE)获得的隐含变量进行表示学习的分布平衡;在此基础上,通过学习样本倾向性权重对样本进行重加权,进一步缩小实验组(Treatment)与对照组(Control)样本间的分布差异。实验结果表明,在工业真实场景数据集的两个场景下,所提模型的提升曲线下的面积(AUUC)比TEDVAE(Treatment Effect with Disentangled VAE)分别提升了15.02%、16.02%;在公开数据集上,所提模型的平均干预效果(ATE)和异构估计精度(PEHE)普遍取得最优结果。
基金supported by the National Natural Science Foundation of China(72071187,11671374,71731010,71921001)Fundamental Research Funds for the Central Universities(WK3470000017,WK2040000027)。