Conditional dependence learning with high-dimensional conditioning variables

导出

摘要 Conditional dependence plays a crucial role in various statistical procedures,including variable selection,network analysis and causal inference.However,there remains a paucity of relevant research in the context of high-dimensional conditioning variables,a common challenge encountered in the era of big data.To address this issue,many existing studies impose certain model structures,yet high-dimensional conditioning variables often introduce spurious correlations in these models.In this paper,we systematically study the estimation biases inherent in widely-used measures of conditional dependence when spurious variables are present under high-dimensional settings.We discuss the estimation inconsistency both intuitively and theoretically,demonstrating that the conditional dependencies can be either overestimated or underestimated under different scenarios.To mitigate these biases and attain consistency,we introduce a measure based on data splitting and refitting techniques for high-dimensional conditional dependence.A conditional independence test is also developed using the newly advocated measure,with a tuning-free asymptotic null distribution.Furthermore,the proposed test is applied to generating high-dimensional network graphs in graphical modeling.The superior performances of newly proposed methods are illustrated both theoretically and through simulation studies.We also utilize the method to construct the gene-gene networks using a dataset of breast invasive carcinoma,which contains interesting discoveries that are worth further scientific exploration.

作者 Jianxin Bi Xingdong Feng Jingyuan Liu

机构地区 Department of Statistics and Data Science in School of Economics School of Statistics and Management MOE Key Laboratory of Econometrics

出处《Science China Mathematics》 2025年第8期1779-1806,共28页 中国科学(数学英文版)

基金 supported by National Natural Science Foundation of China(Grant Nos.12271456,12371270 and 71988101) the Ministry of Education Research in the Humanities and Social Sciences(Grant No.22YJA910002) Shanghai Science and Technology Development Funds(Grant No.23JC1402100)。

关键词 conditional dependence high-dimensional data refitted cross-validation data splitting graphical modeling

分类号 R195 [医药卫生—卫生统计学]

引文网络
相关文献

参考文献1

1WANG LuHeng,LIU JingYuan,LI Yong,LI RunZe.Model-free conditional independence feature screening for ultrahigh dimensional data[J].Science China Mathematics,2017,60(3):551-568. 被引量：6

二级参考文献1

1LIU JingYuan,ZHONG Wei,LI RunZe.A selective overview of feature screening for ultrahigh-dimensional data[J].Science China Mathematics,2015,58(10):2033-2054. 被引量：12

共引文献5

1李二倩,梅波,田茂再.超高维竞争风险模型的特征筛选[J].中国科学：数学,2018,48(8):1061-1086. 被引量：4
2李向杰,张景肖.超高维数据的稳健秩条件特征筛选[J].统计与信息论坛,2018,33(4):6-12.
3赵海霞,李赟,石洪波.基于高维数据的加权朴素贝叶斯算法研究[J].统计与决策,2020(8):5-9. 被引量：9
4牛勇,李华鹏,刘阳惠,熊世峰,於州,张日权.超高维数据特征筛选方法综述[J].应用概率统计,2021,37(1):69-110. 被引量：13
5钟柔,李向杰,张景肖.模型自由的稳健条件独立特征筛选[J].数理统计与管理,2021,40(1):1-14. 被引量：1

1Lihua Qiu,Wen Di.Progress and challenges in photodynamic therapy for cervical lesions in China[J].Gynecology and Obstetrics Clinical Medicine,2025,5(2):72-75.
2Yves Leduc,Gilles Jacquemod,Yoann Charlon,Fabrice Muller.A “Tonebusting” Technique to Build a DAC from a First-Order Digital ΣΔ Modulator[J].Journal of Electronic Research and Application,2025,9(4):8-13.
3Kai XU,Yan-qin NIE,Dao-jiang HE.A Test of U-type for Goodness-of-fit in Regression Models Through Martingale Difference Divergence[J].Acta Mathematicae Applicatae Sinica,2024,40(4):979-1000.
4Chang-qiu Wang,Xiao Cheng,Xiao Ge,Hong-rui Ding,Yan Li,An-huai Lu.Mineral component of mineralizations in different types of breast lesions and their correlation with diseases[J].China Geology,2025,8(3):475-486.
5Yanlin Tang,Jinglong Wang,Menghan Yi,Zhongyi Zhu.Goodness of fit for the Waring distribution[J].Statistical Theory and Related Fields,2025,9(1):1-11.
6Chenghai WANG,Xiang FENG.Progress and perspective in parameterizing soil respiration responses to temperature and moisture[J].Science China Earth Sciences,2025,68(6):1767-1784.
7Changcheng Li.Model-free variable selection in high dimension via constrained kernel regression[J].Science China Mathematics,2025,68(8):1841-1868.
8Kai Yi,Hongxu Jiang,Yanbo Cai,Guangwei Wang,Fei Liu,Deliang Wang.Effect of tantalum doping on the microstructure and photoelectrical properties of transparent conductive zinc oxide films[J].中国科学技术大学学报,2025,55(4):49-57.
9Rachana Poongodan,Dayanand Lal Narayan,Deepika Gadakatte Lokeshwarappa,Hirald Dwaraka Praveena,Dae-Ki Kang.Switchable Normalization Based Faster RCNN for MRI Brain Tumor Segmentation[J].Computers, Materials & Continua,2025,84(9):5751-5772.
10Jiachuan He,Haoran Wang,Chen Ling,Yi Shi,Haohui Hu,Qi Jin,Shi Zhang,Geng Wu,Xun Hong.Amorphous ruthenium nanosheets for efficient hydrazine-assisted water splitting[J].中国科学技术大学学报,2025,55(3):12-18.

Science China Mathematics

2025年第8期

浏览历史

内容加载中请稍等...

Conditional dependence learning with high-dimensional conditioning variables

参考文献1

二级参考文献1

共引文献5

相关作者

相关机构

相关主题

浏览历史