摘要
Conditional dependence plays a crucial role in various statistical procedures,including variable selection,network analysis and causal inference.However,there remains a paucity of relevant research in the context of high-dimensional conditioning variables,a common challenge encountered in the era of big data.To address this issue,many existing studies impose certain model structures,yet high-dimensional conditioning variables often introduce spurious correlations in these models.In this paper,we systematically study the estimation biases inherent in widely-used measures of conditional dependence when spurious variables are present under high-dimensional settings.We discuss the estimation inconsistency both intuitively and theoretically,demonstrating that the conditional dependencies can be either overestimated or underestimated under different scenarios.To mitigate these biases and attain consistency,we introduce a measure based on data splitting and refitting techniques for high-dimensional conditional dependence.A conditional independence test is also developed using the newly advocated measure,with a tuning-free asymptotic null distribution.Furthermore,the proposed test is applied to generating high-dimensional network graphs in graphical modeling.The superior performances of newly proposed methods are illustrated both theoretically and through simulation studies.We also utilize the method to construct the gene-gene networks using a dataset of breast invasive carcinoma,which contains interesting discoveries that are worth further scientific exploration.
基金
supported by National Natural Science Foundation of China(Grant Nos.12271456,12371270 and 71988101)
the Ministry of Education Research in the Humanities and Social Sciences(Grant No.22YJA910002)
Shanghai Science and Technology Development Funds(Grant No.23JC1402100)。