With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or p...With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or propagation structures,with only a few recent approaches attempting causal inference;however,these have not yet effectively integrated causal discovery with domain-specific knowledge graphs for detecting health rumors.In this study,we found that the combined use of causal discovery and domain-specific knowledge graphs can effectively identify implicit pseudo-causal logic embedded within texts,holding significant potential for health rumor detection.To this end,we propose CKDG—a dual-graph fusion framework based on causal logic and medical knowledge graphs.CKDG constructs a weighted causal graph to capture the implicit causal relationships in the text and introduces a medical knowledge graph to verify semantic consistency,thereby enhancing the ability to identify the misuse of professional terminology and pseudoscientific claims.In experiments conducted on a dataset comprising 8430 health rumors,CKDG achieved an accuracy of 91.28%and an F1 score of 90.38%,representing improvements of 5.11%and 3.29%over the best baseline,respectively.Our results indicate that the integrated use of causal discovery and domainspecific knowledge graphs offers significant advantages for health rumor detection systems.This method not only improves detection performance but also enhances the transparency and credibility of model decisions by tracing causal chains and sources of knowledge conflicts.We anticipate that this work will provide key technological support for the development of trustworthy health-information filtering systems,thereby improving the reliability of public health information on social media.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linea...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise. Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series, it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations. This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models, which extends the additive nonlinear times series to nonlinear structural vector autoregressive models. An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables. Simulations demonstrate the effectiveness of the nroosed method.展开更多
Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual inform...Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.展开更多
Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information ...Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information between different domains,which makes large language models prone to spurious correlations problems when dealing with specific domains and entities.In order to solve this problem,this paper proposes a cross-domain named entity recognition method based on causal graph structure enhancement,which captures the cross-domain invariant causal structural representations between feature representations of text sequences and annotation sequences by establishing a causal learning and intervention module,so as to improve the utilization of causal structural features by the large languagemodels in the target domains,and thus effectively alleviate the false entity bias triggered by the false relevance problem;meanwhile,through the semantic feature fusion module,the semantic information of the source and target domains is effectively combined.The results show an improvement of 2.47%and 4.12%in the political and medical domains,respectively,compared with the benchmark model,and an excellent performance in small-sample scenarios,which proves the effectiveness of causal graph structural enhancement in improving the accuracy of cross-domain entity recognition and reducing false correlations.展开更多
Fault diagnostics is important for safe operation of nuclear power plants(NPPs). In recent years, data-driven approaches have been proposed and implemented to tackle the problem, e.g., neural networks, fuzzy and neuro...Fault diagnostics is important for safe operation of nuclear power plants(NPPs). In recent years, data-driven approaches have been proposed and implemented to tackle the problem, e.g., neural networks, fuzzy and neurofuzzy approaches, support vector machine, K-nearest neighbor classifiers and inference methodologies. Among these methods, dynamic uncertain causality graph(DUCG)has been proved effective in many practical cases. However, the causal graph construction behind the DUCG is complicate and, in many cases, results redundant on the symptoms needed to correctly classify the fault. In this paper, we propose a method to simplify causal graph construction in an automatic way. The method consists in transforming the expert knowledge-based DCUG into a fuzzy decision tree(FDT) by extracting from the DUCG a fuzzy rule base that resumes the used symptoms at the basis of the FDT. Genetic algorithm(GA) is, then, used for the optimization of the FDT, by performing a wrapper search around the FDT: the set of symptoms selected during the iterative search are taken as the best set of symptoms for the diagnosis of the faults that can occur in the system. The effectiveness of the approach is shown with respect to a DUCG model initially built to diagnose 23 faults originally using 262 symptoms of Unit-1 in the Ningde NPP of the China Guangdong Nuclear Power Corporation. The results show that the FDT, with GA-optimized symptoms and diagnosis strategy, can drive the construction of DUCG and lower the computational burden without loss of accuracy in diagnosis.展开更多
Jaundice is a common and complex clinical symptom potentially occurring in hepatology, general surgery, pediatrics, infectious diseases, gynecology, and obstetrics, and it is faidy difficult to distinguish the cause o...Jaundice is a common and complex clinical symptom potentially occurring in hepatology, general surgery, pediatrics, infectious diseases, gynecology, and obstetrics, and it is faidy difficult to distinguish the cause of jaundice in clinical practice, especially for general practitioners in less developed regions. With collaboration between physicians and artificial intelligence engineers, a comprehensive knowledge base relevant to jaundice was created based on demographic information, symptoms, physical signs, laboratory tests, imaging diagnosis, medical histories, and risk factors. Then a diagnostic modeling and reasoning system using the dynamic uncertain causality graph was proposed. A modularized modeling scheme was presented to reduce the complexity of model construction, providing multiple perspectives and arbitrary granularity for disease causality representations. A "chaining" inference algorithm and weighted logic operation mechanism were employed to guarantee the exactness and efficiency of diagnostic rea- soning under situations of incomplete and uncertain information. Moreover, the causal interactions among diseases and symptoms intuitively demonstrated the reasoning process in a graphical manner. Verification was performed using 203 randomly pooled clinical cases, and the accuracy was 99.01% and 84.73%, respectively, with or without laboratory tests in the model. The solutions were more explicable and convincing than common methods such as Bayesian Networks, further increasing the objectivity of clinical decision-making. The promising results indicated that our model could be potentially used in intelligent diagnosis and help decrease public health expenditure.展开更多
Aim To improve the causal diagnosis method presented by Bandekar and propose a new method of finding the root fault order according to the fault possibility by means of numerical calculation. Methods Based on the ca...Aim To improve the causal diagnosis method presented by Bandekar and propose a new method of finding the root fault order according to the fault possibility by means of numerical calculation. Methods Based on the causal graph, by utilization of fuzzified threshold value and fuzzy discrimination matrix, a kind of fuzzy causal diagnosis method was given and the fault possibility of each elements in the root fault candidate set (RFCS) was obtained. Results and Conclusion The order of each element in the RFCS can be obtained by the fault possibility, which makes the location of fault much easier. The diagnosis speed of this method is quite high, and by means of the fuzzified threshold value and fuzzy discrimination matrix, the result is more robust to noises and bad parameter's choice.展开更多
Objective:Chronic fatigue syndrome(CFS)is a prevalent symptom of post-coronavirus disease 2019(COVID-19)and is associated with unclear disease mechanisms.The herbal medicine Qingjin Yiqi granules(QJYQ)constitute a cli...Objective:Chronic fatigue syndrome(CFS)is a prevalent symptom of post-coronavirus disease 2019(COVID-19)and is associated with unclear disease mechanisms.The herbal medicine Qingjin Yiqi granules(QJYQ)constitute a clinically approved formula for treating post-COVID-19;however,its potential as a drug target for treating CFS remains largely unknown.This study aimed to identify novel causal factors for CFS and elucidate the potential targets and pharmacological mechanisms of action of QJYQ in treating CFS.Methods:This prospective cohort analysis included 4,212 adults aged≥65 years who were followed up for 7 years with 435 incident CFS cases.Causal modeling and multivariate logistic regression analysis were performed to identify the potential causal determinants of CFS.A proteome-wide,two-sample Mendelian randomization(MR)analysis was employed to explore the proteins associated with the identified causal factors of CFS,which may serve as potential drug targets.Furthermore,we performed a virtual screening analysis to assess the binding affinity between the bioactive compounds in QJYQ and CFS-associated proteins.Results:Among 4,212 participants(47.5%men)with a median age of 69 years(interquartile range:69–70 years)enrolled in 2004,435 developed CFS by 2011.Causal graph analysis with multivariate logistic regression identified frequent cough(odds ratio:1.74,95%confidence interval[CI]:1.15–2.63)and insomnia(odds ratio:2.59,95%CI:1.77–3.79)as novel causal factors of CFS.Proteome-wide MR analysis revealed that the upregulation of endothelial cell-selective adhesion molecule(ESAM)was causally linked to both chronic cough(odds ratio:1.019,95%CI:1.012–1.026,P=2.75 e^(−05))and insomnia(odds ratio:1.015,95%CI:1.008–1.022,P=4.40 e^(−08))in CFS.The major bioactive compounds of QJYQ,ginsenoside Rb2(docking score:−6.03)and RG4(docking score:−6.15),bound to ESAM with high affinity based on virtual screening.Conclusions:Our integrated analytical framework combining epidemiological,genetic,and in silico data provides a novel strategy for elucidating complex disease mechanisms,such as CFS,and informing models of action of traditional Chinese medicines,such as QJYQ.Further validation in animal models is warranted to confirm the potential pharmacological effects of QJYQ on ESAM and as a treatment for CFS.展开更多
To combat increasingly sophisticated cyber attacks,the security community has proposed and deployed a large body of threat detection approaches to discover malicious behaviors on host systems and attack payloads in ne...To combat increasingly sophisticated cyber attacks,the security community has proposed and deployed a large body of threat detection approaches to discover malicious behaviors on host systems and attack payloads in network traffic.Several studies have begun to focus on threat detection methods based on provenance data of host-level event tracing.On the other side,with the significant development of big data and artificial intelligence technologies,large-scale graph computing has been widely used.To this end,kinds of research try to bridge the gap between threat detection based on host log provenance data and graph algorithm,and propose the threat detection algorithm based on system provenance graph.These approaches usually generate the system provenance graph via tagging and tracking of system events,and then leverage the characteristics of the graph to conduct threat detection and attack investigation.For the purpose of deeply understanding the correctness,effectiveness,and efficiency of different graph-based threat detection algorithms,we pay attention to mainstream threat detection methods based on provenance graphs.We select and implement 5 state-of-the-art threat detection approaches among a large number of studies as evaluation objects for further analysis.To this end,we collect about 40GB of host-level raw log data in a real-world IT environment,and simulate 6 types of cyber attack scenarios in an isolated environment for malicious provenance data to build our evaluation datasets.The crosswise comparison and longitudinal assessment interpret in detail these detection approaches can detect which attack scenarios well and why.Our empirical evaluation provides a solid foundation for the improvement direction of the threat detection approach.展开更多
Finding causality merely from observed data is a fundamental problem in science. The most basic form of this causal problem is to determine whether X leads to Y or Y leads to X in the case of joint observation of two ...Finding causality merely from observed data is a fundamental problem in science. The most basic form of this causal problem is to determine whether X leads to Y or Y leads to X in the case of joint observation of two variables X, Y. In statistics, path analysis is used to describe the direct dependence between a set of variables. But in fact, we usually do not know the causal order between variables. However, ignoring the direction of the causal path will prevent researchers from analyzing or using causal models. In this study, we propose a method for estimating causality based on observed data. First, observed variables are cleaned and valid variables are retained. Then, a direct linear non-Gaussian acyclic graph models(DirectLiNGAM) estimates the causal order K between variables. The third step is to estimate the adjacency matrix B of the causal relationship based on K. Next, since B is not convenient for model interpretation, we use adaptive lasso to prune the causal path and variables. Further, a causal path graph and a recursive model are established. Finally, we test and debug the recursive model, obtain a causal model with good fit, and estimate the direct, indirect and total effects between causal variables. This paper overcomes the randomness assigning causal order to variables. This study is different from the researcher’s understanding of his own model by generating some form of simulation data. The simplest and relatively unsmooth statistical learning method used in this study has obvious advantages in the field of interpretable machine learning.展开更多
基金funded by the Hunan Provincial Natural Science Foundation of China(Grant No.2025JJ70105)the Hunan Provincial College Students’Innovation and Entrepreneurship Training Program(Project No.S202411342056)The article processing charge(APC)was funded by the Project No.2025JJ70105.
文摘With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or propagation structures,with only a few recent approaches attempting causal inference;however,these have not yet effectively integrated causal discovery with domain-specific knowledge graphs for detecting health rumors.In this study,we found that the combined use of causal discovery and domain-specific knowledge graphs can effectively identify implicit pseudo-causal logic embedded within texts,holding significant potential for health rumor detection.To this end,we propose CKDG—a dual-graph fusion framework based on causal logic and medical knowledge graphs.CKDG constructs a weighted causal graph to capture the implicit causal relationships in the text and introduces a medical knowledge graph to verify semantic consistency,thereby enhancing the ability to identify the misuse of professional terminology and pseudoscientific claims.In experiments conducted on a dataset comprising 8430 health rumors,CKDG achieved an accuracy of 91.28%and an F1 score of 90.38%,representing improvements of 5.11%and 3.29%over the best baseline,respectively.Our results indicate that the integrated use of causal discovery and domainspecific knowledge graphs offers significant advantages for health rumor detection systems.This method not only improves detection performance but also enhances the transparency and credibility of model decisions by tracing causal chains and sources of knowledge conflicts.We anticipate that this work will provide key technological support for the development of trustworthy health-information filtering systems,thereby improving the reliability of public health information on social media.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150 and 10926197
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis. Traditional causality inference methods have a salient limitation that the model must be linear and with Gaussian noise. Although additive model regression can effectively infer the nonlinear causal relationships of additive nonlinear time series, it suffers from the limitation that contemporaneous causal relationships of variables must be linear and not always valid to test conditional independence relations. This paper provides a nonparametric method that employs both mutual information and conditional mutual information to identify causal structure of a class of nonlinear time series models, which extends the additive nonlinear times series to nonlinear structural vector autoregressive models. An algorithm is developed to learn the contemporaneous and the lagged causal relationships of variables. Simulations demonstrate the effectiveness of the nroosed method.
基金supported by the National Natural Science Foundation of China under Grant Nos.60972150, 10926197,61201323
文摘Detection and clarification of cause-effect relationships among variables is an important problem in time series analysis.This paper provides a method that employs both mutual information and conditional mutual information to identify the causal structure of multivariate time series causal graphical models.A three-step procedure is developed to learn the contemporaneous and the lagged causal relationships of time series causal graphs.Contrary to conventional constraint-based algorithm, the proposed algorithm does not involve any special kinds of distribution and is nonparametric.These properties are especially appealing for inference of time series causal graphs when the prior knowledge about the data model is not available.Simulations and case analysis demonstrate the effectiveness of the method.
基金supported by National Natural Science Foundation of China Joint Fund for Enterprise Innovation Development(U23B2029)National Natural Science Foundation of China(62076167,61772020)+1 种基金Key Scientific Research Project of Higher Education Institutions in Henan Province(24A520058,24A520060,23A520022)Postgraduate Education Reform and Quality Improvement Project of Henan Province(YJS2024AL053).
文摘Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information between different domains,which makes large language models prone to spurious correlations problems when dealing with specific domains and entities.In order to solve this problem,this paper proposes a cross-domain named entity recognition method based on causal graph structure enhancement,which captures the cross-domain invariant causal structural representations between feature representations of text sequences and annotation sequences by establishing a causal learning and intervention module,so as to improve the utilization of causal structural features by the large languagemodels in the target domains,and thus effectively alleviate the false entity bias triggered by the false relevance problem;meanwhile,through the semantic feature fusion module,the semantic information of the source and target domains is effectively combined.The results show an improvement of 2.47%and 4.12%in the political and medical domains,respectively,compared with the benchmark model,and an excellent performance in small-sample scenarios,which proves the effectiveness of causal graph structural enhancement in improving the accuracy of cross-domain entity recognition and reducing false correlations.
文摘Fault diagnostics is important for safe operation of nuclear power plants(NPPs). In recent years, data-driven approaches have been proposed and implemented to tackle the problem, e.g., neural networks, fuzzy and neurofuzzy approaches, support vector machine, K-nearest neighbor classifiers and inference methodologies. Among these methods, dynamic uncertain causality graph(DUCG)has been proved effective in many practical cases. However, the causal graph construction behind the DUCG is complicate and, in many cases, results redundant on the symptoms needed to correctly classify the fault. In this paper, we propose a method to simplify causal graph construction in an automatic way. The method consists in transforming the expert knowledge-based DCUG into a fuzzy decision tree(FDT) by extracting from the DUCG a fuzzy rule base that resumes the used symptoms at the basis of the FDT. Genetic algorithm(GA) is, then, used for the optimization of the FDT, by performing a wrapper search around the FDT: the set of symptoms selected during the iterative search are taken as the best set of symptoms for the diagnosis of the faults that can occur in the system. The effectiveness of the approach is shown with respect to a DUCG model initially built to diagnose 23 faults originally using 262 symptoms of Unit-1 in the Ningde NPP of the China Guangdong Nuclear Power Corporation. The results show that the FDT, with GA-optimized symptoms and diagnosis strategy, can drive the construction of DUCG and lower the computational burden without loss of accuracy in diagnosis.
基金supported by the Medical and Health Research Program of Zhejiang Province(No.2015KYB128)the Zhejiang Provincial Natural Science Foundation(No.LQ15H030004),China
文摘Jaundice is a common and complex clinical symptom potentially occurring in hepatology, general surgery, pediatrics, infectious diseases, gynecology, and obstetrics, and it is faidy difficult to distinguish the cause of jaundice in clinical practice, especially for general practitioners in less developed regions. With collaboration between physicians and artificial intelligence engineers, a comprehensive knowledge base relevant to jaundice was created based on demographic information, symptoms, physical signs, laboratory tests, imaging diagnosis, medical histories, and risk factors. Then a diagnostic modeling and reasoning system using the dynamic uncertain causality graph was proposed. A modularized modeling scheme was presented to reduce the complexity of model construction, providing multiple perspectives and arbitrary granularity for disease causality representations. A "chaining" inference algorithm and weighted logic operation mechanism were employed to guarantee the exactness and efficiency of diagnostic rea- soning under situations of incomplete and uncertain information. Moreover, the causal interactions among diseases and symptoms intuitively demonstrated the reasoning process in a graphical manner. Verification was performed using 203 randomly pooled clinical cases, and the accuracy was 99.01% and 84.73%, respectively, with or without laboratory tests in the model. The solutions were more explicable and convincing than common methods such as Bayesian Networks, further increasing the objectivity of clinical decision-making. The promising results indicated that our model could be potentially used in intelligent diagnosis and help decrease public health expenditure.
文摘Aim To improve the causal diagnosis method presented by Bandekar and propose a new method of finding the root fault order according to the fault possibility by means of numerical calculation. Methods Based on the causal graph, by utilization of fuzzified threshold value and fuzzy discrimination matrix, a kind of fuzzy causal diagnosis method was given and the fault possibility of each elements in the root fault candidate set (RFCS) was obtained. Results and Conclusion The order of each element in the RFCS can be obtained by the fault possibility, which makes the location of fault much easier. The diagnosis speed of this method is quite high, and by means of the fuzzified threshold value and fuzzy discrimination matrix, the result is more robust to noises and bad parameter's choice.
基金supported by an internal fund from Macao Polytechnic University(RP/FCSD-02/2022).
文摘Objective:Chronic fatigue syndrome(CFS)is a prevalent symptom of post-coronavirus disease 2019(COVID-19)and is associated with unclear disease mechanisms.The herbal medicine Qingjin Yiqi granules(QJYQ)constitute a clinically approved formula for treating post-COVID-19;however,its potential as a drug target for treating CFS remains largely unknown.This study aimed to identify novel causal factors for CFS and elucidate the potential targets and pharmacological mechanisms of action of QJYQ in treating CFS.Methods:This prospective cohort analysis included 4,212 adults aged≥65 years who were followed up for 7 years with 435 incident CFS cases.Causal modeling and multivariate logistic regression analysis were performed to identify the potential causal determinants of CFS.A proteome-wide,two-sample Mendelian randomization(MR)analysis was employed to explore the proteins associated with the identified causal factors of CFS,which may serve as potential drug targets.Furthermore,we performed a virtual screening analysis to assess the binding affinity between the bioactive compounds in QJYQ and CFS-associated proteins.Results:Among 4,212 participants(47.5%men)with a median age of 69 years(interquartile range:69–70 years)enrolled in 2004,435 developed CFS by 2011.Causal graph analysis with multivariate logistic regression identified frequent cough(odds ratio:1.74,95%confidence interval[CI]:1.15–2.63)and insomnia(odds ratio:2.59,95%CI:1.77–3.79)as novel causal factors of CFS.Proteome-wide MR analysis revealed that the upregulation of endothelial cell-selective adhesion molecule(ESAM)was causally linked to both chronic cough(odds ratio:1.019,95%CI:1.012–1.026,P=2.75 e^(−05))and insomnia(odds ratio:1.015,95%CI:1.008–1.022,P=4.40 e^(−08))in CFS.The major bioactive compounds of QJYQ,ginsenoside Rb2(docking score:−6.03)and RG4(docking score:−6.15),bound to ESAM with high affinity based on virtual screening.Conclusions:Our integrated analytical framework combining epidemiological,genetic,and in silico data provides a novel strategy for elucidating complex disease mechanisms,such as CFS,and informing models of action of traditional Chinese medicines,such as QJYQ.Further validation in animal models is warranted to confirm the potential pharmacological effects of QJYQ on ESAM and as a treatment for CFS.
基金supported by National Natural Science Foundation of China (No. U1736218)National Key R&D Program of China (No. 2018YFB0804704)partially supported by CNCERT/CC
文摘To combat increasingly sophisticated cyber attacks,the security community has proposed and deployed a large body of threat detection approaches to discover malicious behaviors on host systems and attack payloads in network traffic.Several studies have begun to focus on threat detection methods based on provenance data of host-level event tracing.On the other side,with the significant development of big data and artificial intelligence technologies,large-scale graph computing has been widely used.To this end,kinds of research try to bridge the gap between threat detection based on host log provenance data and graph algorithm,and propose the threat detection algorithm based on system provenance graph.These approaches usually generate the system provenance graph via tagging and tracking of system events,and then leverage the characteristics of the graph to conduct threat detection and attack investigation.For the purpose of deeply understanding the correctness,effectiveness,and efficiency of different graph-based threat detection algorithms,we pay attention to mainstream threat detection methods based on provenance graphs.We select and implement 5 state-of-the-art threat detection approaches among a large number of studies as evaluation objects for further analysis.To this end,we collect about 40GB of host-level raw log data in a real-world IT environment,and simulate 6 types of cyber attack scenarios in an isolated environment for malicious provenance data to build our evaluation datasets.The crosswise comparison and longitudinal assessment interpret in detail these detection approaches can detect which attack scenarios well and why.Our empirical evaluation provides a solid foundation for the improvement direction of the threat detection approach.
基金Supported by the National Natural Science Foundation of China(61573266)
文摘Finding causality merely from observed data is a fundamental problem in science. The most basic form of this causal problem is to determine whether X leads to Y or Y leads to X in the case of joint observation of two variables X, Y. In statistics, path analysis is used to describe the direct dependence between a set of variables. But in fact, we usually do not know the causal order between variables. However, ignoring the direction of the causal path will prevent researchers from analyzing or using causal models. In this study, we propose a method for estimating causality based on observed data. First, observed variables are cleaned and valid variables are retained. Then, a direct linear non-Gaussian acyclic graph models(DirectLiNGAM) estimates the causal order K between variables. The third step is to estimate the adjacency matrix B of the causal relationship based on K. Next, since B is not convenient for model interpretation, we use adaptive lasso to prune the causal path and variables. Further, a causal path graph and a recursive model are established. Finally, we test and debug the recursive model, obtain a causal model with good fit, and estimate the direct, indirect and total effects between causal variables. This paper overcomes the randomness assigning causal order to variables. This study is different from the researcher’s understanding of his own model by generating some form of simulation data. The simplest and relatively unsmooth statistical learning method used in this study has obvious advantages in the field of interpretable machine learning.