Real-world clinical evaluation of traditional Chinese medicine(RWCE-TCM)is a method for comprehensively evaluating the clinical effects of TCM,with the aim of delving into the causality between TCM intervention and cl...Real-world clinical evaluation of traditional Chinese medicine(RWCE-TCM)is a method for comprehensively evaluating the clinical effects of TCM,with the aim of delving into the causality between TCM intervention and clinical outcomes.The study explored data science and causal learning methods to transform RWD into reliable real-world evidence,aiming to provide an innovative approach for RWCE-TCM.This study proposes a 10-step data science methodology to address the challenges posed by diverse and complex data in RWCE-TCM.The methodology involves several key steps,including data integration and warehouse building,high-dimensional feature selection,the use of interpretable statistical machine learning algorithms,complex networks,and graph network analysis,knowledge mining techniques such as natural language processing and machine learning,observational study design,and the application of artificial intelligence tools to build an intelligent engine for translational analysis.The goal is to establish a method for clinical positioning,applicable population screening,and mining the structural association of TCM characteristic therapies.In addition,the study adopts the principle of real-world research and a causal learning method for TCM clinical data.We constructed a multidimensional clinical knowledge map of“disease-syndrome-symptom-prescription-medicine”to enhance our understanding of the diagnosis and treatment laws of TCM,clarify the unique therapies,and explore information conducive to individualized treatment.The causal inference process of observational data can address confounding bias and reduce individual heterogeneity,promoting the transformation of TCM RWD into reliable clinical evidence.Intelligent data science improves efficiency and accuracy for implementing RWCE-TCM.The proposed data science methodology for TCM can handle complex data,ensure high-quality RWD acquisition and analysis,and provide in-depth insights into clinical benefits of TCM.This method supports the intelligent translation and demonstration of RWD in TCM,leads the data-driven translational analysis of causal learning,and innovates the path of RWCE-TCM.展开更多
In multi-label learning,the label-specific features learning framework can effectively solve the dimensional catastrophe problem brought by high-dimensional data.The classification performance and robustness of the mo...In multi-label learning,the label-specific features learning framework can effectively solve the dimensional catastrophe problem brought by high-dimensional data.The classification performance and robustness of the model are effectively improved.Most existing label-specific features learning utilizes the cosine similarity method to measure label correlation.It is well known that the correlation between labels is asymmetric.However,existing label-specific features learning only considers the private features of labels in classification and does not take into account the common features of labels.Based on this,this paper proposes a Causality-driven Common and Label-specific Features Learning,named CCSF algorithm.Firstly,the causal learning algorithm GSBN is used to calculate the asymmetric correlation between labels.Then,in the optimization,both l_(2,1)-norm and l_(1)-norm are used to select the corresponding features,respectively.Finally,it is compared with six state-of-the-art algorithms on nine datasets.The experimental results prove the effectiveness of the algorithm in this paper.展开更多
Many digital platforms have employed free-content promotion strategies to deal with the high uncertainty levels regarding digital content products.However,the diversity of digital content products and user heterogenei...Many digital platforms have employed free-content promotion strategies to deal with the high uncertainty levels regarding digital content products.However,the diversity of digital content products and user heterogeneity in content preference may blur the impact of platform promotions across users and products.Therefore,free-content promotion strategies should be adapted to allocate marketing resources optimally and increase revenue.This study develops personal-ized free-content promotion strategies based on individual-level heterogeneous treatment effects and explores the causes of their heterogeneity,focusing on the moderating effect of user engagement-related variables.To this end,we utilize ran-dom field experimental data provided by a top Chinese e-book platform.We employ a framework that combines machine learning with econometric causal inference methods to estimate individual treatment effects and analyze their potential mechanisms.The analysis shows that,on average,free-content promotions lead to a significant increase in consumer pay-ments.However,the higher the level of user engagement,the lower the payment lift caused by promotions,as more-engaged users are more strongly affected by the cannibalization effect of free-content promotion.This study introduces a novel causal research design to help platforms improve their marketing strategies.展开更多
Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and produ...Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.展开更多
This study investigates the persistent academic impacts of the Head Start program,a federal government-funded early childhood intervention,using data from the Early Childhood Longitudinal Study-Kindergarten Cohort(ECL...This study investigates the persistent academic impacts of the Head Start program,a federal government-funded early childhood intervention,using data from the Early Childhood Longitudinal Study-Kindergarten Cohort(ECLSK).Bayesian Additive Regression Trees(BARTs)are the primary methodology used,and average,conditional,and individual-level treatment impacts on children’s mathematics achievement are estimated.BART estimates a negative Average Treatment Effect(ATE)of−1.5421 with increasingly larger adverse effects for children with higher Socioeconomic Status(SES),suggesting diminishing marginal returns.This finding demonstrates the strength of BART to detect nonlinear moderation patterns that are evasive to conventional models.It also implies that Head Start and other preschool interventions will yield greater policy returns when targeted at low-SES children,in order to enable more efficient and fair distribution of public funds.For comparison,Causal Forest estimates a larger ATE(−2.4340)and determines SES to be the overarching moderator,while Propensity Score Matching offers a conservative estimate(−1.2606)without considering effect heterogeneity.These findings underscore the utility of BART in estimating subtle,SES-varying effects of Head Start,and suggest the potential value of more targeted intervention strategies guided by adaptive causal inference.展开更多
It is desired to obtain the joint probability distribution(JPD) over a set of random variables with local data, so as to avoid the hard work to collect statistical data in the scale of all variables. A lot of work has...It is desired to obtain the joint probability distribution(JPD) over a set of random variables with local data, so as to avoid the hard work to collect statistical data in the scale of all variables. A lot of work has been done when all variables are in a known directed acyclic graph(DAG). However, steady directed cyclic graphs(DCGs) may be involved when we simply combine modules containing local data together, where a module is composed of a child variable and its parent variables. So far, the physical and statistical meaning of steady DCGs remain unclear and unsolved. This paper illustrates the physical and statistical meaning of steady DCGs, and presents a method to calculate the JPD with local data, given that all variables are in a known single-valued Dynamic Uncertain Causality Graph(S-DUCG), and thus defines a new Bayesian Network with steady DCGs. The so-called single-valued means that only the causes of the true state of a variable are specified, while the false state is the complement of the true state.展开更多
Demand-side flexibility is crucial to balancing supply and demand,as renewable energy sources are increasingly integrated into the energy mix,and heating and transport systems are becoming more and more electrified.Hi...Demand-side flexibility is crucial to balancing supply and demand,as renewable energy sources are increasingly integrated into the energy mix,and heating and transport systems are becoming more and more electrified.Historically,this balancing has been managed from the supply side.However,the shift towards renewable energy sources limits the controllability of traditional fossil fuel plants,increasing the importance of demand response(DR)techniques to achieve the required flexibility.Aggregators participating in flexibility markets need to accurately forecast the adaptability they can offer,a task complicated by numerous influencing variables.Based on a top-down approach,this study addresses the problem of forecasting electricity demand in the presence of flexibility from thermostatically controlled loads.We propose a hybrid model that combines data-driven techniques for probabilistic estimation of electricity consumption with a disaggregation of electricity consumption to identify the fraction of thermal loads,subject to flexibility,which is simulated by a virtual battery model.The technique is applied to a synthetic dataset that simulates the response of a European neighborhood to demand response interventions.The results demonstrate the model’s ability to accurately predict both the reduction in electricity demand during DR events and the subsequent rebound in consumption.The model achieves a mean absolute percentage error(MAPE)lower than 17.0%,comparable to the accuracy without flexibility.The results obtained are compared with a direct data-driven approach,demonstrating the validity and effectiveness of our model.展开更多
基金This work was funded by the scientific and technological innovation project of China Academy of Chinese Medical Sciences(CI2021A04706,CI2021B003)the National Key Research and Development Program of China(2023YFC3503404,2017YFC1700406-2,2018YFC1704306)the independent selection project of China Academy of Chinese Medical Sciences(Z0643,Z0723).
文摘Real-world clinical evaluation of traditional Chinese medicine(RWCE-TCM)is a method for comprehensively evaluating the clinical effects of TCM,with the aim of delving into the causality between TCM intervention and clinical outcomes.The study explored data science and causal learning methods to transform RWD into reliable real-world evidence,aiming to provide an innovative approach for RWCE-TCM.This study proposes a 10-step data science methodology to address the challenges posed by diverse and complex data in RWCE-TCM.The methodology involves several key steps,including data integration and warehouse building,high-dimensional feature selection,the use of interpretable statistical machine learning algorithms,complex networks,and graph network analysis,knowledge mining techniques such as natural language processing and machine learning,observational study design,and the application of artificial intelligence tools to build an intelligent engine for translational analysis.The goal is to establish a method for clinical positioning,applicable population screening,and mining the structural association of TCM characteristic therapies.In addition,the study adopts the principle of real-world research and a causal learning method for TCM clinical data.We constructed a multidimensional clinical knowledge map of“disease-syndrome-symptom-prescription-medicine”to enhance our understanding of the diagnosis and treatment laws of TCM,clarify the unique therapies,and explore information conducive to individualized treatment.The causal inference process of observational data can address confounding bias and reduce individual heterogeneity,promoting the transformation of TCM RWD into reliable clinical evidence.Intelligent data science improves efficiency and accuracy for implementing RWCE-TCM.The proposed data science methodology for TCM can handle complex data,ensure high-quality RWD acquisition and analysis,and provide in-depth insights into clinical benefits of TCM.This method supports the intelligent translation and demonstration of RWD in TCM,leads the data-driven translational analysis of causal learning,and innovates the path of RWCE-TCM.
基金2022 University Research Priorities,No.2022AH051989.
文摘In multi-label learning,the label-specific features learning framework can effectively solve the dimensional catastrophe problem brought by high-dimensional data.The classification performance and robustness of the model are effectively improved.Most existing label-specific features learning utilizes the cosine similarity method to measure label correlation.It is well known that the correlation between labels is asymmetric.However,existing label-specific features learning only considers the private features of labels in classification and does not take into account the common features of labels.Based on this,this paper proposes a Causality-driven Common and Label-specific Features Learning,named CCSF algorithm.Firstly,the causal learning algorithm GSBN is used to calculate the asymmetric correlation between labels.Then,in the optimization,both l_(2,1)-norm and l_(1)-norm are used to select the corresponding features,respectively.Finally,it is compared with six state-of-the-art algorithms on nine datasets.The experimental results prove the effectiveness of the algorithm in this paper.
基金supported by the Anhui Postdoctoral Scientific Research Program Foundation(2022B579).
文摘Many digital platforms have employed free-content promotion strategies to deal with the high uncertainty levels regarding digital content products.However,the diversity of digital content products and user heterogeneity in content preference may blur the impact of platform promotions across users and products.Therefore,free-content promotion strategies should be adapted to allocate marketing resources optimally and increase revenue.This study develops personal-ized free-content promotion strategies based on individual-level heterogeneous treatment effects and explores the causes of their heterogeneity,focusing on the moderating effect of user engagement-related variables.To this end,we utilize ran-dom field experimental data provided by a top Chinese e-book platform.We employ a framework that combines machine learning with econometric causal inference methods to estimate individual treatment effects and analyze their potential mechanisms.The analysis shows that,on average,free-content promotions lead to a significant increase in consumer pay-ments.However,the higher the level of user engagement,the lower the payment lift caused by promotions,as more-engaged users are more strongly affected by the cannibalization effect of free-content promotion.This study introduces a novel causal research design to help platforms improve their marketing strategies.
基金Project supported by the National Major Science and Technology Projects of China(No.2022YFB3303302)the National Natural Science Foundation of China(Nos.61977012 and 62207007)the Central Universities Project in China at Chongqing University(Nos.2021CDJYGRH011 and 2020CDJSK06PT14)。
文摘Artificial intelligence generated content(AIGC)has emerged as an indispensable tool for producing large-scale content in various forms,such as images,thanks to the significant role that AI plays in imitation and production.However,interpretability and controllability remain challenges.Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images.To address this issue,we have developed a novel method for causal controllable image generation(CCIG)that combines causal representation learning with bi-directional generative adversarial networks(GANs).This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images.The key of our approach,CCIG,lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder,generator,and joint discriminator in the image generation module.By doing so,we can learn causal representations in image’s latent space and use causal intervention operations to control image generation.We conduct extensive experiments on a real-world dataset,CelebA.The experimental results illustrate the effectiveness of CCIG.
文摘This study investigates the persistent academic impacts of the Head Start program,a federal government-funded early childhood intervention,using data from the Early Childhood Longitudinal Study-Kindergarten Cohort(ECLSK).Bayesian Additive Regression Trees(BARTs)are the primary methodology used,and average,conditional,and individual-level treatment impacts on children’s mathematics achievement are estimated.BART estimates a negative Average Treatment Effect(ATE)of−1.5421 with increasingly larger adverse effects for children with higher Socioeconomic Status(SES),suggesting diminishing marginal returns.This finding demonstrates the strength of BART to detect nonlinear moderation patterns that are evasive to conventional models.It also implies that Head Start and other preschool interventions will yield greater policy returns when targeted at low-SES children,in order to enable more efficient and fair distribution of public funds.For comparison,Causal Forest estimates a larger ATE(−2.4340)and determines SES to be the overarching moderator,while Propensity Score Matching offers a conservative estimate(−1.2606)without considering effect heterogeneity.These findings underscore the utility of BART in estimating subtle,SES-varying effects of Head Start,and suggest the potential value of more targeted intervention strategies guided by adaptive causal inference.
基金supported by the National Natural Science Foundation of China under Grant 71671103
文摘It is desired to obtain the joint probability distribution(JPD) over a set of random variables with local data, so as to avoid the hard work to collect statistical data in the scale of all variables. A lot of work has been done when all variables are in a known directed acyclic graph(DAG). However, steady directed cyclic graphs(DCGs) may be involved when we simply combine modules containing local data together, where a module is composed of a child variable and its parent variables. So far, the physical and statistical meaning of steady DCGs remain unclear and unsolved. This paper illustrates the physical and statistical meaning of steady DCGs, and presents a method to calculate the JPD with local data, given that all variables are in a known single-valued Dynamic Uncertain Causality Graph(S-DUCG), and thus defines a new Bayesian Network with steady DCGs. The so-called single-valued means that only the causes of the true state of a variable are specified, while the false state is the complement of the true state.
基金the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No.957755,project SENDER:Sustainable consumer engagement and demand response.
文摘Demand-side flexibility is crucial to balancing supply and demand,as renewable energy sources are increasingly integrated into the energy mix,and heating and transport systems are becoming more and more electrified.Historically,this balancing has been managed from the supply side.However,the shift towards renewable energy sources limits the controllability of traditional fossil fuel plants,increasing the importance of demand response(DR)techniques to achieve the required flexibility.Aggregators participating in flexibility markets need to accurately forecast the adaptability they can offer,a task complicated by numerous influencing variables.Based on a top-down approach,this study addresses the problem of forecasting electricity demand in the presence of flexibility from thermostatically controlled loads.We propose a hybrid model that combines data-driven techniques for probabilistic estimation of electricity consumption with a disaggregation of electricity consumption to identify the fraction of thermal loads,subject to flexibility,which is simulated by a virtual battery model.The technique is applied to a synthetic dataset that simulates the response of a European neighborhood to demand response interventions.The results demonstrate the model’s ability to accurately predict both the reduction in electricity demand during DR events and the subsequent rebound in consumption.The model achieves a mean absolute percentage error(MAPE)lower than 17.0%,comparable to the accuracy without flexibility.The results obtained are compared with a direct data-driven approach,demonstrating the validity and effectiveness of our model.