Deep learning has been increasingly popular in omics data analysis.Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability.However,because deep learning des...Deep learning has been increasingly popular in omics data analysis.Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability.However,because deep learning desires a large sample size,the existing methods may result in uncertain findings when the dataset has a small sample size,commonly seen in omics data analysis.With the explosion and availability of omics data from multiple populations/studies,the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets,which might lead to inaccurate variable selection results.We propose a penalized integrative deep neural network(PIN)to simultaneously select important variables from multiple datasets.PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework.Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets.The source code is freely available on Github(rucliyang/PINFunc).We speculate that the proposed PIN method will promote the identification of disease-related important variables based on multiple studies/datasets from diverse origins.展开更多
Machine-learning interatomic potentials have revolutionized materials modeling at the atomic scale.Thanks to these,it is now indeed possible to perform simulations of ab initio quality over very large time and length ...Machine-learning interatomic potentials have revolutionized materials modeling at the atomic scale.Thanks to these,it is now indeed possible to perform simulations of ab initio quality over very large time and length scales.More recently,various universal machine-learning models have been proposed as an out-of-box approach avoiding the need to train and validate specific potentials for each particular material of interest.In this paper,we review and evaluate four different universal machine-learning interatomic potentials(uMLIPs),all based on graph neural network architectures which have demonstrated transferability from one chemical system to another.The evaluation procedure relies on data both from a recent verification study of density-functional-theory implementations and from the Materials Project.Through this comprehensive evaluation,we aim to provide guidance to materials scientists in selecting suitable models for their specific research problems,offer recommendations for model selection and optimization,and stimulate discussion on potential areas for improvement in current machinelearning methodologies in materials science.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:72271237Building World-class Universities of Renmin University of China,Grant/Award Number:21XNF037。
文摘Deep learning has been increasingly popular in omics data analysis.Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability.However,because deep learning desires a large sample size,the existing methods may result in uncertain findings when the dataset has a small sample size,commonly seen in omics data analysis.With the explosion and availability of omics data from multiple populations/studies,the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets,which might lead to inaccurate variable selection results.We propose a penalized integrative deep neural network(PIN)to simultaneously select important variables from multiple datasets.PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework.Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets.The source code is freely available on Github(rucliyang/PINFunc).We speculate that the proposed PIN method will promote the identification of disease-related important variables based on multiple studies/datasets from diverse origins.
基金supported by the National Key Research and Development Program of China(2022YFE0141100 and 2023YFB3003005).
文摘Machine-learning interatomic potentials have revolutionized materials modeling at the atomic scale.Thanks to these,it is now indeed possible to perform simulations of ab initio quality over very large time and length scales.More recently,various universal machine-learning models have been proposed as an out-of-box approach avoiding the need to train and validate specific potentials for each particular material of interest.In this paper,we review and evaluate four different universal machine-learning interatomic potentials(uMLIPs),all based on graph neural network architectures which have demonstrated transferability from one chemical system to another.The evaluation procedure relies on data both from a recent verification study of density-functional-theory implementations and from the Materials Project.Through this comprehensive evaluation,we aim to provide guidance to materials scientists in selecting suitable models for their specific research problems,offer recommendations for model selection and optimization,and stimulate discussion on potential areas for improvement in current machinelearning methodologies in materials science.