Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long co...Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long computational time when analyzing large datasets.To address this issue,in this study,we proposed a fast mrMLM method,namely,best linear unbiased prediction multilocus random-SNP-effect mixed linear model(BLUPmrMLM).First,genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction(BLUP)values of marker effects and their variances in BLUPmrMLM.Then,adaptive best subset selection(ABESS)was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes.Finally,shared memory and parallel computing schemes were used to reduce the computational time.In simulation studies,BLUPmrMLM outperformed GEMMA,EMMAX,mrMLM,and FarmCPU as well as the control method(BLUPmrMLM with ABESS removed),in terms of computational time,power,accuracy for estimating quantitative trait nucleotide positions and effects,false positive rate,false discovery rate,false negative rate,and F1 score.In the reanalysis of two large rice datasets,BLUPmrMLM significantly reduced the computational time and identified more previously reported genes,compared with the aforementioned methods.This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets.The software mrMLM v5.1 is available at BioCode(https://ngdc.cncb.ac.cn/biocode/tool/BT007388)or GitHub(https://github.com/YuanmingZhang65/mrMLM).展开更多
Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-...Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-locus GWAS methods have been recommended.However,only a few software packages for multi-locus GWAS are available.Therefore,we developed an R software named mr MLM v4.0.2.This software integrates mr MLM,FASTmr MLM,FASTmr EMMA,p LARm EB,p KWm EB,and ISIS EM-BLASSO methods developed by our lab.There are four components in mr MLM v4.0.2,including dataset input,parameter setting,software running,and result output.The fread function in data.table is used to quickly read datasets,especially big datasets,and the do Parallel package is used to conduct parallel computation using multiple CPUs.In addition,the graphical user interface software mr MLM.GUI v4.0.2,built upon Shiny,is also available.To confirm the correctness of the aforementioned programs,all the methods in mr MLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets.The results confirm the superior performance of mr MLM v4.0.2 to other methods currently available.False positive rates are effectively controlled,albeit with a less stringent significance threshold.mr MLM v4.0.2 is publicly available at Bio Code(https://bigd.big.ac.cn/biocode/tools/BT007077)or R(https://cran.r-project.org/web/packages/mr MLM.GUI/index.html)as an open-source software.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.32070557 and 32270673)the Huazhong Agricultural University Scientific&Technological Self-innovation Foundation,China(Grant No.2014RC020).
文摘Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits.However,most existing multilocus methods require relatively long computational time when analyzing large datasets.To address this issue,in this study,we proposed a fast mrMLM method,namely,best linear unbiased prediction multilocus random-SNP-effect mixed linear model(BLUPmrMLM).First,genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction(BLUP)values of marker effects and their variances in BLUPmrMLM.Then,adaptive best subset selection(ABESS)was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes.Finally,shared memory and parallel computing schemes were used to reduce the computational time.In simulation studies,BLUPmrMLM outperformed GEMMA,EMMAX,mrMLM,and FarmCPU as well as the control method(BLUPmrMLM with ABESS removed),in terms of computational time,power,accuracy for estimating quantitative trait nucleotide positions and effects,false positive rate,false discovery rate,false negative rate,and F1 score.In the reanalysis of two large rice datasets,BLUPmrMLM significantly reduced the computational time and identified more previously reported genes,compared with the aforementioned methods.This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets.The software mrMLM v5.1 is available at BioCode(https://ngdc.cncb.ac.cn/biocode/tool/BT007388)or GitHub(https://github.com/YuanmingZhang65/mrMLM).
基金supported by the National Natural Science Foundation of China(Grant Nos.31871242,U1602261,31701071,21873034,and 31571268)the Huazhong Agricultural University Scientific&Technological Self-innovation Foundation,China(Grant No.2014RC020)the State Key Laboratory of Cotton Biology Open Fund,China(Grant No.CB2019B01)
文摘Previous studies have reported that some important loci are missed in single-locus genome-wide association studies(GWAS),especially because of the large phenotypic error in field experiments.To solve this issue,multi-locus GWAS methods have been recommended.However,only a few software packages for multi-locus GWAS are available.Therefore,we developed an R software named mr MLM v4.0.2.This software integrates mr MLM,FASTmr MLM,FASTmr EMMA,p LARm EB,p KWm EB,and ISIS EM-BLASSO methods developed by our lab.There are four components in mr MLM v4.0.2,including dataset input,parameter setting,software running,and result output.The fread function in data.table is used to quickly read datasets,especially big datasets,and the do Parallel package is used to conduct parallel computation using multiple CPUs.In addition,the graphical user interface software mr MLM.GUI v4.0.2,built upon Shiny,is also available.To confirm the correctness of the aforementioned programs,all the methods in mr MLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets.The results confirm the superior performance of mr MLM v4.0.2 to other methods currently available.False positive rates are effectively controlled,albeit with a less stringent significance threshold.mr MLM v4.0.2 is publicly available at Bio Code(https://bigd.big.ac.cn/biocode/tools/BT007077)or R(https://cran.r-project.org/web/packages/mr MLM.GUI/index.html)as an open-source software.