Current methods used in genome-wide association studies frequently lack power owing to their inability to detect heterogeneous associations and rare and multiallelic variants.To address these issues,quantile regressio...Current methods used in genome-wide association studies frequently lack power owing to their inability to detect heterogeneous associations and rare and multiallelic variants.To address these issues,quantile regression is integrated with a three(compressed)variance component multi-locus random-SNP-effect mixed linear model(3VmrMLM)to propose q3VmrMLM for detecting heterogeneous quantitative trait nucleotides(QTNs)and QTN-by-environment interactions(QEIs),and then design haplotype-based q3VmrMLM(q3VmrMLM-Hap)for identifying multiallelic haplotypes and rare variants.In Monte Carlo simulation studies,q3VmrMLM had higher power than 3VmrMLM,sequence kernel association test(SKAT),and integrated quantile rank test(iQRAT).In a re-analysis of 10 traits in 1439 rice hybrids,261 known genes were identified only by q3VmrMLM and q3VmrMLM-Hap,whereas 175 known genes were detected by both the new and existing methods.Of all the significant QTNs with known genes,q3VmrMLM(179:140 variance heterogeneity and 157 quantile effect heterogeneity)found more heterogeneous QTNs than 3VmrMLM(123),SKAT(27),and iQRAT(29);q3VmrMLM-Hap(121)mapped more lowfrequency(<0.05)QTNs than q3VmrMLM(51),3VmrMLM(43),SKAT(11),and iQRAT(12);and q3VmrMLM-Hap(12),q3VmrMLM(16),and 3VmrMLM(12)had similar power in identifying gene-by-environment interactions.All significant and suggested QTNs achieved the highest predictive accuracy(r=0.9045).In conclusion,this study describes a new and complementary approach to mining genes and unraveling the genetic architecture of complex traits in crops.展开更多
Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we de...Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we developed a genome-wide scanning plus machine learning framework,integrated with advanced computational techniques,to propose a novel algorithm named Fast3VmrMLM.This algo-rithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence.The algorithm was extended to identify haplotype(Fast3VmrMLM-Hap)and molecular(Fast3VmrMLM-mQTL)variants.In simulation studies,Fast3VmrMLM outperformed existing methods in detecting dominant,small,and rare variants,requiring only 3.30 and 5.43 h(20 threads)to analyze the 18K rice and UK Biobank-scale datasets,respectively.Fast3VmrMLM identified more known(211)and candidate(384)genes for 14 traits in the 18K rice dataset than FarmCPU(100 known genes).Additionally,it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design;Fast3VmrMLM-mQTL identified two known soybean genes near structural variants.We demonstrated that this novel two-step framework outperformed genome-wide scanning alone.In breeding by design,a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits.All associated markers yielded high prediction accuracies in rice(0.7443)and maize(0.8492),en-abling the development of superior hybrid combinations.A new breeding-by-design strategy based on the identified key genes was also proposed.This study provides an effective method for gene mining and breeding by design.展开更多
基金supported by the National Natural Science Foundation of China(32070557,32470657,and 32270673).
文摘Current methods used in genome-wide association studies frequently lack power owing to their inability to detect heterogeneous associations and rare and multiallelic variants.To address these issues,quantile regression is integrated with a three(compressed)variance component multi-locus random-SNP-effect mixed linear model(3VmrMLM)to propose q3VmrMLM for detecting heterogeneous quantitative trait nucleotides(QTNs)and QTN-by-environment interactions(QEIs),and then design haplotype-based q3VmrMLM(q3VmrMLM-Hap)for identifying multiallelic haplotypes and rare variants.In Monte Carlo simulation studies,q3VmrMLM had higher power than 3VmrMLM,sequence kernel association test(SKAT),and integrated quantile rank test(iQRAT).In a re-analysis of 10 traits in 1439 rice hybrids,261 known genes were identified only by q3VmrMLM and q3VmrMLM-Hap,whereas 175 known genes were detected by both the new and existing methods.Of all the significant QTNs with known genes,q3VmrMLM(179:140 variance heterogeneity and 157 quantile effect heterogeneity)found more heterogeneous QTNs than 3VmrMLM(123),SKAT(27),and iQRAT(29);q3VmrMLM-Hap(121)mapped more lowfrequency(<0.05)QTNs than q3VmrMLM(51),3VmrMLM(43),SKAT(11),and iQRAT(12);and q3VmrMLM-Hap(12),q3VmrMLM(16),and 3VmrMLM(12)had similar power in identifying gene-by-environment interactions.All significant and suggested QTNs achieved the highest predictive accuracy(r=0.9045).In conclusion,this study describes a new and complementary approach to mining genes and unraveling the genetic architecture of complex traits in crops.
基金supported by the National Natural Science Foundation of China,China(32470657 and 32270673).
文摘Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we developed a genome-wide scanning plus machine learning framework,integrated with advanced computational techniques,to propose a novel algorithm named Fast3VmrMLM.This algo-rithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence.The algorithm was extended to identify haplotype(Fast3VmrMLM-Hap)and molecular(Fast3VmrMLM-mQTL)variants.In simulation studies,Fast3VmrMLM outperformed existing methods in detecting dominant,small,and rare variants,requiring only 3.30 and 5.43 h(20 threads)to analyze the 18K rice and UK Biobank-scale datasets,respectively.Fast3VmrMLM identified more known(211)and candidate(384)genes for 14 traits in the 18K rice dataset than FarmCPU(100 known genes).Additionally,it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design;Fast3VmrMLM-mQTL identified two known soybean genes near structural variants.We demonstrated that this novel two-step framework outperformed genome-wide scanning alone.In breeding by design,a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits.All associated markers yielded high prediction accuracies in rice(0.7443)and maize(0.8492),en-abling the development of superior hybrid combinations.A new breeding-by-design strategy based on the identified key genes was also proposed.This study provides an effective method for gene mining and breeding by design.