Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we de...Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we developed a genome-wide scanning plus machine learning framework,integrated with advanced computational techniques,to propose a novel algorithm named Fast3VmrMLM.This algo-rithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence.The algorithm was extended to identify haplotype(Fast3VmrMLM-Hap)and molecular(Fast3VmrMLM-mQTL)variants.In simulation studies,Fast3VmrMLM outperformed existing methods in detecting dominant,small,and rare variants,requiring only 3.30 and 5.43 h(20 threads)to analyze the 18K rice and UK Biobank-scale datasets,respectively.Fast3VmrMLM identified more known(211)and candidate(384)genes for 14 traits in the 18K rice dataset than FarmCPU(100 known genes).Additionally,it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design;Fast3VmrMLM-mQTL identified two known soybean genes near structural variants.We demonstrated that this novel two-step framework outperformed genome-wide scanning alone.In breeding by design,a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits.All associated markers yielded high prediction accuracies in rice(0.7443)and maize(0.8492),en-abling the development of superior hybrid combinations.A new breeding-by-design strategy based on the identified key genes was also proposed.This study provides an effective method for gene mining and breeding by design.展开更多
基金supported by the National Natural Science Foundation of China,China(32470657 and 32270673).
文摘Genetic dissection and breeding by design for polygenic traits remain substantial challenges.To ad-dress these challenges,it is important to identify as many genes as possible,including key regulatory genes.Here,we developed a genome-wide scanning plus machine learning framework,integrated with advanced computational techniques,to propose a novel algorithm named Fast3VmrMLM.This algo-rithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence.The algorithm was extended to identify haplotype(Fast3VmrMLM-Hap)and molecular(Fast3VmrMLM-mQTL)variants.In simulation studies,Fast3VmrMLM outperformed existing methods in detecting dominant,small,and rare variants,requiring only 3.30 and 5.43 h(20 threads)to analyze the 18K rice and UK Biobank-scale datasets,respectively.Fast3VmrMLM identified more known(211)and candidate(384)genes for 14 traits in the 18K rice dataset than FarmCPU(100 known genes).Additionally,it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design;Fast3VmrMLM-mQTL identified two known soybean genes near structural variants.We demonstrated that this novel two-step framework outperformed genome-wide scanning alone.In breeding by design,a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits.All associated markers yielded high prediction accuracies in rice(0.7443)and maize(0.8492),en-abling the development of superior hybrid combinations.A new breeding-by-design strategy based on the identified key genes was also proposed.This study provides an effective method for gene mining and breeding by design.