Precisely defining and mapping all cytosine(C)positions and their clusters,known as CpG islands(CGIs),as well as their methylation status,are pivotal for genome-wide epigenetic studies,especially when population-centr...Precisely defining and mapping all cytosine(C)positions and their clusters,known as CpG islands(CGIs),as well as their methylation status,are pivotal for genome-wide epigenetic studies,especially when population-centric reference genomes are ready for timely application.Here,we first align the two high-quality reference genomes,T2T-YAO and T2T-CHM13,from different ethnic backgrounds in a base-by-base fashion and compute their genome-wide density-defined and position-defined CGIs.Second,by mapping some representative genome-wide methylation data from selected organs onto the two genomes,we find that there are about 4.7%–5.8%sequence divergency of variable categories depending on quality cutoffs.Genes among the divergent sequences are mostly associated with neurological functions.Moreover,CGIs associated with the divergent sequences are significantly different with respect to CpG density and observed CpG/expected CpG(O/E)ratio between the two genomes.Finally,we find that the T2T-YAO genome not only has a greater CpG coverage than that of the T2T-CHM13 genome when whole-genome bisulfite sequencing(WGBS)data from the European and American populations are mapped to each reference,but also shows more hyper-methylated CpG sites as compared to the T2T-CHM13 genome.Our study suggests that future genome-wide epigenetic studies of the Chinese populations rely on both acquisition of high-quality methylation data and subsequent precision CGI mapping based on the Chinese T2T reference.展开更多
Whole-exome sequencing(WES)data are frequently used for cancer diagnosis and genome-wide association studies(GWAS),based on high-coverage read mapping,informative variant calling,and high-quality reference genomes.The...Whole-exome sequencing(WES)data are frequently used for cancer diagnosis and genome-wide association studies(GWAS),based on high-coverage read mapping,informative variant calling,and high-quality reference genomes.The center position of the currently used genome assembly,GRCh38,is now challenged by two newly published telomere-to-telomere(T2T)genomes,T2T-CHM13 and T2T-YAO,and it becomes urgent to have a comparative study to test population specificity using the three reference genomes based on real case WES data.Here,we report our analysis along this line for 19 tumor samples collected from Chinese patients.The primary comparison of the exon regions among the three references reveals that the sequences in up to∼1%of target regions in T2T-YAO are widely diversified from GRCh38 and may lead to off-target in sequence capture.However,T2T-YAO still outperforms GRCh38 by obtaining 7.41%of more mapped reads.Due to more reliable read-mapping and closer phylogenetic relationship with the samples than GRCh38,T2T-YAO reduces half of variant calls of clinical significance which are mostly benign,while maintaining sensitivity in identifying pathogenic variants.T2T-YAO also outperforms T2T-CHM13 in reducing calls of Chinese-specific variants.Our findings highlight the critical need for employing population-specific reference genomes in genomic analysis to ensure accurate variant analysis and the significant benefits of tailoring these approaches to the unique genetic background of each ethnic group.展开更多
基金supported by grants from the National Science and Technology Major Project(Grant Nos.2021YFF1201200 and 2018ZX10201002)the National Natural Science Foundation of China(Grant No.62372316)+3 种基金the China Postdoctoral Science Foundation(Grant No.2020M673221)the Fundamental Research Funds for the Central Universities(Grant No.2020SCU12056)the Sichuan Science and Technology Program(Grant No.2022YFS0048)the Chongqing Technology Innovation and Application Development Project(Grant No.CSTB2022TIAD-KPX0067),China.
文摘Precisely defining and mapping all cytosine(C)positions and their clusters,known as CpG islands(CGIs),as well as their methylation status,are pivotal for genome-wide epigenetic studies,especially when population-centric reference genomes are ready for timely application.Here,we first align the two high-quality reference genomes,T2T-YAO and T2T-CHM13,from different ethnic backgrounds in a base-by-base fashion and compute their genome-wide density-defined and position-defined CGIs.Second,by mapping some representative genome-wide methylation data from selected organs onto the two genomes,we find that there are about 4.7%–5.8%sequence divergency of variable categories depending on quality cutoffs.Genes among the divergent sequences are mostly associated with neurological functions.Moreover,CGIs associated with the divergent sequences are significantly different with respect to CpG density and observed CpG/expected CpG(O/E)ratio between the two genomes.Finally,we find that the T2T-YAO genome not only has a greater CpG coverage than that of the T2T-CHM13 genome when whole-genome bisulfite sequencing(WGBS)data from the European and American populations are mapped to each reference,but also shows more hyper-methylated CpG sites as compared to the T2T-CHM13 genome.Our study suggests that future genome-wide epigenetic studies of the Chinese populations rely on both acquisition of high-quality methylation data and subsequent precision CGI mapping based on the Chinese T2T reference.
基金supported by grants from the National Key R&DäProgram of China(Grant No.2021YFC 2301000)the National Science Foundation of China(Grant No.32371537)+2 种基金the Linfen Soft Science Research Project(Grant No.2126)the National and Provincial Key Clinical Specialty Capacity Building Project 2020(Department of the Respiratory Medicine)the Peking University People's Hospital Scientific Research Development Funds(Grant No.RDGS2022-11),China.
文摘Whole-exome sequencing(WES)data are frequently used for cancer diagnosis and genome-wide association studies(GWAS),based on high-coverage read mapping,informative variant calling,and high-quality reference genomes.The center position of the currently used genome assembly,GRCh38,is now challenged by two newly published telomere-to-telomere(T2T)genomes,T2T-CHM13 and T2T-YAO,and it becomes urgent to have a comparative study to test population specificity using the three reference genomes based on real case WES data.Here,we report our analysis along this line for 19 tumor samples collected from Chinese patients.The primary comparison of the exon regions among the three references reveals that the sequences in up to∼1%of target regions in T2T-YAO are widely diversified from GRCh38 and may lead to off-target in sequence capture.However,T2T-YAO still outperforms GRCh38 by obtaining 7.41%of more mapped reads.Due to more reliable read-mapping and closer phylogenetic relationship with the samples than GRCh38,T2T-YAO reduces half of variant calls of clinical significance which are mostly benign,while maintaining sensitivity in identifying pathogenic variants.T2T-YAO also outperforms T2T-CHM13 in reducing calls of Chinese-specific variants.Our findings highlight the critical need for employing population-specific reference genomes in genomic analysis to ensure accurate variant analysis and the significant benefits of tailoring these approaches to the unique genetic background of each ethnic group.