Introduction:Genome sequence plays an important role in both basic and applied studies.Gossypium raimondii,the putative contributor of the D subgenome of upland cotton(G.hirsutum,highlights the need to improve the gen...Introduction:Genome sequence plays an important role in both basic and applied studies.Gossypium raimondii,the putative contributor of the D subgenome of upland cotton(G.hirsutum,highlights the need to improve the genome quality rapidly and efficiently.Methods:We performed Hi-C sequencing of G.raimondii and reassembled its genome based on a set of new Hi-C data and previously published scaffolds.We also compared the reassembled genome sequenee with the previously published G raimondii genomes for gene and genome sequence collinearity.Result:A total of 9842%of scaffold sequences were clustered successfully,among which 99.72%of the clustered sequences were ordered and 99.92%of the ordered sequences were oriented with high-quality.Further evaluation of results by heat-map and collinearity analysis revealed that the current reassembled genome is significantly improved than the previous one(Nat Genet 44:98-1103,2012).Conclusion:This improvement in G raimondii genome not only provides a better reference to increase study efficiency but also offers a new way to assemble cotton genomes.Furthermore,Hi-C data of G.raimondii may be used for 3D structure research or regulating analysis.展开更多
Long-range sequencing grants insight into additional genetic information beyond what can be accessed by both short reads and modern longread technology.Several new sequencing technologies,such as“Hi-C”and“Linked Re...Long-range sequencing grants insight into additional genetic information beyond what can be accessed by both short reads and modern longread technology.Several new sequencing technologies,such as“Hi-C”and“Linked Reads”,produce long-range datasets for high-throughput and high-resolution genome analyses,which are rapidly advancing the field of genome assembly,genome scaffolding,and more comprehensive variant identification.In this review,we focused on five major long-range sequencing technologies:high-throughput chromosome conformation capture(Hi-C),10X Genomics Linked Reads,haplotagging,transposase enzyme linked long-read sequencing(TELL-seq),and singletube long fragment read(stLFR).We detailed the mechanisms and data products of the five platforms and their important applications,evaluated the quality of sequencing data from different platforms,and discussed the currently available bioinformatics tools.This work will benefit the selection of appropriate long-range technology for specific biological studies.展开更多
Recent advances in high-throughput chromosome conformation capture(Hi-C)techniques have allowed us to map genome-wide chromatin interactions and uncover higher-order chromatin structures,thereby shedding light on the ...Recent advances in high-throughput chromosome conformation capture(Hi-C)techniques have allowed us to map genome-wide chromatin interactions and uncover higher-order chromatin structures,thereby shedding light on the principles of genome architecture and functions.However,statistical methods for detecting changes in large-scale chromatin organization such as topologically associating domains(TADs)are still lacking.Here,we proposed a new statistical method,DiffGR,for detecting differentially interacting genomic regions at the TAD level between Hi-C contact maps.We utilized the stratum-adjusted correlation coefficient to measure similarity of local TAD regions.We then developed a nonparametric approach to identify statistically significant changes of genomic interacting regions.Through simulation studies,we demonstrated that DiffGR can robustly and effectively discover differential genomic regions under various conditions.Furthermore,we successfully revealed cell type-specific changes in genomic interacting regions in both human and mouse Hi-C datasets,and illustrated that DiffGR yielded consistent and advantageous results compared with state-of-the-art differential TAD detection methods.The DiffGR R package is published under the GNU General Public License(GPL)≥2 license and is publicly available at https://github.com/wmalab/DiffGR.展开更多
Copy number variation(CNV)refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation.The development of the Hi-C technique has empowered research on the spatial s...Copy number variation(CNV)refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation.The development of the Hi-C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments.We utilized machine-learning methods including the linear transformation model and graph convolutional network(GCN)to detect CNV events from Hi-C data and reveal how CNV is related to three-dimensional interactions between genomic fragments in terms of the one-dimensional read count signal and features of the chromatin structure.The experimental results demonstrated a specific linear relation between the Hi-C read count and CNV for each chromosome that can be well qualified by the linear transformation model.In addition,the GCN-based model could accurately extract features of the spatial structure from Hi-C data and infer the corresponding CNV across different chromosomes in a cancer cell line.We performed a series of experiments including dimension reduction,transfer learning,and Hi-C data perturbation to comprehensively evaluate the utility and robustness of the GCN-based model.This work can provide a benchmark for using machine learning to infer CNV from Hi-C data and serves as a necessary foundation for deeper understanding of the relationship between Hi-C data and CNV.展开更多
文摘Introduction:Genome sequence plays an important role in both basic and applied studies.Gossypium raimondii,the putative contributor of the D subgenome of upland cotton(G.hirsutum,highlights the need to improve the genome quality rapidly and efficiently.Methods:We performed Hi-C sequencing of G.raimondii and reassembled its genome based on a set of new Hi-C data and previously published scaffolds.We also compared the reassembled genome sequenee with the previously published G raimondii genomes for gene and genome sequence collinearity.Result:A total of 9842%of scaffold sequences were clustered successfully,among which 99.72%of the clustered sequences were ordered and 99.92%of the ordered sequences were oriented with high-quality.Further evaluation of results by heat-map and collinearity analysis revealed that the current reassembled genome is significantly improved than the previous one(Nat Genet 44:98-1103,2012).Conclusion:This improvement in G raimondii genome not only provides a better reference to increase study efficiency but also offers a new way to assemble cotton genomes.Furthermore,Hi-C data of G.raimondii may be used for 3D structure research or regulating analysis.
基金supported by from the National Natural Science Foundation of China(Grant No.32070601)the Natural Science Fund for Excellent Young Scholars of Shandong Province,China(Grant No.ZR2022YQ23)supported by the Wellcome Trust,UK(Grant No.WT206194).
文摘Long-range sequencing grants insight into additional genetic information beyond what can be accessed by both short reads and modern longread technology.Several new sequencing technologies,such as“Hi-C”and“Linked Reads”,produce long-range datasets for high-throughput and high-resolution genome analyses,which are rapidly advancing the field of genome assembly,genome scaffolding,and more comprehensive variant identification.In this review,we focused on five major long-range sequencing technologies:high-throughput chromosome conformation capture(Hi-C),10X Genomics Linked Reads,haplotagging,transposase enzyme linked long-read sequencing(TELL-seq),and singletube long fragment read(stLFR).We detailed the mechanisms and data products of the five platforms and their important applications,evaluated the quality of sequencing data from different platforms,and discussed the currently available bioinformatics tools.This work will benefit the selection of appropriate long-range technology for specific biological studies.
基金supported by the National Science Foundation,USA(Grant No.DBI-1751317)the National Institute of Health,USA(Grant No.R35GM133678).
文摘Recent advances in high-throughput chromosome conformation capture(Hi-C)techniques have allowed us to map genome-wide chromatin interactions and uncover higher-order chromatin structures,thereby shedding light on the principles of genome architecture and functions.However,statistical methods for detecting changes in large-scale chromatin organization such as topologically associating domains(TADs)are still lacking.Here,we proposed a new statistical method,DiffGR,for detecting differentially interacting genomic regions at the TAD level between Hi-C contact maps.We utilized the stratum-adjusted correlation coefficient to measure similarity of local TAD regions.We then developed a nonparametric approach to identify statistically significant changes of genomic interacting regions.Through simulation studies,we demonstrated that DiffGR can robustly and effectively discover differential genomic regions under various conditions.Furthermore,we successfully revealed cell type-specific changes in genomic interacting regions in both human and mouse Hi-C datasets,and illustrated that DiffGR yielded consistent and advantageous results compared with state-of-the-art differential TAD detection methods.The DiffGR R package is published under the GNU General Public License(GPL)≥2 license and is publicly available at https://github.com/wmalab/DiffGR.
基金Beijing Natural Science Foundation,Grant/Award Number:5232025Beijing Nova Program,Grant/Award Number:20230484290National Natural Science Foundation of China,Grant/Award Numbers:62173338,61873276。
文摘Copy number variation(CNV)refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation.The development of the Hi-C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments.We utilized machine-learning methods including the linear transformation model and graph convolutional network(GCN)to detect CNV events from Hi-C data and reveal how CNV is related to three-dimensional interactions between genomic fragments in terms of the one-dimensional read count signal and features of the chromatin structure.The experimental results demonstrated a specific linear relation between the Hi-C read count and CNV for each chromosome that can be well qualified by the linear transformation model.In addition,the GCN-based model could accurately extract features of the spatial structure from Hi-C data and infer the corresponding CNV across different chromosomes in a cancer cell line.We performed a series of experiments including dimension reduction,transfer learning,and Hi-C data perturbation to comprehensively evaluate the utility and robustness of the GCN-based model.This work can provide a benchmark for using machine learning to infer CNV from Hi-C data and serves as a necessary foundation for deeper understanding of the relationship between Hi-C data and CNV.