Olfactory receptors are poorly annotated for most genome-sequenced chordates.To address this deficiency,we developed a nhmmer-based olfactory receptor annotation tool Genome2OR(https://github.com/To Hanwei/Genome2OR.g...Olfactory receptors are poorly annotated for most genome-sequenced chordates.To address this deficiency,we developed a nhmmer-based olfactory receptor annotation tool Genome2OR(https://github.com/To Hanwei/Genome2OR.git),and used it to process 1,695 sequenced chordate genomes in the NCBI Assembly database as of January,2021.In total,765,248 olfactory receptor genes were annotated,with 404,426 functional genes and 360,822 pseudogenes,which represents a four-fold increase in the number of annotated olfactory receptors.Based on the annotation data,we built a database called Chordata Olfactory Receptor Database(CORD,https://cord.ihuman.shanghaitech.edu.cn)for archiving,analysing and disseminating the data.Beyond the primary data,we offer derivative information,including pictures of species,cross references to public databases,structural models,sequence similarity networks and sequence profiles in the CORD.Furthermore,we did brief analyses on these receptors,including building a huge protein sequence similarity network covering all receptors in the database,and clustering them into 20 communities,classifying the 20 communities into three categories based on their presences/absences in ray-finned fish and/or lobe-finned fish.We infer that olfactory receptors should have unique activation and desensitization mechanisms by analysing their sequences and structural models.We believe the CORD can benefit the researchers and the general public who are interested in olfaction.展开更多
Juglans sigillata is an economically valuable nut crop renowned for its nutritional richness,including essential nutrients,antioxidants,and healthy fats,which boost human cardial,brain and gut health.Despite its impor...Juglans sigillata is an economically valuable nut crop renowned for its nutritional richness,including essential nutrients,antioxidants,and healthy fats,which boost human cardial,brain and gut health.Despite its importance,the lack of a complete genome assembly has been a stumbling block in its biological breeding process.Therefore,we generated deep coverage ultralong Oxford Nanopore Technology(ONT)and PacBio HiFi reads to construct a telomere-to-telomere(T2T)genome assembly.The final assembly spans 537.27 Mb with no gaps,demonstrating a remarkable completeness of 98.1%.We utilized a combination of transcriptome data and homologous proteins to annotate the genome,identifying 36018 protein-coding genes.Furthermore,we profiled global cytosine DNA methylations using ONT sequencing data.Global methylome analysis revealed high methylation levels in transposable element(TE)-rich chromosomal regions juxtaposed with comparatively lower methylation in gene-rich areas.By integrating a detailed multi-omics data analysis,we obtained valuable insights into the mechanism underlying endopleura coloration.This investigation led to the identification of eight candidate genes(e.g.ANR)involved in anthocyanin biosynthesis pathways,which are crucial for the development of color in plants.The comprehensive genome assembly and the understanding of the genetic basis of important traits like endopleura coloration will open avenues for more efficient breeding programs and improved crop quality.展开更多
The large yellow croaker(Larimichthys crocea)is a flagship marine fish in China given its extreme commercial value and golden-yellow coloration.However,the genetic mechanisms underlying golden-yellow coloration remain...The large yellow croaker(Larimichthys crocea)is a flagship marine fish in China given its extreme commercial value and golden-yellow coloration.However,the genetic mechanisms underlying golden-yellow coloration remain unclear.Here,we construct a telomere-to-telomere gap-free genome assembly(T2TLarcro_1.0)spanning 716.87 Mb,with a contig N50 of 31.75 Mb.Compared to the current reference genome(L_crocea_2.0),T2T-Larcro_1.0 incorporates 112.70 Mb of previously unassembled regions and 2368 newly anchored genes.This assembly facilitates comparative genomics analyses in sciaenids by identifying several candidate genes(e.g.,OPNVA,nNOS,RDH13)potentially involved in evolution of golden-yellow coloration.Transcriptomic analyses further confirm expression of OPNVA-encoded vertebrate ancient opsin(VA opsin)in skin tissues of the large yellow croaker,suggesting its role as an extraretinal photoreceptor regulating localized golden-yellow coloration.Integrating genomics and transcriptomics results,we uncover the triggering effect of VA opsin linking skin and neural photoreception to physiological regulation of body color change(golden-yellow to silvery-white)in L.crocea.Collectively,our findings provide molecular evidence that elucidate the underlying evolutionary mechanism of goldenyellow coloration in L.crocea.This high-quality genome assembly also serves as an improved resource for biological evolution,genetic improvement,and selective breeding of L.crocea.展开更多
The pursuit of complete telomere-to-telomere(T2T)genome assembly in plants,challenged by genomic complexity,has been advanced by Oxford Nanopore Technologies(ONT),which offers ultra-long,realtime sequencing.Despite it...The pursuit of complete telomere-to-telomere(T2T)genome assembly in plants,challenged by genomic complexity,has been advanced by Oxford Nanopore Technologies(ONT),which offers ultra-long,realtime sequencing.Despite its promise,sequencing length and gap filling remain significant challenges.This study optimized DNA extraction and library preparation,achieving DNA lengths exceeding 485 kb;average N50 read lengths of 80.57 kb,reaching up to 440 kb;and maximum reads of 5.83 Mb.Importantly,we demonstrated that combining ultra-long sequencing and adaptive sampling can effectively fill gaps during assembly,evidenced by successfully filling the remaining gaps of a near-complete Arabidopsis genome assembly and resolving the sequence of an unknown telomeric region in watermelon genome.Collectively,our strategies improve the feasibility of complete T2T genomic assemblies across various plant species,enhancing genome-based research in diverse fields.展开更多
Objective To confirm previous effort to identify type 2 diabetes susceptibility genes in a Northern Chinese population by conducting a new genome scan with both an increased number of type 2 diabetes families and a n...Objective To confirm previous effort to identify type 2 diabetes susceptibility genes in a Northern Chinese population by conducting a new genome scan with both an increased number of type 2 diabetes families and a new set of microsatellite markers within the previously localized regions.Methods A genome scan method was applied. After multiplexed PCR, electrophoreses, genescan and genotyping analysis, we obtained size information for all loci , and then a further study was done by both parametric and non-parametric linkage analysis to investigate the P values and Z values of these loci.Results We surveyed 34 microsatellite markers which distributed within 5 regions along chromosome 1, and a total of 12?000 genotypes were screened. Evidence of linkage with diabetes was identified for 8 of the 34 loci. All P values of the 8 loci were lower than 0.05, and the highest Z value was 2.17. A very interesting finding is that all 5 markers at the p- terminal 1p36.3-1p36.23 region, spanning a long range of 16.9?cM, were identified to have a low P value of less than 0.05, which suggests that this region may contain multiple susceptibility genes. Regions 4 and 5 also confirmed the previous findings, and we narrowed these two regions to a 2.7?cM and 2.5?cM regions, respectively.Conclusions We further confirmed the results gained in the previous genome-wide scan using an increased number of NIDDM families and a new set of microsatellite markers lying within the initially localized regions. The fact that all 5 loci at the p- terminal region displayed a low P value of less than 0.05 suggests that more than 1 susceptibility gene may reside in this region.展开更多
Plant genomes are so highly diverse that a substantial proportion of genomic sequences are not shared among individuals.The variable DNA sequences,along with the conserved core sequences,compose the more sophisticated...Plant genomes are so highly diverse that a substantial proportion of genomic sequences are not shared among individuals.The variable DNA sequences,along with the conserved core sequences,compose the more sophisticated pan-genome that represents the collection of all non-redundant DNA in a species.With rapid progress in genome sequencing technologies,pan-genome research in plants is now accelerating.Here we review recent advances in plant pan-genomics,including major driving forces of structural variations that constitute the variable sequences,methodological innovations for representing the pan-genome,and major successes in constructing plant pan-genomes.We also summarize recent efforts toward decoding the remaining dark matter in telomere-to-telomere or gapless plant genomes.These new genome resources,which have remarkable advantages over numerous previously assembled less-than-perfect genomes,are expected to become new references for genetic studies and plant breeding.展开更多
基金supported by the National Natural Science Foundation of China(32122024,31971178)the National Key Research and Development Programs of China(2018YFA0507000)ShanghaiTech University。
文摘Olfactory receptors are poorly annotated for most genome-sequenced chordates.To address this deficiency,we developed a nhmmer-based olfactory receptor annotation tool Genome2OR(https://github.com/To Hanwei/Genome2OR.git),and used it to process 1,695 sequenced chordate genomes in the NCBI Assembly database as of January,2021.In total,765,248 olfactory receptor genes were annotated,with 404,426 functional genes and 360,822 pseudogenes,which represents a four-fold increase in the number of annotated olfactory receptors.Based on the annotation data,we built a database called Chordata Olfactory Receptor Database(CORD,https://cord.ihuman.shanghaitech.edu.cn)for archiving,analysing and disseminating the data.Beyond the primary data,we offer derivative information,including pictures of species,cross references to public databases,structural models,sequence similarity networks and sequence profiles in the CORD.Furthermore,we did brief analyses on these receptors,including building a huge protein sequence similarity network covering all receptors in the database,and clustering them into 20 communities,classifying the 20 communities into three categories based on their presences/absences in ray-finned fish and/or lobe-finned fish.We infer that olfactory receptors should have unique activation and desensitization mechanisms by analysing their sequences and structural models.We believe the CORD can benefit the researchers and the general public who are interested in olfaction.
基金supported by the Yunnan Seed Laboratory,China(202205AR070001-15)the National Natural Science Foundation of China,China(Grant No.32160697)。
文摘Juglans sigillata is an economically valuable nut crop renowned for its nutritional richness,including essential nutrients,antioxidants,and healthy fats,which boost human cardial,brain and gut health.Despite its importance,the lack of a complete genome assembly has been a stumbling block in its biological breeding process.Therefore,we generated deep coverage ultralong Oxford Nanopore Technology(ONT)and PacBio HiFi reads to construct a telomere-to-telomere(T2T)genome assembly.The final assembly spans 537.27 Mb with no gaps,demonstrating a remarkable completeness of 98.1%.We utilized a combination of transcriptome data and homologous proteins to annotate the genome,identifying 36018 protein-coding genes.Furthermore,we profiled global cytosine DNA methylations using ONT sequencing data.Global methylome analysis revealed high methylation levels in transposable element(TE)-rich chromosomal regions juxtaposed with comparatively lower methylation in gene-rich areas.By integrating a detailed multi-omics data analysis,we obtained valuable insights into the mechanism underlying endopleura coloration.This investigation led to the identification of eight candidate genes(e.g.ANR)involved in anthocyanin biosynthesis pathways,which are crucial for the development of color in plants.The comprehensive genome assembly and the understanding of the genetic basis of important traits like endopleura coloration will open avenues for more efficient breeding programs and improved crop quality.
基金supported by the National Key Research and Development Program of China(2023YFD2401901)the Fundamental Research Funds for Zhejiang Provincial Universities and Research Institutes(2024J004)。
文摘The large yellow croaker(Larimichthys crocea)is a flagship marine fish in China given its extreme commercial value and golden-yellow coloration.However,the genetic mechanisms underlying golden-yellow coloration remain unclear.Here,we construct a telomere-to-telomere gap-free genome assembly(T2TLarcro_1.0)spanning 716.87 Mb,with a contig N50 of 31.75 Mb.Compared to the current reference genome(L_crocea_2.0),T2T-Larcro_1.0 incorporates 112.70 Mb of previously unassembled regions and 2368 newly anchored genes.This assembly facilitates comparative genomics analyses in sciaenids by identifying several candidate genes(e.g.,OPNVA,nNOS,RDH13)potentially involved in evolution of golden-yellow coloration.Transcriptomic analyses further confirm expression of OPNVA-encoded vertebrate ancient opsin(VA opsin)in skin tissues of the large yellow croaker,suggesting its role as an extraretinal photoreceptor regulating localized golden-yellow coloration.Integrating genomics and transcriptomics results,we uncover the triggering effect of VA opsin linking skin and neural photoreception to physiological regulation of body color change(golden-yellow to silvery-white)in L.crocea.Collectively,our findings provide molecular evidence that elucidate the underlying evolutionary mechanism of goldenyellow coloration in L.crocea.This high-quality genome assembly also serves as an improved resource for biological evolution,genetic improvement,and selective breeding of L.crocea.
基金supported by the Key R&D Program of Shandong Province,China(grant no.ZR202211070163)the National Natural Science Foundation of China(grant nos.32170574 and 32200249)+1 种基金the Natural Science Foundation of Shandong Province(grant nos.ZR2023QC026 and ZR2023QC106)the Young Taishan Scholars Program and Yuandu Scholars Program.
文摘The pursuit of complete telomere-to-telomere(T2T)genome assembly in plants,challenged by genomic complexity,has been advanced by Oxford Nanopore Technologies(ONT),which offers ultra-long,realtime sequencing.Despite its promise,sequencing length and gap filling remain significant challenges.This study optimized DNA extraction and library preparation,achieving DNA lengths exceeding 485 kb;average N50 read lengths of 80.57 kb,reaching up to 440 kb;and maximum reads of 5.83 Mb.Importantly,we demonstrated that combining ultra-long sequencing and adaptive sampling can effectively fill gaps during assembly,evidenced by successfully filling the remaining gaps of a near-complete Arabidopsis genome assembly and resolving the sequence of an unknown telomeric region in watermelon genome.Collectively,our strategies improve the feasibility of complete T2T genomic assemblies across various plant species,enhancing genome-based research in diverse fields.
基金ThisworkwassupportedbytheNationalNaturalSciencesFoundationofChina (No .398962 0 0 ) theNationalHighTechnologyResearchandDevelopmentProgram (No .10 2 10 0 2 0 2 ) theNationalProgramforKeyBasicResearchProject (No .G19980 5 10 16)
文摘Objective To confirm previous effort to identify type 2 diabetes susceptibility genes in a Northern Chinese population by conducting a new genome scan with both an increased number of type 2 diabetes families and a new set of microsatellite markers within the previously localized regions.Methods A genome scan method was applied. After multiplexed PCR, electrophoreses, genescan and genotyping analysis, we obtained size information for all loci , and then a further study was done by both parametric and non-parametric linkage analysis to investigate the P values and Z values of these loci.Results We surveyed 34 microsatellite markers which distributed within 5 regions along chromosome 1, and a total of 12?000 genotypes were screened. Evidence of linkage with diabetes was identified for 8 of the 34 loci. All P values of the 8 loci were lower than 0.05, and the highest Z value was 2.17. A very interesting finding is that all 5 markers at the p- terminal 1p36.3-1p36.23 region, spanning a long range of 16.9?cM, were identified to have a low P value of less than 0.05, which suggests that this region may contain multiple susceptibility genes. Regions 4 and 5 also confirmed the previous findings, and we narrowed these two regions to a 2.7?cM and 2.5?cM regions, respectively.Conclusions We further confirmed the results gained in the previous genome-wide scan using an increased number of NIDDM families and a new set of microsatellite markers lying within the initially localized regions. The fact that all 5 loci at the p- terminal region displayed a low P value of less than 0.05 suggests that more than 1 susceptibility gene may reside in this region.
基金National Natural Science Foundation of China(31825015 to X.H.31901596 to J.S.)Young Elite Scientists Sponsorship Program by CAST(2021QNRC001 toJ.S.).
文摘Plant genomes are so highly diverse that a substantial proportion of genomic sequences are not shared among individuals.The variable DNA sequences,along with the conserved core sequences,compose the more sophisticated pan-genome that represents the collection of all non-redundant DNA in a species.With rapid progress in genome sequencing technologies,pan-genome research in plants is now accelerating.Here we review recent advances in plant pan-genomics,including major driving forces of structural variations that constitute the variable sequences,methodological innovations for representing the pan-genome,and major successes in constructing plant pan-genomes.We also summarize recent efforts toward decoding the remaining dark matter in telomere-to-telomere or gapless plant genomes.These new genome resources,which have remarkable advantages over numerous previously assembled less-than-perfect genomes,are expected to become new references for genetic studies and plant breeding.