Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of...Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of wild potatoes,landraces,and modern cultivars were re-sequenced at>253 depth.Subsequently,a graph-based pangenome was constructed using DM8.1 as the backbone,integrating194,330 nonredundant structural variants.To characterize the metabolome of tubers and illuminate the genomic basis of metabolic traits,LC-MS/MS was employed to obtain the metabolome of 157 accessions,and 9,321 structural variants(SVs)were detected to be significantly associated with 1,258 distinct metabolites via PAV(presence and absence variations)-based metabolomics-GWAS analysis,including metabolites of flavonoids,phenolic acids,and phospholipids.To facilitate the utilization of pangenome resources,a comprehensive platform,the Potato Pangenome Database(PPDB),was developed.Our study provides a comprehensive genomic resource for dissecting the genomic basis of agronomic and metabolic traits in potato,which will accelerate functional genomics studies and genetic improvements in potato.展开更多
Background India harbors the world’s largest cattle population,encompassing over 50 distinct Bos indicus breeds.This rich genetic diversity underscores the inadequacy of a single reference genome to fully capture the...Background India harbors the world’s largest cattle population,encompassing over 50 distinct Bos indicus breeds.This rich genetic diversity underscores the inadequacy of a single reference genome to fully capture the genomic landscape of Indian cattle.To comprehensively characterize the genomic variation within Bos indicus and,specifically,dairy breeds,we aim to identify non-reference sequences and construct a comprehensive pangenome.Results Five representative genomes of prominent dairy breeds,including Gir,Kankrej,Tharparkar,Sahiwal,and Red Sindhi,were sequenced using 10X Genomics‘linked-read’technology.Assemblies generated from these linked-reads ranged from 2.70 Gb to 2.77 Gb,comparable to the Bos indicus Brahman reference genome.A pangenome of Bos indicus cattle was constructed by comparing the newly assembled genomes with the reference using alignment and graph-based methods,revealing 8 Mb and 17.7 Mb of novel sequence respectively.A confident set of 6,844 Non-reference Unique Insertions(NUIs)spanning 7.57 Mb was identified through both methods,representing the pange-nome of Indian Bos indicus breeds.Comparative analysis with previously published pangenomes unveiled 2.8 Mb(37%)commonality with the Chinese indicine pangenome and only 1%commonality with the Bos taurus pange-nome.Among these,2,312 NUIs encompassing~2 Mb,were commonly found in 98 samples of the 5 breeds and des-ignated as Bos indicus Common Insertions(BICIs)in the population.Furthermore,926 BICIs were identified within 682 protein-coding genes,54 long non-coding RNAs(lncRNA),and 18 pseudogenes.These protein-coding genes were enriched for functions such as chemical synaptic transmission,cell junction organization,cell-cell adhesion,and cell morphogenesis.The protein-coding genes were found in various prominent quantitative trait locus(QTL)regions,suggesting potential roles of BICIs in traits related to milk production,reproduction,exterior,health,meat,and carcass.Notably,63.21%of the bases within the BICIs call set contained interspersed repeats,predominantly Long Inter-spersed Nuclear Elements(LINEs).Additionally,70.28%of BICIs are shared with other domesticated and wild species,highlighting their evolutionary significance.Conclusions This is the first report unveiling a robust set of NUIs defining the pangenome of Bos indicus breeds of India.The analyses contribute valuable insights into the genomic landscape of desi cattle breeds.展开更多
Increasing number of structural variations(SVs)have been identified as causative mutations for diverse agronomic traits.However,the systematic exploration of SVs quantity,distribution,and contribution in wheat was lac...Increasing number of structural variations(SVs)have been identified as causative mutations for diverse agronomic traits.However,the systematic exploration of SVs quantity,distribution,and contribution in wheat was lacking.Here,we report high-quality gene-based and SV-based pangenomes comprising 22 hexaploid wheat assemblies showing a wide range of chromosome size,gene number,and TE component,which indicates their representativeness of wheat genetic diversity.Pan-gene analyses uncover 140,261 distinct gene families,of which only 23.2%are shared in all accessions.Moreover,we build a∼16.15 Gb graph pangenome containing 695,897 bubbles,intersecting 5132 genes and 230,307 cis-regulatory regions.Pairwise genome comparisons identify∼1,978,221 non-redundant SVs and 497 SV hotspots.Notably,the density of bubbles as well as SVs show remarkable aggregation in centromeres,which probably play an important role in chromosome plasticity and stability.As for functional SVs exploration,we identify 2769 SVs with absolute relative frequency differences exceeding 0.7 between spring and winter growth habit groups.Additionally,several reported functional genes in wheat display complex structural graphs,for example,PPD-A1,VRT-A2,and TaNAAT2-A.These findings deepen our understanding of wheat genetic diversity,providing valuable graphical pangenome and variation resources to improve the efficiency of genome-wide association mapping in wheat.展开更多
Owing to the constraints of depth sensing technology,images acquired by depth cameras are inevitably mixed with various noises.For depth maps presented in gray values,this research proposes a novel denoising model,ter...Owing to the constraints of depth sensing technology,images acquired by depth cameras are inevitably mixed with various noises.For depth maps presented in gray values,this research proposes a novel denoising model,termed graph-based transform(GBT)and dual graph Laplacian regularization(DGLR)(DGLR-GBT).This model specifically aims to remove Gaussian white noise by capitalizing on the nonlocal self-similarity(NSS)and the piecewise smoothness properties intrinsic to depth maps.Within the group sparse coding(GSC)framework,a combination of GBT and DGLR is implemented.Firstly,within each group,the graph is constructed by using estimates of the true values of the averaged blocks instead of the observations.Secondly,the graph Laplacian regular terms are constructed based on rows and columns of similar block groups,respectively.Lastly,the solution is obtained effectively by combining the alternating direction multiplication method(ADMM)with the weighted thresholding method within the domain of GBT.展开更多
In this study, we used the multi-resolution graph-based clustering (MRGC) method for determining the electrofacies (EF) and lithofacies (LF) from well log data obtained from the intraplatform bank gas fields loc...In this study, we used the multi-resolution graph-based clustering (MRGC) method for determining the electrofacies (EF) and lithofacies (LF) from well log data obtained from the intraplatform bank gas fields located in the Amu Darya Basin. The MRGC could automatically determine the optimal number of clusters without prior knowledge about the structure or cluster numbers of the analyzed data set and allowed the users to control the level of detail actually needed to define the EF. Based on the LF identification and successful EF calibration using core data, an MRGC EF partition model including five clusters and a quantitative LF interpretation chart were constructed. The EF clusters 1 to 5 were interpreted as lagoon, anhydrite flat, interbank, low-energy bank, and high-energy bank, and the coincidence rate in the cored interval could reach 85%. We concluded that the MRGC could be accurately applied to predict the LF in non-cored but logged wells. Therefore, continuous EF clusters were partitioned and corresponding LF were characteristics &different LF were analyzed interpreted, and the distribution and petrophysical in the framework of sequence stratigraphy.展开更多
As large-scale genomic studies have progressed,it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level.While domestic animals tend to have complex routes of ...As large-scale genomic studies have progressed,it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level.While domestic animals tend to have complex routes of origin and migration,suggesting a possible omission of some population-specific sequences in the current reference genome.Conversely,the pangenome is a collection of all DNA sequences of a species that contains sequences shared by all individuals(core genome)and is also able to display sequence information unique to each individual(variable genome).The progress of pangenome research in humans,plants and domestic animals has proved that the missing genetic components and the identification of large structural variants(SVs)can be explored through pangenomic studies.Many individual specific sequences have been shown to be related to biological adaptability,phenotype and important economic traits.The maturity of technologies and methods such as third-generation sequencing,Tel-omere-to-telomere genomes,graphic genomes,and reference-free assembly will further promote the development of pangenome.In the future,pangenome combined with long-read data and multi-omics will help to resolve large SVs and their relationship with the main economic traits of interest in domesticated animals,providing better insights into animal domestication,evolution and breeding.In this review,we mainly discuss how pangenome analysis reveals genetic variations in domestic animals(sheep,cattle,pigs,chickens)and their impacts on phenotypes and how this can contribute to the understanding of species diversity.Additionally,we also go through potential issues and the future perspectives of pangenome research in livestock and poultry.展开更多
A better understanding of the relationship between the structure and functions of urban and suburban spaces is one of the avenues of research still open for geographical information science.The research presented in t...A better understanding of the relationship between the structure and functions of urban and suburban spaces is one of the avenues of research still open for geographical information science.The research presented in this paper develops several graph-based metrics whose objective is to characterize some local and global structural properties that reflect the way the overall building layout can be cross-related to the one of the road layout.Such structural properties are modeled as an aggregation of parcels,buildings,and road networks.We introduce several computational measures(Ratio Minimum Distance,Minimum Ratio Minimum Distance,and Metric Compactness)that respectively evaluate the capability for a given road to be connected with the whole road network.These measures reveal emerging sub-network structures and point out differences between less-connective and moreconnective parts of the network.Based on these local and global properties derived from the topological and graph-based representation,and on building density metrics,this paper proposes an analysis of road and building layouts at different levels of granularity.The metrics developed are applied to a case study in which the derived properties reveal coherent as well as incoherent neighborhoods that illustrate the potential of the approach and the way buildings and roads can be relatively connected in a given urban environment.Overall,and by integrating the parcels and buildings layouts,this approach complements other previous and related works that mainly retain the configurational structure of the urban network as well as morphological studies whose focus is generally limited to the analysis of the building layout.展开更多
Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non...Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non-reference sequences(NRSs),which have not been extensively studied.Results In this study,we constructed a pig pangenome graph using 21 pig assemblies and identified 23,831 NRSs with a total length of 105 Mb.Our findings revealed that NRSs were more prevalent in breeds exhibiting greater genetic divergence from the reference genome.Furthermore,we observed that NRSs were rarely found within coding sequences,while NRS insertions were enriched in immune-related Gene Ontology terms.Notably,our investigation also unveiled a close association between novel genes and the immune capacity of pigs.We observed substantial differences in terms of frequencies of NRSs between Eastern and Western pigs,and the heat-resistant pigs exhibited a substantial number of NRS insertions in an 11.6 Mb interval on chromosome X.Additionally,we discovered a 665 bp insertion in the fourth intron of the TNFRSF19 gene that may be associated with the ability of heat tolerance in South-ern Chinese pigs.Conclusions Our findings demonstrate the potential of a graph genome approach to reveal important functional features of NRSs in pig populations.展开更多
Simultaneous localization and mapping(SLAM)is widely used in many robot applications to acquire the unknown environment's map and the robots location.Graph-based SLAM is demonstrated to be effective in large-scale...Simultaneous localization and mapping(SLAM)is widely used in many robot applications to acquire the unknown environment's map and the robots location.Graph-based SLAM is demonstrated to be effective in large-scale scenarios,and it intuitively performs the SLAM as a pose graph.But because of the high data overlap rate,traditional graph-based SLAM is not efficient in some respects,such as real time performance and memory usage.To reduce1 data overlap rate,a graph-based SLAM with distributed submap strategy(DSS)is presented.In its front-end,submap based scan matching is processed and loop closing detection is conducted.Moreover in its back-end,pose graph is updated for global optimization and submap merging.From a series of experiments,it is demonstrated that graph-based SLAM with DSS reduces 51.79%data overlap rate,decreases 39.70%runtime and 24.60%memory usage.The advantages over other low overlap rate method is also proved in runtime,memory usage,accuracy and robustness performance.展开更多
Background The genetic diversity of yak,a key domestic animal on the Qinghai-Tibetan Plateau(QTP),is a vital resource for domestication and breeding efforts.This study presents the first yak pangenome obtained through...Background The genetic diversity of yak,a key domestic animal on the Qinghai-Tibetan Plateau(QTP),is a vital resource for domestication and breeding efforts.This study presents the first yak pangenome obtained through the de novo assembly of 16 yak genomes.Results We discovered 290 Mb of nonreference sequences and 504 new genes.Our pangenome-wide presence and absence variation(PAV)analysis revealed 5,120 PAV-related genes,highlighting a wide range of variety-specific genes and genes with varying frequencies across yak populations.Principal component analysis(PCA)based on binary gene PAV data classified yaks into three new groups:wild,domestic,and Jinchuan.Moreover,we pro-posed a‘two-haplotype genomic hybridization model'for understanding the hybridization patterns among breeds by integrating gene frequency,heterozygosity,and gene PAV data.A gene PAV-GWAS identified a novel gene(Bos-Gru3G009179)that may be associated with the multirib trait in Jinchuan yaks.Furthermore,an integrated transcrip-tome and pangenome analysis highlighted the significant differences in the expression of core genes and the muta-tional burden of differentially expressed genes between yaks from high and low altitudes.Transcriptome analysis across multiple species revealed that yaks have the most unique differentially expressed m RNAs and lnc RNAs(between high-and low-altitude regions),especially in the heart and lungs,when comparing high-and low-altitude adaptations.Conclusions The yak pangenome offers a comprehensive resource and new insights for functional genomic studies,supporting future biological research and breeding strategies.展开更多
The number of botnet malware attacks on Internet devices has grown at an equivalent rate to the number of Internet devices that are connected to the Internet.Bot detection using machine learning(ML)with flow-based fea...The number of botnet malware attacks on Internet devices has grown at an equivalent rate to the number of Internet devices that are connected to the Internet.Bot detection using machine learning(ML)with flow-based features has been extensively studied in the literature.Existing flow-based detection methods involve significant computational overhead that does not completely capture network communication patterns that might reveal other features ofmalicious hosts.Recently,Graph-Based Bot Detection methods using ML have gained attention to overcome these limitations,as graphs provide a real representation of network communications.The purpose of this study is to build a botnet malware detection system utilizing centrality measures for graph-based botnet detection and ML.We propose BotSward,a graph-based bot detection system that is based on ML.We apply the efficient centrality measures,which are Closeness Centrality(CC),Degree Centrality(CC),and PageRank(PR),and compare them with others used in the state-of-the-art.The efficiency of the proposed method is verified on the available Czech Technical University 13 dataset(CTU-13).The CTU-13 dataset contains 13 real botnet traffic scenarios that are connected to a command-and-control(C&C)channel and that cause malicious actions such as phishing,distributed denial-of-service(DDoS)attacks,spam attacks,etc.BotSward is robust to zero-day attacks,suitable for large-scale datasets,and is intended to produce better accuracy than state-of-the-art techniques.The proposed BotSward solution achieved 99%accuracy in botnet attack detection with a false positive rate as low as 0.0001%.展开更多
Maximizing network lifetime is measured as the primary issue in Mobile Ad-hoc Networks(MANETs).In geographically routing based models,packet transmission seems to be more appropriate in dense circumstances.The involve...Maximizing network lifetime is measured as the primary issue in Mobile Ad-hoc Networks(MANETs).In geographically routing based models,packet transmission seems to be more appropriate in dense circumstances.The involvement of the Heuristic model directly is not appropriate to offer an effectual solution as it becomes NP-hard issues;therefore investigators concentrate on using Meta-heuristic approaches.Dragonfly Optimization(DFO)is an effective meta-heuristic approach to resolve these problems by providing optimal solutions.Moreover,Meta-heuristic approaches(DFO)turn to be slower in convergence problems and need proper computational time while expanding network size.Thus,DFO is adaptively improved as Adaptive Dragonfly Optimization(ADFO)to fit this model and re-formulated using graph-based m-connection establishment(G-𝑚𝑚CE)to overcome computational time and DFO’s convergence based problems,considerably enhancing DFO performance.In(G-𝑚𝑚CE),Connectivity Zone(CZ)is chosen among source to destination in which optimality should be under those connected regions and ADFO is used for effective route establishment in CZ indeed of complete networking model.To measure complementary features of ADFO and(G-𝑚𝑚CE),hybridization of DFO-(G-𝑚𝑚CE)is anticipated over dense circumstances with reduced energy consumption and delay to enhance network lifetime.The simulation was performed in MATLAB environment.展开更多
Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to bes...Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.展开更多
Many cutting-edge methods are now possible in real-time commercial settings and are growing in popularity on cloud platforms.By incorporating new,cutting-edge technologies to a larger extent without using more infrast...Many cutting-edge methods are now possible in real-time commercial settings and are growing in popularity on cloud platforms.By incorporating new,cutting-edge technologies to a larger extent without using more infrastructures,the information technology platform is anticipating a completely new level of devel-opment.The following concepts are proposed in this research paper:1)A reliable authentication method Data replication that is optimised;graph-based data encryp-tion and packing colouring in Redundant Array of Independent Disks(RAID)sto-rage.At the data centre,data is encrypted using crypto keys called Key Streams.These keys are produced using the packing colouring method in the web graph’s jump graph.In order to achieve space efficiency,the replication is carried out on optimised many servers employing packing colours.It would be thought that more connections would provide better authentication.This study provides an innovative architecture with robust security,enhanced authentication,and low cost.展开更多
Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream appl...Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics.However,existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs,particularly in polyploid genomes.Here,we introduce Varigraph,an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants.We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes.Vari-graph outperforms current state-of-the-art linear and graph-based genotypers across non-human ge-nomes while maintaining comparable genotyping performance in human genomes.By employing efficient data structures including counting Bloom filter and bitmap storage,as well as GPU models,Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets.Its wide applicability extends to highly repetitive or large genomes,such as those of maize and wheat.Significantly,Varigraph can handle extensive pangenome graphs,as demonstrated by its performance on a dataset containing 252 rice genomes,for which it achieved a precision exceeding 0.9 for both small and large variants.Notably,Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids,enabling precise determination of allele dosage.In summary,this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studiesandgenomics-assistedbreeding.展开更多
Innovations in DNA sequencing technologies have greatly boosted population-level genomic studies in plants,facilitating the identification of key genetic variations for investigating population diversity and accelerat...Innovations in DNA sequencing technologies have greatly boosted population-level genomic studies in plants,facilitating the identification of key genetic variations for investigating population diversity and accelerating the molecular breeding of crops.Conventional methods for genomic analysis typically rely on small variants,such as SNPs and indels,and use single linear reference genomes,which introduces biases and reduces performance in highly divergent genomic regions.By integrating the population level of sequences,pangenomes,particularly graph pangenomes,offer a promising solution to these challenges.To date,numerous algorithms have been developed for constructing pangenome graphs,aligning reads to these graphs,and performing variant genotyping based on these graphs.As demonstrated in various plant pangenomic studies,these advancements allow for the detection of previously hidden variants,especially structural variants,thereby enhancing applications such as genetic mapping of agronomically important genes.However,noteworthy challenges remain to be overcome in applying pangenome graph approaches to plants.Addressing these issues will require the development of more sophisticated algorithms tailored specifically to plants.Such improvements will contribute to the scalability of this approach,facilitating the production of super-pangenomes,in which hundreds or even thousands of de novo–assembled genomes from one species or genus can be integrated.This,in turn,will promote broader pan-omic studies,further advancing our understanding of genetic diversity and driving innovations in crop breeding.展开更多
Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced ...Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced seven B.thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform.The B.thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added.Compared to the pangenomes of its closely related species of the same genus,B.thuringiensis pangenome shows an open characteristic,similar to B.cereus but not to B.anthracis;the latter has a closed pangenome. We also found extensive divergence among the seven B.thuringiensis genome assemblies,which harbor ample repeats and single nucleotide polymorphisms(SNPs).The identities among orthologous genes are greater than 84.5%and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8 Mb and 5.0-5.6 Mb.We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed,are very useful for pangenomic studies.展开更多
基金supported by the National Natural Science Foundation of China(31972969)the Science and Technology Department of Yunnan Province(2019FY003015)+4 种基金Research Startup Funding of Yunnan University in China(C176220100033)the Science and Technology Major Project of the Department of Science and Technology of Yunnan(K204204210017)Yunnan Fundamental Research Projects(202301BF070001-026)the Project of Central Guiding Local Technology Development(202407AB110005)the Yunnan Talent Support Plan(C619300A036)。
文摘Potato is the world’s most important nongrain crop.In this study,to assess genetic diversity within the Petota section,29 genomes from Petota and Etuberosum sections were newly de novo assembled and 248 accessions of wild potatoes,landraces,and modern cultivars were re-sequenced at>253 depth.Subsequently,a graph-based pangenome was constructed using DM8.1 as the backbone,integrating194,330 nonredundant structural variants.To characterize the metabolome of tubers and illuminate the genomic basis of metabolic traits,LC-MS/MS was employed to obtain the metabolome of 157 accessions,and 9,321 structural variants(SVs)were detected to be significantly associated with 1,258 distinct metabolites via PAV(presence and absence variations)-based metabolomics-GWAS analysis,including metabolites of flavonoids,phenolic acids,and phospholipids.To facilitate the utilization of pangenome resources,a comprehensive platform,the Potato Pangenome Database(PPDB),was developed.Our study provides a comprehensive genomic resource for dissecting the genomic basis of agronomic and metabolic traits in potato,which will accelerate functional genomics studies and genetic improvements in potato.
基金the project “Genomics for Conservation of Indigenous Cattle Breeds and for Enhancing Milk Yield, Phase-I” [BT/ PR26466/AAQ/1/704/2017], funded by the Department of Biotechnology (DBT ), Indiathe project “Identification of key molecular factors involved in resistance/susceptibility to paratuberculosis infection in indigenous breeds of cows” [BT/PR32758/AAQ/1/760/2019], which was also funded by Department of Biotechnology (DBT ), India。
文摘Background India harbors the world’s largest cattle population,encompassing over 50 distinct Bos indicus breeds.This rich genetic diversity underscores the inadequacy of a single reference genome to fully capture the genomic landscape of Indian cattle.To comprehensively characterize the genomic variation within Bos indicus and,specifically,dairy breeds,we aim to identify non-reference sequences and construct a comprehensive pangenome.Results Five representative genomes of prominent dairy breeds,including Gir,Kankrej,Tharparkar,Sahiwal,and Red Sindhi,were sequenced using 10X Genomics‘linked-read’technology.Assemblies generated from these linked-reads ranged from 2.70 Gb to 2.77 Gb,comparable to the Bos indicus Brahman reference genome.A pangenome of Bos indicus cattle was constructed by comparing the newly assembled genomes with the reference using alignment and graph-based methods,revealing 8 Mb and 17.7 Mb of novel sequence respectively.A confident set of 6,844 Non-reference Unique Insertions(NUIs)spanning 7.57 Mb was identified through both methods,representing the pange-nome of Indian Bos indicus breeds.Comparative analysis with previously published pangenomes unveiled 2.8 Mb(37%)commonality with the Chinese indicine pangenome and only 1%commonality with the Bos taurus pange-nome.Among these,2,312 NUIs encompassing~2 Mb,were commonly found in 98 samples of the 5 breeds and des-ignated as Bos indicus Common Insertions(BICIs)in the population.Furthermore,926 BICIs were identified within 682 protein-coding genes,54 long non-coding RNAs(lncRNA),and 18 pseudogenes.These protein-coding genes were enriched for functions such as chemical synaptic transmission,cell junction organization,cell-cell adhesion,and cell morphogenesis.The protein-coding genes were found in various prominent quantitative trait locus(QTL)regions,suggesting potential roles of BICIs in traits related to milk production,reproduction,exterior,health,meat,and carcass.Notably,63.21%of the bases within the BICIs call set contained interspersed repeats,predominantly Long Inter-spersed Nuclear Elements(LINEs).Additionally,70.28%of BICIs are shared with other domesticated and wild species,highlighting their evolutionary significance.Conclusions This is the first report unveiling a robust set of NUIs defining the pangenome of Bos indicus breeds of India.The analyses contribute valuable insights into the genomic landscape of desi cattle breeds.
基金supported by the National Key Research and Development Program of China(2023YFF1000100 and 2023YFA0914601)the Special Funds for Science Technology Innovation and Industrial Development of Shenzhen Dapeng New District(PT202101-01).
文摘Increasing number of structural variations(SVs)have been identified as causative mutations for diverse agronomic traits.However,the systematic exploration of SVs quantity,distribution,and contribution in wheat was lacking.Here,we report high-quality gene-based and SV-based pangenomes comprising 22 hexaploid wheat assemblies showing a wide range of chromosome size,gene number,and TE component,which indicates their representativeness of wheat genetic diversity.Pan-gene analyses uncover 140,261 distinct gene families,of which only 23.2%are shared in all accessions.Moreover,we build a∼16.15 Gb graph pangenome containing 695,897 bubbles,intersecting 5132 genes and 230,307 cis-regulatory regions.Pairwise genome comparisons identify∼1,978,221 non-redundant SVs and 497 SV hotspots.Notably,the density of bubbles as well as SVs show remarkable aggregation in centromeres,which probably play an important role in chromosome plasticity and stability.As for functional SVs exploration,we identify 2769 SVs with absolute relative frequency differences exceeding 0.7 between spring and winter growth habit groups.Additionally,several reported functional genes in wheat display complex structural graphs,for example,PPD-A1,VRT-A2,and TaNAAT2-A.These findings deepen our understanding of wheat genetic diversity,providing valuable graphical pangenome and variation resources to improve the efficiency of genome-wide association mapping in wheat.
基金National Natural Science Foundation of China(No.62372100)。
文摘Owing to the constraints of depth sensing technology,images acquired by depth cameras are inevitably mixed with various noises.For depth maps presented in gray values,this research proposes a novel denoising model,termed graph-based transform(GBT)and dual graph Laplacian regularization(DGLR)(DGLR-GBT).This model specifically aims to remove Gaussian white noise by capitalizing on the nonlocal self-similarity(NSS)and the piecewise smoothness properties intrinsic to depth maps.Within the group sparse coding(GSC)framework,a combination of GBT and DGLR is implemented.Firstly,within each group,the graph is constructed by using estimates of the true values of the averaged blocks instead of the observations.Secondly,the graph Laplacian regular terms are constructed based on rows and columns of similar block groups,respectively.Lastly,the solution is obtained effectively by combining the alternating direction multiplication method(ADMM)with the weighted thresholding method within the domain of GBT.
基金supported by the National Science and Technology Major Project of China(No.2011ZX05029-003)CNPC Science Research and Technology Development Project,China(No.2013D-0904)
文摘In this study, we used the multi-resolution graph-based clustering (MRGC) method for determining the electrofacies (EF) and lithofacies (LF) from well log data obtained from the intraplatform bank gas fields located in the Amu Darya Basin. The MRGC could automatically determine the optimal number of clusters without prior knowledge about the structure or cluster numbers of the analyzed data set and allowed the users to control the level of detail actually needed to define the EF. Based on the LF identification and successful EF calibration using core data, an MRGC EF partition model including five clusters and a quantitative LF interpretation chart were constructed. The EF clusters 1 to 5 were interpreted as lagoon, anhydrite flat, interbank, low-energy bank, and high-energy bank, and the coincidence rate in the cored interval could reach 85%. We concluded that the MRGC could be accurately applied to predict the LF in non-cored but logged wells. Therefore, continuous EF clusters were partitioned and corresponding LF were characteristics &different LF were analyzed interpreted, and the distribution and petrophysical in the framework of sequence stratigraphy.
基金supported by the National Natural Science Foundation of China (grant numbers 31961143021)the earmarked fund for Modern Agro-industry Technology Research System (grant numbers CARS-39-01)+1 种基金the Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences (grant numbers ASTIP-IAS01) to YM and LJsupported by the Elite Youth Program in Chinese Academy of Agricultural Sciences
文摘As large-scale genomic studies have progressed,it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level.While domestic animals tend to have complex routes of origin and migration,suggesting a possible omission of some population-specific sequences in the current reference genome.Conversely,the pangenome is a collection of all DNA sequences of a species that contains sequences shared by all individuals(core genome)and is also able to display sequence information unique to each individual(variable genome).The progress of pangenome research in humans,plants and domestic animals has proved that the missing genetic components and the identification of large structural variants(SVs)can be explored through pangenomic studies.Many individual specific sequences have been shown to be related to biological adaptability,phenotype and important economic traits.The maturity of technologies and methods such as third-generation sequencing,Tel-omere-to-telomere genomes,graphic genomes,and reference-free assembly will further promote the development of pangenome.In the future,pangenome combined with long-read data and multi-omics will help to resolve large SVs and their relationship with the main economic traits of interest in domesticated animals,providing better insights into animal domestication,evolution and breeding.In this review,we mainly discuss how pangenome analysis reveals genetic variations in domestic animals(sheep,cattle,pigs,chickens)and their impacts on phenotypes and how this can contribute to the understanding of species diversity.Additionally,we also go through potential issues and the future perspectives of pangenome research in livestock and poultry.
文摘A better understanding of the relationship between the structure and functions of urban and suburban spaces is one of the avenues of research still open for geographical information science.The research presented in this paper develops several graph-based metrics whose objective is to characterize some local and global structural properties that reflect the way the overall building layout can be cross-related to the one of the road layout.Such structural properties are modeled as an aggregation of parcels,buildings,and road networks.We introduce several computational measures(Ratio Minimum Distance,Minimum Ratio Minimum Distance,and Metric Compactness)that respectively evaluate the capability for a given road to be connected with the whole road network.These measures reveal emerging sub-network structures and point out differences between less-connective and moreconnective parts of the network.Based on these local and global properties derived from the topological and graph-based representation,and on building density metrics,this paper proposes an analysis of road and building layouts at different levels of granularity.The metrics developed are applied to a case study in which the derived properties reveal coherent as well as incoherent neighborhoods that illustrate the potential of the approach and the way buildings and roads can be relatively connected in a given urban environment.Overall,and by integrating the parcels and buildings layouts,this approach complements other previous and related works that mainly retain the configurational structure of the urban network as well as morphological studies whose focus is generally limited to the analysis of the building layout.
基金This work was supported by the National Key Research and Development Program of China(grant no.2022YFF1000500)National Natural Science Foundation of China(grant no.31941007)Zhejiang province agriculture(livestock)varieties breeding Key Technology R&D Program(grant no.2016C02054-2).
文摘Background The reliance on a solitary linear reference genome has imposed a significant constraint on our compre-hensive understanding of genetic variation in animals.This constraint is particularly pronounced for non-reference sequences(NRSs),which have not been extensively studied.Results In this study,we constructed a pig pangenome graph using 21 pig assemblies and identified 23,831 NRSs with a total length of 105 Mb.Our findings revealed that NRSs were more prevalent in breeds exhibiting greater genetic divergence from the reference genome.Furthermore,we observed that NRSs were rarely found within coding sequences,while NRS insertions were enriched in immune-related Gene Ontology terms.Notably,our investigation also unveiled a close association between novel genes and the immune capacity of pigs.We observed substantial differences in terms of frequencies of NRSs between Eastern and Western pigs,and the heat-resistant pigs exhibited a substantial number of NRS insertions in an 11.6 Mb interval on chromosome X.Additionally,we discovered a 665 bp insertion in the fourth intron of the TNFRSF19 gene that may be associated with the ability of heat tolerance in South-ern Chinese pigs.Conclusions Our findings demonstrate the potential of a graph genome approach to reveal important functional features of NRSs in pig populations.
基金the Project Fund for Key Discipline of the Shanghai Municipal Education Commission(No.J50104)the Major State Basic Research Development Program of China(No.2017YFB0403500)。
文摘Simultaneous localization and mapping(SLAM)is widely used in many robot applications to acquire the unknown environment's map and the robots location.Graph-based SLAM is demonstrated to be effective in large-scale scenarios,and it intuitively performs the SLAM as a pose graph.But because of the high data overlap rate,traditional graph-based SLAM is not efficient in some respects,such as real time performance and memory usage.To reduce1 data overlap rate,a graph-based SLAM with distributed submap strategy(DSS)is presented.In its front-end,submap based scan matching is processed and loop closing detection is conducted.Moreover in its back-end,pose graph is updated for global optimization and submap merging.From a series of experiments,it is demonstrated that graph-based SLAM with DSS reduces 51.79%data overlap rate,decreases 39.70%runtime and 24.60%memory usage.The advantages over other low overlap rate method is also proved in runtime,memory usage,accuracy and robustness performance.
基金supported by the National Key R&D Program of China(2021YFD1600200)Program of National Beef Cattle and Yak Industrial Technol-ogy System(NO.CARS-37)+1 种基金Natural Science Foundation of Sichuan Province(General Program)(24NSFSC0581)the Scientific and Technological Innovation Team for Qinghai-Tibetan Plateau Research in Southwest Minzu University(Grant No.2024CXTD02)。
文摘Background The genetic diversity of yak,a key domestic animal on the Qinghai-Tibetan Plateau(QTP),is a vital resource for domestication and breeding efforts.This study presents the first yak pangenome obtained through the de novo assembly of 16 yak genomes.Results We discovered 290 Mb of nonreference sequences and 504 new genes.Our pangenome-wide presence and absence variation(PAV)analysis revealed 5,120 PAV-related genes,highlighting a wide range of variety-specific genes and genes with varying frequencies across yak populations.Principal component analysis(PCA)based on binary gene PAV data classified yaks into three new groups:wild,domestic,and Jinchuan.Moreover,we pro-posed a‘two-haplotype genomic hybridization model'for understanding the hybridization patterns among breeds by integrating gene frequency,heterozygosity,and gene PAV data.A gene PAV-GWAS identified a novel gene(Bos-Gru3G009179)that may be associated with the multirib trait in Jinchuan yaks.Furthermore,an integrated transcrip-tome and pangenome analysis highlighted the significant differences in the expression of core genes and the muta-tional burden of differentially expressed genes between yaks from high and low altitudes.Transcriptome analysis across multiple species revealed that yaks have the most unique differentially expressed m RNAs and lnc RNAs(between high-and low-altitude regions),especially in the heart and lungs,when comparing high-and low-altitude adaptations.Conclusions The yak pangenome offers a comprehensive resource and new insights for functional genomic studies,supporting future biological research and breeding strategies.
文摘The number of botnet malware attacks on Internet devices has grown at an equivalent rate to the number of Internet devices that are connected to the Internet.Bot detection using machine learning(ML)with flow-based features has been extensively studied in the literature.Existing flow-based detection methods involve significant computational overhead that does not completely capture network communication patterns that might reveal other features ofmalicious hosts.Recently,Graph-Based Bot Detection methods using ML have gained attention to overcome these limitations,as graphs provide a real representation of network communications.The purpose of this study is to build a botnet malware detection system utilizing centrality measures for graph-based botnet detection and ML.We propose BotSward,a graph-based bot detection system that is based on ML.We apply the efficient centrality measures,which are Closeness Centrality(CC),Degree Centrality(CC),and PageRank(PR),and compare them with others used in the state-of-the-art.The efficiency of the proposed method is verified on the available Czech Technical University 13 dataset(CTU-13).The CTU-13 dataset contains 13 real botnet traffic scenarios that are connected to a command-and-control(C&C)channel and that cause malicious actions such as phishing,distributed denial-of-service(DDoS)attacks,spam attacks,etc.BotSward is robust to zero-day attacks,suitable for large-scale datasets,and is intended to produce better accuracy than state-of-the-art techniques.The proposed BotSward solution achieved 99%accuracy in botnet attack detection with a false positive rate as low as 0.0001%.
文摘Maximizing network lifetime is measured as the primary issue in Mobile Ad-hoc Networks(MANETs).In geographically routing based models,packet transmission seems to be more appropriate in dense circumstances.The involvement of the Heuristic model directly is not appropriate to offer an effectual solution as it becomes NP-hard issues;therefore investigators concentrate on using Meta-heuristic approaches.Dragonfly Optimization(DFO)is an effective meta-heuristic approach to resolve these problems by providing optimal solutions.Moreover,Meta-heuristic approaches(DFO)turn to be slower in convergence problems and need proper computational time while expanding network size.Thus,DFO is adaptively improved as Adaptive Dragonfly Optimization(ADFO)to fit this model and re-formulated using graph-based m-connection establishment(G-𝑚𝑚CE)to overcome computational time and DFO’s convergence based problems,considerably enhancing DFO performance.In(G-𝑚𝑚CE),Connectivity Zone(CZ)is chosen among source to destination in which optimality should be under those connected regions and ADFO is used for effective route establishment in CZ indeed of complete networking model.To measure complementary features of ADFO and(G-𝑚𝑚CE),hybridization of DFO-(G-𝑚𝑚CE)is anticipated over dense circumstances with reduced energy consumption and delay to enhance network lifetime.The simulation was performed in MATLAB environment.
基金supported by the DOD National Defense Science and Engineering Graduate(NDSEG)Research Fellowshipsupported by the NGA under Contract No.HM04762110003.
文摘Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier.A challenge is to identify which points to label to best improve performance while limiting the number of new labels."Model Change"active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s).We pair this idea with graph-based semi-supervised learning(SSL)methods,that use the spectrum of the graph Laplacian matrix,which can be truncated to avoid prohibitively large computational and storage costs.We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.
文摘Many cutting-edge methods are now possible in real-time commercial settings and are growing in popularity on cloud platforms.By incorporating new,cutting-edge technologies to a larger extent without using more infrastructures,the information technology platform is anticipating a completely new level of devel-opment.The following concepts are proposed in this research paper:1)A reliable authentication method Data replication that is optimised;graph-based data encryp-tion and packing colouring in Redundant Array of Independent Disks(RAID)sto-rage.At the data centre,data is encrypted using crypto keys called Key Streams.These keys are produced using the packing colouring method in the web graph’s jump graph.In order to achieve space efficiency,the replication is carried out on optimised many servers employing packing colours.It would be thought that more connections would provide better authentication.This study provides an innovative architecture with robust security,enhanced authentication,and low cost.
基金funded by the National Natural Science Foundation of China(no.32270685)the National Natural Science Fund for Excellent Young Scientists Fund Program(Overseas)the"Young Scientist Fostering Funds for the National Key Laboratory for Germplasm Innovation&Utilization of Horticultural Crops."
文摘Accurate variant genotyping is crucial for genomics-assisted breeding.Graph pangenome references can address single-reference bias,thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics.However,existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs,particularly in polyploid genomes.Here,we introduce Varigraph,an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants.We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes.Vari-graph outperforms current state-of-the-art linear and graph-based genotypers across non-human ge-nomes while maintaining comparable genotyping performance in human genomes.By employing efficient data structures including counting Bloom filter and bitmap storage,as well as GPU models,Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets.Its wide applicability extends to highly repetitive or large genomes,such as those of maize and wheat.Significantly,Varigraph can handle extensive pangenome graphs,as demonstrated by its performance on a dataset containing 252 rice genomes,for which it achieved a precision exceeding 0.9 for both small and large variants.Notably,Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids,enabling precise determination of allele dosage.In summary,this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studiesandgenomics-assistedbreeding.
基金funded by the National Natural Science Fund for Excellent Young Scientists Fund Program(Overseas)the Young Scientist Fostering Funds for the National Key Laboratory for Germplasm Innovation&Utilization of Horticultural Crops.
文摘Innovations in DNA sequencing technologies have greatly boosted population-level genomic studies in plants,facilitating the identification of key genetic variations for investigating population diversity and accelerating the molecular breeding of crops.Conventional methods for genomic analysis typically rely on small variants,such as SNPs and indels,and use single linear reference genomes,which introduces biases and reduces performance in highly divergent genomic regions.By integrating the population level of sequences,pangenomes,particularly graph pangenomes,offer a promising solution to these challenges.To date,numerous algorithms have been developed for constructing pangenome graphs,aligning reads to these graphs,and performing variant genotyping based on these graphs.As demonstrated in various plant pangenomic studies,these advancements allow for the detection of previously hidden variants,especially structural variants,thereby enhancing applications such as genetic mapping of agronomically important genes.However,noteworthy challenges remain to be overcome in applying pangenome graph approaches to plants.Addressing these issues will require the development of more sophisticated algorithms tailored specifically to plants.Such improvements will contribute to the scalability of this approach,facilitating the production of super-pangenomes,in which hundreds or even thousands of de novo–assembled genomes from one species or genus can be integrated.This,in turn,will promote broader pan-omic studies,further advancing our understanding of genetic diversity and driving innovations in crop breeding.
基金supported by a grant from King Abdulaziz City for Science and Technology,Riyadh,Saudi Arabia(No. KACST 428-29)institutional grant from CAS Key Laboratory of Genome Sciences and Information,Beijing Institute of Genomics, Chinese Academy of Sciences+2 种基金supported by the grants from the National Basic Research Program(973 Program)(No.2010CB126604)the Special Foundation Work Program(No.2009FY 120100)the Ministry of Science and Technology of the People's Republic of China and from the National Science Foundation of China(No. 31071163).
文摘Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced seven B.thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform.The B.thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added.Compared to the pangenomes of its closely related species of the same genus,B.thuringiensis pangenome shows an open characteristic,similar to B.cereus but not to B.anthracis;the latter has a closed pangenome. We also found extensive divergence among the seven B.thuringiensis genome assemblies,which harbor ample repeats and single nucleotide polymorphisms(SNPs).The identities among orthologous genes are greater than 84.5%and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8 Mb and 5.0-5.6 Mb.We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed,are very useful for pangenomic studies.