Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples w...Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples with complete annotations is prohibitively expensive,and the available annotations in practical engineering are often sparse due to limited observation.To leverage samples with sparse annotations,this paper proposes an uncertainty-based active transfer learning method.The most valuable positions in the flow field are selected based on uncertainty for annotation,effectively improving prediction accuracy and reducing annotation costs.Our method involves a novel active annotation based on synchronous quantile regression,which can mitigate the computational cost of query annotation.Besides,a novel quantile levels-based consistency regularization is proposed to constrain the remaining unlabeled regions and further improve the model performance.Experiments show that our method can significantly reduce prediction errors with only 1%extra annotations,and is a promising tool for achieving rapid and accurate flow field prediction.展开更多
Translation is an important medium of cultural communication.It is not a mere transfer of two languages,but the interaction of two cultures.Cultural misreading,which results from cultural discrepancy and translator’s...Translation is an important medium of cultural communication.It is not a mere transfer of two languages,but the interaction of two cultures.Cultural misreading,which results from cultural discrepancy and translator’s subjectivity,truly reflects where the blockade and conflict in the cultural communication is.Cultural misreading is an objective phenomenon that exists in the entire process of translation.This paper intends to make a comprehensive analysis and discussion on The History of the Former Han Dynasty:a Critical Translation with Annotations translated by Homer Hasenpflug Dubs.As for the reasons of cultural misreading,this paper divides them into three types—language,thinking habit,traditional culture.It is to be hoped that this paper will draw more attention from the translation circle to the phenomena,and make contribution to the development of literary translation.展开更多
This paper mainly studies the basic types of annotation and the analysis of its effective functional usage,so as to pay more attention to annotation in the translation of poetry and Fu.The annotation of this study bel...This paper mainly studies the basic types of annotation and the analysis of its effective functional usage,so as to pay more attention to annotation in the translation of poetry and Fu.The annotation of this study belongs to the category of paratext.Annotation is attributed to the paratext,undertakes its special function,enriches and perfects the paratext system.展开更多
Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data...Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data gathering and analysis can obstruct the replication and utilization of the results,ultimately affecting the study’s transparency and integrity.The task of manually annotating extensive datasets demands considerable labor and financial investment,especially when it entails engaging specialized individuals.In our crowd counting study,we employed the web-based annotation tool SuperAnnotate to streamline the human annotation process for a dataset comprising 3,000 images.By integrating automated annotation tools,we realized substantial time efficiencies,as demonstrated by the remarkable achievement of 858,958 annotations.This underscores the significant contribution of such technologies to the efficiency of the annotation process.展开更多
Machine learning models for crop image analysis and phenomics are highly important for precision agriculture and breeding and have been the subject of intensive research.However,the lack of publicly available high-qua...Machine learning models for crop image analysis and phenomics are highly important for precision agriculture and breeding and have been the subject of intensive research.However,the lack of publicly available high-quality image datasets with detailed annotations has severely hindered the development of these models.In this work,we present a comprehensive multicultivar and multiview rice plant image dataset(CVRP)created from 231 landraces and 50 modern cultivars grown under dense planting in paddy fields.The dataset includes images capturing rice plants in their natural environment,as well as indoor images focusing specifically on panicles,allowing for a detailed investigation of cultivar-specific differences.A semiautomatic annotation process using deep learning models was designed for annotations,followed by rigorous manual curation.We demonstrated the utility of the CVRP by evaluating the performance of four state-of-the-art(SOTA)semantic segmentation models.We also conducted 3D plant reconstruction with organ segmentation via images and annotations.The database not only facilitates general-purpose image-based panicle identification and segmentation but also provides valuable re-sources for challenging tasks such as automatic rice cultivar identification,panicle and grain counting,and 3D plant reconstruction.The database and the model for image annotation are available at.展开更多
Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames,start sites,splice sites,and related structural features.The source of these incon...Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames,start sites,splice sites,and related structural features.The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures.In addition,the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations,nor do they represent these annotations in a format consistent with current file standards.These frameworks also lack consideration for functional attributes,such as the presence or absence of protein domains that can be used for gene model validation.To provide oversight to the increasing number of published genome annotations,we present a software package,the Gene Filtering,Analysis,and Conversion(gFACs),to filter,analyze,and convert predicted gene models and alignments.The software operates across a wide range of alignment,analysis,and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes.gFACs supports common downstream applications,including genome browsers,and generates extensive details on the filtering process,including distributions that can be visualized to further assess the proposed gene space.gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs.展开更多
Collaborative social annotation systems allow users to record and share their original keywords or tag attachments to Web resources such as Web pages, photos, or videos. These annotations are a method for organizing a...Collaborative social annotation systems allow users to record and share their original keywords or tag attachments to Web resources such as Web pages, photos, or videos. These annotations are a method for organizing and labeling information. They have the potential to help users navigate the Web and locate the needed resources. However, since annotations axe posted by users under no central control, there exist problems such as spare and synonymous annotations. To efficiently use annotation information to facilitate knowledge discovery from the Web, it is advantageous if we organize social annotations from semantic perspective and embed them into algorithms for knowledge discovery. This inspires the Web page recommendation with annotations, in which users and Web pages are clustered so that semantically similar items can be related. In this paper we propose four graphic models which cluster users, Web pages and annotations and recommend Web pages for given users by assigning items to the right cluster first. The algorithms are then compared to the classical collaborative filtering recommendation method on a real-world data set. Our result indicates that the graphic models provide better recommendation performance and are robust to fit for the real applications.展开更多
The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development ...The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes.展开更多
Anterior segment eye diseases account for a significant proportion of presentations to eye clinics worldwide,including diseases associated with corneal pathologies,anterior chamber abnormalities(e.g.blood or inflammat...Anterior segment eye diseases account for a significant proportion of presentations to eye clinics worldwide,including diseases associated with corneal pathologies,anterior chamber abnormalities(e.g.blood or inflammation),and lens diseases.The construction of an automatic tool for segmentation of anterior segment eye lesions would greatly improve the efficiency of clinical care.With research on artificial intelligence progressing in recent years,deep learning models have shown their superiority in image classification and segmentation.The training and evaluation of deep learning models should be based on a large amount of data annotated with expertise;however,such data are relatively scarce in the domain of medicine.Herein,the authors developed a new medical image annotation system,called EyeHealer.It is a large-scale anterior eye segment dataset with both eye structures and lesions annotated at the pixel level.Comprehensive experiments were conducted to verify its performance in disease classification and eye lesion segmentation.The results showed that semantic segmentation models outperformed medical segmentation models.This paper describes the establishment of the system for automated classification and segmentation tasks.The dataset will be made publicly available to encourage future research in this area.展开更多
The most prosperous and important eras in the history of communication between China and foreign countries were the Han, Tang and Ming dynasties. During the Han and Tang dynasties China’s foreign relations were large...The most prosperous and important eras in the history of communication between China and foreign countries were the Han, Tang and Ming dynasties. During the Han and Tang dynasties China’s foreign relations were largely confined to Asia. Apart from Japan and a few South Asian countries there was almost no maritime contact with the outside world by China. The展开更多
Creating large-scale and well-annotated datasets to train AI algorithms is crucial for automated tumor detection and localization.However,with limited resources,it is challenging to determine the best type of annotati...Creating large-scale and well-annotated datasets to train AI algorithms is crucial for automated tumor detection and localization.However,with limited resources,it is challenging to determine the best type of annotations when annotating massive amounts of unlabeled data.To address this issue,we focus on polyps in colonoscopy videos and pancreatic tumors in abdominal CT scans;Both applications require significant effort and time for pixel-wise annotation due to the high dimensional nature of the data,involving either temporary or spatial dimensions.In this paper,we develop a new annotation strategy,termed Drag&Drop,which simplifies the annotation process to drag and drop.This annotation strategy is more efficient,particularly for temporal and volumetric imaging,than other types of weak annotations,such as per-pixel,bounding boxes,scribbles,ellipses and points.Furthermore,to exploit our Drag&Drop annotations,we develop a novel weakly supervised learning method based on the watershed algorithm.Experimental results show that our method achieves better detection and localization performance than alternative weak annotations and,more importantly,achieves similar performance to that trained on detailed per-pixel annotations.Interestingly,we find that,with limited resources,allocating weak annotations from a diverse patient population can foster models more robust to unseen images than allocating per-pixel annotations for a small set of images.In summary,this research proposes an efficient annotation strategy for tumor detection and localization that is less accurate than per-pixel annotations but useful for creating large-scale datasets for screening tumors in various medical modalities.展开更多
Single-cell RNA sequencing(scRNA-seq)is revolutionizing the study of complex and dynamic cellular mechanisms.However,cell type annotation remains a main challenge as it largely relies on a priori knowledge and manual ...Single-cell RNA sequencing(scRNA-seq)is revolutionizing the study of complex and dynamic cellular mechanisms.However,cell type annotation remains a main challenge as it largely relies on a priori knowledge and manual curation,which is cumbersome and subjective.The increasing number of scRNA-seq datasets,as well as numerous published genetic studies,has motivated us to build a comprehensive human cell type reference atlas.Here,we present decoding Cell type Specificity(deCS),an automatic cell type annotation method augmented by a comprehensive collection of human cell type expression profiles and marker genes.We used deCS to annotate scRNAseq data from various tissue types and systematically evaluated the annotation accuracy under different conditions,including reference panels,sequencing depth,and feature selection strategies.Our results demonstrate that expanding the references is critical for improving annotation accuracy.Compared to many existing state-of-the-art annotation tools,deCS significantly reduced computation time and increased accuracy.deCS can be integrated into the standard scRNA-seq analytical pipeline to enhance cell type annotation.Finally,we demonstrated the broad utility of deCS to identify trait-cell type associations in 51 human complex traits,providing deep insights into the cellular mechanisms underlying disease pathogenesis.展开更多
During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC gro...During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC group.We have revised the grouping annotations in Fig.1 and have labeled the DHE staining in the figure to present the experimental design more clearly.展开更多
Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in relat...Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in related research methodologies.Biomedical ontology,as a shared formal conceptual system,not only offers standardized terms for multi-source biomedical data but also provides a solid data foundation and framework for biomedical research.In this review,we summarize enrichment analysis and deep learning for biomedical ontology based on its structure and semantic annotation properties,highlighting how technological advancements are enabling the more comprehensive use of ontology information.Enrichment analysis represents an important application of ontology to elucidate the potential biological significance for a particular molecular list.Deep learning,on the other hand,represents an increasingly powerful analytical tool that can be more widely combined with ontology for analysis and prediction.With the continuous evolution of big data technologies,the integration of these technologies with biomedical ontologies is opening up exciting new possibilities for advancing biomedical research.展开更多
In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA ...In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA repeats,of which 496762 and 495977 were identified on the forward and reverse DNA strands,respectively.The detected DRs were,on average,374 bp in length and occupied 66.4%of the O.sativa genome.Totally 61%of DRs,identified by the IP method,overlapped with previously annotated dispersed repeats(ADRs)detected using the Extensive De Novo TE Annotator(EDTA)pipeline.展开更多
Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.teg...Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.展开更多
Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of...Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.展开更多
The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannot...The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannotating and utilizing plastome sequences remain challenging.Therefore,a streamlined phylogenomic pipeline spanning plastome annotation,phylogenetic reconstruction and comparative genomics would greatly facilitate research utilizing this important organellar genome.Here,we develop PlastidHub,a novel web application employing innovative tools to analyze plastome sequences.In comparison with existing tools,key novel functionalities in PlastidHub include:(1)standardization of quadripartite structure;(2)improvement of annotation flexibility and consistency;(3)quantitative assessment of annotation completeness;(4)diverse extraction modes for canonical and specialized sequences;(5)intelligent screening of molecular markers for biodiversity studies;(6)genelevel visual comparison of structural variations and annotation completeness.PlastidHub features cloud-based web applications that do not require users to install,update,or maintain tools;detailed help documents including user guides,test examples,a static pop-up prompt box,and dynamic pop-up warning prompts when entering unreasonable parameter values;batch processing capabilities for all tools;intermediate results for secondary use;and easy-to-operate task flows between fileupload and download.A key feature of PlastidHub is its interrelated task-based user interface design.Give that PlastidHub is easy to use without specialized computational skills or resources,this new platform should be widely used among botanists and evolutionary biologists,improving and expediting research employing the plastome.PlastidHub is available at https://www.plastidhub.cn.展开更多
Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throug...Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.展开更多
基金supported by the National Natural Science Foundation of China(No.92371206).
文摘Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples with complete annotations is prohibitively expensive,and the available annotations in practical engineering are often sparse due to limited observation.To leverage samples with sparse annotations,this paper proposes an uncertainty-based active transfer learning method.The most valuable positions in the flow field are selected based on uncertainty for annotation,effectively improving prediction accuracy and reducing annotation costs.Our method involves a novel active annotation based on synchronous quantile regression,which can mitigate the computational cost of query annotation.Besides,a novel quantile levels-based consistency regularization is proposed to constrain the remaining unlabeled regions and further improve the model performance.Experiments show that our method can significantly reduce prediction errors with only 1%extra annotations,and is a promising tool for achieving rapid and accurate flow field prediction.
文摘Translation is an important medium of cultural communication.It is not a mere transfer of two languages,but the interaction of two cultures.Cultural misreading,which results from cultural discrepancy and translator’s subjectivity,truly reflects where the blockade and conflict in the cultural communication is.Cultural misreading is an objective phenomenon that exists in the entire process of translation.This paper intends to make a comprehensive analysis and discussion on The History of the Former Han Dynasty:a Critical Translation with Annotations translated by Homer Hasenpflug Dubs.As for the reasons of cultural misreading,this paper divides them into three types—language,thinking habit,traditional culture.It is to be hoped that this paper will draw more attention from the translation circle to the phenomena,and make contribution to the development of literary translation.
基金This paper is sponsored by the Postgraduate Creative Foundation of Gannan Normal University entitled“A Study on Dynamic Equivalence of Ecological Translation Elements in Davis’English Translation of Tao Yuanming’s Works from the Perspective of Cultural Context”(“文化语境视域下戴维斯英译《陶渊明作品集》的生态翻译元素动态对等研究”,YCX21A004)“National Social Science Foundation of China western Project in2018”(“2018年国家社科基金西部项目”)entitled“A study of Chinese Ci fu in the English-speaking world”(“英语世界的中国辞赋研究”,18XZW017).
文摘This paper mainly studies the basic types of annotation and the analysis of its effective functional usage,so as to pay more attention to annotation in the translation of poetry and Fu.The annotation of this study belongs to the category of paratext.Annotation is attributed to the paratext,undertakes its special function,enriches and perfects the paratext system.
基金The authors acknowledge that this research work was supported through project number UCS&T/R&D-09/19–20/17533 titled“An Intelligent Computational Model for Crowd Demonstration and Risk Analysis during Spiritual Events in Haridwar’by the Uttarakhand Council for Science and Technology(UCOST),India.”。
文摘Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data gathering and analysis can obstruct the replication and utilization of the results,ultimately affecting the study’s transparency and integrity.The task of manually annotating extensive datasets demands considerable labor and financial investment,especially when it entails engaging specialized individuals.In our crowd counting study,we employed the web-based annotation tool SuperAnnotate to streamline the human annotation process for a dataset comprising 3,000 images.By integrating automated annotation tools,we realized substantial time efficiencies,as demonstrated by the remarkable achievement of 858,958 annotations.This underscores the significant contribution of such technologies to the efficiency of the annotation process.
基金supported in part by grants from Biological Breeding-National Science and Technology Major Project(Grant No.2023ZD04076)the National Natural Science Foundation of China(Grant No.32170647)+2 种基金the National Science Foundation of Jiangsu Province in China(Grant Nos.BK20212010 and BE2022383)the Jiangsu Engineering Research Center for Plant Genome Editing,Southern Japonica Rice Research and Development Co.LTDthe Jiangsu Collaborative Innovation Center for Modern Crop Production.
文摘Machine learning models for crop image analysis and phenomics are highly important for precision agriculture and breeding and have been the subject of intensive research.However,the lack of publicly available high-quality image datasets with detailed annotations has severely hindered the development of these models.In this work,we present a comprehensive multicultivar and multiview rice plant image dataset(CVRP)created from 231 landraces and 50 modern cultivars grown under dense planting in paddy fields.The dataset includes images capturing rice plants in their natural environment,as well as indoor images focusing specifically on panicles,allowing for a detailed investigation of cultivar-specific differences.A semiautomatic annotation process using deep learning models was designed for annotations,followed by rigorous manual curation.We demonstrated the utility of the CVRP by evaluating the performance of four state-of-the-art(SOTA)semantic segmentation models.We also conducted 3D plant reconstruction with organ segmentation via images and annotations.The database not only facilitates general-purpose image-based panicle identification and segmentation but also provides valuable re-sources for challenging tasks such as automatic rice cultivar identification,panicle and grain counting,and 3D plant reconstruction.The database and the model for image annotation are available at.
基金supported by the National Science Foundation Plant Genome Research Program of the United States(Grant No.1444573)
文摘Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames,start sites,splice sites,and related structural features.The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures.In addition,the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations,nor do they represent these annotations in a format consistent with current file standards.These frameworks also lack consideration for functional attributes,such as the presence or absence of protein domains that can be used for gene model validation.To provide oversight to the increasing number of published genome annotations,we present a software package,the Gene Filtering,Analysis,and Conversion(gFACs),to filter,analyze,and convert predicted gene models and alignments.The software operates across a wide range of alignment,analysis,and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes.gFACs supports common downstream applications,including genome browsers,and generates extensive details on the filtering process,including distributions that can be visualized to further assess the proposed gene space.gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs.
基金supported in part by the National Natural Science Foundation of China under Grant Nos. 60621001, 60875028,60875049, and 70890084the Chinese Ministry of Science and Technology under Grant No. 2006AA010106,the Chinese Academy of Sciences under Grant Nos. 2F05N01, 2F08N03 and 2F07C01
文摘Collaborative social annotation systems allow users to record and share their original keywords or tag attachments to Web resources such as Web pages, photos, or videos. These annotations are a method for organizing and labeling information. They have the potential to help users navigate the Web and locate the needed resources. However, since annotations axe posted by users under no central control, there exist problems such as spare and synonymous annotations. To efficiently use annotation information to facilitate knowledge discovery from the Web, it is advantageous if we organize social annotations from semantic perspective and embed them into algorithms for knowledge discovery. This inspires the Web page recommendation with annotations, in which users and Web pages are clustered so that semantically similar items can be related. In this paper we propose four graphic models which cluster users, Web pages and annotations and recommend Web pages for given users by assigning items to the right cluster first. The algorithms are then compared to the classical collaborative filtering recommendation method on a real-world data set. Our result indicates that the graphic models provide better recommendation performance and are robust to fit for the real applications.
基金supported by the National Natural Science Foundation of China (31000591,31000587,31171266)
文摘The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes.
基金This study was funded by the National Key Research and Development Program of China(Grant No.2017YFC1104600)Recruitment Program of Leading Talents of Guangdong Province(Grant No.2016LJ06Y375).
文摘Anterior segment eye diseases account for a significant proportion of presentations to eye clinics worldwide,including diseases associated with corneal pathologies,anterior chamber abnormalities(e.g.blood or inflammation),and lens diseases.The construction of an automatic tool for segmentation of anterior segment eye lesions would greatly improve the efficiency of clinical care.With research on artificial intelligence progressing in recent years,deep learning models have shown their superiority in image classification and segmentation.The training and evaluation of deep learning models should be based on a large amount of data annotated with expertise;however,such data are relatively scarce in the domain of medicine.Herein,the authors developed a new medical image annotation system,called EyeHealer.It is a large-scale anterior eye segment dataset with both eye structures and lesions annotated at the pixel level.Comprehensive experiments were conducted to verify its performance in disease classification and eye lesion segmentation.The results showed that semantic segmentation models outperformed medical segmentation models.This paper describes the establishment of the system for automated classification and segmentation tasks.The dataset will be made publicly available to encourage future research in this area.
文摘The most prosperous and important eras in the history of communication between China and foreign countries were the Han, Tang and Ming dynasties. During the Han and Tang dynasties China’s foreign relations were largely confined to Asia. Apart from Japan and a few South Asian countries there was almost no maritime contact with the outside world by China. The
基金supported by the Lustgarten Foundation for Pancreatic Cancer Research and the Patrick J.McGovern Foundation Award.
文摘Creating large-scale and well-annotated datasets to train AI algorithms is crucial for automated tumor detection and localization.However,with limited resources,it is challenging to determine the best type of annotations when annotating massive amounts of unlabeled data.To address this issue,we focus on polyps in colonoscopy videos and pancreatic tumors in abdominal CT scans;Both applications require significant effort and time for pixel-wise annotation due to the high dimensional nature of the data,involving either temporary or spatial dimensions.In this paper,we develop a new annotation strategy,termed Drag&Drop,which simplifies the annotation process to drag and drop.This annotation strategy is more efficient,particularly for temporal and volumetric imaging,than other types of weak annotations,such as per-pixel,bounding boxes,scribbles,ellipses and points.Furthermore,to exploit our Drag&Drop annotations,we develop a novel weakly supervised learning method based on the watershed algorithm.Experimental results show that our method achieves better detection and localization performance than alternative weak annotations and,more importantly,achieves similar performance to that trained on detailed per-pixel annotations.Interestingly,we find that,with limited resources,allocating weak annotations from a diverse patient population can foster models more robust to unseen images than allocating per-pixel annotations for a small set of images.In summary,this research proposes an efficient annotation strategy for tumor detection and localization that is less accurate than per-pixel annotations but useful for creating large-scale datasets for screening tumors in various medical modalities.
基金supported by National Institutes of Health grants(Grant Nos.R01LM012806R,I01DE030122,and R01DE029818)support from Cancer Prevention and Research Institute of Texas(Grant Nos.CPRIT RP180734 and RP210045),United States.
文摘Single-cell RNA sequencing(scRNA-seq)is revolutionizing the study of complex and dynamic cellular mechanisms.However,cell type annotation remains a main challenge as it largely relies on a priori knowledge and manual curation,which is cumbersome and subjective.The increasing number of scRNA-seq datasets,as well as numerous published genetic studies,has motivated us to build a comprehensive human cell type reference atlas.Here,we present decoding Cell type Specificity(deCS),an automatic cell type annotation method augmented by a comprehensive collection of human cell type expression profiles and marker genes.We used deCS to annotate scRNAseq data from various tissue types and systematically evaluated the annotation accuracy under different conditions,including reference panels,sequencing depth,and feature selection strategies.Our results demonstrate that expanding the references is critical for improving annotation accuracy.Compared to many existing state-of-the-art annotation tools,deCS significantly reduced computation time and increased accuracy.deCS can be integrated into the standard scRNA-seq analytical pipeline to enhance cell type annotation.Finally,we demonstrated the broad utility of deCS to identify trait-cell type associations in 51 human complex traits,providing deep insights into the cellular mechanisms underlying disease pathogenesis.
文摘During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC group.We have revised the grouping annotations in Fig.1 and have labeled the DHE staining in the figure to present the experimental design more clearly.
基金supported by the National Natural Science Foundation of China(61902095).
文摘Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in related research methodologies.Biomedical ontology,as a shared formal conceptual system,not only offers standardized terms for multi-source biomedical data but also provides a solid data foundation and framework for biomedical research.In this review,we summarize enrichment analysis and deep learning for biomedical ontology based on its structure and semantic annotation properties,highlighting how technological advancements are enabling the more comprehensive use of ontology information.Enrichment analysis represents an important application of ontology to elucidate the potential biological significance for a particular molecular list.Deep learning,on the other hand,represents an increasingly powerful analytical tool that can be more widely combined with ontology for analysis and prediction.With the continuous evolution of big data technologies,the integration of these technologies with biomedical ontologies is opening up exciting new possibilities for advancing biomedical research.
基金supported by the Russian Science Foundation,Russia(Grant No.24-24-00031).
文摘In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA repeats,of which 496762 and 495977 were identified on the forward and reverse DNA strands,respectively.The detected DRs were,on average,374 bp in length and occupied 66.4%of the O.sativa genome.Totally 61%of DRs,identified by the IP method,overlapped with previously annotated dispersed repeats(ADRs)detected using the Extensive De Novo TE Annotator(EDTA)pipeline.
基金supported by the National Natural Science Foundation of China(32260097)the National Guidance of Local Science and Technology Development Fund of China([2023]009)。
文摘Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.
基金Supported by Guangdong Basic and Applied Basic Research Foundation(No.2025A1515011627)San Ming Project of Medicine in Shenzhen(No.SZSM202311012).
文摘Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.
基金the Natural Science Foundation of Shandong Province(ZR2020QC022)the Science and Technology Basic Resources Investigation Program of China(No.2019FY100900)+2 种基金the Major Program for Basic Research Project of Yunnan Province(202401BC070001)Yunnan Revitalization Talent Support Program:Yunling Scholar Project to Tingshuang Yithe open research project of“Cross Cooperative Team”of the Germplasm Bank of Wild Species,Kunming Institute of Botany,Chinese Academy of Sciences.
文摘The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannotating and utilizing plastome sequences remain challenging.Therefore,a streamlined phylogenomic pipeline spanning plastome annotation,phylogenetic reconstruction and comparative genomics would greatly facilitate research utilizing this important organellar genome.Here,we develop PlastidHub,a novel web application employing innovative tools to analyze plastome sequences.In comparison with existing tools,key novel functionalities in PlastidHub include:(1)standardization of quadripartite structure;(2)improvement of annotation flexibility and consistency;(3)quantitative assessment of annotation completeness;(4)diverse extraction modes for canonical and specialized sequences;(5)intelligent screening of molecular markers for biodiversity studies;(6)genelevel visual comparison of structural variations and annotation completeness.PlastidHub features cloud-based web applications that do not require users to install,update,or maintain tools;detailed help documents including user guides,test examples,a static pop-up prompt box,and dynamic pop-up warning prompts when entering unreasonable parameter values;batch processing capabilities for all tools;intermediate results for secondary use;and easy-to-operate task flows between fileupload and download.A key feature of PlastidHub is its interrelated task-based user interface design.Give that PlastidHub is easy to use without specialized computational skills or resources,this new platform should be widely used among botanists and evolutionary biologists,improving and expediting research employing the plastome.PlastidHub is available at https://www.plastidhub.cn.
基金supported by the National Natural Science Foundation of China(Nos.82274064,82374026,and 82204591)。
文摘Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.