Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BA...Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BANNs model SNP and SNP-set effects in their input and hidden layers,respectively.Furthermore,the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales.However,its application in genomic prediction has yet to be explored.Results This study extended the BANNs framework to the area of genomic selection and explored the optimal SN P-set partitioning strategies by using dairy cattle datasets.The SN P-sets were partitioned based on two strategiesgene annotations and 100 kb windows,denoted as BANN_gene and BANN_100kb,respectively.The BANNs model was compared with GBLU P,random forest(RF),BayesB and BayesCπthrough five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits,type traits,and one health trait of 6,558,6,210and 5,962 Chinese Holsteins,respectively.Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLU P,RF and Bayesian methods.Specifically,the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP,RF,BayesB and BayesCrr across all traits.The average accuracy improvements of BANN_100kb over GBLU P,RF,BayesB and BayesCrr were 4.86%,3.95%,3.84%and 1.92%,and the accuracy of BANN_gene was improved by3.75%,2.86%,2.73%and 0.85%compared to GBLUP,RF,BayesB and BayesCπ,respectively across all seven traits.Meanwhile,both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP,RF and Bayesian methods.Conclusion Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios,and might serve as a promising alternative approach for genomic prediction in dairy cattle.展开更多
The method presented in this work is based on the fundamental concepts of Paraconsistent Annotated Logic with annotation of 2 values (PAL2v). The PAL2v is a non-classic Logics which admits contradiction and in this pa...The method presented in this work is based on the fundamental concepts of Paraconsistent Annotated Logic with annotation of 2 values (PAL2v). The PAL2v is a non-classic Logics which admits contradiction and in this paper we perform a study using mathematical interpretation in its representative lattice. This studies result in algorithms and equations give an effective treatment on signals of information that represent situations found in uncertainty knowledge database. From the obtained equations, algorithms are elaborated to be utilized in computation models of the uncertainty treatment Systems. We presented some results that were obtained of analyses done with one of the algorithms that compose the paraconsistent analyzing system of logical signals with the PAL2v Logic. The paraconsistent reasoning system built according to the PAL2v methodology notions reveals itself to be more efficient than the traditional ones, because it gets to offer an appropriate treatment to contradictory information.展开更多
Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntact...Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as "pandas eat bamboo". In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.展开更多
Deep learning and multimodal remote and proximal sensing are widely used for analyzing plant and crop traits,but many of these deep learning models are supervised and necessitate reference datasets with image annotati...Deep learning and multimodal remote and proximal sensing are widely used for analyzing plant and crop traits,but many of these deep learning models are supervised and necessitate reference datasets with image annotations.Acquiring these datasets often demands experiments that are both labor-intensive and time-consuming.Furthermore,extracting traits from remote sensing data beyond simple geometric features remains a challenge.To address these challenges,we proposed a radiative transfer modeling framework based on the Helios 3-dimensional(3D)plant modeling software designed for plant remote and proximal sensing image simulation.The framework has the capability to simulate RGB,multi-/hyperspectral,thermal,and depth cameras,and produce associated plant images with fully resolved reference labels such as plant physical traits,leaf chemical concentrations,and leaf physiological traits.Helios offers a simulated environment that enables generation of 3D geometric models of plants and soil with random variation,and specification or simulation of their properties and function.This approach differs from traditional computer graphics rendering by explicitly modeling radiation transfer physics,which provides a critical link to underlying plant biophysical processes.Results indicate that the framework is capable of generating high-quality,labeled synthetic plant images under given lighting scenarios,which can lessen or remove the need for manually collected and annotated data.Two example applications are presented that demonstrate the feasibility of using the model to enable unsupervised learning by training deep learning models exclusively with simulated images and performing prediction tasks using real images.展开更多
In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA ...In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA repeats,of which 496762 and 495977 were identified on the forward and reverse DNA strands,respectively.The detected DRs were,on average,374 bp in length and occupied 66.4%of the O.sativa genome.Totally 61%of DRs,identified by the IP method,overlapped with previously annotated dispersed repeats(ADRs)detected using the Extensive De Novo TE Annotator(EDTA)pipeline.展开更多
We ascertain the modularity-like objective function whose optimization is equivalent to the maximum likelihood in annotated networks. We demonstrate that the modularity-like objective function is a lin- ear combinatio...We ascertain the modularity-like objective function whose optimization is equivalent to the maximum likelihood in annotated networks. We demonstrate that the modularity-like objective function is a lin- ear combination of modularity and conditional entropy. In contrast with statistical inference methods, in our method, the influence of the metadata is adjustable; when its influence is strong enough, the metadata can be recovered. Conversely, when it is weak, the detection may correspond to another partition. Between the two, there is a transition. This paper provides a concept for expanding the scope of modularity methods.展开更多
Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.teg...Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.展开更多
Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of...Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.展开更多
The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannot...The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannotating and utilizing plastome sequences remain challenging.Therefore,a streamlined phylogenomic pipeline spanning plastome annotation,phylogenetic reconstruction and comparative genomics would greatly facilitate research utilizing this important organellar genome.Here,we develop PlastidHub,a novel web application employing innovative tools to analyze plastome sequences.In comparison with existing tools,key novel functionalities in PlastidHub include:(1)standardization of quadripartite structure;(2)improvement of annotation flexibility and consistency;(3)quantitative assessment of annotation completeness;(4)diverse extraction modes for canonical and specialized sequences;(5)intelligent screening of molecular markers for biodiversity studies;(6)genelevel visual comparison of structural variations and annotation completeness.PlastidHub features cloud-based web applications that do not require users to install,update,or maintain tools;detailed help documents including user guides,test examples,a static pop-up prompt box,and dynamic pop-up warning prompts when entering unreasonable parameter values;batch processing capabilities for all tools;intermediate results for secondary use;and easy-to-operate task flows between fileupload and download.A key feature of PlastidHub is its interrelated task-based user interface design.Give that PlastidHub is easy to use without specialized computational skills or resources,this new platform should be widely used among botanists and evolutionary biologists,improving and expediting research employing the plastome.PlastidHub is available at https://www.plastidhub.cn.展开更多
Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in relat...Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in related research methodologies.Biomedical ontology,as a shared formal conceptual system,not only offers standardized terms for multi-source biomedical data but also provides a solid data foundation and framework for biomedical research.In this review,we summarize enrichment analysis and deep learning for biomedical ontology based on its structure and semantic annotation properties,highlighting how technological advancements are enabling the more comprehensive use of ontology information.Enrichment analysis represents an important application of ontology to elucidate the potential biological significance for a particular molecular list.Deep learning,on the other hand,represents an increasingly powerful analytical tool that can be more widely combined with ontology for analysis and prediction.With the continuous evolution of big data technologies,the integration of these technologies with biomedical ontologies is opening up exciting new possibilities for advancing biomedical research.展开更多
During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC gro...During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC group.We have revised the grouping annotations in Fig.1 and have labeled the DHE staining in the figure to present the experimental design more clearly.展开更多
Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throug...Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.展开更多
Next-generation sequencing(NGS) technologies generate thousands to millions of genetic variants per sample.Identification of potential disease-causal variants is labor intensive as it relies on filtering using various...Next-generation sequencing(NGS) technologies generate thousands to millions of genetic variants per sample.Identification of potential disease-causal variants is labor intensive as it relies on filtering using various annotation metrics and consideration of multiple pathogenicity prediction scores.We have developed VPOT(variant prioritization ordering tool),a python-based command line tool that allows researchers to create a single fully customizable pathogenicity ranking score from any number of annotation values,each with a user-defined weighting.The use of VPOT can be informative when analyzing entire cohorts,as variants in a cohort can be prioritized.VPOT also provides additional functions to allow variant filtering based on a candidate gene list or by affected status in a family pedigree.VPOT outperforms similar tools in terms of efficacy,flexibility,scalability,and computational performance.VPOT is freely available for public use at Git Hub(https://github.com/VCCRI/VPOT/).Documentation for installation along with a user tutorial,a default parameter file,and test data are provided.展开更多
This paper aims to investigate the effectiveness of rubric-referenced student self-assessment(SSA)on students’English essay writing by employing a two-group pre-post-quasi-experimental research design.The method was ...This paper aims to investigate the effectiveness of rubric-referenced student self-assessment(SSA)on students’English essay writing by employing a two-group pre-post-quasi-experimental research design.The method was tested on 54 students at a Chinese university.During a 17-week experiment,the experimental group(EG)received the rubric and annotated samples,while the comparison group(CG)received only the rubric in self-assessment.Data sources included students’scores in the pre-test and post-test and interviews.Quantitative findings indicated that the EG made significantly stronger progress than the CG in the post-test.Interview results suggested that annotation-based rubric-referenced SSA can help students understand the task requirements,initiate their self-regulatory behaviors,and improve their self-assessment confidence,although students still wanted to receive assistance from teachers partly due to the Confucian-heritage culture settings in China.The findings are discussed in terms of the design features of sample annotations within the framework of self-regulated learning(SRL),as well as the implications of using this method in the classroom.展开更多
In this paper we address the problem of geometric multi-model fitting using a few weakly annotated data points,which has been little studied so far.In weak annotating(WA),most manual annotations are supposed to be cor...In this paper we address the problem of geometric multi-model fitting using a few weakly annotated data points,which has been little studied so far.In weak annotating(WA),most manual annotations are supposed to be correct yet inevitably mixed with incorrect ones.Such WA data can naturally arise through interaction in various tasks.For example,in the case of homography estimation,one can easily annotate points on the same plane or object with a single label by observing the image.Motivated by this,we propose a novel method to make full use of WA data to boost multi-model fitting performance.Specifically,a graph for model proposal sampling is first constructed using the WA data,given the prior that WA data annotated with the same weak label has a high probability of belonging to the same model.By incorporating this prior knowledge into the calculation of edge probabilities,vertices(i.e.,data points)lying on or near the latent model are likely to be associated and further form a subset or cluster for effective proposal generation.Having generated proposals,α-expansion is used for labeling,and our method in return updates the proposals.This procedure works in an iterative way.Extensive experiments validate our method and show that it produces noticeably better results than state-of-the-art techniques in most cases.展开更多
The success of deep transfer learning in fault diagnosis is attributed to the collection of high-quality labeled data from the source domain.However,in engineering scenarios,achieving such high-quality label annotatio...The success of deep transfer learning in fault diagnosis is attributed to the collection of high-quality labeled data from the source domain.However,in engineering scenarios,achieving such high-quality label annotation is difficult and expensive.The incorrect label annotation produces two negative effects:1)the complex decision boundary of diagnosis models lowers the generalization performance on the target domain,and2)the distribution of target domain samples becomes misaligned with the false-labeled samples.To overcome these negative effects,this article proposes a solution called the label recovery and trajectory designable network(LRTDN).LRTDN consists of three parts.First,a residual network with dual classifiers is to learn features from cross-domain samples.Second,an annotation check module is constructed to generate a label anomaly indicator that could modify the abnormal labels of false-labeled samples in the source domain.With the training of relabeled samples,the complexity of diagnosis model is reduced via semi-supervised learning.Third,the adaptation trajectories are designed for sample distributions across domains.This ensures that the target domain samples are only adapted with the pure-labeled samples.The LRTDN is verified by two case studies,in which the diagnosis knowledge of bearings is transferred across different working conditions as well as different yet related machines.The results show that LRTDN offers a high diagnosis accuracy even in the presence of incorrect annotation.展开更多
Objective:To investigate the effect of Guangdong Shenqu(GSQ)on intestinal flora structure in mice with food stagnation through 16S rDNA sequencing.Methods: Mice were randomly assigned to control,model,GSQ low-dose(GSQ...Objective:To investigate the effect of Guangdong Shenqu(GSQ)on intestinal flora structure in mice with food stagnation through 16S rDNA sequencing.Methods: Mice were randomly assigned to control,model,GSQ low-dose(GSQL),GSQ medium-dose(GSQM),GSQ high-dose(GSQH),and lacidophilin tablets(LAB)groups,with each group containing 10 mice.A food stagnation and internal heat mouse model was established through intragastric administration of a mixture of beeswax and olive oil(1:15).The control group was administered normal saline,and the model group was administered beeswax and olive oil to maintain a state.The GSQL(2 g/kg),GSQM(4 g/kg),GSQH(8 g/kg),and LAB groups(0.625 g/kg)were administered corresponding drugs for 5 d.After administration,16S rDNA sequencing was performed to assess gut microbiota in mouse fecal samples.Results: The model group exhibited significant intestinal flora changes.Following GSQ administration,the abundance and diversity index of the intestinal flora increased significantly,the number of bacterial species was regulated,andαandβdiversity were improved.GSQ administration increased the abundance of probiotics,including Clostridia,Lachnospirales,and Lactobacillus,whereas the abundance of conditional pathogenic bacteria,such as Allobaculum,Erysipelotrichaceae,and Bacteroides decreased.Functional prediction analysis indicated that the pathogenesis of food stagnation and GSQ intervention were primarily associated with carbohydrate,lipid,and amino acid metabolism,among other metabolic pathways.Conclusion: The digestive mechanism of GSQ may be attributed to its role in restoring diversity and abundance within the intestinal flora,thereby improving the composition and structure of the intestinal flora in mice and subsequently influencing the regulation of metabolic pathways.展开更多
Butyric acid is a volatile saturated monocarboxylic acid,which is widely used in the chemical,food,pharmaceutical,energy,and animal feed industries.This study focuses on producing butyric acid from pre-treated rape st...Butyric acid is a volatile saturated monocarboxylic acid,which is widely used in the chemical,food,pharmaceutical,energy,and animal feed industries.This study focuses on producing butyric acid from pre-treated rape straw using simultaneous enzymatic hydrolysis semi-solid fermentation(SEHSF).Clostridium beijerinckii BRM001 screened from pit mud of Chinese nongxiangxing baijiu was used.The genome of C.beijerinckii BRM001 was sequenced and annotated.Using rape straw as the sole carbon source,fermentation optimization was carried out based on the genomic analysis of BRM001.The optimized butyric acid yield was as high as 13.86±0.77 g/L,which was 2.1 times higher than that of the initial screening.Furthermore,under optimal conditions,non-sterile SEHSF was carried out,and the yield of butyric acid was 13.42±0.83 g/L in a 2.5-L fermentor.This study provides a new approach for butyric acid production which eliminates the need for detoxification of straw hydrolysate and makes full use of the value of fermentation waste residue without secondary pollution,making the whole process greener and more economical,which has a certain industrial potential.展开更多
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. T...The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.展开更多
Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples w...Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples with complete annotations is prohibitively expensive,and the available annotations in practical engineering are often sparse due to limited observation.To leverage samples with sparse annotations,this paper proposes an uncertainty-based active transfer learning method.The most valuable positions in the flow field are selected based on uncertainty for annotation,effectively improving prediction accuracy and reducing annotation costs.Our method involves a novel active annotation based on synchronous quantile regression,which can mitigate the computational cost of query annotation.Besides,a novel quantile levels-based consistency regularization is proposed to constrain the remaining unlabeled regions and further improve the model performance.Experiments show that our method can significantly reduce prediction errors with only 1%extra annotations,and is a promising tool for achieving rapid and accurate flow field prediction.展开更多
基金supported by the National Key Research and Development Program of China(2022YFD1302204)the earmarked fund CARS36+1 种基金Ningxia Key Research and Development Program of China(2023BCF010042019NYYZ09)。
文摘Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BANNs model SNP and SNP-set effects in their input and hidden layers,respectively.Furthermore,the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales.However,its application in genomic prediction has yet to be explored.Results This study extended the BANNs framework to the area of genomic selection and explored the optimal SN P-set partitioning strategies by using dairy cattle datasets.The SN P-sets were partitioned based on two strategiesgene annotations and 100 kb windows,denoted as BANN_gene and BANN_100kb,respectively.The BANNs model was compared with GBLU P,random forest(RF),BayesB and BayesCπthrough five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits,type traits,and one health trait of 6,558,6,210and 5,962 Chinese Holsteins,respectively.Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLU P,RF and Bayesian methods.Specifically,the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP,RF,BayesB and BayesCrr across all traits.The average accuracy improvements of BANN_100kb over GBLU P,RF,BayesB and BayesCrr were 4.86%,3.95%,3.84%and 1.92%,and the accuracy of BANN_gene was improved by3.75%,2.86%,2.73%and 0.85%compared to GBLUP,RF,BayesB and BayesCπ,respectively across all seven traits.Meanwhile,both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP,RF and Bayesian methods.Conclusion Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios,and might serve as a promising alternative approach for genomic prediction in dairy cattle.
文摘The method presented in this work is based on the fundamental concepts of Paraconsistent Annotated Logic with annotation of 2 values (PAL2v). The PAL2v is a non-classic Logics which admits contradiction and in this paper we perform a study using mathematical interpretation in its representative lattice. This studies result in algorithms and equations give an effective treatment on signals of information that represent situations found in uncertainty knowledge database. From the obtained equations, algorithms are elaborated to be utilized in computation models of the uncertainty treatment Systems. We presented some results that were obtained of analyses done with one of the algorithms that compose the paraconsistent analyzing system of logical signals with the PAL2v Logic. The paraconsistent reasoning system built according to the PAL2v methodology notions reveals itself to be more efficient than the traditional ones, because it gets to offer an appropriate treatment to contradictory information.
基金Project supported by the National Natural Science Foundation of China (Nos 60533090 and 60603096)the National High-Tech Research and Development Program (863) of China (No 2006AA 010107)
文摘Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as "pandas eat bamboo". In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.
基金supported,in whole or in part,by the Bill&Melinda Gates Foundation INV-0028630USDA NIFA Hatch project 7003146.
文摘Deep learning and multimodal remote and proximal sensing are widely used for analyzing plant and crop traits,but many of these deep learning models are supervised and necessitate reference datasets with image annotations.Acquiring these datasets often demands experiments that are both labor-intensive and time-consuming.Furthermore,extracting traits from remote sensing data beyond simple geometric features remains a challenge.To address these challenges,we proposed a radiative transfer modeling framework based on the Helios 3-dimensional(3D)plant modeling software designed for plant remote and proximal sensing image simulation.The framework has the capability to simulate RGB,multi-/hyperspectral,thermal,and depth cameras,and produce associated plant images with fully resolved reference labels such as plant physical traits,leaf chemical concentrations,and leaf physiological traits.Helios offers a simulated environment that enables generation of 3D geometric models of plants and soil with random variation,and specification or simulation of their properties and function.This approach differs from traditional computer graphics rendering by explicitly modeling radiation transfer physics,which provides a critical link to underlying plant biophysical processes.Results indicate that the framework is capable of generating high-quality,labeled synthetic plant images under given lighting scenarios,which can lessen or remove the need for manually collected and annotated data.Two example applications are presented that demonstrate the feasibility of using the model to enable unsupervised learning by training deep learning models exclusively with simulated images and performing prediction tasks using real images.
基金supported by the Russian Science Foundation,Russia(Grant No.24-24-00031).
文摘In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA repeats,of which 496762 and 495977 were identified on the forward and reverse DNA strands,respectively.The detected DRs were,on average,374 bp in length and occupied 66.4%of the O.sativa genome.Totally 61%of DRs,identified by the IP method,overlapped with previously annotated dispersed repeats(ADRs)detected using the Extensive De Novo TE Annotator(EDTA)pipeline.
基金This work was funded by the National Natural Science Foundation of China (Grant Nos. 11275186, 91024026, and FOM2014OF001).
文摘We ascertain the modularity-like objective function whose optimization is equivalent to the maximum likelihood in annotated networks. We demonstrate that the modularity-like objective function is a lin- ear combination of modularity and conditional entropy. In contrast with statistical inference methods, in our method, the influence of the metadata is adjustable; when its influence is strong enough, the metadata can be recovered. Conversely, when it is weak, the detection may correspond to another partition. Between the two, there is a transition. This paper provides a concept for expanding the scope of modularity methods.
基金supported by the National Natural Science Foundation of China(32260097)the National Guidance of Local Science and Technology Development Fund of China([2023]009)。
文摘Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.
基金Supported by Guangdong Basic and Applied Basic Research Foundation(No.2025A1515011627)San Ming Project of Medicine in Shenzhen(No.SZSM202311012).
文摘Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.
基金the Natural Science Foundation of Shandong Province(ZR2020QC022)the Science and Technology Basic Resources Investigation Program of China(No.2019FY100900)+2 种基金the Major Program for Basic Research Project of Yunnan Province(202401BC070001)Yunnan Revitalization Talent Support Program:Yunling Scholar Project to Tingshuang Yithe open research project of“Cross Cooperative Team”of the Germplasm Bank of Wild Species,Kunming Institute of Botany,Chinese Academy of Sciences.
文摘The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannotating and utilizing plastome sequences remain challenging.Therefore,a streamlined phylogenomic pipeline spanning plastome annotation,phylogenetic reconstruction and comparative genomics would greatly facilitate research utilizing this important organellar genome.Here,we develop PlastidHub,a novel web application employing innovative tools to analyze plastome sequences.In comparison with existing tools,key novel functionalities in PlastidHub include:(1)standardization of quadripartite structure;(2)improvement of annotation flexibility and consistency;(3)quantitative assessment of annotation completeness;(4)diverse extraction modes for canonical and specialized sequences;(5)intelligent screening of molecular markers for biodiversity studies;(6)genelevel visual comparison of structural variations and annotation completeness.PlastidHub features cloud-based web applications that do not require users to install,update,or maintain tools;detailed help documents including user guides,test examples,a static pop-up prompt box,and dynamic pop-up warning prompts when entering unreasonable parameter values;batch processing capabilities for all tools;intermediate results for secondary use;and easy-to-operate task flows between fileupload and download.A key feature of PlastidHub is its interrelated task-based user interface design.Give that PlastidHub is easy to use without specialized computational skills or resources,this new platform should be widely used among botanists and evolutionary biologists,improving and expediting research employing the plastome.PlastidHub is available at https://www.plastidhub.cn.
基金supported by the National Natural Science Foundation of China(61902095).
文摘Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in related research methodologies.Biomedical ontology,as a shared formal conceptual system,not only offers standardized terms for multi-source biomedical data but also provides a solid data foundation and framework for biomedical research.In this review,we summarize enrichment analysis and deep learning for biomedical ontology based on its structure and semantic annotation properties,highlighting how technological advancements are enabling the more comprehensive use of ontology information.Enrichment analysis represents an important application of ontology to elucidate the potential biological significance for a particular molecular list.Deep learning,on the other hand,represents an increasingly powerful analytical tool that can be more widely combined with ontology for analysis and prediction.With the continuous evolution of big data technologies,the integration of these technologies with biomedical ontologies is opening up exciting new possibilities for advancing biomedical research.
文摘During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC group.We have revised the grouping annotations in Fig.1 and have labeled the DHE staining in the figure to present the experimental design more clearly.
基金supported by the National Natural Science Foundation of China(Nos.82274064,82374026,and 82204591)。
文摘Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.
基金an Australian Postgraduate Award(University of New South Wales)to EI,Chain Reaction(The Ultimate Corporate Bike Challenge),the Office of Health and Medical Research,NSW Government,Australiathe National Health and Medical Research Council Principal Research Fellowship(Grant No.1135886)to SLD,NSW Government,Australiathe National Heart Foundation of Australia Future Leader Fellowship(Grant No.101204)to EG.
文摘Next-generation sequencing(NGS) technologies generate thousands to millions of genetic variants per sample.Identification of potential disease-causal variants is labor intensive as it relies on filtering using various annotation metrics and consideration of multiple pathogenicity prediction scores.We have developed VPOT(variant prioritization ordering tool),a python-based command line tool that allows researchers to create a single fully customizable pathogenicity ranking score from any number of annotation values,each with a user-defined weighting.The use of VPOT can be informative when analyzing entire cohorts,as variants in a cohort can be prioritized.VPOT also provides additional functions to allow variant filtering based on a candidate gene list or by affected status in a family pedigree.VPOT outperforms similar tools in terms of efficacy,flexibility,scalability,and computational performance.VPOT is freely available for public use at Git Hub(https://github.com/VCCRI/VPOT/).Documentation for installation along with a user tutorial,a default parameter file,and test data are provided.
基金supported by The Research Project of Philosophy and Social Science of Ministry of Education of China[Grant No.17YJC740102]Guangdong Provincial Teaching Award Nurturing Project(Name:Developing the Self-Assessment System of Writing for the National Quality Course of College English at South China University of Technology).
文摘This paper aims to investigate the effectiveness of rubric-referenced student self-assessment(SSA)on students’English essay writing by employing a two-group pre-post-quasi-experimental research design.The method was tested on 54 students at a Chinese university.During a 17-week experiment,the experimental group(EG)received the rubric and annotated samples,while the comparison group(CG)received only the rubric in self-assessment.Data sources included students’scores in the pre-test and post-test and interviews.Quantitative findings indicated that the EG made significantly stronger progress than the CG in the post-test.Interview results suggested that annotation-based rubric-referenced SSA can help students understand the task requirements,initiate their self-regulatory behaviors,and improve their self-assessment confidence,although students still wanted to receive assistance from teachers partly due to the Confucian-heritage culture settings in China.The findings are discussed in terms of the design features of sample annotations within the framework of self-regulated learning(SRL),as well as the implications of using this method in the classroom.
基金supported in part by JSPS KAKENHI Grant JP18K17823supported in part by Deakin CY01-251301-F003-PJ03906-PG00447。
文摘In this paper we address the problem of geometric multi-model fitting using a few weakly annotated data points,which has been little studied so far.In weak annotating(WA),most manual annotations are supposed to be correct yet inevitably mixed with incorrect ones.Such WA data can naturally arise through interaction in various tasks.For example,in the case of homography estimation,one can easily annotate points on the same plane or object with a single label by observing the image.Motivated by this,we propose a novel method to make full use of WA data to boost multi-model fitting performance.Specifically,a graph for model proposal sampling is first constructed using the WA data,given the prior that WA data annotated with the same weak label has a high probability of belonging to the same model.By incorporating this prior knowledge into the calculation of edge probabilities,vertices(i.e.,data points)lying on or near the latent model are likely to be associated and further form a subset or cluster for effective proposal generation.Having generated proposals,α-expansion is used for labeling,and our method in return updates the proposals.This procedure works in an iterative way.Extensive experiments validate our method and show that it produces noticeably better results than state-of-the-art techniques in most cases.
基金the National Key R&D Program of China(2022YFB3402100)the National Science Fund for Distinguished Young Scholars of China(52025056)+4 种基金the National Natural Science Foundation of China(52305129)the China Postdoctoral Science Foundation(2023M732789)the China Postdoctoral Innovative Talents Support Program(BX20230290)the Open Foundation of Hunan Provincial Key Laboratory of Health Maintenance for Mechanical Equipment(2022JXKF JJ01)the Fundamental Research Funds for Central Universities。
文摘The success of deep transfer learning in fault diagnosis is attributed to the collection of high-quality labeled data from the source domain.However,in engineering scenarios,achieving such high-quality label annotation is difficult and expensive.The incorrect label annotation produces two negative effects:1)the complex decision boundary of diagnosis models lowers the generalization performance on the target domain,and2)the distribution of target domain samples becomes misaligned with the false-labeled samples.To overcome these negative effects,this article proposes a solution called the label recovery and trajectory designable network(LRTDN).LRTDN consists of three parts.First,a residual network with dual classifiers is to learn features from cross-domain samples.Second,an annotation check module is constructed to generate a label anomaly indicator that could modify the abnormal labels of false-labeled samples in the source domain.With the training of relabeled samples,the complexity of diagnosis model is reduced via semi-supervised learning.Third,the adaptation trajectories are designed for sample distributions across domains.This ensures that the target domain samples are only adapted with the pure-labeled samples.The LRTDN is verified by two case studies,in which the diagnosis knowledge of bearings is transferred across different working conditions as well as different yet related machines.The results show that LRTDN offers a high diagnosis accuracy even in the presence of incorrect annotation.
基金supported by the National Natural Science Foundation of China(81872995).
文摘Objective:To investigate the effect of Guangdong Shenqu(GSQ)on intestinal flora structure in mice with food stagnation through 16S rDNA sequencing.Methods: Mice were randomly assigned to control,model,GSQ low-dose(GSQL),GSQ medium-dose(GSQM),GSQ high-dose(GSQH),and lacidophilin tablets(LAB)groups,with each group containing 10 mice.A food stagnation and internal heat mouse model was established through intragastric administration of a mixture of beeswax and olive oil(1:15).The control group was administered normal saline,and the model group was administered beeswax and olive oil to maintain a state.The GSQL(2 g/kg),GSQM(4 g/kg),GSQH(8 g/kg),and LAB groups(0.625 g/kg)were administered corresponding drugs for 5 d.After administration,16S rDNA sequencing was performed to assess gut microbiota in mouse fecal samples.Results: The model group exhibited significant intestinal flora changes.Following GSQ administration,the abundance and diversity index of the intestinal flora increased significantly,the number of bacterial species was regulated,andαandβdiversity were improved.GSQ administration increased the abundance of probiotics,including Clostridia,Lachnospirales,and Lactobacillus,whereas the abundance of conditional pathogenic bacteria,such as Allobaculum,Erysipelotrichaceae,and Bacteroides decreased.Functional prediction analysis indicated that the pathogenesis of food stagnation and GSQ intervention were primarily associated with carbohydrate,lipid,and amino acid metabolism,among other metabolic pathways.Conclusion: The digestive mechanism of GSQ may be attributed to its role in restoring diversity and abundance within the intestinal flora,thereby improving the composition and structure of the intestinal flora in mice and subsequently influencing the regulation of metabolic pathways.
基金supported by grants from the National Natural Science Foundation of China(Grant No.31801522)the Cooperation Project of Wuliangye Group Co.,Ltd.and Sichuan University of Science&Engineering,China(CXY2019ZR011)Sichuan University of Science&Engineering(Item No.2020RC36).
文摘Butyric acid is a volatile saturated monocarboxylic acid,which is widely used in the chemical,food,pharmaceutical,energy,and animal feed industries.This study focuses on producing butyric acid from pre-treated rape straw using simultaneous enzymatic hydrolysis semi-solid fermentation(SEHSF).Clostridium beijerinckii BRM001 screened from pit mud of Chinese nongxiangxing baijiu was used.The genome of C.beijerinckii BRM001 was sequenced and annotated.Using rape straw as the sole carbon source,fermentation optimization was carried out based on the genomic analysis of BRM001.The optimized butyric acid yield was as high as 13.86±0.77 g/L,which was 2.1 times higher than that of the initial screening.Furthermore,under optimal conditions,non-sterile SEHSF was carried out,and the yield of butyric acid was 13.42±0.83 g/L in a 2.5-L fermentor.This study provides a new approach for butyric acid production which eliminates the need for detoxification of straw hydrolysate and makes full use of the value of fermentation waste residue without secondary pollution,making the whole process greener and more economical,which has a certain industrial potential.
基金supported by the National Natural Science Foundation of China(82171836)the 1·3·5 project for disciplines of excellence,West China Hospital,Sichuan University(ZYJC20002).
文摘The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
基金supported by the National Natural Science Foundation of China(No.92371206).
文摘Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples with complete annotations is prohibitively expensive,and the available annotations in practical engineering are often sparse due to limited observation.To leverage samples with sparse annotations,this paper proposes an uncertainty-based active transfer learning method.The most valuable positions in the flow field are selected based on uncertainty for annotation,effectively improving prediction accuracy and reducing annotation costs.Our method involves a novel active annotation based on synchronous quantile regression,which can mitigate the computational cost of query annotation.Besides,a novel quantile levels-based consistency regularization is proposed to constrain the remaining unlabeled regions and further improve the model performance.Experiments show that our method can significantly reduce prediction errors with only 1%extra annotations,and is a promising tool for achieving rapid and accurate flow field prediction.