Dear Editor,Plant mitochondria are essential organelles of plant cells and play crucial roles in oxidative phosphorylation and diverse metabolic processes(Wang et al.,2024).Mitochondria have an endosymbiotic origin,li...Dear Editor,Plant mitochondria are essential organelles of plant cells and play crucial roles in oxidative phosphorylation and diverse metabolic processes(Wang et al.,2024).Mitochondria have an endosymbiotic origin,likely from a proteobacterial lineage that branched off before the divergence of known alphaproteobacterial groups(Martijn et al.,2018),and contain their own genetic materials,known as plant mitochondrial genomes(PMGs).PMGs exhibit unique features that may result from mechanisms different from those of nuclei and plastids,such as DNA repair(Chevigny et al.,2020),mRNA splicing(Novikova and Belfort.2017;Guo et al.,2020),transcription regulation,and foreign sequence integration.Recent advances in genome editing technologies have facilitated targeted modifications of PMGs,enabling the functional characterization of mitochondrial genes and the generation of desirable phenotypes(Forner et al.,2023;Xu et al.,2024).Accurate annotations of PMGs are a prerequisite for basic and applied research on mitochondria,such as studies of PMG evolution or targeted breeding of new crop varieties by PMG engineering.展开更多
In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA ...In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA repeats,of which 496762 and 495977 were identified on the forward and reverse DNA strands,respectively.The detected DRs were,on average,374 bp in length and occupied 66.4%of the O.sativa genome.Totally 61%of DRs,identified by the IP method,overlapped with previously annotated dispersed repeats(ADRs)detected using the Extensive De Novo TE Annotator(EDTA)pipeline.展开更多
Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.teg...Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.展开更多
Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of...Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.展开更多
The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannot...The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannotating and utilizing plastome sequences remain challenging.Therefore,a streamlined phylogenomic pipeline spanning plastome annotation,phylogenetic reconstruction and comparative genomics would greatly facilitate research utilizing this important organellar genome.Here,we develop PlastidHub,a novel web application employing innovative tools to analyze plastome sequences.In comparison with existing tools,key novel functionalities in PlastidHub include:(1)standardization of quadripartite structure;(2)improvement of annotation flexibility and consistency;(3)quantitative assessment of annotation completeness;(4)diverse extraction modes for canonical and specialized sequences;(5)intelligent screening of molecular markers for biodiversity studies;(6)genelevel visual comparison of structural variations and annotation completeness.PlastidHub features cloud-based web applications that do not require users to install,update,or maintain tools;detailed help documents including user guides,test examples,a static pop-up prompt box,and dynamic pop-up warning prompts when entering unreasonable parameter values;batch processing capabilities for all tools;intermediate results for secondary use;and easy-to-operate task flows between fileupload and download.A key feature of PlastidHub is its interrelated task-based user interface design.Give that PlastidHub is easy to use without specialized computational skills or resources,this new platform should be widely used among botanists and evolutionary biologists,improving and expediting research employing the plastome.PlastidHub is available at https://www.plastidhub.cn.展开更多
Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in relat...Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in related research methodologies.Biomedical ontology,as a shared formal conceptual system,not only offers standardized terms for multi-source biomedical data but also provides a solid data foundation and framework for biomedical research.In this review,we summarize enrichment analysis and deep learning for biomedical ontology based on its structure and semantic annotation properties,highlighting how technological advancements are enabling the more comprehensive use of ontology information.Enrichment analysis represents an important application of ontology to elucidate the potential biological significance for a particular molecular list.Deep learning,on the other hand,represents an increasingly powerful analytical tool that can be more widely combined with ontology for analysis and prediction.With the continuous evolution of big data technologies,the integration of these technologies with biomedical ontologies is opening up exciting new possibilities for advancing biomedical research.展开更多
During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC gro...During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC group.We have revised the grouping annotations in Fig.1 and have labeled the DHE staining in the figure to present the experimental design more clearly.展开更多
Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throug...Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.展开更多
The success of deep transfer learning in fault diagnosis is attributed to the collection of high-quality labeled data from the source domain.However,in engineering scenarios,achieving such high-quality label annotatio...The success of deep transfer learning in fault diagnosis is attributed to the collection of high-quality labeled data from the source domain.However,in engineering scenarios,achieving such high-quality label annotation is difficult and expensive.The incorrect label annotation produces two negative effects:1)the complex decision boundary of diagnosis models lowers the generalization performance on the target domain,and2)the distribution of target domain samples becomes misaligned with the false-labeled samples.To overcome these negative effects,this article proposes a solution called the label recovery and trajectory designable network(LRTDN).LRTDN consists of three parts.First,a residual network with dual classifiers is to learn features from cross-domain samples.Second,an annotation check module is constructed to generate a label anomaly indicator that could modify the abnormal labels of false-labeled samples in the source domain.With the training of relabeled samples,the complexity of diagnosis model is reduced via semi-supervised learning.Third,the adaptation trajectories are designed for sample distributions across domains.This ensures that the target domain samples are only adapted with the pure-labeled samples.The LRTDN is verified by two case studies,in which the diagnosis knowledge of bearings is transferred across different working conditions as well as different yet related machines.The results show that LRTDN offers a high diagnosis accuracy even in the presence of incorrect annotation.展开更多
Objective:To investigate the effect of Guangdong Shenqu(GSQ)on intestinal flora structure in mice with food stagnation through 16S rDNA sequencing.Methods: Mice were randomly assigned to control,model,GSQ low-dose(GSQ...Objective:To investigate the effect of Guangdong Shenqu(GSQ)on intestinal flora structure in mice with food stagnation through 16S rDNA sequencing.Methods: Mice were randomly assigned to control,model,GSQ low-dose(GSQL),GSQ medium-dose(GSQM),GSQ high-dose(GSQH),and lacidophilin tablets(LAB)groups,with each group containing 10 mice.A food stagnation and internal heat mouse model was established through intragastric administration of a mixture of beeswax and olive oil(1:15).The control group was administered normal saline,and the model group was administered beeswax and olive oil to maintain a state.The GSQL(2 g/kg),GSQM(4 g/kg),GSQH(8 g/kg),and LAB groups(0.625 g/kg)were administered corresponding drugs for 5 d.After administration,16S rDNA sequencing was performed to assess gut microbiota in mouse fecal samples.Results: The model group exhibited significant intestinal flora changes.Following GSQ administration,the abundance and diversity index of the intestinal flora increased significantly,the number of bacterial species was regulated,andαandβdiversity were improved.GSQ administration increased the abundance of probiotics,including Clostridia,Lachnospirales,and Lactobacillus,whereas the abundance of conditional pathogenic bacteria,such as Allobaculum,Erysipelotrichaceae,and Bacteroides decreased.Functional prediction analysis indicated that the pathogenesis of food stagnation and GSQ intervention were primarily associated with carbohydrate,lipid,and amino acid metabolism,among other metabolic pathways.Conclusion: The digestive mechanism of GSQ may be attributed to its role in restoring diversity and abundance within the intestinal flora,thereby improving the composition and structure of the intestinal flora in mice and subsequently influencing the regulation of metabolic pathways.展开更多
Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BA...Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BANNs model SNP and SNP-set effects in their input and hidden layers,respectively.Furthermore,the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales.However,its application in genomic prediction has yet to be explored.Results This study extended the BANNs framework to the area of genomic selection and explored the optimal SN P-set partitioning strategies by using dairy cattle datasets.The SN P-sets were partitioned based on two strategiesgene annotations and 100 kb windows,denoted as BANN_gene and BANN_100kb,respectively.The BANNs model was compared with GBLU P,random forest(RF),BayesB and BayesCπthrough five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits,type traits,and one health trait of 6,558,6,210and 5,962 Chinese Holsteins,respectively.Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLU P,RF and Bayesian methods.Specifically,the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP,RF,BayesB and BayesCrr across all traits.The average accuracy improvements of BANN_100kb over GBLU P,RF,BayesB and BayesCrr were 4.86%,3.95%,3.84%and 1.92%,and the accuracy of BANN_gene was improved by3.75%,2.86%,2.73%and 0.85%compared to GBLUP,RF,BayesB and BayesCπ,respectively across all seven traits.Meanwhile,both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP,RF and Bayesian methods.Conclusion Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios,and might serve as a promising alternative approach for genomic prediction in dairy cattle.展开更多
Butyric acid is a volatile saturated monocarboxylic acid,which is widely used in the chemical,food,pharmaceutical,energy,and animal feed industries.This study focuses on producing butyric acid from pre-treated rape st...Butyric acid is a volatile saturated monocarboxylic acid,which is widely used in the chemical,food,pharmaceutical,energy,and animal feed industries.This study focuses on producing butyric acid from pre-treated rape straw using simultaneous enzymatic hydrolysis semi-solid fermentation(SEHSF).Clostridium beijerinckii BRM001 screened from pit mud of Chinese nongxiangxing baijiu was used.The genome of C.beijerinckii BRM001 was sequenced and annotated.Using rape straw as the sole carbon source,fermentation optimization was carried out based on the genomic analysis of BRM001.The optimized butyric acid yield was as high as 13.86±0.77 g/L,which was 2.1 times higher than that of the initial screening.Furthermore,under optimal conditions,non-sterile SEHSF was carried out,and the yield of butyric acid was 13.42±0.83 g/L in a 2.5-L fermentor.This study provides a new approach for butyric acid production which eliminates the need for detoxification of straw hydrolysate and makes full use of the value of fermentation waste residue without secondary pollution,making the whole process greener and more economical,which has a certain industrial potential.展开更多
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. T...The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.展开更多
Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples w...Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples with complete annotations is prohibitively expensive,and the available annotations in practical engineering are often sparse due to limited observation.To leverage samples with sparse annotations,this paper proposes an uncertainty-based active transfer learning method.The most valuable positions in the flow field are selected based on uncertainty for annotation,effectively improving prediction accuracy and reducing annotation costs.Our method involves a novel active annotation based on synchronous quantile regression,which can mitigate the computational cost of query annotation.Besides,a novel quantile levels-based consistency regularization is proposed to constrain the remaining unlabeled regions and further improve the model performance.Experiments show that our method can significantly reduce prediction errors with only 1%extra annotations,and is a promising tool for achieving rapid and accurate flow field prediction.展开更多
Brassica oleracea has been developed into many important crops,including cabbage,kale,cauliflower,broccoli and so on.The genome and gene annotation of cabbage(cultivar JZS),a representative morphotype of B.oleracea,ha...Brassica oleracea has been developed into many important crops,including cabbage,kale,cauliflower,broccoli and so on.The genome and gene annotation of cabbage(cultivar JZS),a representative morphotype of B.oleracea,has been widely used as a common reference in biological research.Although its genome assembly has been updated twice,the current gene annotation still lacks information on untranslated regions(UTRs)and alternative splicing(AS).Here,we constructed a high-quality gene annotation(JZSv3)using a full-length transcriptome acquired by nanopore sequencing,yielding a total of 59452 genes and 75684 transcripts.Additionally,we re-analyzed the previously reported transcriptome data related to the development of different tissues and cold response using JZSv3 as a reference,and found that 3843 out of 11908 differentially expressed genes(DEGs)underwent AS during the development of different tissues and 309 out of 903 cold-related genes underwent AS in response to cold stress.Meanwhile,we also identified many AS genes,including BolLHCB5 and BolHSP70,that displayed distinct expression patterns within variant transcripts of the same gene,highlighting the importance of JZSv3 as a pivotal reference for AS analysis.Overall,JZSv3 provides a valuable resource for exploring gene function,especially for obtaining a deeper understanding of AS regulation mechanisms.展开更多
●AIM:To investigate a pioneering framework for the segmentation of meibomian glands(MGs),using limited annotations to reduce the workload on ophthalmologists and enhance the efficiency of clinical diagnosis.●METHODS...●AIM:To investigate a pioneering framework for the segmentation of meibomian glands(MGs),using limited annotations to reduce the workload on ophthalmologists and enhance the efficiency of clinical diagnosis.●METHODS:Totally 203 infrared meibomian images from 138 patients with dry eye disease,accompanied by corresponding annotations,were gathered for the study.A rectified scribble-supervised gland segmentation(RSSGS)model,incorporating temporal ensemble prediction,uncertainty estimation,and a transformation equivariance constraint,was introduced to address constraints imposed by limited supervision information inherent in scribble annotations.The viability and efficacy of the proposed model were assessed based on accuracy,intersection over union(IoU),and dice coefficient.●RESULTS:Using manual labels as the gold standard,RSSGS demonstrated outcomes with an accuracy of 93.54%,a dice coefficient of 78.02%,and an IoU of 64.18%.Notably,these performance metrics exceed the current weakly supervised state-of-the-art methods by 0.76%,2.06%,and 2.69%,respectively.Furthermore,despite achieving a substantial 80%reduction in annotation costs,it only lags behind fully annotated methods by 0.72%,1.51%,and 2.04%.●CONCLUSION:An innovative automatic segmentation model is developed for MGs in infrared eyelid images,using scribble annotation for training.This model maintains an exceptionally high level of segmentation accuracy while substantially reducing training costs.It holds substantial utility for calculating clinical parameters,thereby greatly enhancing the diagnostic efficiency of ophthalmologists in evaluating meibomian gland dysfunction.展开更多
As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects in...As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.展开更多
Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data...Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data gathering and analysis can obstruct the replication and utilization of the results,ultimately affecting the study’s transparency and integrity.The task of manually annotating extensive datasets demands considerable labor and financial investment,especially when it entails engaging specialized individuals.In our crowd counting study,we employed the web-based annotation tool SuperAnnotate to streamline the human annotation process for a dataset comprising 3,000 images.By integrating automated annotation tools,we realized substantial time efficiencies,as demonstrated by the remarkable achievement of 858,958 annotations.This underscores the significant contribution of such technologies to the efficiency of the annotation process.展开更多
Genetic factors play a critical role in autoimmune hepatitis (AIH), and numerous studies have been conducted to identify variants associated with the risk of AIH. However, our knowledge of these genetic risk factors i...Genetic factors play a critical role in autoimmune hepatitis (AIH), and numerous studies have been conducted to identify variants associated with the risk of AIH. However, our knowledge of these genetic risk factors is still limited. In this study, we aim to provide a comprehensive synopsis of the genetic architecture of this disease. A systematic search was conducted to identify published studies on the associations between genetic variants and the risk of AIH. Meta-analyses were conducted to calculate the pooled odds ratio (OR) and 95% confidence interval (CI). Then, the cumulative evidence was evaluated for significant associations according to the Venice criteria and false-positive report probability. Finally, functional annotations and pathway analyses were conducted to identify potential pathogenic loci and related pathways. In total, 62 studies involving 11,068 cases and 45,482 controls were included to assess the association between 75 genetic variants and the risk of AIH. Among them, 24 variants were associated with the risk of AIH, and there is strong cumulative evidence supporting these associations. Importantly, HLA DRB1*0301 (OR: 3.023, 95% CI: 2.443 - 1.678, P = 2.81 × 10?24) and DRB3*0101 (OR: 3.667, 95% CI: 2.649 - 5.075, P = 4.69 × 10?15) are newly identified genome-wide significant risk loci. In addition, the rs3184504 variant (OR: 1.305, 95% CI: 1.122 - 1.516, P = 0.001) in the SH2B3 gene is a potential functional mutation. GO pathway analysis suggests that these genes are enriched in antigen processing and presentation, response to interferon-gamma, and immune response-regulating signaling pathways. This study comprehensively summarizes the genetic architecture of AIH and provides cumulative evidence. We have identified two new loci that exceed genome-wide significance. The findings from this study will offer new insights into the pathogenesis of AIH.展开更多
基金supported by the CAMS Innovation Fund for Medical Sciences(CIFMS)(2021-I2M-1-022)the National Science&Technology Fundamental Resources Investigation Program of China(2018FY100705)the Provincial Special Project for Construction of Innovation Demonstration Area at Chenzhou City under National Sustainable Development Plan(2023sfq04).
文摘Dear Editor,Plant mitochondria are essential organelles of plant cells and play crucial roles in oxidative phosphorylation and diverse metabolic processes(Wang et al.,2024).Mitochondria have an endosymbiotic origin,likely from a proteobacterial lineage that branched off before the divergence of known alphaproteobacterial groups(Martijn et al.,2018),and contain their own genetic materials,known as plant mitochondrial genomes(PMGs).PMGs exhibit unique features that may result from mechanisms different from those of nuclei and plastids,such as DNA repair(Chevigny et al.,2020),mRNA splicing(Novikova and Belfort.2017;Guo et al.,2020),transcription regulation,and foreign sequence integration.Recent advances in genome editing technologies have facilitated targeted modifications of PMGs,enabling the functional characterization of mitochondrial genes and the generation of desirable phenotypes(Forner et al.,2023;Xu et al.,2024).Accurate annotations of PMGs are a prerequisite for basic and applied research on mitochondria,such as studies of PMG evolution or targeted breeding of new crop varieties by PMG engineering.
基金supported by the Russian Science Foundation,Russia(Grant No.24-24-00031).
文摘In this study,we searched for dispersed repeats(DRs)in the rice(Oryza sativa)genome using the iterative procedure(IP)method.The results revealed that the O.sativa genome contained 79 DR families,comprising 992739 DNA repeats,of which 496762 and 495977 were identified on the forward and reverse DNA strands,respectively.The detected DRs were,on average,374 bp in length and occupied 66.4%of the O.sativa genome.Totally 61%of DRs,identified by the IP method,overlapped with previously annotated dispersed repeats(ADRs)detected using the Extensive De Novo TE Annotator(EDTA)pipeline.
基金supported by the National Natural Science Foundation of China(32260097)the National Guidance of Local Science and Technology Development Fund of China([2023]009)。
文摘Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.
基金Supported by Guangdong Basic and Applied Basic Research Foundation(No.2025A1515011627)San Ming Project of Medicine in Shenzhen(No.SZSM202311012).
文摘Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.
基金the Natural Science Foundation of Shandong Province(ZR2020QC022)the Science and Technology Basic Resources Investigation Program of China(No.2019FY100900)+2 种基金the Major Program for Basic Research Project of Yunnan Province(202401BC070001)Yunnan Revitalization Talent Support Program:Yunling Scholar Project to Tingshuang Yithe open research project of“Cross Cooperative Team”of the Germplasm Bank of Wild Species,Kunming Institute of Botany,Chinese Academy of Sciences.
文摘The plastid genome(plastome)represents an indispensable molecular resource for studying plant phylogeny and evolution.Although plastome size is much smaller than that of nuclear genomes,accurately and efficientlyannotating and utilizing plastome sequences remain challenging.Therefore,a streamlined phylogenomic pipeline spanning plastome annotation,phylogenetic reconstruction and comparative genomics would greatly facilitate research utilizing this important organellar genome.Here,we develop PlastidHub,a novel web application employing innovative tools to analyze plastome sequences.In comparison with existing tools,key novel functionalities in PlastidHub include:(1)standardization of quadripartite structure;(2)improvement of annotation flexibility and consistency;(3)quantitative assessment of annotation completeness;(4)diverse extraction modes for canonical and specialized sequences;(5)intelligent screening of molecular markers for biodiversity studies;(6)genelevel visual comparison of structural variations and annotation completeness.PlastidHub features cloud-based web applications that do not require users to install,update,or maintain tools;detailed help documents including user guides,test examples,a static pop-up prompt box,and dynamic pop-up warning prompts when entering unreasonable parameter values;batch processing capabilities for all tools;intermediate results for secondary use;and easy-to-operate task flows between fileupload and download.A key feature of PlastidHub is its interrelated task-based user interface design.Give that PlastidHub is easy to use without specialized computational skills or resources,this new platform should be widely used among botanists and evolutionary biologists,improving and expediting research employing the plastome.PlastidHub is available at https://www.plastidhub.cn.
基金supported by the National Natural Science Foundation of China(61902095).
文摘Biomedical big data,characterized by its massive scale,multi-dimensionality,and heterogeneity,offers novel perspectives for disease research,elucidates biological principles,and simultaneously prompts changes in related research methodologies.Biomedical ontology,as a shared formal conceptual system,not only offers standardized terms for multi-source biomedical data but also provides a solid data foundation and framework for biomedical research.In this review,we summarize enrichment analysis and deep learning for biomedical ontology based on its structure and semantic annotation properties,highlighting how technological advancements are enabling the more comprehensive use of ontology information.Enrichment analysis represents an important application of ontology to elucidate the potential biological significance for a particular molecular list.Deep learning,on the other hand,represents an increasingly powerful analytical tool that can be more widely combined with ontology for analysis and prediction.With the continuous evolution of big data technologies,the integration of these technologies with biomedical ontologies is opening up exciting new possibilities for advancing biomedical research.
文摘During the process of organizing our original data,we unfortunately identified two error in the figures within our published article.In Fig.1,the online version incorrectly labels the SNI+NAC group as the sham+NAC group.We have revised the grouping annotations in Fig.1 and have labeled the DHE staining in the figure to present the experimental design more clearly.
基金supported by the National Natural Science Foundation of China(Nos.82274064,82374026,and 82204591)。
文摘Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.
基金the National Key R&D Program of China(2022YFB3402100)the National Science Fund for Distinguished Young Scholars of China(52025056)+4 种基金the National Natural Science Foundation of China(52305129)the China Postdoctoral Science Foundation(2023M732789)the China Postdoctoral Innovative Talents Support Program(BX20230290)the Open Foundation of Hunan Provincial Key Laboratory of Health Maintenance for Mechanical Equipment(2022JXKF JJ01)the Fundamental Research Funds for Central Universities。
文摘The success of deep transfer learning in fault diagnosis is attributed to the collection of high-quality labeled data from the source domain.However,in engineering scenarios,achieving such high-quality label annotation is difficult and expensive.The incorrect label annotation produces two negative effects:1)the complex decision boundary of diagnosis models lowers the generalization performance on the target domain,and2)the distribution of target domain samples becomes misaligned with the false-labeled samples.To overcome these negative effects,this article proposes a solution called the label recovery and trajectory designable network(LRTDN).LRTDN consists of three parts.First,a residual network with dual classifiers is to learn features from cross-domain samples.Second,an annotation check module is constructed to generate a label anomaly indicator that could modify the abnormal labels of false-labeled samples in the source domain.With the training of relabeled samples,the complexity of diagnosis model is reduced via semi-supervised learning.Third,the adaptation trajectories are designed for sample distributions across domains.This ensures that the target domain samples are only adapted with the pure-labeled samples.The LRTDN is verified by two case studies,in which the diagnosis knowledge of bearings is transferred across different working conditions as well as different yet related machines.The results show that LRTDN offers a high diagnosis accuracy even in the presence of incorrect annotation.
基金supported by the National Natural Science Foundation of China(81872995).
文摘Objective:To investigate the effect of Guangdong Shenqu(GSQ)on intestinal flora structure in mice with food stagnation through 16S rDNA sequencing.Methods: Mice were randomly assigned to control,model,GSQ low-dose(GSQL),GSQ medium-dose(GSQM),GSQ high-dose(GSQH),and lacidophilin tablets(LAB)groups,with each group containing 10 mice.A food stagnation and internal heat mouse model was established through intragastric administration of a mixture of beeswax and olive oil(1:15).The control group was administered normal saline,and the model group was administered beeswax and olive oil to maintain a state.The GSQL(2 g/kg),GSQM(4 g/kg),GSQH(8 g/kg),and LAB groups(0.625 g/kg)were administered corresponding drugs for 5 d.After administration,16S rDNA sequencing was performed to assess gut microbiota in mouse fecal samples.Results: The model group exhibited significant intestinal flora changes.Following GSQ administration,the abundance and diversity index of the intestinal flora increased significantly,the number of bacterial species was regulated,andαandβdiversity were improved.GSQ administration increased the abundance of probiotics,including Clostridia,Lachnospirales,and Lactobacillus,whereas the abundance of conditional pathogenic bacteria,such as Allobaculum,Erysipelotrichaceae,and Bacteroides decreased.Functional prediction analysis indicated that the pathogenesis of food stagnation and GSQ intervention were primarily associated with carbohydrate,lipid,and amino acid metabolism,among other metabolic pathways.Conclusion: The digestive mechanism of GSQ may be attributed to its role in restoring diversity and abundance within the intestinal flora,thereby improving the composition and structure of the intestinal flora in mice and subsequently influencing the regulation of metabolic pathways.
基金supported by the National Key Research and Development Program of China(2022YFD1302204)the earmarked fund CARS36+1 种基金Ningxia Key Research and Development Program of China(2023BCF010042019NYYZ09)。
文摘Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BANNs model SNP and SNP-set effects in their input and hidden layers,respectively.Furthermore,the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales.However,its application in genomic prediction has yet to be explored.Results This study extended the BANNs framework to the area of genomic selection and explored the optimal SN P-set partitioning strategies by using dairy cattle datasets.The SN P-sets were partitioned based on two strategiesgene annotations and 100 kb windows,denoted as BANN_gene and BANN_100kb,respectively.The BANNs model was compared with GBLU P,random forest(RF),BayesB and BayesCπthrough five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits,type traits,and one health trait of 6,558,6,210and 5,962 Chinese Holsteins,respectively.Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLU P,RF and Bayesian methods.Specifically,the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP,RF,BayesB and BayesCrr across all traits.The average accuracy improvements of BANN_100kb over GBLU P,RF,BayesB and BayesCrr were 4.86%,3.95%,3.84%and 1.92%,and the accuracy of BANN_gene was improved by3.75%,2.86%,2.73%and 0.85%compared to GBLUP,RF,BayesB and BayesCπ,respectively across all seven traits.Meanwhile,both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP,RF and Bayesian methods.Conclusion Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios,and might serve as a promising alternative approach for genomic prediction in dairy cattle.
基金supported by grants from the National Natural Science Foundation of China(Grant No.31801522)the Cooperation Project of Wuliangye Group Co.,Ltd.and Sichuan University of Science&Engineering,China(CXY2019ZR011)Sichuan University of Science&Engineering(Item No.2020RC36).
文摘Butyric acid is a volatile saturated monocarboxylic acid,which is widely used in the chemical,food,pharmaceutical,energy,and animal feed industries.This study focuses on producing butyric acid from pre-treated rape straw using simultaneous enzymatic hydrolysis semi-solid fermentation(SEHSF).Clostridium beijerinckii BRM001 screened from pit mud of Chinese nongxiangxing baijiu was used.The genome of C.beijerinckii BRM001 was sequenced and annotated.Using rape straw as the sole carbon source,fermentation optimization was carried out based on the genomic analysis of BRM001.The optimized butyric acid yield was as high as 13.86±0.77 g/L,which was 2.1 times higher than that of the initial screening.Furthermore,under optimal conditions,non-sterile SEHSF was carried out,and the yield of butyric acid was 13.42±0.83 g/L in a 2.5-L fermentor.This study provides a new approach for butyric acid production which eliminates the need for detoxification of straw hydrolysate and makes full use of the value of fermentation waste residue without secondary pollution,making the whole process greener and more economical,which has a certain industrial potential.
基金supported by the National Natural Science Foundation of China(82171836)the 1·3·5 project for disciplines of excellence,West China Hospital,Sichuan University(ZYJC20002).
文摘The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
基金supported by the National Natural Science Foundation of China(No.92371206).
文摘Deep learning has been widely applied in surrogate modeling for airfoil flow field prediction.The success of deep learning relies heavily on large-scale,high-quality labeled samples.However,acquiring labeled samples with complete annotations is prohibitively expensive,and the available annotations in practical engineering are often sparse due to limited observation.To leverage samples with sparse annotations,this paper proposes an uncertainty-based active transfer learning method.The most valuable positions in the flow field are selected based on uncertainty for annotation,effectively improving prediction accuracy and reducing annotation costs.Our method involves a novel active annotation based on synchronous quantile regression,which can mitigate the computational cost of query annotation.Besides,a novel quantile levels-based consistency regularization is proposed to constrain the remaining unlabeled regions and further improve the model performance.Experiments show that our method can significantly reduce prediction errors with only 1%extra annotations,and is a promising tool for achieving rapid and accurate flow field prediction.
基金supported by the National Natural Science Foundation of China (Grant Nos.31972411,31722048,and 31630068)the Central Public-interest Scientific Institution Basal Research Fund (Grant No.Y2022PT23)+1 种基金the Innovation Program of the Chinese Academy of Agricultural Sciences,and the Key Laboratory of Biology and Genetic Improvement of Horticultural Crops,Ministry of Agriculture and Rural Affairs,P.R.Chinasupported by NIFA,the Department of Agriculture,via UC-Berkeley,USA。
文摘Brassica oleracea has been developed into many important crops,including cabbage,kale,cauliflower,broccoli and so on.The genome and gene annotation of cabbage(cultivar JZS),a representative morphotype of B.oleracea,has been widely used as a common reference in biological research.Although its genome assembly has been updated twice,the current gene annotation still lacks information on untranslated regions(UTRs)and alternative splicing(AS).Here,we constructed a high-quality gene annotation(JZSv3)using a full-length transcriptome acquired by nanopore sequencing,yielding a total of 59452 genes and 75684 transcripts.Additionally,we re-analyzed the previously reported transcriptome data related to the development of different tissues and cold response using JZSv3 as a reference,and found that 3843 out of 11908 differentially expressed genes(DEGs)underwent AS during the development of different tissues and 309 out of 903 cold-related genes underwent AS in response to cold stress.Meanwhile,we also identified many AS genes,including BolLHCB5 and BolHSP70,that displayed distinct expression patterns within variant transcripts of the same gene,highlighting the importance of JZSv3 as a pivotal reference for AS analysis.Overall,JZSv3 provides a valuable resource for exploring gene function,especially for obtaining a deeper understanding of AS regulation mechanisms.
基金Supported by Natural Science Foundation of Fujian Province(No.2020J011084)Fujian Province Technology and Economy Integration Service Platform(No.2023XRH001)Fuzhou-Xiamen-Quanzhou National Independent Innovation Demonstration Zone Collaborative Innovation Platform(No.2022FX5)。
文摘●AIM:To investigate a pioneering framework for the segmentation of meibomian glands(MGs),using limited annotations to reduce the workload on ophthalmologists and enhance the efficiency of clinical diagnosis.●METHODS:Totally 203 infrared meibomian images from 138 patients with dry eye disease,accompanied by corresponding annotations,were gathered for the study.A rectified scribble-supervised gland segmentation(RSSGS)model,incorporating temporal ensemble prediction,uncertainty estimation,and a transformation equivariance constraint,was introduced to address constraints imposed by limited supervision information inherent in scribble annotations.The viability and efficacy of the proposed model were assessed based on accuracy,intersection over union(IoU),and dice coefficient.●RESULTS:Using manual labels as the gold standard,RSSGS demonstrated outcomes with an accuracy of 93.54%,a dice coefficient of 78.02%,and an IoU of 64.18%.Notably,these performance metrics exceed the current weakly supervised state-of-the-art methods by 0.76%,2.06%,and 2.69%,respectively.Furthermore,despite achieving a substantial 80%reduction in annotation costs,it only lags behind fully annotated methods by 0.72%,1.51%,and 2.04%.●CONCLUSION:An innovative automatic segmentation model is developed for MGs in infrared eyelid images,using scribble annotation for training.This model maintains an exceptionally high level of segmentation accuracy while substantially reducing training costs.It holds substantial utility for calculating clinical parameters,thereby greatly enhancing the diagnostic efficiency of ophthalmologists in evaluating meibomian gland dysfunction.
文摘As Natural Language Processing(NLP)continues to advance,driven by the emergence of sophisticated large language models such as ChatGPT,there has been a notable growth in research activity.This rapid uptake reflects increasing interest in the field and induces critical inquiries into ChatGPT’s applicability in the NLP domain.This review paper systematically investigates the role of ChatGPT in diverse NLP tasks,including information extraction,Name Entity Recognition(NER),event extraction,relation extraction,Part of Speech(PoS)tagging,text classification,sentiment analysis,emotion recognition and text annotation.The novelty of this work lies in its comprehensive analysis of the existing literature,addressing a critical gap in understanding ChatGPT’s adaptability,limitations,and optimal application.In this paper,we employed a systematic stepwise approach following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)framework to direct our search process and seek relevant studies.Our review reveals ChatGPT’s significant potential in enhancing various NLP tasks.Its adaptability in information extraction tasks,sentiment analysis,and text classification showcases its ability to comprehend diverse contexts and extract meaningful details.Additionally,ChatGPT’s flexibility in annotation tasks reducesmanual efforts and accelerates the annotation process,making it a valuable asset in NLP development and research.Furthermore,GPT-4 and prompt engineering emerge as a complementary mechanism,empowering users to guide the model and enhance overall accuracy.Despite its promising potential,challenges persist.The performance of ChatGP Tneeds tobe testedusingmore extensivedatasets anddiversedata structures.Subsequently,its limitations in handling domain-specific language and the need for fine-tuning in specific applications highlight the importance of further investigations to address these issues.
基金The authors acknowledge that this research work was supported through project number UCS&T/R&D-09/19–20/17533 titled“An Intelligent Computational Model for Crowd Demonstration and Risk Analysis during Spiritual Events in Haridwar’by the Uttarakhand Council for Science and Technology(UCOST),India.”。
文摘Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data gathering and analysis can obstruct the replication and utilization of the results,ultimately affecting the study’s transparency and integrity.The task of manually annotating extensive datasets demands considerable labor and financial investment,especially when it entails engaging specialized individuals.In our crowd counting study,we employed the web-based annotation tool SuperAnnotate to streamline the human annotation process for a dataset comprising 3,000 images.By integrating automated annotation tools,we realized substantial time efficiencies,as demonstrated by the remarkable achievement of 858,958 annotations.This underscores the significant contribution of such technologies to the efficiency of the annotation process.
文摘Genetic factors play a critical role in autoimmune hepatitis (AIH), and numerous studies have been conducted to identify variants associated with the risk of AIH. However, our knowledge of these genetic risk factors is still limited. In this study, we aim to provide a comprehensive synopsis of the genetic architecture of this disease. A systematic search was conducted to identify published studies on the associations between genetic variants and the risk of AIH. Meta-analyses were conducted to calculate the pooled odds ratio (OR) and 95% confidence interval (CI). Then, the cumulative evidence was evaluated for significant associations according to the Venice criteria and false-positive report probability. Finally, functional annotations and pathway analyses were conducted to identify potential pathogenic loci and related pathways. In total, 62 studies involving 11,068 cases and 45,482 controls were included to assess the association between 75 genetic variants and the risk of AIH. Among them, 24 variants were associated with the risk of AIH, and there is strong cumulative evidence supporting these associations. Importantly, HLA DRB1*0301 (OR: 3.023, 95% CI: 2.443 - 1.678, P = 2.81 × 10?24) and DRB3*0101 (OR: 3.667, 95% CI: 2.649 - 5.075, P = 4.69 × 10?15) are newly identified genome-wide significant risk loci. In addition, the rs3184504 variant (OR: 1.305, 95% CI: 1.122 - 1.516, P = 0.001) in the SH2B3 gene is a potential functional mutation. GO pathway analysis suggests that these genes are enriched in antigen processing and presentation, response to interferon-gamma, and immune response-regulating signaling pathways. This study comprehensively summarizes the genetic architecture of AIH and provides cumulative evidence. We have identified two new loci that exceed genome-wide significance. The findings from this study will offer new insights into the pathogenesis of AIH.