In machine learning,sentiment analysis is a technique to find and analyze the sentiments hidden in the text.For sentiment analysis,annotated data is a basic requirement.Generally,this data is manually annotated.Manual...In machine learning,sentiment analysis is a technique to find and analyze the sentiments hidden in the text.For sentiment analysis,annotated data is a basic requirement.Generally,this data is manually annotated.Manual annotation is time consuming,costly and laborious process.To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis.Dataset is created from the reviews of ten most popular songs on YouTube.Reviews of five aspects—voice,video,music,lyrics and song,are extracted.An N-Gram based technique is proposed.Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds(575 h)if it was annotated manually.For the validation of the proposed technique,a sub-dataset—Voice,is annotated manually as well as with the proposed technique.Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations.The high Kappa value(i.e.,0.9571%)shows the high level of agreement between the two.This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost.This research also contributes in consolidating the guidelines for the manual annotation process.展开更多
Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of...Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.展开更多
Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throug...Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.展开更多
Dealing with issues such as too simple image features and word noise inference in product image sentence anmotation, a product image sentence annotation model focusing on image feature learning and key words summariza...Dealing with issues such as too simple image features and word noise inference in product image sentence anmotation, a product image sentence annotation model focusing on image feature learning and key words summarization is described. Three kernel descriptors such as gradient, shape, and color are extracted, respectively. Feature late-fusion is executed in turn by the multiple kernel learning model to obtain more discriminant image features. Absolute rank and relative rank of the tag-rank model are used to boost the key words' weights. A new word integration algorithm named word sequence blocks building (WSBB) is designed to create N-gram word sequences. Sentences are generated according to the N-gram word sequences and predefined templates. Experimental results show that both the BLEU-1 scores and BLEU-2 scores of the sentences are superior to those of the state-of-art baselines.展开更多
Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data...Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data gathering and analysis can obstruct the replication and utilization of the results,ultimately affecting the study’s transparency and integrity.The task of manually annotating extensive datasets demands considerable labor and financial investment,especially when it entails engaging specialized individuals.In our crowd counting study,we employed the web-based annotation tool SuperAnnotate to streamline the human annotation process for a dataset comprising 3,000 images.By integrating automated annotation tools,we realized substantial time efficiencies,as demonstrated by the remarkable achievement of 858,958 annotations.This underscores the significant contribution of such technologies to the efficiency of the annotation process.展开更多
The Chinese tree shrew(Tupaia belangeri chinensis)is emerging as an important experimental animal in multiple fields of biomedical research.Comprehensive reference genome annotation for both mRNA and long non-coding R...The Chinese tree shrew(Tupaia belangeri chinensis)is emerging as an important experimental animal in multiple fields of biomedical research.Comprehensive reference genome annotation for both mRNA and long non-coding RNA(lncRNA)is crucial for developing animal models using this species.In the current study,we collected a total of 234 high-quality RNA sequencing(RNA-seq)datasets and two long-read isoform sequencing(ISO-seq)datasets and improved the annotation of our previously assembled high-quality chromosomelevel tree shrew genome.We obtained a total of 3514 newly annotated coding genes and 50576 lncRNA genes.We also characterized the tissuespecific expression patterns and alternative splicing patterns of mRNAs and lncRNAs and mapped the orthologous relationships among 11 mammalian species using the current annotated genome.We identified 144 tree shrew-specific gene families,including interleukin 6(IL6)and STT3 oligosaccharyltransferase complex catalytic subunit B(STT3B),which underwent significant changes in size.Comparison of the overall expression patterns in tissues and pathways across four species(human,rhesus monkey,tree shrew,and mouse)indicated that tree shrews are more similar to primates than to mice at the tissue-transcriptome level.Notably,the newly annotated purine rich element binding protein A(PURA)gene and the STT3B gene family showed dysregulation upon viral infection.The updated version of the tree shrew genome annotation(KIZ version 3:TS_3.0)is available at http://www.treeshrewdb.org and provides an essential reference for basic and biomedical studies using tree shrew animal models.展开更多
It is very important in the field of bioinformatics to apply computer to perform the function annotation for new sequenced bio-sequences. Based on GO database and BLAST program, a novel method for the function annotat...It is very important in the field of bioinformatics to apply computer to perform the function annotation for new sequenced bio-sequences. Based on GO database and BLAST program, a novel method for the function annotation of new biological sequences is presented by using the variable-precision rough set theory. The proposed method is applied to the real data in GO database to examine its effectiveness. Numerical results show that the proposed method has better precision, recall-rate and harmonic mean value compared with existing methods.展开更多
Since the publication of this article,the authors have noticed that the GeneIDs from new and original genome annotations don’t match in Table S6,the correct Table S6 is given here.The authors would like to apologize ...Since the publication of this article,the authors have noticed that the GeneIDs from new and original genome annotations don’t match in Table S6,the correct Table S6 is given here.The authors would like to apologize for this error.展开更多
Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among ...Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among all tokens in a sentence. We proposed an annotation scheme based on feature structure theory for enriching biomedical domain corpora with token semantic association (TSA). There are 227 documents of the BioNLP GE ST training data annotated to form TSA corpus in which each annotated item shows a token semantic association that appears as a triple. The annotation of token semantic association has the potential to significantly advance biomedical text mining by providing rich token semantic information for NLP systems especially for the sophisticated IE systems, such as bio-event extraction.展开更多
Representing the relationships between ontologies is the key problem of semantic annotations based on multi-ontologies. Traditional approaches only had the ability of denoting the simple concept subsumption relations ...Representing the relationships between ontologies is the key problem of semantic annotations based on multi-ontologies. Traditional approaches only had the ability of denoting the simple concept subsumption relations between ontologies. Through analyzing and classifying the relationships between ontologies, the idea of bridge ontology was proposed, which had the powerful capability of expressing the complex relationships between concepts and relationships between relations in multi-ontologies. Meanwhile, a new approach employing bridge ontology was proposed to deal with the multi-ontologies-based semantic annotation problem. The bridge ontology is a peculiar ontology, which can be created and maintained conveniently, and is effective in the multi-ontologies-based semantic annotation. The approach using bridge ontology has the advantages of low-cost, scalable, robust in the web circumstance, and avoiding the unnecessary ontology extending and integration. Key words semantic web - bridge ontology - multi-ontologies - semantic annotation CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (60373066, 60303024). National Grand Fundamental Research 973 Program of China (2002CB312000), National Re-search Foundation for the Doctoral Program of Higher Education of China (20020286004)Biography: WANG Peng (1977-), male, Ph.D candidate, research direction: semantic web, ontology, and knowledge representation on the Web.展开更多
In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages o...In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.展开更多
In industry,it is becoming common to detect and recognize industrial workpieces using deep learning methods.In this field,the lack of datasets is a big problem,and collecting and annotating datasets in this field is v...In industry,it is becoming common to detect and recognize industrial workpieces using deep learning methods.In this field,the lack of datasets is a big problem,and collecting and annotating datasets in this field is very labor intensive.The researchers need to perform dataset annotation if a dataset is generated by themselves.It is also one of the restrictive factors that the current method based on deep learning cannot expand well.At present,there are very few workpiece datasets for industrial fields,and the existing datasets are generated from ideal workpiece computer aided design(CAD)models,for which few actual workpiece images were collected and utilized.We propose an automatic industrial workpiece dataset generation method and an automatic ground truth annotation method.Included in our methods are three algorithms that we proposed:a point cloud based spatial plane segmentation algorithm to segment the workpieces in the real scene and to obtain the annotation information of the workpieces in the images captured in the real scene;a random multiple workpiece generation algorithm to generate abundant composition datasets with random rotation workpiece angles and positions;and a tangent vector based contour tracking and completion algorithm to get improved contour images.With our procedures,annotation information can be obtained using the algorithms proposed in this paper.Upon completion of the annotation process,a json format file is generated.Faster R-CNN(Faster R-convolutional neural network),SSD(single shot multibox detector)and YOLO(you only look once:unified,real-time object detection)are trained using the datasets proposed in this paper.The experimental results show the effectiveness and integrity of this dataset generation and annotation method.展开更多
This paper discusses the placement of Chinese annotation from point of view of graphics. Area Feature is classified as simple polygon, complex polygon and special polygon. For simple ones, annotations are placed along...This paper discusses the placement of Chinese annotation from point of view of graphics. Area Feature is classified as simple polygon, complex polygon and special polygon. For simple ones, annotations are placed along the longest edge. For complex ones, firstly the polygon are simplified according to close points, then the longest diagonal is gotten by comparing length, lastly, annotations are placed along long diagonal. For special ones, the polygon are partitioned into several parts by a certain rule for getting their sub\|diagonals, then their annotation are placed by means of the second.展开更多
文摘In machine learning,sentiment analysis is a technique to find and analyze the sentiments hidden in the text.For sentiment analysis,annotated data is a basic requirement.Generally,this data is manually annotated.Manual annotation is time consuming,costly and laborious process.To overcome these resource constraints this research has proposed a fully automated annotation technique for aspect level sentiment analysis.Dataset is created from the reviews of ten most popular songs on YouTube.Reviews of five aspects—voice,video,music,lyrics and song,are extracted.An N-Gram based technique is proposed.Complete dataset consists of 369436 reviews that took 173.53 s to annotate using the proposed technique while this dataset might have taken approximately 2.07 million seconds(575 h)if it was annotated manually.For the validation of the proposed technique,a sub-dataset—Voice,is annotated manually as well as with the proposed technique.Cohen’s Kappa statistics is used to evaluate the degree of agreement between the two annotations.The high Kappa value(i.e.,0.9571%)shows the high level of agreement between the two.This validates that the quality of annotation of the proposed technique is as good as manual annotation even with far less computational cost.This research also contributes in consolidating the guidelines for the manual annotation process.
基金Supported by Guangdong Basic and Applied Basic Research Foundation(No.2025A1515011627)San Ming Project of Medicine in Shenzhen(No.SZSM202311012).
文摘Glaucoma is an eye disease characterized by pathologically elevated intraocular pressure,optic nerve atrophy,and visual field defects,which can lead to irreversible vision loss.In recent years,the rapid development of artificial intelligence(AI)technology has provided new approaches for the early diagnosis and management of glaucoma.By classifying and annotating glaucoma-related images,AI models can learn and recognize the specific pathological features of glaucoma,thereby achieving automated imaging analysis and classification.Research on glaucoma imaging classification and annotation mainly involves color fundus photography(CFP),optical coherence tomography(OCT),anterior segment optical coherence tomography(AS-OCT),and ultrasound biomicroscopy(UBM)images.CFP is primarily used for the annotation of the optic cup and disc,while OCT is used for measuring and annotating the thickness of the retinal nerve fiber layer,and AS-OCT and UBM focus on the annotation of the anterior chamber angle structure and the measurement of anterior segment structural parameters.To standardize the classification and annotation of glaucoma images,enhance the quality and consistency of annotated data,and promote the clinical application of intelligent ophthalmology,this guideline has been developed.This guideline systematically elaborates on the principles,methods,processes,and quality control requirements for the classification and annotation of glaucoma images,providing standardized guidance for the classification and annotation of glaucoma images.
基金supported by the National Natural Science Foundation of China(Nos.82274064,82374026,and 82204591)。
文摘Natural products(NPs)have long held a significant position in various fields such as medicine,food,agriculture,and materials.The chemical space covered by NPs is extensive but often underexplored.Therefore,high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems.Mass spectrometry(MS)has emerged as a powerful platform for the annotation and discovery of NPs.MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information.Additionally,the released annotation methodologies,based on a variety of informatics tools,continuously improve the ability to annotate the structure and properties of compounds.This review examines the current mainstream databases and annotation methodologies,focusing on their advantages and limitations.Prospects for future technological advancements are then discussed in terms of novel applications and research objectives.Through a systematic overview,this review aims to provide valuable insights and a reference for MS-based NPs annotation,thereby promoting the discovery of novel natural entities.
基金The National Natural Science Foundation of China(No.61133012)the Humanity and Social Science Foundation of the Ministry of Education(No.12YJCZH274)+1 种基金the Humanity and Social Science Foundation of Jiangxi Province(No.XW1502,TQ1503)the Science and Technology Project of Jiangxi Science and Technology Department(No.20121BBG70050,20142BBG70011)
文摘Dealing with issues such as too simple image features and word noise inference in product image sentence anmotation, a product image sentence annotation model focusing on image feature learning and key words summarization is described. Three kernel descriptors such as gradient, shape, and color are extracted, respectively. Feature late-fusion is executed in turn by the multiple kernel learning model to obtain more discriminant image features. Absolute rank and relative rank of the tag-rank model are used to boost the key words' weights. A new word integration algorithm named word sequence blocks building (WSBB) is designed to create N-gram word sequences. Sentences are generated according to the N-gram word sequences and predefined templates. Experimental results show that both the BLEU-1 scores and BLEU-2 scores of the sentences are superior to those of the state-of-art baselines.
基金The authors acknowledge that this research work was supported through project number UCS&T/R&D-09/19–20/17533 titled“An Intelligent Computational Model for Crowd Demonstration and Risk Analysis during Spiritual Events in Haridwar’by the Uttarakhand Council for Science and Technology(UCOST),India.”。
文摘Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis.However,the lack of detailed information regarding the methods employed for data gathering and analysis can obstruct the replication and utilization of the results,ultimately affecting the study’s transparency and integrity.The task of manually annotating extensive datasets demands considerable labor and financial investment,especially when it entails engaging specialized individuals.In our crowd counting study,we employed the web-based annotation tool SuperAnnotate to streamline the human annotation process for a dataset comprising 3,000 images.By integrating automated annotation tools,we realized substantial time efficiencies,as demonstrated by the remarkable achievement of 858,958 annotations.This underscores the significant contribution of such technologies to the efficiency of the annotation process.
基金This study was supported by the National Natural Science Foundation of China(U1902215 to Y.G.Y.and 31970542 to Y.F.)Chinese Academy of Sciences(Light of West China Program xbzg-zdsys-201909 to Y.G.Y.)Yunnan Province(202001AS070023 and 2018FB046 to D.D.Y.and 202002AA100007 to Y.G.Y.)。
文摘The Chinese tree shrew(Tupaia belangeri chinensis)is emerging as an important experimental animal in multiple fields of biomedical research.Comprehensive reference genome annotation for both mRNA and long non-coding RNA(lncRNA)is crucial for developing animal models using this species.In the current study,we collected a total of 234 high-quality RNA sequencing(RNA-seq)datasets and two long-read isoform sequencing(ISO-seq)datasets and improved the annotation of our previously assembled high-quality chromosomelevel tree shrew genome.We obtained a total of 3514 newly annotated coding genes and 50576 lncRNA genes.We also characterized the tissuespecific expression patterns and alternative splicing patterns of mRNAs and lncRNAs and mapped the orthologous relationships among 11 mammalian species using the current annotated genome.We identified 144 tree shrew-specific gene families,including interleukin 6(IL6)and STT3 oligosaccharyltransferase complex catalytic subunit B(STT3B),which underwent significant changes in size.Comparison of the overall expression patterns in tissues and pathways across four species(human,rhesus monkey,tree shrew,and mouse)indicated that tree shrews are more similar to primates than to mice at the tissue-transcriptome level.Notably,the newly annotated purine rich element binding protein A(PURA)gene and the STT3B gene family showed dysregulation upon viral infection.The updated version of the tree shrew genome annotation(KIZ version 3:TS_3.0)is available at http://www.treeshrewdb.org and provides an essential reference for basic and biomedical studies using tree shrew animal models.
基金the support of the National Natural Science Foundation of China under Grant No.60673023,60433020,10501017,3040016the European Commission for TH/Asia Link/010 under Grant No.111084.
文摘It is very important in the field of bioinformatics to apply computer to perform the function annotation for new sequenced bio-sequences. Based on GO database and BLAST program, a novel method for the function annotation of new biological sequences is presented by using the variable-precision rough set theory. The proposed method is applied to the real data in GO database to examine its effectiveness. Numerical results show that the proposed method has better precision, recall-rate and harmonic mean value compared with existing methods.
文摘Since the publication of this article,the authors have noticed that the GeneIDs from new and original genome annotations don’t match in Table S6,the correct Table S6 is given here.The authors would like to apologize for this error.
基金Supported by the National Natural Science Foundation of China(61202304,61173095,61173062,61202193)
文摘Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among all tokens in a sentence. We proposed an annotation scheme based on feature structure theory for enriching biomedical domain corpora with token semantic association (TSA). There are 227 documents of the BioNLP GE ST training data annotated to form TSA corpus in which each annotated item shows a token semantic association that appears as a triple. The annotation of token semantic association has the potential to significantly advance biomedical text mining by providing rich token semantic information for NLP systems especially for the sophisticated IE systems, such as bio-event extraction.
文摘Representing the relationships between ontologies is the key problem of semantic annotations based on multi-ontologies. Traditional approaches only had the ability of denoting the simple concept subsumption relations between ontologies. Through analyzing and classifying the relationships between ontologies, the idea of bridge ontology was proposed, which had the powerful capability of expressing the complex relationships between concepts and relationships between relations in multi-ontologies. Meanwhile, a new approach employing bridge ontology was proposed to deal with the multi-ontologies-based semantic annotation problem. The bridge ontology is a peculiar ontology, which can be created and maintained conveniently, and is effective in the multi-ontologies-based semantic annotation. The approach using bridge ontology has the advantages of low-cost, scalable, robust in the web circumstance, and avoiding the unnecessary ontology extending and integration. Key words semantic web - bridge ontology - multi-ontologies - semantic annotation CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (60373066, 60303024). National Grand Fundamental Research 973 Program of China (2002CB312000), National Re-search Foundation for the Doctoral Program of Higher Education of China (20020286004)Biography: WANG Peng (1977-), male, Ph.D candidate, research direction: semantic web, ontology, and knowledge representation on the Web.
基金Supported by the National Natural Science Foundation of China(61202193,61202304)the Major Projects of Chinese National Social Science Foundation(11&ZD189)the Chinese Postdoctoral Science Foundation(2013M540593,2014T70722)
文摘In this paper we propose a novel model "recursive directed graph" based on feature structure, and apply it to represent the semantic relations of postpositive attributive structures in biomedical texts. The usages of postpositive attributive are complex and variable, especially three categories: present participle phrase, past participle phrase, and preposition phrase as postpositire attributive, which always bring the difficulties of automatic parsing. We summarize these categories and annotate the semantic information. Compared with dependency structure, feature structure, being recursive directed graph, enhances semantic information extraction in biomedical field. The annotation results show that recursive directed graph is more suitable to extract complex semantic relations for biomedical text mining.
文摘In industry,it is becoming common to detect and recognize industrial workpieces using deep learning methods.In this field,the lack of datasets is a big problem,and collecting and annotating datasets in this field is very labor intensive.The researchers need to perform dataset annotation if a dataset is generated by themselves.It is also one of the restrictive factors that the current method based on deep learning cannot expand well.At present,there are very few workpiece datasets for industrial fields,and the existing datasets are generated from ideal workpiece computer aided design(CAD)models,for which few actual workpiece images were collected and utilized.We propose an automatic industrial workpiece dataset generation method and an automatic ground truth annotation method.Included in our methods are three algorithms that we proposed:a point cloud based spatial plane segmentation algorithm to segment the workpieces in the real scene and to obtain the annotation information of the workpieces in the images captured in the real scene;a random multiple workpiece generation algorithm to generate abundant composition datasets with random rotation workpiece angles and positions;and a tangent vector based contour tracking and completion algorithm to get improved contour images.With our procedures,annotation information can be obtained using the algorithms proposed in this paper.Upon completion of the annotation process,a json format file is generated.Faster R-CNN(Faster R-convolutional neural network),SSD(single shot multibox detector)and YOLO(you only look once:unified,real-time object detection)are trained using the datasets proposed in this paper.The experimental results show the effectiveness and integrity of this dataset generation and annotation method.
文摘This paper discusses the placement of Chinese annotation from point of view of graphics. Area Feature is classified as simple polygon, complex polygon and special polygon. For simple ones, annotations are placed along the longest edge. For complex ones, firstly the polygon are simplified according to close points, then the longest diagonal is gotten by comparing length, lastly, annotations are placed along long diagonal. For special ones, the polygon are partitioned into several parts by a certain rule for getting their sub\|diagonals, then their annotation are placed by means of the second.