In industry,it is becoming common to detect and recognize industrial workpieces using deep learning methods.In this field,the lack of datasets is a big problem,and collecting and annotating datasets in this field is v...In industry,it is becoming common to detect and recognize industrial workpieces using deep learning methods.In this field,the lack of datasets is a big problem,and collecting and annotating datasets in this field is very labor intensive.The researchers need to perform dataset annotation if a dataset is generated by themselves.It is also one of the restrictive factors that the current method based on deep learning cannot expand well.At present,there are very few workpiece datasets for industrial fields,and the existing datasets are generated from ideal workpiece computer aided design(CAD)models,for which few actual workpiece images were collected and utilized.We propose an automatic industrial workpiece dataset generation method and an automatic ground truth annotation method.Included in our methods are three algorithms that we proposed:a point cloud based spatial plane segmentation algorithm to segment the workpieces in the real scene and to obtain the annotation information of the workpieces in the images captured in the real scene;a random multiple workpiece generation algorithm to generate abundant composition datasets with random rotation workpiece angles and positions;and a tangent vector based contour tracking and completion algorithm to get improved contour images.With our procedures,annotation information can be obtained using the algorithms proposed in this paper.Upon completion of the annotation process,a json format file is generated.Faster R-CNN(Faster R-convolutional neural network),SSD(single shot multibox detector)and YOLO(you only look once:unified,real-time object detection)are trained using the datasets proposed in this paper.The experimental results show the effectiveness and integrity of this dataset generation and annotation method.展开更多
Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BA...Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BANNs model SNP and SNP-set effects in their input and hidden layers,respectively.Furthermore,the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales.However,its application in genomic prediction has yet to be explored.Results This study extended the BANNs framework to the area of genomic selection and explored the optimal SN P-set partitioning strategies by using dairy cattle datasets.The SN P-sets were partitioned based on two strategiesgene annotations and 100 kb windows,denoted as BANN_gene and BANN_100kb,respectively.The BANNs model was compared with GBLU P,random forest(RF),BayesB and BayesCπthrough five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits,type traits,and one health trait of 6,558,6,210and 5,962 Chinese Holsteins,respectively.Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLU P,RF and Bayesian methods.Specifically,the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP,RF,BayesB and BayesCrr across all traits.The average accuracy improvements of BANN_100kb over GBLU P,RF,BayesB and BayesCrr were 4.86%,3.95%,3.84%and 1.92%,and the accuracy of BANN_gene was improved by3.75%,2.86%,2.73%and 0.85%compared to GBLUP,RF,BayesB and BayesCπ,respectively across all seven traits.Meanwhile,both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP,RF and Bayesian methods.Conclusion Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios,and might serve as a promising alternative approach for genomic prediction in dairy cattle.展开更多
Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of ...Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of these folders deliver relevant indexing information.From the outcomes,it is dif-ficult to discover data that the user can be absorbed in.Therefore,in order to determine the significance of the data,it is important to identify the contents in an informative manner.Image annotation can be one of the greatest problematic domains in multimedia research and computer vision.Hence,in this paper,Adap-tive Convolutional Deep Learning Model(ACDLM)is developed for automatic image annotation.Initially,the databases are collected from the open-source system which consists of some labelled images(for training phase)and some unlabeled images{Corel 5 K,MSRC v2}.After that,the images are sent to the pre-processing step such as colour space quantization and texture color class map.The pre-processed images are sent to the segmentation approach for efficient labelling technique using J-image segmentation(JSEG).Thefinal step is an auto-matic annotation using ACDLM which is a combination of Convolutional Neural Network(CNN)and Honey Badger Algorithm(HBA).Based on the proposed classifier,the unlabeled images are labelled.The proposed methodology is imple-mented in MATLAB and performance is evaluated by performance metrics such as accuracy,precision,recall and F1_Measure.With the assistance of the pro-posed methodology,the unlabeled images are labelled.展开更多
针对柑橘病虫害领域文本数据中存在重叠三元组、嵌套实体和复杂实体抽取困难的问题,提出一种基于DPNA-CASREL(Dual-pointer network annotation-cascade binary tagging framework for relational triple extraction)的柑橘病虫害实体...针对柑橘病虫害领域文本数据中存在重叠三元组、嵌套实体和复杂实体抽取困难的问题,提出一种基于DPNA-CASREL(Dual-pointer network annotation-cascade binary tagging framework for relational triple extraction)的柑橘病虫害实体关系联合抽取方法。通过结合预训练模型RoBERTa-wwm-ext (Robustly optimized BERT pre-training approach with whole word masking and extended training data)与双向长短期记忆网络(Bi-directional long short-term memory,BiLSTM)构建编码器获取文本的多维向量编码,并根据柑橘病虫害语料特点设计双重指针网络标注的解码网络,在头实体解码中引入多级指针网络标注方法,在尾实体解码网络中采用复杂实体标注策略以增强模型对复杂实体的抽取性能,实现对实体关系三元组的同步抽取,解决三元组重叠、嵌套实体等问题。在自建柑橘病虫害数据集上的实验结果表明,DPNA-CASREL模型的精确率、召回率和F1值分别为82.12%、81.97%、82.05%,优于其他模型,对嵌套、复杂实体抽取的F1值比CASREL分别提升8.16、6.58个百分点,有效解决了实体嵌套和实体边界不清晰问题。本文方法可为柑橘病虫害知识图谱构建提供基础。展开更多
Facial pore segmentation results can provide reliable evidence to simulate post-product pore conditions and provide product recommendations.However,accurately segmenting pores is challenging due to their small size,we...Facial pore segmentation results can provide reliable evidence to simulate post-product pore conditions and provide product recommendations.However,accurately segmenting pores is challenging due to their small size,weak boundaries and dense distribution.It is also difficult to acquire precise annotation.Therefore,we formulate pore segmentation as a two-stage,weakly supervised task using both traditional and deep learning methods without human annotation.We propose a novel method called the pore segmentation network(PS-Net).Specifically,it contains pore feature extraction with coarse labels generated by a traditional method,as well as fine segmentation with progressively updated pseudo labels.Since pores provide high-frequency information about faces,we propose a high-frequency attention module that emphasizes low-level features.Moreover,we design a Bayesian module to identify pore shapes in high-level features.We establish a large-scale facial pore dataset with coarse labels that were generated via the difference of Gaussian(DoG)Pore method.PS-Net achieves the best performance on this dataset,proving its superiority compared with existing state-of-the-art segmentation methods.展开更多
The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development ...The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes.展开更多
文摘In industry,it is becoming common to detect and recognize industrial workpieces using deep learning methods.In this field,the lack of datasets is a big problem,and collecting and annotating datasets in this field is very labor intensive.The researchers need to perform dataset annotation if a dataset is generated by themselves.It is also one of the restrictive factors that the current method based on deep learning cannot expand well.At present,there are very few workpiece datasets for industrial fields,and the existing datasets are generated from ideal workpiece computer aided design(CAD)models,for which few actual workpiece images were collected and utilized.We propose an automatic industrial workpiece dataset generation method and an automatic ground truth annotation method.Included in our methods are three algorithms that we proposed:a point cloud based spatial plane segmentation algorithm to segment the workpieces in the real scene and to obtain the annotation information of the workpieces in the images captured in the real scene;a random multiple workpiece generation algorithm to generate abundant composition datasets with random rotation workpiece angles and positions;and a tangent vector based contour tracking and completion algorithm to get improved contour images.With our procedures,annotation information can be obtained using the algorithms proposed in this paper.Upon completion of the annotation process,a json format file is generated.Faster R-CNN(Faster R-convolutional neural network),SSD(single shot multibox detector)and YOLO(you only look once:unified,real-time object detection)are trained using the datasets proposed in this paper.The experimental results show the effectiveness and integrity of this dataset generation and annotation method.
基金supported by the National Key Research and Development Program of China(2022YFD1302204)the earmarked fund CARS36+1 种基金Ningxia Key Research and Development Program of China(2023BCF010042019NYYZ09)。
文摘Background Biologically annotated neural networks(BANNs)are feedforward Bayesian neural network models that utilize partially connected architectures based on SN P-set annotations.As an interpretable neural network,BANNs model SNP and SNP-set effects in their input and hidden layers,respectively.Furthermore,the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales.However,its application in genomic prediction has yet to be explored.Results This study extended the BANNs framework to the area of genomic selection and explored the optimal SN P-set partitioning strategies by using dairy cattle datasets.The SN P-sets were partitioned based on two strategiesgene annotations and 100 kb windows,denoted as BANN_gene and BANN_100kb,respectively.The BANNs model was compared with GBLU P,random forest(RF),BayesB and BayesCπthrough five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits,type traits,and one health trait of 6,558,6,210and 5,962 Chinese Holsteins,respectively.Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLU P,RF and Bayesian methods.Specifically,the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP,RF,BayesB and BayesCrr across all traits.The average accuracy improvements of BANN_100kb over GBLU P,RF,BayesB and BayesCrr were 4.86%,3.95%,3.84%and 1.92%,and the accuracy of BANN_gene was improved by3.75%,2.86%,2.73%and 0.85%compared to GBLUP,RF,BayesB and BayesCπ,respectively across all seven traits.Meanwhile,both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP,RF and Bayesian methods.Conclusion Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios,and might serve as a promising alternative approach for genomic prediction in dairy cattle.
文摘Every day,websites and personal archives create more and more photos.The size of these archives is immeasurable.The comfort of use of these huge digital image gatherings donates to their admiration.However,not all of these folders deliver relevant indexing information.From the outcomes,it is dif-ficult to discover data that the user can be absorbed in.Therefore,in order to determine the significance of the data,it is important to identify the contents in an informative manner.Image annotation can be one of the greatest problematic domains in multimedia research and computer vision.Hence,in this paper,Adap-tive Convolutional Deep Learning Model(ACDLM)is developed for automatic image annotation.Initially,the databases are collected from the open-source system which consists of some labelled images(for training phase)and some unlabeled images{Corel 5 K,MSRC v2}.After that,the images are sent to the pre-processing step such as colour space quantization and texture color class map.The pre-processed images are sent to the segmentation approach for efficient labelling technique using J-image segmentation(JSEG).Thefinal step is an auto-matic annotation using ACDLM which is a combination of Convolutional Neural Network(CNN)and Honey Badger Algorithm(HBA).Based on the proposed classifier,the unlabeled images are labelled.The proposed methodology is imple-mented in MATLAB and performance is evaluated by performance metrics such as accuracy,precision,recall and F1_Measure.With the assistance of the pro-posed methodology,the unlabeled images are labelled.
文摘针对柑橘病虫害领域文本数据中存在重叠三元组、嵌套实体和复杂实体抽取困难的问题,提出一种基于DPNA-CASREL(Dual-pointer network annotation-cascade binary tagging framework for relational triple extraction)的柑橘病虫害实体关系联合抽取方法。通过结合预训练模型RoBERTa-wwm-ext (Robustly optimized BERT pre-training approach with whole word masking and extended training data)与双向长短期记忆网络(Bi-directional long short-term memory,BiLSTM)构建编码器获取文本的多维向量编码,并根据柑橘病虫害语料特点设计双重指针网络标注的解码网络,在头实体解码中引入多级指针网络标注方法,在尾实体解码网络中采用复杂实体标注策略以增强模型对复杂实体的抽取性能,实现对实体关系三元组的同步抽取,解决三元组重叠、嵌套实体等问题。在自建柑橘病虫害数据集上的实验结果表明,DPNA-CASREL模型的精确率、召回率和F1值分别为82.12%、81.97%、82.05%,优于其他模型,对嵌套、复杂实体抽取的F1值比CASREL分别提升8.16、6.58个百分点,有效解决了实体嵌套和实体边界不清晰问题。本文方法可为柑橘病虫害知识图谱构建提供基础。
基金supported by the Fundamental Research Funds for the Central Universities(No.61975056)the National Natural Science Foundation of China(No.62101191)the Science and Technology Commission of Shanghai Municipality(Nos.20440713100,21ZR1420800,and 22DZ2229004).
文摘Facial pore segmentation results can provide reliable evidence to simulate post-product pore conditions and provide product recommendations.However,accurately segmenting pores is challenging due to their small size,weak boundaries and dense distribution.It is also difficult to acquire precise annotation.Therefore,we formulate pore segmentation as a two-stage,weakly supervised task using both traditional and deep learning methods without human annotation.We propose a novel method called the pore segmentation network(PS-Net).Specifically,it contains pore feature extraction with coarse labels generated by a traditional method,as well as fine segmentation with progressively updated pseudo labels.Since pores provide high-frequency information about faces,we propose a high-frequency attention module that emphasizes low-level features.Moreover,we design a Bayesian module to identify pore shapes in high-level features.We establish a large-scale facial pore dataset with coarse labels that were generated via the difference of Gaussian(DoG)Pore method.PS-Net achieves the best performance on this dataset,proving its superiority compared with existing state-of-the-art segmentation methods.
文摘针对农药登记文本中信息密集、逻辑结构复杂、实体间跨度大以及实体长度异质性等特点,同时为克服传统联合抽取方法中面临的三元组重叠、曝光偏差和冗余计算问题,本研究提出一种多特征融合的单阶段实体关系联合抽取模型(Multi-feature fusion single-stage entity and relation joint extraction model,MF-SERel)。首先,在编码层,通过融合语义与句法特征,丰富字符向量表示,提升模型对复杂语料的表征能力;其次,在多维标注框架层,提出HT-BES多维标注策略,以解决重叠三元组问题。通过并行评分函数与细粒度分类组件,将实体关系联合抽取转化为了基于关系维度的多标签标注任务,该过程不包含相互依赖步骤,从而实现单阶段并行标注,避免了曝光偏差并降低了计算冗余;最后,在解码层依据细粒度分类预测标签,解码出实体关系三元组。将本研究提出的模型与GraphRel、CasRel和TPLinker等基线模型进行对比,在农药数据集(Pesticide registration dataset,PRD)和公开数据集(Dataset of unstructured information extraction,DuIE)上进行检测。结果表明MF-SERel模型在农药数据集PRD和公开数据集DuIE上具有良好的表现。在农药数据集PRD上,本研究提出的模型MF-SERel在推理速度上提升了20%,F1值提升了2.3%,说明MF-SERel模型在农药登记文本中具有良好的知识挖掘能力;在公开数据集DuIE上,MF-SERel模型在推理速度上提升了54%,F1值提升了1.7%,同样取得了较好结果,证明MF-SERel模型具有较好的泛化能力。综上,本研究提出的MF-SERel模型可为农药领域知识的结构化抽取提供新方法。
文摘该研究致力于构建一个高质量的数据集,用于南美白对虾养殖领域的命名实体识别(named entity recognition,NER)任务,命名为VamNER。为确保数据集的多样性,从CNKI数据库中收集了近10年的高质量论文,并结合权威书籍进行语料构建。邀请专家讨论实体类型,并经过专业培训的标注人员使用IOB2标注格式进行标注,标注过程分为预标注和正式标注两个阶段以提高效率。在预标注阶段,标注者间一致性(inter-annotation agreement,IAA)达到0.87,表明标注人员的一致性较高。最终,VamNER包含6115个句子,总字符数达384602,涵盖10个实体类型,共有12814个实体。研究通过与多个通用领域数据集和一个特定领域数据集进行比较,揭示了VamNER的独特特性。在实验中使用了预训练的基于变换器的双向编码器表示(bidirectional encoder representations from Transformers,BERT)模型、双向长短期记忆神经网络(bidirectional long short-term memory network,BiLSTM)和条件随机场模型(conditional random fields,CRF),最优模型在测试集上的F1值达到82.8%。VamNER成为首个专注于南美白对虾养殖领域的NER数据集,为中文特定领域NER研究提供了丰富资源,有望推动水产养殖领域NER研究的发展。
基金supported by the National Natural Science Foundation of China (31000591,31000587,31171266)
文摘The discovery of novel cancer genes is one of the main goals in cancer research.Bioinformatics methods can be used to accelerate cancer gene discovery,which may help in the understanding of cancer and the development of drug targets.In this paper,we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence,including protein-protein interaction network properties,and sequence and functional features.We detected 55 features that were significantly different between cancer genes and non-cancer genes.Fourteen cancer-associated features were chosen to train the classifier.Four machine learning methods,logistic regression,support vector machines(SVMs),BayesNet and decision tree,were explored in the classifier models to distinguish cancer genes from non-cancer genes.The prediction power of the different models was evaluated by 5-fold cross-validation.The area under the receiver operating characteristic curve for logistic regression,SVM,Baysnet and J48 tree models was 0.834,0.740,0.800 and 0.782,respectively.Finally,the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database,and 1976 cancer gene candidates were identified.We found that the integrated prediction model performed much better than the models based on the individual biological evidence,and the network and functional features had stronger powers than the sequence features in predicting cancer genes.