期刊文献+
共找到1,292篇文章
< 1 2 65 >
每页显示 20 50 100
Text Extraction and Enhancement of Binary Images Using Cellular Automata
1
作者 G. Sahoo Tapas Kumar +1 位作者 B. L. Raina C. M. Bhatia 《International Journal of Automation and computing》 EI 2009年第3期254-260,共7页
Text characters embedded in images represent a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their ... Text characters embedded in images represent a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values, and complex backgrounds. Existing methods cannot handle well those texts with different contrast or embedded in a complex image background. In this paper, a set of sequential algorithms for text extraction and enhancement of image using cellular automata are proposed. The image enhancement includes gray level, contrast manipulation, edge detection, and filtering. First, it applies edge detection and uses a threshold to filter out for low-contrast text and simplify complex background of high-contrast text from binary image. The proposed algorithm is simple and easy to use and requires only a sample texture binary image as an input. It generates textures with perceived quality, better than those proposed by earlier published techniques. The performance of our method is demonstrated by presenting experimental results for a set of text based binary images. The quality of thresholding is assessed using the precision and recall analysis of the resultant text in the binary image. 展开更多
关键词 text extraction edge detection cellular automata algorithm text detection thresholding.
在线阅读 下载PDF
An Efficient HW/SW Design for Text Extraction from Complex Color Image
2
作者 Mohamed Amin Ben Atitallah Rostom Kachouri +1 位作者 Ahmed Ben Atitallah Hassene Mnif 《Computers, Materials & Continua》 SCIE EI 2022年第6期5963-5977,共15页
In the context of constructing an embedded system to help visually impaired people to interpret text,in this paper,an efficient High-level synthesis(HLS)Hardware/Software(HW/SW)design for text extraction using the Gam... In the context of constructing an embedded system to help visually impaired people to interpret text,in this paper,an efficient High-level synthesis(HLS)Hardware/Software(HW/SW)design for text extraction using the Gamma Correction Method(GCM)is proposed.Indeed,the GCM is a common method used to extract text from a complex color image and video.The purpose of this work is to study the complexity of the GCM method on Xilinx ZCU102 FPGA board and to propose a HW implementation as Intellectual Property(IP)block of the critical blocks in this method using HLS flow with taking account the quality of the text extraction.This IP is integrated and connected to the ARM Cortex-A53 as coprocessor in HW/SW codesign context.The experimental results show that theHLS HW/SW implementation of the GCM method on ZCU102 FPGA board allows a reduction in processing time by about 89%compared to the SW implementation.This result is given for the same potency and strength of SW implementation for the text extraction. 展开更多
关键词 text extraction GCM HW/SW codesign FPGA HLS flow
在线阅读 下载PDF
Text extraction method for historical Tibetan document images based on block projections 被引量:3
3
作者 段立娟 张西群 +1 位作者 马龙龙 吴健 《Optoelectronics Letters》 EI 2017年第6期457-461,共5页
Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of te... Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method. 展开更多
关键词 HISTORICAL TIBETAN document filtered BLOCKS bounding CORNER APPROXIMATE projection COORDINATE
原文传递
Efficient Text Extraction Algorithm Using Color Clustering for Language Translation in Mobile Phone 被引量:2
4
作者 Adrián Canedo-Rodríguez Jung Hyoun Kim +5 位作者 Soo-Hyung Kim John Kelly Jung Hee Kim Sun Yi Sai Kiran Veeramachaneni Yolanda Blanco-Fernández 《Journal of Signal and Information Processing》 2012年第2期228-237,共10页
Many Text Extraction methodologies have been proposed, but none of them are suitable to be part of a real system implemented on a device with low computational resources, either because their accuracy is insufficient,... Many Text Extraction methodologies have been proposed, but none of them are suitable to be part of a real system implemented on a device with low computational resources, either because their accuracy is insufficient, or because their performance is too slow. In this sense, we propose a Text Extraction algorithm for the context of language translation of scene text images with mobile phones, which is fast and accurate at the same time. The algorithm uses very efficient computations to calculate the Principal Color Components of a previously quantized image, and decides which ones are the main foreground-background colors, after which it extracts the text in the image. We have compared our algorithm with other algorithms using commercial OCR, achieving accuracy rates more than 12% higher, and performing two times faster. Also, our methodology is more robust against common degradations, such as uneven illumination, or blurring. Thus, we developed a very attractive system to accurately separate foreground and background from scene text images, working over low computational resources devices. 展开更多
关键词 text extraction COLOR QUANTIZATION text BINARIZATION LANGUAGE TRANSLATION
在线阅读 下载PDF
A New Method to Extract Text from Natural Scenes
5
作者 郝峻晟 戚飞虎 +1 位作者 朱凯华 蒋人杰 《Journal of Donghua University(English Edition)》 EI CAS 2005年第4期52-57,共6页
This paper presents a new method for text detection, location and binarization from natural scenes. Several morphological steps are used to detect the general position of the text, including English, Chinese and Japan... This paper presents a new method for text detection, location and binarization from natural scenes. Several morphological steps are used to detect the general position of the text, including English, Chinese and Japanese characters. Next bonnding boxes are processed by a new “Expand, Break and Merge” (EBM) method to get the precise text areas. Finally, text is binarized by a hybrid method based on Otsu and Niblack. This new approach can extract different kinds of text from complicated natural scenes. It is insensitive to noise, distortedness, and text orientation. It also has good performance on extracting texts in various sizes. 展开更多
关键词 text extraction mathematical morphology bounding boxes binarization
在线阅读 下载PDF
A Hybrid Method of Extractive Text Summarization Based on Deep Learning and Graph Ranking Algorithms 被引量:1
6
作者 SHI Hui WANG Tiexin 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2022年第S01期158-165,共8页
In the era of Big Data,we are faced with an inevitable and challenging problem of“overload information”.To alleviate this problem,it is important to use effective automatic text summarization techniques to obtain th... In the era of Big Data,we are faced with an inevitable and challenging problem of“overload information”.To alleviate this problem,it is important to use effective automatic text summarization techniques to obtain the key information quickly and efficiently from the huge amount of text.In this paper,we propose a hybrid method of extractive text summarization based on deep learning and graph ranking algorithms(ETSDG).In this method,a pre-trained deep learning model is designed to yield useful sentence embeddings.Given the association between sentences in raw documents,a traditional LexRank algorithm with fine-tuning is adopted fin ETSDG.In order to improve the performance of the extractive text summarization method,we further integrate the traditional LexRank algorithm with deep learning.Testing results on the data set DUC2004 show that ETSDG has better performance in ROUGE metrics compared with certain benchmark methods. 展开更多
关键词 extractive text summarization deep learning sentence embeddings LexRank
在线阅读 下载PDF
A Method of Text Extremum Region Extraction Based on Joint-Channels 被引量:1
7
作者 Xueming Qiao Weiyi Zhu +4 位作者 Dongjie Zhu Liang Kong Yingxue Xia Chunxu Lin Zhenhao Guo Yiheng Sun 《Journal on Artificial Intelligence》 2020年第1期29-37,共9页
Natural scene recognition has important significance and value in the fields of image retrieval,autonomous navigation,human-computer interaction and industrial automation.Firstly,the natural scene image non-text conte... Natural scene recognition has important significance and value in the fields of image retrieval,autonomous navigation,human-computer interaction and industrial automation.Firstly,the natural scene image non-text content takes up relatively high proportion;secondly,the natural scene images have a cluttered background and complex lighting conditions,angle,font and color.Therefore,how to extract text extreme regions efficiently from complex and varied natural scene images plays an important role in natural scene image text recognition.In this paper,a Text extremum region Extraction algorithm based on Joint-Channels(TEJC)is proposed.On the one hand,it can solve the problem that the maximum stable extremum region(MSER)algorithm is only suitable for gray images and difficult to process color images.On the other hand,it solves the problem that the MSER algorithm has high complexity and low accuracy when extracting the most stable extreme region.In this paper,the proposed algorithm is tested and evaluated on the ICDAR data set.The experimental results show that the method has superiority. 展开更多
关键词 Feature extraction scene text detection scene text feature extraction extreme region
在线阅读 下载PDF
A Deep Look into Extractive Text Summarization
8
作者 Jhonathan Quillo-Espino Rosa María Romero-González Ana-Marcela Herrera-Navarro 《Journal of Computer and Communications》 2021年第6期24-37,共14页
This investigation has presented an approach to Extractive Automatic Text Summarization (EATS). A framework focused on the summary of a single document has been developed, using the Tf-ldf method (Frequency Term, Inve... This investigation has presented an approach to Extractive Automatic Text Summarization (EATS). A framework focused on the summary of a single document has been developed, using the Tf-ldf method (Frequency Term, Inverse Document Frequency) as a reference, dividing the document into a subset of documents and generating value of each of the words contained in each document, those documents that show Tf-Idf equal or higher than the threshold are those that represent greater importance, therefore;can be weighted and generate a text summary according to the user’s request. This document represents a derived model of text mining application in today’s world. We demonstrate the way of performing the summarization. Random values were used to check its performance. The experimented results show a satisfactory and understandable summary and summaries were found to be able to run efficiently and quickly, showing which are the most important text sentences according to the threshold selected by the user. 展开更多
关键词 text Mining Preprocesses text Summarization extractive text Sumarization
在线阅读 下载PDF
Drug and Vaccine Extractive Text Summarization Insights Using Fine-Tuned Transformers
9
作者 Rajesh Bandaru Y.Radhika 《Journal of Artificial Intelligence and Technology》 2024年第4期351-362,共12页
Text representation is a key aspect in determining the success of various text summarizing techniques.Summarization using pretrained transformer models has produced encouraging results.Yet the scope of applying these ... Text representation is a key aspect in determining the success of various text summarizing techniques.Summarization using pretrained transformer models has produced encouraging results.Yet the scope of applying these models in medical and drug discovery is not examined to a proper extent.To address this issue,this article aims to perform extractive summarization based on fine-tuned transformers pertaining to drug and medical domain.This research also aims to enhance sentence representation.Exploring the extractive text summarization aspects of medical and drug discovery is a challenging task as the datasets are limited.Hence,this research concentrates on the collection of abstracts collected from PubMed for various domains of medical and drug discovery such as drug and COVID,with a total capacity of 1,370 abstracts.A detailed experimentation using BART(Bidirectional Autoregressive Transformer),T5(Text-to-Text Transfer Transformer),LexRank,and TexRank for the analysis of the dataset is carried out in this research to perform extractive text summarization. 展开更多
关键词 BART BERT extractive text summarization LexRank TexRank
暂未订购
A Hybrid Query-Based Extractive Text Summarization Based on K-Means and Latent Dirichlet Allocation Techniques
10
作者 Sohail Muhammad Muzammil Khan Sarwar Shah Khan 《Journal on Artificial Intelligence》 2024年第1期193-209,共17页
Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections.Large-sc... Retrieving information from evolving digital data collection using a user’s query is always essential and needs efficient retrieval mechanisms that help reduce the required time from such massive collections.Large-scale time consumption is certain to scan and analyze to retrieve the most relevant textual data item from all the documents required a sophisticated technique for a query against the document collection.It is always challenging to retrieve a more accurate and fast retrieval from a large collection.Text summarization is a dominant research field in information retrieval and text processing to locate the most appropriate data object as single or multiple documents from the collection.Machine learning and knowledge-based techniques are the two query-based extractive text summarization techniques in Natural Language Processing(NLP)which can be used for precise retrieval and are considered to be the best option.NLP uses machine learning approaches for both supervised and unsupervised learning for calculating probabilistic features.The study aims to propose a hybrid approach for query-based extractive text summarization in the research study.Text-Rank Algorithm is used as a core algorithm for the flow of an implementation of the approach to gain the required goals.Query-based text summarization of multiple documents using a hybrid approach,combining the K-Means clustering technique with Latent Dirichlet Allocation(LDA)as topic modeling technique produces 0.288,0.631,and 0.328 for precision,recall,and F-score,respectively.The results show that the proposed hybrid approach performs better than the graph-based independent approach and the sentences and word frequency-based approach. 展开更多
关键词 extractive text summarization machine learning natural language processing K-MEANS latent dirichlet allocation
在线阅读 下载PDF
Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
11
作者 Bat-Erdene Nyandag Ru Li G. Indruska 《Journal of Computer and Communications》 2016年第10期79-89,共12页
This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning doc... This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning document at the distance learning system database. This test covered following things: 1) to parse word structure at the distance learning system database documents and Cyrillic Mongolian language documents at the section, to form new documents by algorithm for identifying word stem;2) to test optimized content extraction from text material based on e-test results (key word, correct answer, base form with affix and new form formed by word stem without affix) at distance learning system, also to search key word by automatically selecting using word extraction algorithm;3) to test Boolean and probabilistic retrieval method through extended vector space retrieval method. This chapter covers: to process document content extraction retrieval algorithm, to propose recommendations query through word stem, not depending on word position based on Cyrillic Mongolian language documents distinction. 展开更多
关键词 Cyrillic Mongolian Language Content extraction Formatting Learning text Materials Style
在线阅读 下载PDF
Mathematical Expression Extraction in Text Fields of Documents Based on HMM
12
作者 Xuedong Tian Ruihan Bai +2 位作者 Fang Yang Jinyuan Bai Xinfu Li 《Journal of Computer and Communications》 2017年第14期1-13,共13页
Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed... Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate. 展开更多
关键词 Mathematical Expression extractION Hidden MARKOV Model text FIELDS DOCUMENTS SYMBOL Combination Features
在线阅读 下载PDF
基于语义特征和TextRank算法的科研成果论文中文文本关键词提取方法
13
作者 张世超 王建宾 孟浩 《华南地震》 2025年第3期188-194,共7页
为准确提取科研成果论文中文文本关键词,并准确排列,研究基于语义特征和TextRank算法的科研成果论文中文文本关键词提取方法。基于语义特征的科研成果论文中文文本候选关键词筛选方法,在Word2Vec工具中,将中文文本转换为词向量,作为论... 为准确提取科研成果论文中文文本关键词,并准确排列,研究基于语义特征和TextRank算法的科研成果论文中文文本关键词提取方法。基于语义特征的科研成果论文中文文本候选关键词筛选方法,在Word2Vec工具中,将中文文本转换为词向量,作为论文中文文本语义特征;将语义特征输入卷积神经网络中,以分类的方式,提取属于候选关键词类型的语义特征,将其所属文本词语作为候选关键词;通过基于TextRank算法的科研成果论文中文文本关键词提取方法,在候选关键词中,以候选关键词的平均信息熵、词性、位置三种特征,为关键词提取指标,构建提取关键词的图模型,运算候选关键词综合权重,以从大到小的方式排列候选关键词,将排名靠前的候选关键词,作为最终提取的关键词,完成科研成果论文中文文本关键词提取。经测试,此方法可提高科研成果论文中文文本关键词提取精度、提高关键词排名准确性。 展开更多
关键词 语义特征 textRank算法 科研成果论文 中文文本 关键词提取 卷积神经网络
在线阅读 下载PDF
基于无监督文本特征的隐含主题自动抽取方法
14
作者 包永红 《现代电子技术》 北大核心 2026年第4期42-46,共5页
文本数据中蕴含着丰富的信息,但这些信息往往以隐含的方式存在,不易被直接观察或理解。目前传统的监督学习方法需要大量的人工标注数据来训练模型,易受标注者的主观性影响,为解决该问题,提出一种基于无监督文本特征的隐含主题自动抽取... 文本数据中蕴含着丰富的信息,但这些信息往往以隐含的方式存在,不易被直接观察或理解。目前传统的监督学习方法需要大量的人工标注数据来训练模型,易受标注者的主观性影响,为解决该问题,提出一种基于无监督文本特征的隐含主题自动抽取方法。利用双向最大匹配法对文本进行分词后,去除其中的停用词,完成文本预处理工作;采用无监督TF-IDF算法提取预处理后文本的特征,再将文本数据转换为数值型特征向量,构建词特征向量集;引入LDA模型自动抽取隐含主题,即构建词特征向量中词汇对应隐含主题的概率分布模型,并利用Gibbs快速抽样法获取模型超参数,得到隐含主题概率分布,进而依据该分布结果实现文本隐含主题的自动抽取。实验结果表明,所提方法在应用过程中的F1值高于0.93,困惑度低于0.6,能够精准地抽取文本中的隐含主题。 展开更多
关键词 隐含主题 自动抽取 文本特征 无监督TF-IDF算法 LDA模型 Gibbs快速抽样法
在线阅读 下载PDF
基于改进的TextRank的自动摘要提取方法 被引量:43
15
作者 余珊珊 苏锦钿 李鹏飞 《计算机科学》 CSCD 北大核心 2016年第6期240-247,共8页
经典的TextRank算法在文档的自动摘要提取时往往只考虑了句子节点间的相似性,而忽略了文档的篇章结构及句子的上下文信息。针对这些问题,结合中文文本的结构特点,提出一种改进后的iTextRank算法,通过将标题、段落、特殊句子、句子位置... 经典的TextRank算法在文档的自动摘要提取时往往只考虑了句子节点间的相似性,而忽略了文档的篇章结构及句子的上下文信息。针对这些问题,结合中文文本的结构特点,提出一种改进后的iTextRank算法,通过将标题、段落、特殊句子、句子位置和长度等信息引入到TextRank网络图的构造中,给出改进后的句子相似度计算方法及权重调整因子,并将其应用于中文文本的自动摘要提取,同时分析了算法的时间复杂度。最后,实验证明iTextRank比经典的TextRank方法具有更高的准确率和更低的召回率。 展开更多
关键词 中文文本 自动摘要提取 textRank 篇章结构 无监督学习方法
在线阅读 下载PDF
基于关系导向的电力设备故障缺陷文本实体及关系联合抽取方法
16
作者 李艾青 宋辉 +2 位作者 田嘉鹏 盛戈皞 江秀臣 《高压电器》 北大核心 2026年第2期42-49,70,共9页
电力设备故障缺陷知识图谱能够有效提升设备运维的智能化、自动化水平,而实体及关系的抽取对图谱的构建至关重要。然而故障缺陷文本中的实体关系三元组往往互相重叠或嵌套,使得传统方法难以处理,并伴随着误差传递、冗余实体推断等问题... 电力设备故障缺陷知识图谱能够有效提升设备运维的智能化、自动化水平,而实体及关系的抽取对图谱的构建至关重要。然而故障缺陷文本中的实体关系三元组往往互相重叠或嵌套,使得传统方法难以处理,并伴随着误差传递、冗余实体推断等问题。针对这些问题,文中提出了一种面向电力设备故障缺陷领域的实体及关系联合抽取方法。该方法将三元组抽取任务建模为不同关系类型下头实体映射到尾实体的过程,通过首先抽取出头实体,再为已识别头实体针对每一种关系分别标记其对应的尾实体,从而有效缓解了三元组重叠嵌套及冗余推断等问题。实验表明,所提出的方法相较于基线模型在三元组出现不同程度重叠或嵌套时表现地更加鲁棒,其F1值提升了8.57%~25.19%,验证了所提模型的有效性与可行性。 展开更多
关键词 电力设备 故障缺陷文本 知识图谱 知识抽取 深度学习
在线阅读 下载PDF
基于改进TextRank的科技文本关键词抽取方法 被引量:6
17
作者 杨冬菊 胡成富 《计算机应用》 CSCD 北大核心 2024年第6期1720-1726,共7页
针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过... 针对科技文本关键词抽取任务中抽取出现次数少但能较好表达文本主旨的词语效果差的问题,提出一种基于改进TextRank的关键词抽取方法。首先,利用词语的词频-逆文档频率(TF-IDF)统计特征和位置特征优化共现图中词语间的概率转移矩阵,通过迭代计算得到词语的初始得分;然后,利用K-Core(K-Core decomposition)算法挖掘KCore子图得到词语的层级特征,利用平均信息熵特征衡量词语的主题表征能力;最后,在词语初始得分的基础上融合层级特征和平均信息熵特征,从而确定关键词。实验结果表明,在公开数据集上,与TextRank方法和OTextRank(Optimized TextRank)方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了6.5和3.3个百分点;在科技服务项目数据集上,与TextRank方法和OTextRank方法相比,所提方法在抽取不同关键词数量的实验中,F1均值分别提高了7.4和3.2个百分点。实验结果验证了所提方法抽取出现频率低但较好表达文本主旨关键词的有效性。 展开更多
关键词 科技文本 关键词抽取 textRank K-Core图 平均信息熵
在线阅读 下载PDF
一种基于TextRank的文本二次聚类算法 被引量:3
18
作者 潘晓英 胡开开 朱静 《计算机技术与发展》 2016年第8期7-11,共5页
针对传统文本聚类技术中存在的聚类精度一般或者运算时间复杂度过高等问题,文中首先介绍了两种较为常用的文本聚类技术:基于划分的K-means和基于主题模型的LDA。在分析各自缺陷的基础上,提出一种基于TextRank的文本二次聚类算法。该算... 针对传统文本聚类技术中存在的聚类精度一般或者运算时间复杂度过高等问题,文中首先介绍了两种较为常用的文本聚类技术:基于划分的K-means和基于主题模型的LDA。在分析各自缺陷的基础上,提出一种基于TextRank的文本二次聚类算法。该算法借鉴主题模型的思想,在传统的聚类过程中引入词聚类,并在关键词提取阶段融合词语的位置与跨度特征,减少了由局部关键词作为全局关键词带来的误差。实验结果表明,改进后的算法在聚类效果上要优于传统的VSM聚类和基于主题模型的LDA算法。 展开更多
关键词 文本聚类 textRank 关键词提取 向量空间模型 LDA
在线阅读 下载PDF
基于SciBERT模型的科技文献摘要识别与生成研究
19
作者 韩淑梅 郭航旭 +1 位作者 蒙杰 赵昕晖 《甘肃科技》 2026年第1期64-72,共9页
为提高科技文献处理效率和信息提取准确性,文章采用SciBERT模型对科技文献进行表示学习,通过构建文本分类模型对科技文献进行深度语义理解、文本编码及特征提取,提出基于SciBERT模型的科技文献摘要识别与生成方法,用于识别科技文献中的... 为提高科技文献处理效率和信息提取准确性,文章采用SciBERT模型对科技文献进行表示学习,通过构建文本分类模型对科技文献进行深度语义理解、文本编码及特征提取,提出基于SciBERT模型的科技文献摘要识别与生成方法,用于识别科技文献中的摘要段落,实现精炼摘要自动生成,以帮助科研人员更快速、准确地了解文献内容,为科技文献信息处理领域的进一步发展提供新的思路和方法,为科技文献摘要识别技术的发展提供有益启示与实践经验。 展开更多
关键词 科技文献摘要生成 SciBERT模型 自然语言处理 文本特征提取
在线阅读 下载PDF
融合多特征的TextRank藏文文本关键词抽取方法研究 被引量:4
20
作者 艾金勇 《情报探索》 2020年第7期1-6,共6页
[目的/意义]旨在为提升藏文文本关键词的抽取效果提供参考。[方法/过程]分析中英文文本关键词抽取方法的特点和存在问题,针对藏文文本特点,提出一种融合多特征的TextRank关键词抽取方法,通过实验获取不同特征的相对最优权重系数,并将权... [目的/意义]旨在为提升藏文文本关键词的抽取效果提供参考。[方法/过程]分析中英文文本关键词抽取方法的特点和存在问题,针对藏文文本特点,提出一种融合多特征的TextRank关键词抽取方法,通过实验获取不同特征的相对最优权重系数,并将权值计算公式应用于TextRank的初始权值与转移概率的计算中。[结果/结论]该方法通过融合藏文文本的结构特征以及词语之间语法关系等关键词提取影响因素,实现了候选关键词的量化权值,相比于传统方法关键词抽取效果有明显提升,同时证明融合结构特征与语法特征能有效改善TextRank算法的性能。 展开更多
关键词 多特征 textRank 藏文文本 关键词抽取
在线阅读 下载PDF
上一页 1 2 65 下一页 到第
使用帮助 返回顶部