期刊文献+
共找到286篇文章
< 1 2 15 >
每页显示 20 50 100
Chinese multi-document personal name disambiguation 被引量:8
1
作者 Wang Houfeng(王厚峰) Mei Zheng 《High Technology Letters》 EI CAS 2005年第3期280-283,共4页
This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors deno... This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus. 展开更多
关键词 personal name disambiguation Chinese multi-document heuristic strategy. agglomerative clustering
在线阅读 下载PDF
Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization 被引量:1
2
作者 Mahdi Gholami Mehr 《Intelligent Information Management》 2013年第6期182-190,共9页
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo... Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches. 展开更多
关键词 multi-document SUMMARIZATION Machine Learning Decision Trees ADABOOST C4.5 MEDICAL Document SUMMARIZATION
在线阅读 下载PDF
Density peaks clustering based integrate framework for multi-document summarization 被引量:2
3
作者 BaoyanWang Jian Zhang +1 位作者 Yi Liu Yuexian Zou 《CAAI Transactions on Intelligence Technology》 2017年第1期26-30,共5页
We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based met... We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10]. 展开更多
关键词 multi-document summarization Integrated score framework Density peaks clustering Sentences rank
在线阅读 下载PDF
Constructing a taxonomy to support multi-document summarization of dissertation abstracts
4
作者 KHOO Christopher S.G. GOH Dion H. 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1258-1267,共10页
This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level an... This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts. 展开更多
关键词 Text summarization Automatic multi-document summarization Variable-based framework Digital library
在线阅读 下载PDF
Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs
5
作者 YE Feiyue XU Xinchen 《Journal of Shanghai Jiaotong university(Science)》 EI 2018年第4期584-592,共9页
As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of do... As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordp. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordp method outperforms the state of the art systems when tested on the Duc2004 data set. Key words: multi-document, graph algorithm, keyword density, Graph & Keywordp, Due2004 展开更多
关键词 multi-document graph algorithm keyword density Graph & Keywordρ Duc2004
原文传递
Research on multi-document summarization based on latent semantic indexing
6
作者 秦兵 刘挺 +1 位作者 张宇 李生 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第1期91-94,共4页
A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decompos... A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decomposition (SVD) to reduce the dimension of the matrix and extract features, and then the sentence similarity is computed. The sentences are clustered according to similarity of sentences. The centroid sentences are selected from each class. Finally, the selected sentences are ordered to generate the summarization. The evaluation and results are presented, which prove that the proposed methods are efficient. 展开更多
关键词 multi-document summarization LSI (latent semantic indexing) CLUSTERING
在线阅读 下载PDF
TWO-STAGE SENTENCE SELECTION APPROACH FOR MULTI-DOCUMENT SUMMARIZATION
7
作者 Zhang Shu Zhao Tiejun Zheng Dequan Zhao Hua 《Journal of Electronics(China)》 2008年第4期562-567,共6页
Compared with the traditional method of adding sentences to get summary in multi-document summarization,a two-stage sentence selection approach based on deleting sentences in acandidate sentence set to generate summar... Compared with the traditional method of adding sentences to get summary in multi-document summarization,a two-stage sentence selection approach based on deleting sentences in acandidate sentence set to generate summary is proposed,which has two stages,the acquisition of acandidate sentence set and the optimum selection of sentence.At the first stage,the candidate sentenceset is obtained by redundancy-based sentence selection approach.At the second stage,optimum se-lection of sentences is proposed to delete sentences in the candidate sentence set according to itscontribution to the whole set until getting the appointed summary length.With a test corpus,theROUGE value of summaries gotten by the proposed approach proves its validity,compared with thetraditional method of sentence selection.The influence of the token chosen in the two-stage sentenceselection approach on the quality of the generated summaries is analyzed. 展开更多
关键词 TWO-STAGE Sentence selection approach multi-document summarization
在线阅读 下载PDF
Multi-Document Summarization Model Based on Integer Linear Programming
8
作者 Rasim Alguliev Ramiz Aliguliyev Makrufa Hajirahimova 《Intelligent Control and Automation》 2010年第2期105-111,共7页
This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main c... This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main content of the text, and summaries are created by extracting the highest scored sentences from the original document. The model formalized as a multiobjective integer programming problem. An advantage of this model is that it can cover the main content of source (s) and provide less redundancy in the generated sum- maries. To extract sentences which form a summary with an extensive coverage of the main content of the text and less redundancy, have been used the similarity of sentences to the original document and the similarity between sentences. Performance evaluation is conducted by comparing summarization outputs with manual summaries of DUC2004 dataset. Experiments showed that the proposed approach outperforms the related methods. 展开更多
关键词 multi-document SUMMARIZATION Content COVERAGE LESS REDUNDANCY INTEGER Linear Programming
在线阅读 下载PDF
Unsupervised Graph-Based Tibetan Multi-Document Summarization
9
作者 Xiaodong Yan Yiqin Wang +3 位作者 Wei Song Xiaobing Zhao A.Run Yang Yanxing 《Computers, Materials & Continua》 SCIE EI 2022年第10期1769-1781,共13页
Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good res... Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good results in the task of text summarization both in Chinese and English,but the research of text summarization in low-resource languages is still in the exploratory stage,especially in Tibetan.What’s more,there is no large-scale annotated corpus for text summarization.The lack of dataset severely limits the development of low-resource text summarization.In this case,unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data.In this paper,we propose an unsupervised graph-based Tibetan multi-document summarization method,which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic.Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough.In terms of topic division,we adopt two level clustering methods converting original document into document-level and sentence-level graph,next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering.Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents.Then model sentence clusters into graphs,finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences,higher topic relevance summary is extracted.In order to promote the development of Tibetan text summarization,and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets,this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments.The experiment results show that our method can effectively improve the quality of summarization and our method is competitive to previous unsupervised methods. 展开更多
关键词 multi-document summarization text clustering topic feature fusion graphic model
在线阅读 下载PDF
BHLM:Bayesian theory-based hybrid learning model for multi-document summarization
10
作者 S.Suneetha A.Venugopal Reddy 《International Journal of Modeling, Simulation, and Scientific Computing》 EI 2018年第2期229-250,共22页
In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necess... In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necessary to summarize the document for obtaining the condensed information.To perform the multi-document summarization,a new Bayesian theory-based Hybrid Learning Model(BHLM)is proposed in this paper.Initially,the input documents are preprocessed,where the stop words are removed from the document.Then,the feature of the sentence is extracted to determine the sentence score for summarizing the document.The extracted feature is then fed into the hybrid learning model for learning.Subsequently,learning feature,training error and correlation coefficient are integrated with the Bayesian model to develop BHLM.Also,the proposed method is used to assign the class label assisted by the mean,variance and probability measures.Finally,based on the class label,the sentences are sorted out to generate the final summary of the multi-document.The experimental results are validated in MATLAB,and the performance is analyzed using the metrics,precision,recall,F-measure and rouge-1.The proposed model attains 99.6%precision and 75%rouge-1 measure,which shows that the model can provide the final summary efficiently. 展开更多
关键词 multi-document text feature sentence score hybrid learning model Bayesian theory
原文传递
多向堆叠记忆网络在证件图像篡改检测中的应用 被引量:1
11
作者 赵卫东 黄见 +1 位作者 张睿 吴乾奕 《小型微型计算机系统》 北大核心 2025年第2期346-352,共7页
随着金融线上业务的迅猛发展,篡改图像信息的问题在风控环节频繁出现.然而,现有的篡改检测模型在处理证件图片的准确性和应对环境干扰方面亟需加强.为解决这一问题,本文提出了一种二阶段篡改检测模型:在第1阶段中,通过将简单堆叠长短期... 随着金融线上业务的迅猛发展,篡改图像信息的问题在风控环节频繁出现.然而,现有的篡改检测模型在处理证件图片的准确性和应对环境干扰方面亟需加强.为解决这一问题,本文提出了一种二阶段篡改检测模型:在第1阶段中,通过将简单堆叠长短期记忆网络改进为多方向堆叠记忆网络,弥补了篡改特征对比方向单一的问题,并且兼顾了图像的位置信息,从而提高篡改鉴别准确率.第2阶段是在初步确定篡改区域后,基于篡改区域外围多层邻域的纹理特征,以注意力机制为核心推测中心区域纹理特征值,再与原中心区域纹理特征值对比筛选假阳性区域.实验表明,本文的改进方法是有效的. 展开更多
关键词 篡改检测 证件图像 多向堆叠记忆网络 多邻域纹理特征
在线阅读 下载PDF
基于隔行对照标注策略的少数民族古文献开发研究——以藏文古文献隔行标注为例
12
作者 龙从军 安波 赵维纳 《中文信息学报》 北大核心 2025年第3期49-58,共10页
少数民族古籍是我国古籍文献的重要组成部分,是中华文明不可或缺的文明成果。但受制于语言文字识读的限制,参与民族古籍整理、挖掘和开发利用的研究团队规模小,技术力量不足,民族古籍文献的利用和普及传播力度不够。基于此,该文提出民... 少数民族古籍是我国古籍文献的重要组成部分,是中华文明不可或缺的文明成果。但受制于语言文字识读的限制,参与民族古籍整理、挖掘和开发利用的研究团队规模小,技术力量不足,民族古籍文献的利用和普及传播力度不够。基于此,该文提出民族古籍隔行对照标注策略,旨在一定程度上解决文字识读困难,鼓励更多跨学科研究者参与民族古籍文献的研究,提高民族古籍开发效率。该文以藏文古文献为例,探索隔行标注策略,在人工标注一定规模语料的前提下,提出了基于多任务学习的隔行对照标注策略。该方法有效提升了隔行数据标注速度,减少了人工标注的工作量,有利于构建大规模的隔行对照数据库。实验结果表明,经过10000条标注语料训练后,该模型在分词行和标注行上分别取得70.9%和63.2%的F 1值,在翻译行上取得18.7%的BLEU值。基于隔行对照标注策略的方法显著地提升了民族古文献的研究范围和深度,避免了民族语本身带来的限制,为挖掘和弘扬中华民族传统文化贡献力量。 展开更多
关键词 藏文古文献 隔行标注 多任务学习 机器学习 民族古文献
在线阅读 下载PDF
高轨遥感卫星数传处理器设计与验证
13
作者 李永峰 李文东 +2 位作者 阎昆 刘晓飞 郑小松 《航天器工程》 北大核心 2025年第2期66-74,共9页
针对高轨遥感卫星获取信息成本高、成像分辨率低、星地链路带宽小、信号覆盖范围广的特点,提出一种数传处理器设计。采用高可靠数据接口、高保真图像压缩、多文件存储管理、自适应速率控制等多项关键技术,以较低的硬件资源开销实现了多... 针对高轨遥感卫星获取信息成本高、成像分辨率低、星地链路带宽小、信号覆盖范围广的特点,提出一种数传处理器设计。采用高可靠数据接口、高保真图像压缩、多文件存储管理、自适应速率控制等多项关键技术,以较低的硬件资源开销实现了多个设备间高速遥感数据的无误码交换,具有更优的图像压缩性能,在不增加额外硬件资源配置的前提下可支持多个任务的并行开展,并使传输通道的有效帧效率达到100%。文章提出的设计,高效实现了高轨遥感卫星的数据处理与传输需求,显著提升了高轨遥感卫星的应用效能。 展开更多
关键词 高轨遥感卫星 数传处理器 高保真图像压缩 多文件管理
在线阅读 下载PDF
面向可溯源文本生成的科技文献伪反馈训练数据合成研究
14
作者 马永强 刘家伟 高影繁 《情报学报》 北大核心 2025年第7期830-845,共16页
在学术文本中插入恰当的引文标识是学术写作的基本规范,可以帮助读者验证文本内容的真实性。引文标识符可以用于实现内容溯源、保证内容可验证性。在学术场景中,现有大语言模型普遍缺乏内置的内容溯源机制,导致所生成学术文本的可验证... 在学术文本中插入恰当的引文标识是学术写作的基本规范,可以帮助读者验证文本内容的真实性。引文标识符可以用于实现内容溯源、保证内容可验证性。在学术场景中,现有大语言模型普遍缺乏内置的内容溯源机制,导致所生成学术文本的可验证性不足。当前,借助领域数据集来优化大模型是主流的研究思路。然而,在优化模型可溯源性方面,基于人类撰写的学术文本所构建的训练集存在内在一致性不足、引文标注行为差异性大等问题,基于大模型的数据合成方法在数据多样性方面也存在局限性。为此,本文提出了一种面向可溯源学术文本的引文标识符体系与评测方法,用于分析大模型所生成学术文本的可溯源性。然后,从训练数据的角度,针对可溯源的学术文本生成,本文提出了一种两阶段伪反馈训练数据合成方法,兼顾大模型标注文本和人类标注文本的特性,构建高质量、多样化的训练数据。研究结果表明,采用本文构建的合成数据训练的小模型,能够生成更具可溯源性的学术文本;通过第二阶段的伪反馈进一步优化数据分布和任务多样性,有助于增强模型的泛化能力。 展开更多
关键词 大语言模型 数据合成 学术多文档摘要 文本可溯源性
在线阅读 下载PDF
基于CNN-BiLSTM-CBAM的多特征融合恶意PDF文档检测方法
15
作者 王友贺 孙奕 《信息网络安全》 北大核心 2025年第10期1579-1588,共10页
为应对现有恶意PDF文档检测方法忽视特征之间语义关系以及局限于单一类型的特征分析等问题,文章提出一种检测方案,将CNN-BiLSTM-CBAM的模型和多特征融合应用于恶意PDF文档检测中。该方法不仅融合了静态分析中提取的常规信息和结构信息,... 为应对现有恶意PDF文档检测方法忽视特征之间语义关系以及局限于单一类型的特征分析等问题,文章提出一种检测方案,将CNN-BiLSTM-CBAM的模型和多特征融合应用于恶意PDF文档检测中。该方法不仅融合了静态分析中提取的常规信息和结构信息,还结合了动态分析捕获的API序列信息,构建了一个全面多维的特征集。首先,该模型利用卷积神经网络提取特征集中的局部特征;然后,利用双向长短时记忆(BiLSTM)网络捕获特征间的依赖性和上下文语义关系特征,通过卷积块注意力模块(CBAM)为不同特征分配不同的权重,筛选出较具区分性的关键特征;最后,利用Softmax分类器计算检测结果。实验结果表明,与现有方法相比,该模型在准确率、召回率和F1分数等关键性能指标上均展现出显著优势,有效提升了恶意PDF文档的检测性能。 展开更多
关键词 恶意PDF文档检测 多特征融合 卷积块注意力模块 双向长短时记忆网络
在线阅读 下载PDF
基于NLP和图像分类模型的中文科技文献双模态分类方法
16
作者 王峥 丁熠 +1 位作者 陈海明 陈盈 《南京师大学报(自然科学版)》 北大核心 2025年第3期84-92,共9页
随着当前对科技文献管理和组织要求的急剧增加,对于更为可扩展、精确且自动化的文献分类方式的需求也更高.为了有效应对海量科技文献数据的分析难题,提出了融合YOLOv7图像分类模型和自然语言处理(NLP)模型的多模态文献分析引擎.该架构... 随着当前对科技文献管理和组织要求的急剧增加,对于更为可扩展、精确且自动化的文献分类方式的需求也更高.为了有效应对海量科技文献数据的分析难题,提出了融合YOLOv7图像分类模型和自然语言处理(NLP)模型的多模态文献分析引擎.该架构充分挖掘文档中的自然语言文本、描述性图像以及两者间的内在关联这3种关键信息,通过综合训练流程整合不同模态的深度学习网络,达成相较于单模态分类方法更优的分类精准度.同时,将所提方法应用到中文科技文献数据集,并依据中图分类号对文献进行了分类训练.结果表明,所提双模态文献分类方法具有更高的分类准确性,有助于企事业单位和研究机构在数据与知识管理方面的效率提升. 展开更多
关键词 科技文献分类 图像分类 多模态特征 自然语言处理 深度学习 YOLOv7
在线阅读 下载PDF
基于双流自适应特征融合的多模态烟草文档分类
17
作者 孙首名 张琦 +2 位作者 王喆 苏娜 沈奇 《绿洲农业科学与工程》 2025年第1期160-163,共4页
针对烟草文档自动化分类的需求,提出一种基于双流自适应特征融合的多模态烟草文档分类网络,名为DSAFFNet。该网络结合烟草文档的文本模态和图像模态,采用DSAFF(Dual-StreamAdaptiveFeatureFusion)模块对不同模态特征的重要性自适应调整... 针对烟草文档自动化分类的需求,提出一种基于双流自适应特征融合的多模态烟草文档分类网络,名为DSAFFNet。该网络结合烟草文档的文本模态和图像模态,采用DSAFF(Dual-StreamAdaptiveFeatureFusion)模块对不同模态特征的重要性自适应调整权重,实现灵活而精确的多模态融合。试验结果表明,所提网络在烟草文档数据集上的表现优于以往分类方法。 展开更多
关键词 烟草文档分类 多模态学习 双流网络
在线阅读 下载PDF
零信任环境下的多层次身份认证数据流安全检测算法 被引量:5
18
作者 顾健华 冯建华 +1 位作者 高泽芳 文成江 《现代电子技术》 北大核心 2025年第1期85-89,共5页
身份认证数据流中的敏感信息可能在传输过程中被攻击者截获,并用于恶意目的,导致隐私泄露、身份盗用等风险,为确保网络安全性,提高主体身份认证安全性,提出零信任环境下的多层次身份认证数据流安全检测算法。采用改进的文档指纹检测算... 身份认证数据流中的敏感信息可能在传输过程中被攻击者截获,并用于恶意目的,导致隐私泄露、身份盗用等风险,为确保网络安全性,提高主体身份认证安全性,提出零信任环境下的多层次身份认证数据流安全检测算法。采用改进的文档指纹检测算法实现多层次身份认证过程中主体和客体交互数据流安全监测。通过Rabin-Karp算法实现身份认证数据文档的分块,采用Winnow算法划分身份认证数据分块文档边界后,得到身份认证数据文档指纹,将其与指纹库中的指纹进行匹配对比,识别出多层次身份认证数据流中的异常数据,实现多层次身份认证数据流安全检测。实验结果表明,该算法具有较好的身份认证数据流安全检测能力,有效地降低了网络威胁频率,提升了网络安全性。 展开更多
关键词 零信任 多层次身份认证 数据流安全检测 文档指纹检测算法 Rabin-Karp算法 WINNOW算法
在线阅读 下载PDF
基于知识图谱中多维元路径的科技文档查询扩展
19
作者 徐建民 仝思梦 张国防 《计算机工程与科学》 北大核心 2025年第8期1493-1502,共10页
针对现有科技文档的查询扩展方法存在文档信息利用不充分、文档间关联关系未能有效利用等方面的局限性,提出一种基于知识图谱中多维元路径的科技文档查询扩展方法。首先,对伪相关反馈文档集进行处理得到候选扩展词集;其次,在对科技文档... 针对现有科技文档的查询扩展方法存在文档信息利用不充分、文档间关联关系未能有效利用等方面的局限性,提出一种基于知识图谱中多维元路径的科技文档查询扩展方法。首先,对伪相关反馈文档集进行处理得到候选扩展词集;其次,在对科技文档知识图谱进行分析的基础上,寻找合适的元路径表示用户查询与候选扩展词的关联关系,并基于节点间不同的元路径关联计算用户查询与候选扩展词之间的多维语义相关度;最后,融合多维语义相关度以及候选扩展词在伪相关反馈文档集中的权重选择最终扩展词,实现对用户查询的扩展。实验结果显示,与已有的查询扩展方法相比,基于知识图谱中多维元路径的科技文档查询扩展方法在mAP,DCG和NDCG上分别至少提升了9.21%,10%和11.7%。 展开更多
关键词 知识图谱 查询扩展 多维元路径 科技文档 信息检索
在线阅读 下载PDF
基于改进YOLOv5s的文档图像版面分析算法
20
作者 尹玲 李家乐 黄勃 《软件导刊》 2025年第2期146-154,共9页
针对当前基于深度学习的版面分析方法存在效率低和训练成本高的问题,提出一种基于YOLOv5s改进的单阶段目标检测网络RCW-YOLO,并将其应用于文档图像版面分析任务。首先,通过Res2Net模块改进YOLOv5s中的C3模块,有效增强网络对文档图像多... 针对当前基于深度学习的版面分析方法存在效率低和训练成本高的问题,提出一种基于YOLOv5s改进的单阶段目标检测网络RCW-YOLO,并将其应用于文档图像版面分析任务。首先,通过Res2Net模块改进YOLOv5s中的C3模块,有效增强网络对文档图像多尺度特征的提取能力;其次,引入轻量级上采样算子CARAFE以优化特征融合网络,减少上采样过程中的信息丢失;最后,引入WIoUv3作为边界框回归损失函数,制定合适的梯度权益分配策略,以提升模型泛化能力和整体性能。实验结果表明,在CDLA、IIIT-AR-13K和PubLayNet数据集上,RCW-YOLO在mAP@0.50:0.95指标上分别达到了87.2%、76.4%和94.5%,优于现有的两阶段算法和其他单阶段算法,同时具有更低的计算量、参数量和更快的推断效率。 展开更多
关键词 文档图像版面分析 目标检测 YOLOv5s 多尺度特征提取
在线阅读 下载PDF
上一页 1 2 15 下一页 到第
使用帮助 返回顶部