期刊文献+
共找到34,147篇文章
< 1 2 250 >
每页显示 20 50 100
Automated Multi-Document Biomedical Text Summarization Using Deep Learning Model 被引量:3
1
作者 Ahmed S.Almasoud Siwar Ben Haj Hassine +5 位作者 Fahd N.Al-Wesabi Mohamed K.Nour Anwer Mustafa Hilal Mesfer Al Duhayyim Manar Ahmed Hamza Abdelwahed Motwakel 《Computers, Materials & Continua》 SCIE EI 2022年第6期5799-5815,共17页
Due to the advanced developments of the Internet and information technologies,a massive quantity of electronic data in the biomedical sector has been exponentially increased.To handle the huge amount of biomedical dat... Due to the advanced developments of the Internet and information technologies,a massive quantity of electronic data in the biomedical sector has been exponentially increased.To handle the huge amount of biomedical data,automated multi-document biomedical text summarization becomes an effective and robust approach of accessing the increased amount of technical and medical literature in the biomedical sector through the summarization of multiple source documents by retaining the significantly informative data.So,multi-document biomedical text summarization acts as a vital role to alleviate the issue of accessing precise and updated information.This paper presents a Deep Learning based Attention Long Short Term Memory(DLALSTM)Model for Multi-document Biomedical Text Summarization.The proposed DL-ALSTM model initially performs data preprocessing to convert the available medical data into a compatible format for further processing.Then,the DL-ALSTM model gets executed to summarize the contents from the multiple biomedical documents.In order to tune the summarization performance of the DL-ALSTM model,chaotic glowworm swarm optimization(CGSO)algorithm is employed.Extensive experimentation analysis is performed to ensure the betterment of the DL-ALSTM model and the results are investigated using the PubMed dataset.Comprehensive comparative result analysis is carried out to showcase the efficiency of the proposed DL-ALSTM model with the recently presented models. 展开更多
关键词 BIOMEDICAL text summarization healthcare deep learning lstm parameter tuning
暂未订购
Hint-SQL:基于自动线索生成的Text-to-SQL提示方法
2
作者 谭钊 刘喜平 +4 位作者 舒晴 万齐智 刘德喜 万常选 廖国琼 《计算机学报》 北大核心 2026年第3期700-720,共21页
Text-to-SQL旨在将自然语言问题翻译为可被数据库系统执行的SQL语句,从而为数据查询提供便利。随着大语言模型(LLMs)技术的发展,基于LLMs的Text-to-SQL提示方法成为该领域的主流解决方案。近年来,研究者在LLMs的提示词中加入线索(Hint)... Text-to-SQL旨在将自然语言问题翻译为可被数据库系统执行的SQL语句,从而为数据查询提供便利。随着大语言模型(LLMs)技术的发展,基于LLMs的Text-to-SQL提示方法成为该领域的主流解决方案。近年来,研究者在LLMs的提示词中加入线索(Hint)来传递具体的Text-to-SQL建议,以引导LLMs生成SQL。然而,现有线索多由研究者根据Text-to-SQL任务的特点人为撰写,其内容过于宽泛,难以根据具体的任务需求做出调整,无法适配所有Text-to-SQL任务。本文提出基于自动线索生成的Text-to-SQL提示方法Hint-SQL,它能够根据当前Text-to-SQL任务自动地生成合适的语义、操作和结构线索,从而引导LLMs生成语义一致、结构正确的SQL。为了生成任务定制化线索,我们构建了线索生成智能体(HAgent)。HAgent基于两阶段微调框架,由开源LLMs微调而来,该框架自动合成微调所需数据,无需人工标注,为监督微调和偏好学习优化提供支持。HintSQL既可以单独使用,也可以用来增强现有方法。大规模实验结果显示,HintSQL独立使用时可以媲美主流方法,也可以显著增强现有方法性能,在BIRD数据集上,HintSQL将当前最好方法的准确率提升到了71.58%,提升幅度达到4.37%。本研究揭示了线索在Text-to-SQL任务中的重要作用,为Text-to-SQL的后续研究提供了参考。 展开更多
关键词 自然语言处理 text-to-SQL 大语言模型 提示工程 线索
在线阅读 下载PDF
AI-Generated Text Detection:A Comprehensive Review of Active and Passive Approaches
3
作者 Lingyun Xiang Nian Li +1 位作者 Yuling Liu Jiayong Hu 《Computers, Materials & Continua》 2026年第3期201-229,共29页
The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and ot... The rapid advancement of large language models(LLMs)has driven the pervasive adoption of AI-generated content(AIGC),while also raising concerns about misinformation,academic misconduct,biased or harmful content,and other risks.Detecting AI-generated text has thus become essential to safeguard the authenticity and reliability of digital information.This survey reviews recent progress in detection methods,categorizing approaches into passive and active categories based on their reliance on intrinsic textual features or embedded signals.Passive detection is further divided into surface linguistic feature-based and language model-based methods,whereas active detection encompasses watermarking-based and semantic retrieval-based approaches.This taxonomy enables systematic comparison of methodological differences in model dependency,applicability,and robustness.A key challenge for AI-generated text detection is that existing detectors are highly vulnerable to adversarial attacks,particularly paraphrasing,which substantially compromises their effectiveness.Addressing this gap highlights the need for future research on enhancing robustness and cross-domain generalization.By synthesizing current advances and limitations,this survey provides a structured reference for the field and outlines pathways toward more reliable and scalable detection solutions. 展开更多
关键词 AI-generated text detection large language models text classification WATERMARKING
在线阅读 下载PDF
Out-of-distribution Detection for Power System Text Data by Enhanced Mahalanobis Distance with Calibration
4
作者 Yixiang Zhang Huifang Wang +3 位作者 Yuzhen Zheng Zhengming Fei Hui Zhou Huafeng Luo 《Protection and Control of Modern Power Systems》 2026年第1期40-52,共13页
The increasing significance of text data in power system intelligence has highlighted the out-of-distribution(OOD)problem as a critical challenge,hindering the deployment of artificial intelligence(AI)models.In a clos... The increasing significance of text data in power system intelligence has highlighted the out-of-distribution(OOD)problem as a critical challenge,hindering the deployment of artificial intelligence(AI)models.In a closed-world setting,most AI models cannot detect and reject unexpected data,which exacerbates the harmful impact of the OOD problem.The high similarity between OOD and indistribution(IND)samples in the power system presents challenges for existing OOD detection methods in achieving effective results.This study aims to elucidate and address the OOD problem in power systems through a text classification task.First,the underlying causes of OOD sample generation are analyzed,highlighting the inherent nature of the OOD problem in the power system.Second,a novel method integrating the enhanced Mahalanobis distance with calibration strategies is introduced to improve OOD detection for text data in power system applications.Finally,the case study utilizing the actual text data from power system field operation(PSFO)is conducted,demonstrating the effectiveness of the proposed OOD detection method.Experimental results indicate that the proposed method outperformed existing methods in text OOD detection tasks within the power system,achieving a remarkable 21.03%enhancement of metric in the false positive rate at 95%true positive recall(FPR95)and a 12.97%enhancement in classi-fication accuracy for the mixed IND-OOD scenarios. 展开更多
关键词 Out-of-distribution detection text clas-sification text data applications in power grid machine learning natural language processing
在线阅读 下载PDF
基于BERT-TextCNN模型的基础研究与应用研究论文分类方法研究
5
作者 张萌萌 钟永恒 刘佳 《科技管理研究》 2026年第1期256-267,共12页
研究旨在构建一种高效且精准的分类模型用于判别单篇论文归属基础研究或应用研究。通过构建融合半自动标注的BERT-TextCNN模型,借助半自动标注策略降低人工标注工作量并提高模型分类效率,利用BERT生成文本向量,通过TextCNN提取关键特征... 研究旨在构建一种高效且精准的分类模型用于判别单篇论文归属基础研究或应用研究。通过构建融合半自动标注的BERT-TextCNN模型,借助半自动标注策略降低人工标注工作量并提高模型分类效率,利用BERT生成文本向量,通过TextCNN提取关键特征;通过文献计量法和BERTopic模型分析量子信息领域的分类结果。结果表明,该模型的F1值高达0.896,相较于BERT和TextCNN分别提升2.1%和7.9%,并显著优于Baichuan4-Turbo、DeepSeek-V3和GLM-4-Plus等大语言模型,F1值提升幅度分别为12.2%、13.1%和18.8%。这既验证了语义表征与局部特征融合机制的优越性,又有效克服了大语言模型在专业领域分类中存在的“高召回低精度”缺陷。将模型应用至量子信息领域,发现基础研究聚焦在量子态与纠缠、离子自旋等方向,应用研究重点关注密钥分发、量子传感与网络组件等研究。研究为科学文献分类提供了新方法,在科研评估与资源优化方面具有重要应用价值。 展开更多
关键词 文献分类 深度学习 半自动标注 文本挖掘 量子信息
在线阅读 下载PDF
Context-Aware Spam Detection Using BERT Embeddings with Multi-Window CNNs
6
作者 Sajid Ali Qazi Mazhar Ul Haq +3 位作者 Ala Saleh Alluhaidan Muhammad Shahid Anwar Sadique Ahmad Leila Jamel 《Computer Modeling in Engineering & Sciences》 2026年第1期1296-1310,共15页
Spam emails remain one of the most persistent threats to digital communication,necessitating effective detection solutions that safeguard both individuals and organisations.We propose a spam email classification frame... Spam emails remain one of the most persistent threats to digital communication,necessitating effective detection solutions that safeguard both individuals and organisations.We propose a spam email classification frame-work that uses Bidirectional Encoder Representations from Transformers(BERT)for contextual feature extraction and a multiple-window Convolutional Neural Network(CNN)for classification.To identify semantic nuances in email content,BERT embeddings are used,and CNN filters extract discriminative n-gram patterns at various levels of detail,enabling accurate spam identification.The proposed model outperformed Word2Vec-based baselines on a sample of 5728 labelled emails,achieving an accuracy of 98.69%,AUC of 0.9981,F1 Score of 0.9724,and MCC of 0.9639.With a medium kernel size of(6,9)and compact multi-window CNN architectures,it improves performance.Cross-validation illustrates stability and generalization across folds.By balancing high recall with minimal false positives,our method provides a reliable and scalable solution for current spam detection in advanced deep learning.By combining contextual embedding and a neural architecture,this study develops a security analysis method. 展开更多
关键词 E-mail spam detection BERT embedding text classification CYBERSECURITY CNN
在线阅读 下载PDF
The Continuation Task and the Model-as-Feedback Writing Task in L2 Writing Development:Timing of Model Texts
7
作者 Xiaoyan Zhang 《Chinese Journal of Applied Linguistics》 2026年第1期76-91,160,共17页
This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a con... This study compares the relative efficacy of the continuation task and the model-as-feedbackwriting (MAFW) task in EFL writing development. Ninety intermediate-level Chinese EFL learnerswere randomly assigned to a continuation group, a MAFW group, and a control group, each with30 learners. A pretest and a posttest were used to gauge L2 writing development. Results showedthat the continuation task outperformed the MAFW task not only in enhancing the overall qualityof L2 writing, but also in promoting the quality of three components of L2 writing, namely, content,organization, and language. The finding has important implications for L2 writing teaching andlearning. 展开更多
关键词 continuation task model-as-feedback writing task L2 writing development timing of model texts
在线阅读 下载PDF
Evolution and insights of China’s environmental governance policies:An LDA-based policy text analysis
8
作者 HUA Yu-chen YANG Jia-meng +2 位作者 WEI Ren-jie CHENG Xiu LIU Zhi-yong 《Ecological Economy》 2026年第1期2-30,共29页
China’s environmental governance strategy provides a distinctive pathway for integrating sustainable development into national policy.Understanding its policy trajectory is essential for assessing China’s contributi... China’s environmental governance strategy provides a distinctive pathway for integrating sustainable development into national policy.Understanding its policy trajectory is essential for assessing China’s contribution to global sustainable development and the United Nations Sustainable Development Goals(SDGs).This study constructs a comprehensive database of 425 national environmental governance policy documents issued between 1978 and 2022 and applies Latent Dirichlet Allocation(LDA)modeling to examine the evolution of policy themes and discourse.The results show that China’s environmental governance has undergone four stages-initial exploration,detailed development,transformative leap,and diverse prosperity-reflecting a progressive shift toward more integrated and coordinated governance.Policy priorities have evolved from a primary focus on pollution control and energy transition to an emphasis on institutional construction and organizational reform,thereby strengthening alignment with the SDGs.This transformation is characterized by recurring developmental themes and increasingly preventive,forward-looking,and system-oriented governance approaches.Moreover,the co-evolution of policy concepts and implementation has driven a transition from localized,end-of-pipe responses to comprehensive governance frameworks,alongside a shift from normative guidance towards effectiveness-oriented policy design.By employing a data-driven text analysis approach,this study offers a systematic framework for tracing long-term policy evolution and assessing its implications for sustainable development. 展开更多
关键词 environmental governance policy text analysis LDA topic modeling topic evolution sustainable development policy policy transformation
原文传递
Research on the Classification of Digital Cultural Texts Based on ASSC-TextRCNN Algorithm
9
作者 Zixuan Guo Houbin Wang +1 位作者 Sameer Kumar Yuanfang Chen 《Computers, Materials & Continua》 2026年第3期2119-2145,共27页
With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard ... With the rapid development of digital culture,a large number of cultural texts are presented in the form of digital and network.These texts have significant characteristics such as sparsity,real-time and non-standard expression,which bring serious challenges to traditional classification methods.In order to cope with the above problems,this paper proposes a new ASSC(ALBERT,SVD,Self-Attention and Cross-Entropy)-TextRCNN digital cultural text classification model.Based on the framework of TextRCNN,the Albert pre-training language model is introduced to improve the depth and accuracy of semantic embedding.Combined with the dual attention mechanism,the model’s ability to capture and model potential key information in short texts is strengthened.The Singular Value Decomposition(SVD)was used to replace the traditional Max pooling operation,which effectively reduced the feature loss rate and retained more key semantic information.The cross-entropy loss function was used to optimize the prediction results,making the model more robust in class distribution learning.The experimental results indicate that,in the digital cultural text classification task,as compared to the baseline model,the proposed ASSC-TextRCNN method achieves an 11.85%relative improvement in accuracy and an 11.97%relative increase in the F1 score.Meanwhile,the relative error rate decreases by 53.18%.This achievement not only validates the effectiveness and advanced nature of the proposed approach but also offers a novel technical route and methodological underpinnings for the intelligent analysis and dissemination of digital cultural texts.It holds great significance for promoting the in-depth exploration and value realization of digital culture. 展开更多
关键词 text classification natural language processing textRCNN model albert pre-training singular value decomposition cross-entropy loss function
在线阅读 下载PDF
基于Text-Mining分析的新中式服装的文化性与设计性研究
10
作者 宾森 《丝绸》 北大核心 2026年第4期21-33,共13页
针对当前新中式服装研究多依赖定性分析、缺乏大规模数据支撑,且对文化性与设计性内在关联探讨不足的问题。文章通过大数据分析方法领域中的Text-Mining分析方法,对淘宝平台采集的4682条新中式服装商品描述文本进行了词频分析与LDA主题... 针对当前新中式服装研究多依赖定性分析、缺乏大规模数据支撑,且对文化性与设计性内在关联探讨不足的问题。文章通过大数据分析方法领域中的Text-Mining分析方法,对淘宝平台采集的4682条新中式服装商品描述文本进行了词频分析与LDA主题模型分析。在此基础上,文章系统揭示了新中式服装文化性与设计性的三大表达特点。进一步而言,新中式服装的生命力源于“文化性”与“设计性”之间动态、共生的相互作用。即文化内核为设计注入灵魂与辨识度,而设计创新则反过来赋能文化,使新中式服装得以在当代生活语境中活化与延续。 展开更多
关键词 新中式 文化性 设计性 text-mining分析 服装设计 中国服饰
在线阅读 下载PDF
RNSQL:融合逆规范化的Text2SQL生成
11
作者 帖军 范子琪 +2 位作者 孙翀 郑禄 朱柏尔 《计算机应用与软件》 北大核心 2025年第9期31-37,86,共8页
Text2SQL是自然语言处理科研领域中的一项重要任务,在研究智能问答系统中发挥关键性的作用,其核心任务是将自然语言描述的问题自动转换为SQL查询语句。当前研究重点为提高SQL子句任务的匹配准确率,但忽略了SQL的句法生成的正确性,涉及... Text2SQL是自然语言处理科研领域中的一项重要任务,在研究智能问答系统中发挥关键性的作用,其核心任务是将自然语言描述的问题自动转换为SQL查询语句。当前研究重点为提高SQL子句任务的匹配准确率,但忽略了SQL的句法生成的正确性,涉及多表连接的SQL生成仍存在大量错误。因此,提出一种基于神经网络的Text2SQL方法,该方法通过逆规范化技术,对数据库模式进行重构,关注SQL句法生成的正确性,称为逆规范化网络(Reverse Normalization SQL,RNSQL)。经理论分析和在公共数据集Spider上实验验证,RNSQL能有效提升Text2SQL任务的质量。 展开更多
关键词 逆规范化 语义解析 text2SQL 槽填充
在线阅读 下载PDF
基于Transformer和Text-CNN的日志异常检测 被引量:1
12
作者 尹春勇 张小虎 《计算机工程与科学》 北大核心 2025年第3期448-458,共11页
日志数据作为软件系统中最为重要的数据资源之一,记录着系统运行期间的详细信息,自动化的日志异常检测对于维护系统安全至关重要。随着大型语言模型在自然语言处理领域的广泛应用,基于Transformer的日志异常检测方法被广泛地提出。传统... 日志数据作为软件系统中最为重要的数据资源之一,记录着系统运行期间的详细信息,自动化的日志异常检测对于维护系统安全至关重要。随着大型语言模型在自然语言处理领域的广泛应用,基于Transformer的日志异常检测方法被广泛地提出。传统的基于Transformer的方法,难以捕捉日志序列的局部特征,针对上述问题,提出了基于Transformer和Text-CNN的日志异常检测方法LogTC。首先,通过规则匹配将日志转换成结构化的日志数据,并保留日志语句中的有效信息;其次,根据日志特性采用固定窗口或会话窗口将日志语句划分为日志序列;再次,使用自然语言处理技术Sentence-BERT生成日志语句的语义化表示;最后,将日志序列的语义化向量输入到LogTC日志异常检测模型中进行检测。实验结果表明,LogTC能够有效地检测日志数据中的异常,且在2个数据集上都取得了较好的结果。 展开更多
关键词 日志异常检测 深度学习 词嵌入 TRANSFORMER text-CNN
在线阅读 下载PDF
J-TEXT托卡马克相干成像光谱诊断系统设计
13
作者 聂林 吴骏彬 +5 位作者 龙婷 雷驰 严伟 李杨波 张霄翼 J-TEXT实验团队 《核聚变与等离子体物理》 北大核心 2025年第3期273-279,共7页
相干成像光谱诊断是一种采用高速相机拍摄方式对等离子体边界的杂质离子流速进行二维成像的被动光谱诊断,对研究托卡马克边界和偏滤器等离子体环向旋转、杂质离子分布有着重要的作用。J-TEXT装置成功研制并部署了一套主要基于CⅢ线(464.... 相干成像光谱诊断是一种采用高速相机拍摄方式对等离子体边界的杂质离子流速进行二维成像的被动光谱诊断,对研究托卡马克边界和偏滤器等离子体环向旋转、杂质离子分布有着重要的作用。J-TEXT装置成功研制并部署了一套主要基于CⅢ线(464.88 nm)的相干成像光谱诊断系统。该系统的光学视场设计为12°,主要针对J-TEXT强场侧边缘等离子体区域进行观测。在性能指标方面,系统具备2 ms的时间分辨率,同时实现了11 mm(垂直方向)空间分辨率。目前该诊断系统已完成实验测试,并成功获取了等离子体边界的关键数据,为开展边界物理研究提供了新的实验手段。 展开更多
关键词 相干成像光谱诊断 环向速度 J-text托卡马克
在线阅读 下载PDF
Chinese multi-document personal name disambiguation 被引量:8
14
作者 Wang Houfeng(王厚峰) Mei Zheng 《High Technology Letters》 EI CAS 2005年第3期280-283,共4页
This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors deno... This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus. 展开更多
关键词 personal name disambiguation Chinese multi-document heuristic strategy. agglomerative clustering
在线阅读 下载PDF
中文短文本情感分类:融入位置感知强化的Transformer-TextCNN模型研究
15
作者 李浩君 王耀东 汪旭辉 《计算机工程与应用》 北大核心 2025年第11期216-226,共11页
针对当前中文短文本情感分类模型文本位置信息与关键特征获取不足的问题,提出了一种融入位置感知强化的Transformer-TextCNN情感分类模型。利用BERT可学习绝对位置编码与正弦位置编码强化模型的位置感知能力,融合Transformer的全局上下... 针对当前中文短文本情感分类模型文本位置信息与关键特征获取不足的问题,提出了一种融入位置感知强化的Transformer-TextCNN情感分类模型。利用BERT可学习绝对位置编码与正弦位置编码强化模型的位置感知能力,融合Transformer的全局上下文理解能力与TextCNN的局部特征捕捉能力,分别提取中文短文本全局特征与局部特征,构建位置感知强化与特征协同的情感特征输出服务,实现中文短文本情感准确分类。实验结果表明,该模型在视频弹幕数据集上的准确率达到90.23%,在SMP2020数据集上的准确率达到87.38%。相较于最优的基线模型,准确率在视频弹幕数据集和SMP2020数据集上分别提高了1.98和0.44个百分点,在中文短文本情感分类任务中取得更好的分类效果。 展开更多
关键词 文本情感分类 BERT TRANSFORMER textCNN 位置编码
在线阅读 下载PDF
Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization 被引量:1
16
作者 Mahdi Gholami Mehr 《Intelligent Information Management》 2013年第6期182-190,共9页
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo... Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches. 展开更多
关键词 multi-document SUMMARIZATION Machine Learning Decision Trees ADABOOST C4.5 MEDICAL Document SUMMARIZATION
在线阅读 下载PDF
Constructing a taxonomy to support multi-document summarization of dissertation abstracts
17
作者 KHOO Christopher S.G. GOH Dion H. 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1258-1267,共10页
This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level an... This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts. 展开更多
关键词 text summarization Automatic multi-document summarization Variable-based framework Digital library
在线阅读 下载PDF
Density peaks clustering based integrate framework for multi-document summarization 被引量:3
18
作者 BaoyanWang Jian Zhang +1 位作者 Yi Liu Yuexian Zou 《CAAI Transactions on Intelligence Technology》 2017年第1期26-30,共5页
We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based met... We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10]. 展开更多
关键词 multi-document summarization Integrated score framework Density peaks clustering Sentences rank
在线阅读 下载PDF
Unsupervised Graph-Based Tibetan Multi-Document Summarization
19
作者 Xiaodong Yan Yiqin Wang +3 位作者 Wei Song Xiaobing Zhao A.Run Yang Yanxing 《Computers, Materials & Continua》 SCIE EI 2022年第10期1769-1781,共13页
Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good res... Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good results in the task of text summarization both in Chinese and English,but the research of text summarization in low-resource languages is still in the exploratory stage,especially in Tibetan.What’s more,there is no large-scale annotated corpus for text summarization.The lack of dataset severely limits the development of low-resource text summarization.In this case,unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data.In this paper,we propose an unsupervised graph-based Tibetan multi-document summarization method,which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic.Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough.In terms of topic division,we adopt two level clustering methods converting original document into document-level and sentence-level graph,next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering.Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents.Then model sentence clusters into graphs,finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences,higher topic relevance summary is extracted.In order to promote the development of Tibetan text summarization,and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets,this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments.The experiment results show that our method can effectively improve the quality of summarization and our method is competitive to previous unsupervised methods. 展开更多
关键词 multi-document summarization text clustering topic feature fusion graphic model
在线阅读 下载PDF
Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs
20
作者 YE Feiyue XU Xinchen 《Journal of Shanghai Jiaotong university(Science)》 EI 2018年第4期584-592,共9页
As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of do... As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordp. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordp method outperforms the state of the art systems when tested on the Duc2004 data set. Key words: multi-document, graph algorithm, keyword density, Graph & Keywordp, Due2004 展开更多
关键词 multi-document graph algorithm keyword density Graph & Keywordρ Duc2004
原文传递
上一页 1 2 250 下一页 到第
使用帮助 返回顶部