期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Using LSA and text segmentation to improve automatic Chinese dialogue text summarization 被引量:3
1
作者 LIU Chuan-han WANG Yong-cheng +1 位作者 ZHENG Fei LIU De-rong 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2007年第1期79-87,共9页
Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all questio... Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified, an automatic text segmentation approach analogous to Text'filing is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some "important" sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness. 展开更多
关键词 Automatic text summarization Latent semantic analysis (LSA) text segmentation Dialogue style COHERENCE Question-answer pairs
在线阅读 下载PDF
Clustering based segmentation of text in complex color images
2
作者 毛文革 王洪滨 张田文 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2004年第4期387-394,共8页
We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation.... We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods. 展开更多
关键词 text segmentation Fuzzy competitively clustering Optimal number of clusters Foreground images
在线阅读 下载PDF
A learning-based method to detect and segment text from scene images 被引量:3
3
作者 JIANG Ren-jie QI Fei-hu +2 位作者 XU Li WU Guo-rong ZHU Kai-hua 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2007年第4期568-574,共7页
This paper proposes a learning-based method for text detection and text segmentation in natural scene images. First, the input image is decomposed into multiple connected-components (CCs) by Niblack clustering algorit... This paper proposes a learning-based method for text detection and text segmentation in natural scene images. First, the input image is decomposed into multiple connected-components (CCs) by Niblack clustering algorithm. Then all the CCs including text CCs and non-text CCs are verified on their text features by a 2-stage classification module, where most non-text CCs are discarded by an attentional cascade classifier and remaining CCs are further verified by an SVM. All the accepted CCs are output to result in text only binary image. Experiments with many images in different scenes showed satisfactory performance of our proposed method. 展开更多
关键词 text detection text segmentation text feature Attentional cascade
在线阅读 下载PDF
CTSF:An End-to-End Efficient Neural Network for Chinese Text with Skeleton Feature
4
作者 Hengyang Wang Jin Liu Haoliang Ren 《Journal on Big Data》 2021年第3期119-126,共8页
The past decade has seen the rapid development of text detection based on deep learning.However,current methods of Chinese character detection and recognition have proven to be poor.The accuracy of segmenting text box... The past decade has seen the rapid development of text detection based on deep learning.However,current methods of Chinese character detection and recognition have proven to be poor.The accuracy of segmenting text boxes in natural scenes is not impressive.The reasons for this strait can be summarized into two points:the complexity of natural scenes and numerous types of Chinese characters.In response to these problems,we proposed a lightweight neural network architecture named CTSF.It consists of two modules,one is a text detection network that combines CTPN and the image feature extraction modules of PVANet,named CDSE.The other is a literacy network based on spatial pyramid pool and fusion of Chinese character skeleton features named SPPCNN-SF,so as to realize the text detection and recognition,respectively.Our model performs much better than the original model on ICDAR2011 and ICDAR2013(achieved 85%and 88%F-measures)and enhanced the processing speed in training phase.In addition,our method achieves extremely performance on three Chinese datasets,with accuracy of 95.12%,95.56%and 96.01%. 展开更多
关键词 Deep learning convolutional neural network Chinese character detection text segmentation
在线阅读 下载PDF
Automatic character detection and segmentation in natural scene images 被引量:12
5
作者 ZHU Kai-hua QI Fei-hu +1 位作者 JIANG Ren-jie XU Li 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2007年第1期63-71,共9页
We present a robust connected-component (CC) based method for automatic detection and segmentation of text in real-scene images. This technique can be applied in robot vision, sign recognition, meeting processing and ... We present a robust connected-component (CC) based method for automatic detection and segmentation of text in real-scene images. This technique can be applied in robot vision, sign recognition, meeting processing and video indexing. First, a Non-Linear Niblack method (NLNiblack) is proposed to decompose the image into candidate CCs. Then, all these CCs are fed into a cascade of classifiers trained by Adaboost algorithm. Each classifier in the cascade responds to one feature of the CC. Proposed here are 12 novel features which are insensitive to noise, scale, text orientation and text language. The classifier cascade allows non-text CCs of the image to be rapidly discarded while more computation is spent on promising text-like CCs. The CCs passing through the cascade are considered as text components and are used to form the segmentation result. A prototype system was built, with experimental results proving the effectiveness and efficiency of the proposed method. 展开更多
关键词 text detection and segmentation ADABOOST NLNiblack decomposition method Attentional cascade
在线阅读 下载PDF
Extended Approach to Water Flow Algorithm for Text Line Segmentation
6
作者 Darko Brodi 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第1期187-194,共8页
This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as par... This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing. 展开更多
关键词 document image analysis text segmentation region growing smearing method water flow algorithm
原文传递
Segmentation of Stick Text Based on Sub Connected Area Analysis
7
作者 高静波 李新友 唐泽圣 《Journal of Computer Science & Technology》 SCIE EI CSCD 1998年第1期55-62,共8页
A new stick text segmentation method based on the sub connected area analysis is introduced in this paper. The foundation of this method is the sub connected area representation of text image that can represent all c... A new stick text segmentation method based on the sub connected area analysis is introduced in this paper. The foundation of this method is the sub connected area representation of text image that can represent all connected areas in an image efficiently. This method consists mainly of four steps: sub connected area classification, finding initial boundary following point, finding optimal segmentation point by boundary tracing, and text segmentation. This method is similar to boundary analysis method but is more efficient than boundary analysis. 展开更多
关键词 Stick text text segmentation sub connected area.
原文传递
Segmented Summarization and Refinement:A Pipeline for Long-Document Analysis on Social Media
8
作者 Guanghua Wang Priyanshi Garg Weili Wu 《Journal of Social Computing》 EI 2024年第2期132-144,共13页
Social media’s explosive growth has resulted in a massive influx of electronic documents influencing various facets of daily life.However,the enormous and complex nature of this content makes extracting valuable insi... Social media’s explosive growth has resulted in a massive influx of electronic documents influencing various facets of daily life.However,the enormous and complex nature of this content makes extracting valuable insights challenging.Long document summarization emerges as a pivotal technique in this context,serving to distill extensive texts into concise and comprehensible summaries.This paper presents a novel three-stage pipeline for effective long document summarization.The proposed approach combines unsupervised and supervised learning techniques,efficiently handling large document sets while requiring minimal computational resources.Our methodology introduces a unique process for forming semantic chunks through spectral dynamic segmentation,effectively reducing redundancy and repetitiveness in the summarization process.Contrary to previous methods,our approach aligns each semantic chunk with the entire summary paragraph,allowing the abstractive summarization model to process documents without truncation and enabling the summarization model to deduce missing information from other chunks.To enhance the summary generation,we utilize a sophisticated rewrite model based on Bidirectional and Auto-Regressive Transformers(BART),rearranging and reformulating summary constructs to improve their fluidity and coherence.Empirical studies conducted on the long documents from the Webis-TLDR-17 dataset demonstrate that our approach significantly enhances the efficiency of abstractive summarization transformers.The contributions of this paper thus offer significant advancements in the field of long document summarization,providing a novel and effective methodology for summarizing extensive texts in the context of social media. 展开更多
关键词 long document summarization abstractive summarization text segmentation text alignment rewrite model spectral embedding
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部