期刊文献+
共找到1,296篇文章
< 1 2 65 >
每页显示 20 50 100
Semantic Document Layout Analysis of Handwritten Manuscripts
1
作者 Emad Sami Jaha 《Computers, Materials & Continua》 SCIE EI 2023年第5期2805-2831,共27页
A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume... A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts. 展开更多
关键词 Semantic characteristics semantic labeling document layout analysis semantic document layout analysis handwritten manuscripts clustering RETRIEVAL image processing computer vision machine learning
在线阅读 下载PDF
Evolution and prospect of China's rural development policy: A policy text analysis of the No.1 Central Documents
2
作者 WANG Qiang 《Ecological Economy》 2018年第4期268-281,共14页
By combing 20 documents of the Central Committee on the historical evolution of rural development policies since 1982, we hold that historical evolution has undergone reforms, adjustments, modernization developments a... By combing 20 documents of the Central Committee on the historical evolution of rural development policies since 1982, we hold that historical evolution has undergone reforms, adjustments, modernization developments and new ideas, and the path of reform experienced economic recovery, industrial nurturing agriculture, agriculture modernization and rural revitalization. The study found that: farmers' income has always been the focus of attention; agricultural production has shifted from total demand to green ecology; urban and rural resource elements are not well-organized, resulting in internal contradictions. The implementation of the rural revitalization strategy is an important measure to fundamentally solve the rural development problems in the new era. 展开更多
关键词 the No.1 CENTRAL document text analysis rural development EVOLUTION PROSPECT
原文传递
Sentiment Analysis on Twitter Data Using Term Frequency-Inverse Document Frequency
3
作者 Akash Addiga Sikha Bagui 《Journal of Computer and Communications》 2022年第8期117-128,共12页
This study is an exploratory analysis of applying natural language processing techniques such as Term Frequency-Inverse Document Frequency and Sentiment Analysis on Twitter data. The uniqueness of this work is establi... This study is an exploratory analysis of applying natural language processing techniques such as Term Frequency-Inverse Document Frequency and Sentiment Analysis on Twitter data. The uniqueness of this work is established by determining the overall sentiment of a politician’s tweets based on TF-IDF values of terms used in their published tweets. By calculating the TF-IDF value of terms from the corpus, this work displays the correlation between TF-IDF score and polarity. The results of this work show that calculating the TF-IDF score of the corpus allows for a more accurate representation of the overall polarity since terms are given a weight based on their uniqueness and relevance rather than just the frequency at which they appear in the corpus. 展开更多
关键词 Sentiment analysis Twitter Data Term Frequency Inverse Term Frequency Term Frequency-Inverse document Frequency (TF-IDF) Social Media
在线阅读 下载PDF
Fusion of Type-2 Neutrosophic Similarity Measure in Signatures Verification Systems: A New Forensic Document Analysis Paradigm
4
作者 Shahlaa Mashhadani Wisal Hashim Abdulsalam +1 位作者 Oday Ali Hassen Saad M.Darwish 《Intelligent Automation & Soft Computing》 2024年第5期805-828,共24页
Signature verification involves vague situations in which a signature could resemble many reference samples ormight differ because of handwriting variances. By presenting the features and similarity score of signature... Signature verification involves vague situations in which a signature could resemble many reference samples ormight differ because of handwriting variances. By presenting the features and similarity score of signatures from thematching algorithm as fuzzy sets and capturing the degrees of membership, non-membership, and indeterminacy,a neutrosophic engine can significantly contribute to signature verification by addressing the inherent uncertaintiesand ambiguities present in signatures. But type-1 neutrosophic logic gives these membership functions fixed values,which could not adequately capture the various degrees of uncertainty in the characteristics of signatures. Type-1neutrosophic representation is also unable to adjust to various degrees of uncertainty. The proposed work exploresthe type-2 neutrosophic logic to enable additional flexibility and granularity in handling ambiguity, indeterminacy,and uncertainty, hence improving the accuracy of signature verification systems. Because type-2 neutrosophiclogic allows the assessment of many sources of ambiguity and conflicting information, decision-making is moreflexible. These experimental results show the possible benefits of using a type-2 neutrosophic engine for signatureverification by demonstrating its superior handling of uncertainty and variability over type-1, which eventuallyresults in more accurate False Rejection Rate (FRR) and False Acceptance Rate (FAR) verification results. In acomparison analysis using a benchmark dataset of handwritten signatures, the type-2 neutrosophic similaritymeasure yields a better accuracy rate of 98% than the type-1 95%. 展开更多
关键词 Type-2 neutrosophic reasoning biometric signature verification forensic document experts’ analysis
在线阅读 下载PDF
From Diaries to Digital:The Role of AI in Web-Mediated Documentary Analysis
5
作者 Laura Arosio 《Sociology Study》 2024年第5期213-227,共15页
This paper explores how artificial intelligence(AI)can support social researchers in utilizing web-mediated documents for research purposes.It extends traditional documentary analysis to include digital artifacts such... This paper explores how artificial intelligence(AI)can support social researchers in utilizing web-mediated documents for research purposes.It extends traditional documentary analysis to include digital artifacts such as blogs,forums,emails and online archives.The discussion highlights the role of AI in different stages of the research process,including question generation,sample and design definition,ethical considerations,data analysis,and results dissemination,emphasizing how AI can automate complex tasks and enhance research design.The paper also reports on practical experiences using AI tools,specifically ChatGPT-4,in conducting web-mediated documentary analysis and shares some ideas for the integration of AI in social research. 展开更多
关键词 artificial intelligence generative AI web-mediated documents documentary analysis data analysis with AI social research methodology
在线阅读 下载PDF
Mechanism of imipenem-induced mental disorder: A meta-analysis 被引量:1
6
作者 Zhou-Hong Zhan Jia-Liang Wang +4 位作者 Li-Hong Wang Nan-Nan Shen Xin-Wen Liu Yan-Na Yu Fu-Rong Liu 《World Journal of Psychiatry》 SCIE 2024年第10期1583-1591,共9页
BACKGROUND Imipenem is a highly effective carbapenem antibiotic,which is widely used in the treatment of many serious bacterial infections.At the same time,it can also cause some adverse reactions,mental abnormalities... BACKGROUND Imipenem is a highly effective carbapenem antibiotic,which is widely used in the treatment of many serious bacterial infections.At the same time,it can also cause some adverse reactions,mental abnormalities are the most concerned central nervous system adverse reactions.Different patients respond differently to imipenem,and the effect of imipenem on psychiatric disorders is unclear.Therefore,meta-analysis summarizing the results of multiple previous studies can provide stronger evidence support for clinical guidelines to guide clinical rational use of imipenem to minimize risks.After reviewing the literature published between 2003 and 2017,seven controlled trials with a total of 550 patients were included,with 273 and 277 patients in the control and experimental groups,respectively.The sample size of the study ranged from a minimum of 30 cases to a maximum of 61 cases.Patients in the experimental group were treated with imipenem while the control group was treated with conventional drugs.Meta-analysis showed that the incidence of mental disorders in the experimental group was higher than that in the control group(odds ratio=3.66,95%confidence interval:1.11-12.11,P=0.030);however,there was no significant difference in the incidence of adverse reactions between the two groups(odds ratio=0.05,95%confidence interval:0.00 to 0.10,P=0.060).Funnel diagrams showed that the scattered points of each study were symmetrical and distributed in an inverted funnel shape;therefore,there was no publication bias.CONCLUSION Imipenem can cause mental disorders in patients.However,the low quality of the included literature may have affected the final results.Therefore,it is necessary to conduct a high-quality randomized controlled study with multiple samples to further confirm the mechanism of imipenem-induced mental disorders and provide effective guidance for clinical treatment. 展开更多
关键词 Imipenem PSYCHOSIS Drug mechanism Meta analysis document resource quality
暂未订购
A Machine Learning-Based Technique with Intelligent WordNet Lemmatize for Twitter Sentiment Analysis 被引量:1
7
作者 S.Saranya G.Usha 《Intelligent Automation & Soft Computing》 SCIE 2023年第4期339-352,共14页
Laterally with the birth of the Internet,the fast growth of mobile stra-tegies has democratised content production owing to the widespread usage of social media,resulting in a detonation of short informal writings.Twi... Laterally with the birth of the Internet,the fast growth of mobile stra-tegies has democratised content production owing to the widespread usage of social media,resulting in a detonation of short informal writings.Twitter is micro-blogging short text and social networking services,with posted millions of quick messages.Twitter analysis addresses the topic of interpreting users’tweets in terms of ideas,interests,and views in a range of settings andfields.This type of study can be useful for a variation of academics and applications that need knowing people’s perspectives on a given topic or event.Although sentiment examination of these texts is useful for a variety of reasons,it is typically seen as a difficult undertaking due to the fact that these messages are frequently short,informal,loud,and rich in linguistic ambiguities such as polysemy.Furthermore,most contemporary sentiment analysis algorithms are based on clean data.In this paper,we offers a machine-learning-based sentiment analysis method that extracts features from Term Frequency and Inverse Document Frequency(TF-IDF)and needs to apply deep intelligent wordnet lemmatize to improve the excellence of tweets by removing noise.We also utilise the Random Forest network to detect the emotion of a tweet.To authenticate the proposed approach performance,we conduct extensive tests on publically accessible datasets,and thefindings reveal that the suggested technique significantly outperforms sentiment classification in multi-class emotion text data. 展开更多
关键词 Random Forest sentiment analysis social media term frequency and inverse document frequency TWITTER wordnet lemmatize
在线阅读 下载PDF
A Corpus-Based Critical Discourse Analysis of Trump and Biden Administrations’China Policies 被引量:2
8
作者 ZHI Yongbi YIN Wenjing ZHI Ran 《International Relations and Diplomacy》 2022年第4期175-189,共15页
The theory of proximization is an effective discourse strategy to study the speaker’s ability to achieve his own legitimacy or reinforce the other’s illegitimacy,and its superiority can be maximized by means of quan... The theory of proximization is an effective discourse strategy to study the speaker’s ability to achieve his own legitimacy or reinforce the other’s illegitimacy,and its superiority can be maximized by means of quantitative and comparative analysis.In this study,we collected reports on Trump’s and Biden’s policies on China to build two small corpora,with a total of 11,030 words in the Trump corpus and 17,566 words in the Biden corpus.The critical discourse analysis is combined with proximization theory.With the help of BFSU Qualitative Coder 1.2,Antconc 3.5.7,and Log-Likelihood and Chi-Square Calculator 1.0,a critical cognitive score of the relevant discourse was conducted from the perspective of proximization theory.It has been found that:(1)Both Trump and Biden administrations resort to a large number of spatial proximization strategies to build ODCs converging to IDCs with China as the ODC,posing a threat to internal physical IDCs;(2)in the use of temporal proximization strategy,both administrations use primarily modal verbs and various entities to construct ODCs that extend indefinitely into the present and future,emphasizing the urgency and the threat of the effect and reinforcing the legitimacy of their actions;(3)in terms of axiological proximization strategy,the two administrations differ greatly from each other,indicating that there are still discursive biases. 展开更多
关键词 proximization theory critical discourse analysis American policies toward China CORPUS the U.S.government documents
在线阅读 下载PDF
FUZZY METHOD FOR FAILURE CRITICALITY ANALYSIS
9
作者 黄洪钟 须雷 胡宗武 《Journal of Shanghai Jiaotong university(Science)》 EI 2000年第2期38-41,共4页
The greatest benefit is realized from failure mode, effect and criticality analysis (FMECA) when it is done early in the design phase and tracks product changes as they evolve; design changes can then be made more eco... The greatest benefit is realized from failure mode, effect and criticality analysis (FMECA) when it is done early in the design phase and tracks product changes as they evolve; design changes can then be made more economically than if the problems are discovered after the design has been completed. However, when the discovered design flaws must be prioritized for corrective actions, precise information on their probability of occurrence, the effect of the failure, and their detectability often are not availabe. To solve this problem, this paper described a new method, based on fuzzy sets, for prioritizing failures for corrective actions in a FMCEA. Its successful application to the container crane shows that the proposed method is both reasonable and practical. 展开更多
关键词 FAILURE safety CRITICALITY analysis CONTAINER CRANE fuzzy SETS document code:A
在线阅读 下载PDF
Feature Extraction of Fabric Defects Based on Complex Contourlet Transform and Principal Component Analysis 被引量:1
10
作者 吴一全 万红 叶志龙 《Journal of Donghua University(English Edition)》 EI CAS 2013年第4期282-286,共5页
To extract features of fabric defects effectively and reduce dimension of feature space,a feature extraction method of fabric defects based on complex contourlet transform (CCT) and principal component analysis (PC... To extract features of fabric defects effectively and reduce dimension of feature space,a feature extraction method of fabric defects based on complex contourlet transform (CCT) and principal component analysis (PCA) is proposed.Firstly,training samples of fabric defect images are decomposed by CCT.Secondly,PCA is applied in the obtained low-frequency component and part of highfrequency components to get a lower dimensional feature space.Finally,components of testing samples obtained by CCT are projected onto the feature space where different types of fabric defects are distinguished by the minimum Euclidean distance method.A large number of experimental results show that,compared with PCA,the method combining wavdet low-frequency component with PCA (WLPCA),the method combining contourlet transform with PCA (CPCA),and the method combining wavelet low-frequency and highfrequency components with PCA (WPCA),the proposed method can extract features of common fabric defect types effectively.The recognition rate is greatly improved while the dimension is reduced. 展开更多
关键词 fabric defects feature extraction complex contourlet transform(CCT) principal component analysis(PCA)CLC number:TP391.4 TS103.7document code:AArticle ID:1672-5220(2013)04-0282-05
在线阅读 下载PDF
Assessment of Sentiment Analysis Using Information Gain Based Feature Selection Approach
11
作者 R.Madhumathi A.Meena Kowshalya R.Shruthi 《Computer Systems Science & Engineering》 SCIE EI 2022年第11期849-860,共12页
Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is... Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers. 展开更多
关键词 Sentiment analysis sentence level document level feature level information gain
在线阅读 下载PDF
Arc-length technique for nonlinear finite element analysis 被引量:10
12
作者 MEMONBashir-Ahmed 苏小卒 《Journal of Zhejiang University Science》 EI CSCD 2004年第5期618-628,共11页
Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ... Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle, received wide acceptance in finite element analysis, and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades, with particular emphasis on nonlinear finite element analysis of reinforced concrete structures. 展开更多
关键词 Arc-length method Nonlinear analysis Finite element method Reinforced concrete Load-deflection path document code: A CLC number: TU31 Arc-length technique for nonlinear finite element analysis* MEMON Bashir-Ahmed# SU Xiao-zu (苏小卒) (Department of Structural Engineering Tongji University Shanghai 200092 China) E-mail: bashirmemon@sohu.com xiaozub@online.sh.cn Received July 30 2003 revision accepted Sept. 11 2003 Abstract: Nonlinear solution of reinforced concrete structures particularly complete load-deflection response requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle received wide acceptance in finite element analysis and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades with particular emphasis on nonlinear finite element analysis of reinforced concrete structures. Key words: Arc-length method Nonlinear analysis Finite element method Reinforced concrete Load-deflection path
在线阅读 下载PDF
An Informetric analysis of web citation in Chinese journals of Library and Information Science in recent years
13
作者 ZHANG Yang ZHANG Jie 《Chinese Journal of Library and Information Science》 2010年第3期46-62,共17页
This paper selects 998 articles as its data sources from four Chinese core journals in the field of Library and Information Science from 2003 to 2007.Some pertinent aspects of reference citations particularly from web... This paper selects 998 articles as its data sources from four Chinese core journals in the field of Library and Information Science from 2003 to 2007.Some pertinent aspects of reference citations particularly from web resources are selected for a focused analysis and discussion.This includes primarily such items as the number of web citations,web citations per each article,the distribution of domain names of web citations and also certain aspects about the institutional and/or geographical affiliations of the author.The evolving situation of utilizing online networked academic information resources in China is the central thematic discussion of this study.The writing of this paper is augmented by the explicatory presentation of 3 graphic figures,6 tables and 18 references. 展开更多
关键词 Web information resource Network document Web citation Informetrics Citation analysis Library and information science
原文传递
Analysis of Adverse Reactions in the Treatment of COVID-19 with Three Chinese Patent Medicines and Three Herbal Formulas
14
作者 Li Qiao Wang Aili +1 位作者 Wu Di Chen Yuwen 《Asian Journal of Social Pharmacy》 2023年第1期8-16,共9页
Objective To explore the rules and characteristics of the adverse drug reactions(ADRs)of three Chinese patent medicines and three herbal formulas for the treatment of COVID-19,and to provide a reference for clinical s... Objective To explore the rules and characteristics of the adverse drug reactions(ADRs)of three Chinese patent medicines and three herbal formulas for the treatment of COVID-19,and to provide a reference for clinical safe medication.Methods The cases and ADR reports of the three Chinese patent medicines and three herbal formulas in PubMed,Web of Science,Springer Link,CNKI,Wanfang and VIP database were retrieved from December 2019 to May 2021.Then we extracted and analyzed the effective information included in the literature.Results and Conclusion According to the pre-developed retrieval plan,a total of 136 documents were obtained,and a total of 6 documents met the inclusion criteria finally.553 patients used three Chinese patent medicines and three herbal formulas,and there were 133 cases of adverse reactions.The adverse reactions of patients taking the three Chinese patent medicines and three herbal formulas can all be explained under the theory of traditional Chinese medicine,and the adverse reactions can be eliminated by adding or subtracting the flavor of the medicine or stopping the medicine. 展开更多
关键词 three Chinese patent medicines and three herbal formulas adverse drug reaction document analysis
暂未订购
失效模式与效能分析在体外诊断试剂临床试验文件管理中的应用
15
作者 伍婉媚 周美辰 +3 位作者 周晓雯 李祥 李艳 钟洪兰 《现代医院》 2026年第1期76-79,共4页
目的探讨失效模式与效能分析(FMEA)应用于体外诊断试剂临床试验文件规范化管理的效果。方法以本机构2024年开展的体外诊断试剂临床试验项目为研究对象,针对试验文件保存不完整、保存不规范、书写不及时、记录不规范等问题展开评估,识别... 目的探讨失效模式与效能分析(FMEA)应用于体外诊断试剂临床试验文件规范化管理的效果。方法以本机构2024年开展的体外诊断试剂临床试验项目为研究对象,针对试验文件保存不完整、保存不规范、书写不及时、记录不规范等问题展开评估,识别失效模式,并把风险优先系数(RPN)≥35.29的3种失效模式列为高风险因素进行分析,进而制定相应的风险控制措施。结果通过应用FMEA方法,能够有效识别和控制文件管理过程中的潜在风险,显著提高了文件管理的规范性与可靠性,切实保障了临床试验数据的完整性和可追溯性。结论FMEA在体外诊断试剂临床试验文件规范化管理中具有重要作用,能为临床试验的高质量发展提供有力支撑。 展开更多
关键词 体外诊断试剂 临床试验 文件管理 失效模式与效能分析
暂未订购
结合关键字提取和图对比学习的文档版面分析
16
作者 马晓松 刘杰 +1 位作者 李晓辉 郭颖 《小型微型计算机系统》 北大核心 2026年第1期150-156,共7页
文档版面分析是信息检索和文档理解领域的重要任务和必要前提.传统的文档版面分析方法往往忽略了文本内容与结构之间的深度关联.本文提出了基于图神经网络结合大语言模型和图对比学习的方法,以提高文档版面分析的精确度.首先,通过大语... 文档版面分析是信息检索和文档理解领域的重要任务和必要前提.传统的文档版面分析方法往往忽略了文本内容与结构之间的深度关联.本文提出了基于图神经网络结合大语言模型和图对比学习的方法,以提高文档版面分析的精确度.首先,通过大语言模型自动提取关键字并融合到图节点中,增强了图神经网络对文档内容与结构的理解.其次,采用图对比学习,通过视图间对比损失优化节点表示,使模型更有效地区分文档布局模式.实验结果表明,在DocLayNet数据集上的测试中,该方法显著提升了文档版面分析的准确率,优于现有的基准方法.本文的方法为文档理解与信息提取领域提供了一种新的技术路径,有望在更多实际应用中得到广泛应用. 展开更多
关键词 图神经网络 大模型 多模态 图对比学习 文档版面分析
在线阅读 下载PDF
浅析设备文件归档的问题与对策
17
作者 陈秋霞 《办公自动化》 2026年第1期71-73,共3页
在当今全球化与数字化快速发展的时代背景下,工程项目的规模不断拓展、复杂度日益剧增,设备文件管理作为工程档案管理领域的核心要素,对于保障工程项目的科学推进与持续稳定运行具有至关重要的意义。文章着重剖析设备文件管理的现存问... 在当今全球化与数字化快速发展的时代背景下,工程项目的规模不断拓展、复杂度日益剧增,设备文件管理作为工程档案管理领域的核心要素,对于保障工程项目的科学推进与持续稳定运行具有至关重要的意义。文章着重剖析设备文件管理的现存问题及其根源,并结合国企大型基建项目的实际案例给出不少优化改进的有效措施。这些举措的目的在于提高设备文件的管理质量,保障其能够切实为现场生产以及设备维护工作提供强有力的支持。 展开更多
关键词 设备文件管理 档案归档 利用率 问题分析 改进措施
在线阅读 下载PDF
Automatically Constructing an Effective Domain Ontology for Document Classification 被引量:2
18
作者 Yi-Hsing Chang 《Computer Technology and Application》 2011年第3期182-189,共8页
An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the... An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification. 展开更多
关键词 Naive bayes classifier ONTOLOGY formal concept analysis document classification.
在线阅读 下载PDF
Text Extraction in Complex Color Document Images for Enhanced Readability
19
作者 P. Nagabhushan S. Nirmala 《Intelligent Information Management》 2010年第2期120-133,共14页
Often we encounter documents with text printed on complex color background. Readability of textual contents in such documents is very poor due to complexity of the background and mix up of color(s) of foreground text ... Often we encounter documents with text printed on complex color background. Readability of textual contents in such documents is very poor due to complexity of the background and mix up of color(s) of foreground text with colors of background. Automatic segmentation of foreground text in such document images is very much essential for smooth reading of the document contents either by human or by machine. In this paper we propose a novel approach to extract the foreground text in color document images having complex background. The proposed approach is a hybrid approach which combines connected component and texture feature analysis of potential text regions. The proposed approach utilizes Canny edge detector to detect all possible text edge pixels. Connected component analysis is performed on these edge pixels to identify candidate text regions. Because of background complexity it is also possible that a non-text region may be identified as a text region. This problem is overcome by analyzing the texture features of potential text region corresponding to each connected component. An unsupervised local thresholding is devised to perform foreground segmentation in detected text regions. Finally the text regions which are noisy are identified and reprocessed to further enhance the quality of retrieved foreground. The proposed approach can handle document images with varying background of multiple colors and texture;and foreground text in any color, font, size and orientation. Experimental results show that the proposed algorithm detects on an average 97.12% of text regions in the source document. Readability of the extracted foreground text is illustrated through Optical character recognition (OCR) in case the text is in English. The proposed approach is compared with some existing methods of foreground separation in document images. Experimental results show that our approach performs better. 展开更多
关键词 Color document Image COMPLEX Background Connected Component analysis Segmentation of Text Texture analysis UNSUPERVISED THRESHOLDING OCR
在线阅读 下载PDF
Pre-training transformer with dual-branch context content module for table detection in document images
20
作者 Yongzhi LI Pengle ZHANG +2 位作者 Meng SUN Jin HUANG Ruhan HE 《虚拟现实与智能硬件(中英文)》 EI 2024年第5期408-420,共13页
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such... Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM. 展开更多
关键词 Table detection document image analysis TRANSFORMER Dilated convolution Deformable convolution Feature fusion
在线阅读 下载PDF
上一页 1 2 65 下一页 到第
使用帮助 返回顶部