A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume...A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.展开更多
In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structu...In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process. Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.展开更多
随着深度学习技术的发展,尤其是基于Transformer的预训练模型的广泛应用,文档解析逐步从传统规则方法演进为融合文本、视觉与布局信息的多模态系统。为了对后续研究提供理论参考与技术借鉴,系统回顾了文档解析技术的演进脉络:从早期基...随着深度学习技术的发展,尤其是基于Transformer的预训练模型的广泛应用,文档解析逐步从传统规则方法演进为融合文本、视觉与布局信息的多模态系统。为了对后续研究提供理论参考与技术借鉴,系统回顾了文档解析技术的演进脉络:从早期基于规则的方法,到基于深度学习的图像预处理、版面分析及信息抽取任务,重点剖析了场景文本检测、表格理解等特定任务模型,以及布局语言模型(layout language model,LayoutLM)系列通用预训练模型和基于大语言模型的最新探索。展开更多
文档分析与识别(简称文档识别)技术将各种非结构化文档数据(图像、联机笔迹)转化为结构化数据,便于计算机处理和理解,应用场景十分广阔。20世纪60年代以来,文档识别方法研究与应用受到广泛关注并取得巨大进展。得益于深度学习技术的发...文档分析与识别(简称文档识别)技术将各种非结构化文档数据(图像、联机笔迹)转化为结构化数据,便于计算机处理和理解,应用场景十分广阔。20世纪60年代以来,文档识别方法研究与应用受到广泛关注并取得巨大进展。得益于深度学习技术的发展和应用,文档识别的性能快速提升,相关技术在文档数字化、票据处理、笔迹录入、智能交通、文档检索与信息抽取等领域得到广泛应用。首先介绍文档识别的背景和技术范畴,回顾该领域发展历史,然后重点对深度学习方法兴起以来的研究进行综述,分析当前技术存在的不足,并建议未来值得重视的研究方向。研究现状综述部分,按文档分析与识别的几个主要技术环节(文档图像预处理、版面分析、场景文本检测、文本识别、结构化符号和图形识别、文档检索与信息抽取)分别进行介绍,简述传统方法研究的代表性工作,重点介绍深度学习方法研究的新进展。总体上,当前研究对象向深度、广度扩展,处理方法全面转向深度神经网络模型和深度学习方法,识别性能大幅提升且应用场景不断扩展。在现状分析基础上,指出当前技术在识别精度和可靠性、可解释性、学习能力和自适应性等方面还有明显不足。最后从提升性能、应用扩展、提升学习能力几个角度提出一些研究方向。从提升性能角度,研究问题包括文本识别可靠性、可解释性、全要素识别、长尾问题、多语言、复杂版面分割与理解、变形文档分析与识别等。应用扩展包括新应用(如机器人流程自动化(robotic process automation,RPA)、文字信息抄录、考古)和新技术问题(语义信息抽取、跨模态融合、面向应用的推理决策等)两方面。从提升学习能力角度,相关问题包括小样本学习、迁移学习、多任务学习、领域自适应、结构化预测、弱监督学习、自监督学习、开放集学习和跨模态学习等。展开更多
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.
基金This work is supported by the National Natural Science Foundation of China under Grant No. 60472002.
文摘In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process. Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.
文摘随着深度学习技术的发展,尤其是基于Transformer的预训练模型的广泛应用,文档解析逐步从传统规则方法演进为融合文本、视觉与布局信息的多模态系统。为了对后续研究提供理论参考与技术借鉴,系统回顾了文档解析技术的演进脉络:从早期基于规则的方法,到基于深度学习的图像预处理、版面分析及信息抽取任务,重点剖析了场景文本检测、表格理解等特定任务模型,以及布局语言模型(layout language model,LayoutLM)系列通用预训练模型和基于大语言模型的最新探索。
文摘文档分析与识别(简称文档识别)技术将各种非结构化文档数据(图像、联机笔迹)转化为结构化数据,便于计算机处理和理解,应用场景十分广阔。20世纪60年代以来,文档识别方法研究与应用受到广泛关注并取得巨大进展。得益于深度学习技术的发展和应用,文档识别的性能快速提升,相关技术在文档数字化、票据处理、笔迹录入、智能交通、文档检索与信息抽取等领域得到广泛应用。首先介绍文档识别的背景和技术范畴,回顾该领域发展历史,然后重点对深度学习方法兴起以来的研究进行综述,分析当前技术存在的不足,并建议未来值得重视的研究方向。研究现状综述部分,按文档分析与识别的几个主要技术环节(文档图像预处理、版面分析、场景文本检测、文本识别、结构化符号和图形识别、文档检索与信息抽取)分别进行介绍,简述传统方法研究的代表性工作,重点介绍深度学习方法研究的新进展。总体上,当前研究对象向深度、广度扩展,处理方法全面转向深度神经网络模型和深度学习方法,识别性能大幅提升且应用场景不断扩展。在现状分析基础上,指出当前技术在识别精度和可靠性、可解释性、学习能力和自适应性等方面还有明显不足。最后从提升性能、应用扩展、提升学习能力几个角度提出一些研究方向。从提升性能角度,研究问题包括文本识别可靠性、可解释性、全要素识别、长尾问题、多语言、复杂版面分割与理解、变形文档分析与识别等。应用扩展包括新应用(如机器人流程自动化(robotic process automation,RPA)、文字信息抄录、考古)和新技术问题(语义信息抽取、跨模态融合、面向应用的推理决策等)两方面。从提升学习能力角度,相关问题包括小样本学习、迁移学习、多任务学习、领域自适应、结构化预测、弱监督学习、自监督学习、开放集学习和跨模态学习等。