Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of te...Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.展开更多
Complex structured documents can be intentionally represented as a tree structure decorated with attributes. Ignoring attributes (these are related to semantic aspects that can be treated separately from purely struct...Complex structured documents can be intentionally represented as a tree structure decorated with attributes. Ignoring attributes (these are related to semantic aspects that can be treated separately from purely structural aspects which interest us here), in the context of a cooperative edition, legal structures are characterized by a document model (an abstract grammar) and each intentional representation can be manipulated independently and eventually asynchronously by several co-authors through various editing tools that operate on its “partial replicas”. For unsynchronized edition of a partial replica, considered co-author must have a syntactic document local model that constraints him to ensure minimum consistency of local representation that handles with respect to the global model. This consistency is synonymous with the existence of one or more (global) intentional representations towards the global model, assuming the current local representation as her/their partial replica. The purpose of this paper is to present the grammatical structures which are grammars that permit not only to specify a (global) model for documents published in a cooperative manner, but also to derive automatically via a so call projection operation, consistent (local) models for each co-authors involved in the cooperative edition. We also show some properties that meet these grammatical structures.展开更多
基金supported by the Innovation Platform Construction of Qinghai Province(No.2016-ZJ-Y04)the Basic Research Program of Qinghai Province(No.2016-ZJ-740)
文摘Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.
文摘Complex structured documents can be intentionally represented as a tree structure decorated with attributes. Ignoring attributes (these are related to semantic aspects that can be treated separately from purely structural aspects which interest us here), in the context of a cooperative edition, legal structures are characterized by a document model (an abstract grammar) and each intentional representation can be manipulated independently and eventually asynchronously by several co-authors through various editing tools that operate on its “partial replicas”. For unsynchronized edition of a partial replica, considered co-author must have a syntactic document local model that constraints him to ensure minimum consistency of local representation that handles with respect to the global model. This consistency is synonymous with the existence of one or more (global) intentional representations towards the global model, assuming the current local representation as her/their partial replica. The purpose of this paper is to present the grammatical structures which are grammars that permit not only to specify a (global) model for documents published in a cooperative manner, but also to derive automatically via a so call projection operation, consistent (local) models for each co-authors involved in the cooperative edition. We also show some properties that meet these grammatical structures.