Complex structured documents can be intentionally represented as a tree structure decorated with attributes. Ignoring attributes (these are related to semantic aspects that can be treated separately from purely struct...Complex structured documents can be intentionally represented as a tree structure decorated with attributes. Ignoring attributes (these are related to semantic aspects that can be treated separately from purely structural aspects which interest us here), in the context of a cooperative edition, legal structures are characterized by a document model (an abstract grammar) and each intentional representation can be manipulated independently and eventually asynchronously by several co-authors through various editing tools that operate on its “partial replicas”. For unsynchronized edition of a partial replica, considered co-author must have a syntactic document local model that constraints him to ensure minimum consistency of local representation that handles with respect to the global model. This consistency is synonymous with the existence of one or more (global) intentional representations towards the global model, assuming the current local representation as her/their partial replica. The purpose of this paper is to present the grammatical structures which are grammars that permit not only to specify a (global) model for documents published in a cooperative manner, but also to derive automatically via a so call projection operation, consistent (local) models for each co-authors involved in the cooperative edition. We also show some properties that meet these grammatical structures.展开更多
The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same ...The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same period, workflow management has experienced significant development, but has become more focused on the industry. However, it seems to us that document workflows have not had the same interest for the scientific community. But nowadays, the emergence and supremacy of the Internet in electronic exchanges are leading to a massive dematerialization of documents;which requires a conceptual reconsideration of the organizational framework for the processing of said documents in both public and private administrations. This problem seems open to us and deserves the interest of the scientific community. Indeed, EDM has mainly focused on the storage (referencing) and circulation of documents (traceability). It paid little attention to the overall behavior of the system in processing documents. The purpose of our researches is to model document processing systems. In the previous works, we proposed a general model and its specialization in the case of small documents (any document processed by a single person at a time during its processing life cycle), which represent 70% of documents processed by administrations, according to our study. In this contribution, we extend the model for processing small documents to the case where they are managed in a system comprising document classes organized in subclasses;which is the case for most administrations. We have thus observed that this model is a Markovian <i>M<sup>L×K</sup>/M<sup>L×K</sup>/</i>1 queues network. We have analyzed the constraints of this model and deduced certain characteristics and metrics. <span style="white-space:normal;"><i></i></span><i>In fine<span style="white-space:normal;"></span></i>, the ultimate objective of our work is to design a document workflow management system, integrating a component of global behavior prediction.展开更多
This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prom...This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prompt + one-shot learning proposed in this article works well. Meanwhile, our research findings indicate that although timeline-based storyline extraction has shown promising prospects in the practical applications of LLMs, it is still a complex natural language processing task that requires further research.展开更多
The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is av...The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be ally linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.展开更多
Short residence time of the sorbent in the gas stream and formation of a dense layer of reaction product surrounding its surface influence the sulfur removal efficiency. A practical means of improving the process perf...Short residence time of the sorbent in the gas stream and formation of a dense layer of reaction product surrounding its surface influence the sulfur removal efficiency. A practical means of improving the process performance is to employ fluidized bed reaction in replacement of entrained bed reaction on normally used in cool side desulfurizaiton. This paper describes cold modeling study of a circulating fluidized bed reactor. Several aspects of the problem are discussed: fluidization behavior of CaO, attrition of the sorbent and solids entrainment from the fluidized bed. Mechanisms and key controlling parameters are identified, and an integral model based on rate of attrition and mass balance is developed for predicting steady state mass flows and particle size distributions of the system. A process flow scheme is finally presented for conducting desulfurization tests in the second stage of the study.展开更多
Digital medicine is a new concept in medical field, and the need for digital human body is increasing these years. This paper used Free Form Deformation (FFD) to model the motion of human leg. It presented the motion ...Digital medicine is a new concept in medical field, and the need for digital human body is increasing these years. This paper used Free Form Deformation (FFD) to model the motion of human leg. It presented the motion equations of knee joint on the basis of anatomic structure and motion characters, then transmitted the deformation to the mesh of leg through a simplified FFD that only used two order B spline basis function. The experiments prove that this method can simulate the bend of leg and the deformation of muscles fairly well. Compared with the method of curved patches, this method is more convenient and effective. Furthermore, those equations can be easily applied to other joint models of human body.展开更多
Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed...Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate.展开更多
Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The...Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories.展开更多
A numerical model has been developed to simulate the transport and fate of oil spilled at sea. The model combines the transport and fate processes of spilled oil with the random walk technique. Oil movement under th...A numerical model has been developed to simulate the transport and fate of oil spilled at sea. The model combines the transport and fate processes of spilled oil with the random walk technique. Oil movement under the influence of tidal currents, wind driven currents, and turbulent eddies is simulated by the PLUME RW dispersion model developed by HR Wallingford. The weathering processes in the model represent physical and chemical changes of soil slicks with time, and comprise mechanical spreading, dispersion, evaporation and emulsification. Shoreline stranding is determined approximately using a capacity method for different shoreline types. This paper presents details of the model, and describe the results of various sensitivity tests. The model is suitable for oil spill contingency planning.展开更多
To efficiently retrieve relevant document from the rapid proliferation of large information collections, a novel immune algorithm for document query optimization is proposed. The essential ideal of the immune algorith...To efficiently retrieve relevant document from the rapid proliferation of large information collections, a novel immune algorithm for document query optimization is proposed. The essential ideal of the immune algorithm is that the crossover and mutation of operator are constructed according to its own characteristics of information retrieval. Immune operator is adopted to avoid degeneracy. Relevant documents retrieved are merged to a single document list according to rank formula. Experimental results show that the novel immune algorithm can lead to substantial improvements of relevant document retrieval effectiveness.展开更多
This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which ...This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which an algorithm recognises the table portion of the documents. Hidden Markov Model (HMM) is then applied to the HTML code in order to extract the tables. The model was trained and tested with five hundred and twenty six self-generated tables (three hundred and twenty-one (321) tables for training and two hundred and five (205) tables for testing). Viterbi algorithm was implemented for the testing part. The system was evaluated in terms of accuracy, precision, recall and f-measure. The overall evaluation results show 88.8% accuracy, 96.8% precision, 91.7% recall and 88.8% F-measure revealing that the method is good at solving the problem of table extraction.展开更多
A new scheme for femur shape recovery from volumetric images using deformable models was proposed. First, prior 3 D deformable femur models are created as templates using point distribution models technology. Second, ...A new scheme for femur shape recovery from volumetric images using deformable models was proposed. First, prior 3 D deformable femur models are created as templates using point distribution models technology. Second, active contour models are employed to segment the magnetic resonance imaging (MRI) volumetric images of the tibial and femoral joints and the deformable models are initialized based on the segmentation results. Finally, the objective function is minimized to give the optimal results constraining the surface of shapes.展开更多
文摘Complex structured documents can be intentionally represented as a tree structure decorated with attributes. Ignoring attributes (these are related to semantic aspects that can be treated separately from purely structural aspects which interest us here), in the context of a cooperative edition, legal structures are characterized by a document model (an abstract grammar) and each intentional representation can be manipulated independently and eventually asynchronously by several co-authors through various editing tools that operate on its “partial replicas”. For unsynchronized edition of a partial replica, considered co-author must have a syntactic document local model that constraints him to ensure minimum consistency of local representation that handles with respect to the global model. This consistency is synonymous with the existence of one or more (global) intentional representations towards the global model, assuming the current local representation as her/their partial replica. The purpose of this paper is to present the grammatical structures which are grammars that permit not only to specify a (global) model for documents published in a cooperative manner, but also to derive automatically via a so call projection operation, consistent (local) models for each co-authors involved in the cooperative edition. We also show some properties that meet these grammatical structures.
文摘The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same period, workflow management has experienced significant development, but has become more focused on the industry. However, it seems to us that document workflows have not had the same interest for the scientific community. But nowadays, the emergence and supremacy of the Internet in electronic exchanges are leading to a massive dematerialization of documents;which requires a conceptual reconsideration of the organizational framework for the processing of said documents in both public and private administrations. This problem seems open to us and deserves the interest of the scientific community. Indeed, EDM has mainly focused on the storage (referencing) and circulation of documents (traceability). It paid little attention to the overall behavior of the system in processing documents. The purpose of our researches is to model document processing systems. In the previous works, we proposed a general model and its specialization in the case of small documents (any document processed by a single person at a time during its processing life cycle), which represent 70% of documents processed by administrations, according to our study. In this contribution, we extend the model for processing small documents to the case where they are managed in a system comprising document classes organized in subclasses;which is the case for most administrations. We have thus observed that this model is a Markovian <i>M<sup>L×K</sup>/M<sup>L×K</sup>/</i>1 queues network. We have analyzed the constraints of this model and deduced certain characteristics and metrics. <span style="white-space:normal;"><i></i></span><i>In fine<span style="white-space:normal;"></span></i>, the ultimate objective of our work is to design a document workflow management system, integrating a component of global behavior prediction.
文摘This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prompt + one-shot learning proposed in this article works well. Meanwhile, our research findings indicate that although timeline-based storyline extraction has shown promising prospects in the practical applications of LLMs, it is still a complex natural language processing task that requires further research.
基金Supported by the Funds of Heilongjiang Outstanding Young Teacher (1151G037).
文摘The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be ally linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.
文摘Short residence time of the sorbent in the gas stream and formation of a dense layer of reaction product surrounding its surface influence the sulfur removal efficiency. A practical means of improving the process performance is to employ fluidized bed reaction in replacement of entrained bed reaction on normally used in cool side desulfurizaiton. This paper describes cold modeling study of a circulating fluidized bed reactor. Several aspects of the problem are discussed: fluidization behavior of CaO, attrition of the sorbent and solids entrainment from the fluidized bed. Mechanisms and key controlling parameters are identified, and an integral model based on rate of attrition and mass balance is developed for predicting steady state mass flows and particle size distributions of the system. A process flow scheme is finally presented for conducting desulfurization tests in the second stage of the study.
文摘Digital medicine is a new concept in medical field, and the need for digital human body is increasing these years. This paper used Free Form Deformation (FFD) to model the motion of human leg. It presented the motion equations of knee joint on the basis of anatomic structure and motion characters, then transmitted the deformation to the mesh of leg through a simplified FFD that only used two order B spline basis function. The experiments prove that this method can simulate the bend of leg and the deformation of muscles fairly well. Compared with the method of curved patches, this method is more convenient and effective. Furthermore, those equations can be easily applied to other joint models of human body.
文摘Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate.
文摘Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories.
文摘A numerical model has been developed to simulate the transport and fate of oil spilled at sea. The model combines the transport and fate processes of spilled oil with the random walk technique. Oil movement under the influence of tidal currents, wind driven currents, and turbulent eddies is simulated by the PLUME RW dispersion model developed by HR Wallingford. The weathering processes in the model represent physical and chemical changes of soil slicks with time, and comprise mechanical spreading, dispersion, evaporation and emulsification. Shoreline stranding is determined approximately using a capacity method for different shoreline types. This paper presents details of the model, and describe the results of various sensitivity tests. The model is suitable for oil spill contingency planning.
基金TheNationalHigh TechDevelopment 863ProgramofChina (No .2 0 0 3AA1Z2 610 )
文摘To efficiently retrieve relevant document from the rapid proliferation of large information collections, a novel immune algorithm for document query optimization is proposed. The essential ideal of the immune algorithm is that the crossover and mutation of operator are constructed according to its own characteristics of information retrieval. Immune operator is adopted to avoid degeneracy. Relevant documents retrieved are merged to a single document list according to rank formula. Experimental results show that the novel immune algorithm can lead to substantial improvements of relevant document retrieval effectiveness.
文摘This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which an algorithm recognises the table portion of the documents. Hidden Markov Model (HMM) is then applied to the HTML code in order to extract the tables. The model was trained and tested with five hundred and twenty six self-generated tables (three hundred and twenty-one (321) tables for training and two hundred and five (205) tables for testing). Viterbi algorithm was implemented for the testing part. The system was evaluated in terms of accuracy, precision, recall and f-measure. The overall evaluation results show 88.8% accuracy, 96.8% precision, 91.7% recall and 88.8% F-measure revealing that the method is good at solving the problem of table extraction.
基金National Natural Science Foundation ofChina!( 69772 0 0 2 )
文摘A new scheme for femur shape recovery from volumetric images using deformable models was proposed. First, prior 3 D deformable femur models are created as templates using point distribution models technology. Second, active contour models are employed to segment the magnetic resonance imaging (MRI) volumetric images of the tibial and femoral joints and the deformable models are initialized based on the segmentation results. Finally, the objective function is minimized to give the optimal results constraining the surface of shapes.