In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shippi...In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.展开更多
With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers ...With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers has become particularly urgent.Currently,detection methods based on document structure and behavioral features encounter challenges in feature engineering,these methods not only have limited accuracy,but also consume large resources,and usually can only detect documents in specific formats,which lacks versatility and adaptability.To address such problems,this paper proposes a novel malicious document detection method-visualizing documents as GGE images(Grayscale,Grayscale matrix,Entropy).The GGE method visualizes the original byte sequence of the malicious document as a grayscale image,the information entropy sequence of the document as an entropy image,and at the same time,the grayscale level co-occurrence matrix and the texture and spatial information stored in it are converted into grayscale matrix image,and fuses the three types of images to get the GGE color image.The Convolutional Block Attention Module-EfficientNet-B0(CBAM-EfficientNet-B0)model is then used for classification,combining transfer learning and applying the pre-trained model on the ImageNet dataset to the feature extraction process of GGE images.As shown in the experimental results,the GGE method has superior performance compared with other methods,which is suitable for detecting malicious documents in different formats,and achieves an accuracy of 99.44%and 97.39%on Portable Document Format(PDF)and office datasets,respectively,and consumes less time during the detection process,which can be effectively applied to the task of detecting malicious documents in real-time.展开更多
This video series is the first experimental psychology documentary made in China.It focuses on analyzing professional theories to raise people’s general understanding of basic psychology.By combining innovative audio...This video series is the first experimental psychology documentary made in China.It focuses on analyzing professional theories to raise people’s general understanding of basic psychology.By combining innovative audiovisual narrative with psychological experiments,it zooms in on real human nature through discussing social hotspots from the perspectives of social psychology,cognitive psychology,and personality psychology,in order to help people find answers for their current psychological difficulties.展开更多
This critical review looks at the assessment of the application of artificial intelligence in handling legal documents with specific reference to medical negligence cases with a view of identifying its transformative ...This critical review looks at the assessment of the application of artificial intelligence in handling legal documents with specific reference to medical negligence cases with a view of identifying its transformative potentialities, issues and ethical concerns. The review consolidates findings that show the impact of AI in improving the efficiency, accuracy and justice delivery in the legal profession. The studies show increased efficiency in speed of document review and enhancement of the accuracy of the reviewed documents, with time efficiency estimates of 60% reduction of time. However, the review also outlines some of the problems that continue to characterize AI, such as data quality problems, biased algorithms and the problem of the opaque decision-making system. This paper assesses ethical issues related to patient autonomy, justice and non-malignant suffering, with particular focus on patient privacy and fair process, and on potential unfairness to patients. This paper’s review of AI innovations finds that regulations lag behind AI developments, leading to unsettled issues regarding legal responsibility for AI and user control over AI-generated results and findings in legal proceedings. Some of the future avenues that are presented in the study are the future of XAI for legal purposes, utilizing federated learning for resolving privacy issues, and the need to foster adaptive regulation. Finally, the review advocates for Legal Subject Matter Experts to collaborate with legal informatics experts, ethicists, and policy makers to develop the best solutions to implement AI in medical negligence claims. It reasons that there is great potential for AI to have a deep impact on the practice of law but when done, it must do so in a way that respects justice and on the Rights of Individuals.展开更多
What Are You Up To Today?Chief Director:Wu Zijuan Length:12 Episodes Producer:bilibili Broadcasting Platform:bilibili Produced by China’s YouTube-like video sharing platform bilibili,the film is a series of short doc...What Are You Up To Today?Chief Director:Wu Zijuan Length:12 Episodes Producer:bilibili Broadcasting Platform:bilibili Produced by China’s YouTube-like video sharing platform bilibili,the film is a series of short documentaries presenting people's daily life in different jobs.It follows 12 individuals in their respective jobs and trades that keep society functioning.By focusing on their daily lives,the documentary films capture the hustle and bustle of the days that make up a hopeful life.展开更多
In nursing practice,electronic nursing records(ENRs)are an important component of patient care documents,but they also significantly increase administrative burdens.With the development of artificial intelligence tech...In nursing practice,electronic nursing records(ENRs)are an important component of patient care documents,but they also significantly increase administrative burdens.With the development of artificial intelligence technology,it has become possible to use large text models to assist in generating nursing documents.This article explores the application of generative AI in nursing documentation.Research has shown that the application of generative AI in nursing documents demonstrates significant potential,but also faces challenges in terms of quality and implementation.In terms of efficiency,AI assisted document tools can significantly reduce the administrative burden on nurses by reallocating time to direct patient care.Studies have shown that they can reduce document time by 21-30%.However,there are variables in the quality of AI generated records,and the content is often described as'textbook style',lacking patient specific details and appropriate medical terminology.Successful implementation relies on a specialized framework that includes strong stakeholder engagement and adaptation to nursing specific workflows and regulatory standards.The conclusion points out that current AI systems are most suitable for assisting in drafting nursing documents,and clinical validation remains crucial for patient safety and document integrity.展开更多
This paper highlights the critical role of medical device design and development documents within the quality system,including their compliance with regulatory standards,their function as a traceable record,their supp...This paper highlights the critical role of medical device design and development documents within the quality system,including their compliance with regulatory standards,their function as a traceable record,their support for all stages,and their use in risk and change management.It also covers document template creation,review record association,information management,adverse event traceability,and the reconciliation of differences in international declarations.展开更多
In the process of building a new power system dominated by new energy sources,power storage is a key supporting technology that ensures the safe and stable operation of the power grid,enables the flexible regulation o...In the process of building a new power system dominated by new energy sources,power storage is a key supporting technology that ensures the safe and stable operation of the power grid,enables the flexible regulation of the system,and raises the level of new energy consumption.It is also key to achieving carbon peak and neutrality as well as energy transformation.展开更多
In this paper,the research achievements and progress of Yunnan tea germplasm resource in past sixty years are systematically reviewed from the following aspects:exploration,collecting,conservation,protection,identifi...In this paper,the research achievements and progress of Yunnan tea germplasm resource in past sixty years are systematically reviewed from the following aspects:exploration,collecting,conservation,protection,identification,evaluation and shared utilization.Simultaneously,the current problems and the suggestions about subsequent development of tea germplasm resources in Yunnan were discussed,including superior and rare germplasm collection,tea genetic diversity research,biotechnology utilization in tea germplasm innovation,super gene exploration and function,the construction of utilization platform,biological base of species and population conservation.展开更多
A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and...A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding. In the Rough-CC4, the documents are described by the equivalent classes of the approximate words. By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words. In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded. By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced. The Rough-CC4 can be used in automatic classification of documents.展开更多
The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is av...The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be ally linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.展开更多
Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholdi...Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholding techniques that use fixed thresholds and windows,the proposed method introduces a concept for obtaining dynamic windows according to the image content to achieve better binarization.To enhance a low-contrast image,we proposed a new mean histogram stretching method for suppressing noisy pixels in the background and,simultaneously,increasing pixel contrast at edges or near edges,which results in an enhanced image.For the enhanced image,we propose a new method for deriving adaptive local thresholds for dynamic windows.The dynamic window is derived by exploiting the advantage of Otsu thresholding.To assess the performance of the proposed method,we have used standard databases,namely,document image binarization contest(DIBCO),for experimentation.The comparative study on well-known existing methods indicates that the proposed method outperforms the existing methods in terms of quality and recognition rate.展开更多
The eXtensible markup language (XML) is a kind of new meta language for replacing HTML and has many advantages. Traditional engineering documents have too many expression forms to be expediently managed and have no dy...The eXtensible markup language (XML) is a kind of new meta language for replacing HTML and has many advantages. Traditional engineering documents have too many expression forms to be expediently managed and have no dynamic correlation functions. This paper introduces a new method and uses XML to store and manage engineering documents to realize the format unity of engineering documents and their dynamic correlations.展开更多
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo...Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches.展开更多
Engineers often need to look for the right pieces of information by sifting through long engineering documents, It is a very tiring and time-consuming job. To address this issue, researchers are increasingly devoting ...Engineers often need to look for the right pieces of information by sifting through long engineering documents, It is a very tiring and time-consuming job. To address this issue, researchers are increasingly devoting their attention to new ways to help information users, including engineers, to access and retrieve document content. The research reported in this paper explores how to use the key technologies of document decomposition (study of document structure), document mark-up (with EXtensible Mark- up Language (XML), HyperText Mark-up Language (HTML), and Scalable Vector Graphics (SVG)), and a facetted classification mechanism. Document content extraction is implemented via computer programming (with Java). An Engineering Document Content Management System (EDCMS) developed in this research demonstrates that as information providers we can make document content in a more accessible manner for information users including engineers.The main features of the EDCMS system are: 1) EDCMS is a system that enables users, especially engineers, to access and retrieve information at content rather than document level. In other words, it provides the right pieces of information that answer specific questions so that engineers don't need to waste time sifting through the whole document to obtain the required piece of information. 2) Users can use the EDCMS via both the data and metadata of a document to access engineering document content. 3) Users can use the EDCMS to access and retrieve content objects, i.e. text, images and graphics (including engineering drawings) via multiple views and at different granularities based on decomposition schemes. Experiments with the EDCMS have been conducted on semi-structured documents, a textbook of CADCAM, and a set of project posters in the Engineering Design domain. Experimental results show that the system provides information users with a powerful solution to access document content.展开更多
A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume...A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.展开更多
How can choreography and physical theatre pieces continue to perpetuate the work after rendering? How to preserve their aura, their dynamics, and their ephemeral and genuine nature, as Walter Benjamin said? In 1936,...How can choreography and physical theatre pieces continue to perpetuate the work after rendering? How to preserve their aura, their dynamics, and their ephemeral and genuine nature, as Walter Benjamin said? In 1936, Benjamin already anticipated in The Work of Art in the Age of lts Technological Reproducibility that something is missing even in the best-finished reproduction. And memories of dance and physical theatre are intricate. The question is how to create a type of documentation that does not betray the vital flow of the event-based phenomenon. In this short article we will see a series of choreographic and performance artists like Esther Ferrer, Ayara Hern^indez Holz, and Olga de Soto who claimed a new form of organic documentation, making it turn performance or memory of viewers. Other creators as the company La Fura dels Baus claim documentation as spectacle and others on the opposite side, as Tino Sehgal propose radically non documentation of their work. Precisely, these different positions coincide with those of thinkers like Peggy Phelan, Sarah Bay-Cheng, or Paula Caspao who respect to a range of documentation and how it can never replace the live art.展开更多
文摘In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.
基金supported by the Natural Science Foundation of Henan Province(Grant No.242300420297)awarded to Yi Sun.
文摘With the development of anti-virus technology,malicious documents have gradually become the main pathway of Advanced Persistent Threat(APT)attacks,therefore,the development of effective malicious document classifiers has become particularly urgent.Currently,detection methods based on document structure and behavioral features encounter challenges in feature engineering,these methods not only have limited accuracy,but also consume large resources,and usually can only detect documents in specific formats,which lacks versatility and adaptability.To address such problems,this paper proposes a novel malicious document detection method-visualizing documents as GGE images(Grayscale,Grayscale matrix,Entropy).The GGE method visualizes the original byte sequence of the malicious document as a grayscale image,the information entropy sequence of the document as an entropy image,and at the same time,the grayscale level co-occurrence matrix and the texture and spatial information stored in it are converted into grayscale matrix image,and fuses the three types of images to get the GGE color image.The Convolutional Block Attention Module-EfficientNet-B0(CBAM-EfficientNet-B0)model is then used for classification,combining transfer learning and applying the pre-trained model on the ImageNet dataset to the feature extraction process of GGE images.As shown in the experimental results,the GGE method has superior performance compared with other methods,which is suitable for detecting malicious documents in different formats,and achieves an accuracy of 99.44%and 97.39%on Portable Document Format(PDF)and office datasets,respectively,and consumes less time during the detection process,which can be effectively applied to the task of detecting malicious documents in real-time.
文摘This video series is the first experimental psychology documentary made in China.It focuses on analyzing professional theories to raise people’s general understanding of basic psychology.By combining innovative audiovisual narrative with psychological experiments,it zooms in on real human nature through discussing social hotspots from the perspectives of social psychology,cognitive psychology,and personality psychology,in order to help people find answers for their current psychological difficulties.
文摘This critical review looks at the assessment of the application of artificial intelligence in handling legal documents with specific reference to medical negligence cases with a view of identifying its transformative potentialities, issues and ethical concerns. The review consolidates findings that show the impact of AI in improving the efficiency, accuracy and justice delivery in the legal profession. The studies show increased efficiency in speed of document review and enhancement of the accuracy of the reviewed documents, with time efficiency estimates of 60% reduction of time. However, the review also outlines some of the problems that continue to characterize AI, such as data quality problems, biased algorithms and the problem of the opaque decision-making system. This paper assesses ethical issues related to patient autonomy, justice and non-malignant suffering, with particular focus on patient privacy and fair process, and on potential unfairness to patients. This paper’s review of AI innovations finds that regulations lag behind AI developments, leading to unsettled issues regarding legal responsibility for AI and user control over AI-generated results and findings in legal proceedings. Some of the future avenues that are presented in the study are the future of XAI for legal purposes, utilizing federated learning for resolving privacy issues, and the need to foster adaptive regulation. Finally, the review advocates for Legal Subject Matter Experts to collaborate with legal informatics experts, ethicists, and policy makers to develop the best solutions to implement AI in medical negligence claims. It reasons that there is great potential for AI to have a deep impact on the practice of law but when done, it must do so in a way that respects justice and on the Rights of Individuals.
文摘What Are You Up To Today?Chief Director:Wu Zijuan Length:12 Episodes Producer:bilibili Broadcasting Platform:bilibili Produced by China’s YouTube-like video sharing platform bilibili,the film is a series of short documentaries presenting people's daily life in different jobs.It follows 12 individuals in their respective jobs and trades that keep society functioning.By focusing on their daily lives,the documentary films capture the hustle and bustle of the days that make up a hopeful life.
基金supported by the Tightly Integrated Health Consortium Research Project(Grant ynlglht202412)。
文摘In nursing practice,electronic nursing records(ENRs)are an important component of patient care documents,but they also significantly increase administrative burdens.With the development of artificial intelligence technology,it has become possible to use large text models to assist in generating nursing documents.This article explores the application of generative AI in nursing documentation.Research has shown that the application of generative AI in nursing documents demonstrates significant potential,but also faces challenges in terms of quality and implementation.In terms of efficiency,AI assisted document tools can significantly reduce the administrative burden on nurses by reallocating time to direct patient care.Studies have shown that they can reduce document time by 21-30%.However,there are variables in the quality of AI generated records,and the content is often described as'textbook style',lacking patient specific details and appropriate medical terminology.Successful implementation relies on a specialized framework that includes strong stakeholder engagement and adaptation to nursing specific workflows and regulatory standards.The conclusion points out that current AI systems are most suitable for assisting in drafting nursing documents,and clinical validation remains crucial for patient safety and document integrity.
文摘This paper highlights the critical role of medical device design and development documents within the quality system,including their compliance with regulatory standards,their function as a traceable record,their support for all stages,and their use in risk and change management.It also covers document template creation,review record association,information management,adverse event traceability,and the reconciliation of differences in international declarations.
文摘In the process of building a new power system dominated by new energy sources,power storage is a key supporting technology that ensures the safe and stable operation of the power grid,enables the flexible regulation of the system,and raises the level of new energy consumption.It is also key to achieving carbon peak and neutrality as well as energy transformation.
基金Supported by Project of National Natural Science Foundation of China (31160175)Project of Tea Research Institute of Yunnan Academy of Agricultural Sciences (2009A0937)National Modern Agriculture Technology System Projects in Tea Industry (nycytx-23)~~
文摘In this paper,the research achievements and progress of Yunnan tea germplasm resource in past sixty years are systematically reviewed from the following aspects:exploration,collecting,conservation,protection,identification,evaluation and shared utilization.Simultaneously,the current problems and the suggestions about subsequent development of tea germplasm resources in Yunnan were discussed,including superior and rare germplasm collection,tea genetic diversity research,biotechnology utilization in tea germplasm innovation,super gene exploration and function,the construction of utilization platform,biological base of species and population conservation.
基金The National Natural Science Foundation of China(No.60503020,60373066,60403016,60425206),the Natural Science Foundation of Jiangsu Higher Education Institutions ( No.04KJB520096),the Doctoral Foundation of Nanjing University of Posts and Telecommunication (No.0302).
文摘A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding. In the Rough-CC4, the documents are described by the equivalent classes of the approximate words. By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words. In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded. By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced. The Rough-CC4 can be used in automatic classification of documents.
基金Supported by the Funds of Heilongjiang Outstanding Young Teacher (1151G037).
文摘The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be ally linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.
基金funded by the Ministry of Higher Education,Malaysia for providing facilities and financial support under the Long Research Grant Scheme LRGS-1-2019-UKM-UKM-2-7.
文摘Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholding techniques that use fixed thresholds and windows,the proposed method introduces a concept for obtaining dynamic windows according to the image content to achieve better binarization.To enhance a low-contrast image,we proposed a new mean histogram stretching method for suppressing noisy pixels in the background and,simultaneously,increasing pixel contrast at edges or near edges,which results in an enhanced image.For the enhanced image,we propose a new method for deriving adaptive local thresholds for dynamic windows.The dynamic window is derived by exploiting the advantage of Otsu thresholding.To assess the performance of the proposed method,we have used standard databases,namely,document image binarization contest(DIBCO),for experimentation.The comparative study on well-known existing methods indicates that the proposed method outperforms the existing methods in terms of quality and recognition rate.
文摘The eXtensible markup language (XML) is a kind of new meta language for replacing HTML and has many advantages. Traditional engineering documents have too many expression forms to be expediently managed and have no dynamic correlation functions. This paper introduces a new method and uses XML to store and manage engineering documents to realize the format unity of engineering documents and their dynamic correlations.
文摘Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches.
基金This work was supported by the UK Engineering and Physical Sciences Research Council(EPSRC)(No.GR/R67507/01).
文摘Engineers often need to look for the right pieces of information by sifting through long engineering documents, It is a very tiring and time-consuming job. To address this issue, researchers are increasingly devoting their attention to new ways to help information users, including engineers, to access and retrieve document content. The research reported in this paper explores how to use the key technologies of document decomposition (study of document structure), document mark-up (with EXtensible Mark- up Language (XML), HyperText Mark-up Language (HTML), and Scalable Vector Graphics (SVG)), and a facetted classification mechanism. Document content extraction is implemented via computer programming (with Java). An Engineering Document Content Management System (EDCMS) developed in this research demonstrates that as information providers we can make document content in a more accessible manner for information users including engineers.The main features of the EDCMS system are: 1) EDCMS is a system that enables users, especially engineers, to access and retrieve information at content rather than document level. In other words, it provides the right pieces of information that answer specific questions so that engineers don't need to waste time sifting through the whole document to obtain the required piece of information. 2) Users can use the EDCMS via both the data and metadata of a document to access engineering document content. 3) Users can use the EDCMS to access and retrieve content objects, i.e. text, images and graphics (including engineering drawings) via multiple views and at different granularities based on decomposition schemes. Experiments with the EDCMS have been conducted on semi-structured documents, a textbook of CADCAM, and a set of project posters in the Engineering Design domain. Experimental results show that the system provides information users with a powerful solution to access document content.
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.
文摘How can choreography and physical theatre pieces continue to perpetuate the work after rendering? How to preserve their aura, their dynamics, and their ephemeral and genuine nature, as Walter Benjamin said? In 1936, Benjamin already anticipated in The Work of Art in the Age of lts Technological Reproducibility that something is missing even in the best-finished reproduction. And memories of dance and physical theatre are intricate. The question is how to create a type of documentation that does not betray the vital flow of the event-based phenomenon. In this short article we will see a series of choreographic and performance artists like Esther Ferrer, Ayara Hern^indez Holz, and Olga de Soto who claimed a new form of organic documentation, making it turn performance or memory of viewers. Other creators as the company La Fura dels Baus claim documentation as spectacle and others on the opposite side, as Tino Sehgal propose radically non documentation of their work. Precisely, these different positions coincide with those of thinkers like Peggy Phelan, Sarah Bay-Cheng, or Paula Caspao who respect to a range of documentation and how it can never replace the live art.