Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and rec...Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.展开更多
Tree logic, inherited from ambient logic, is introduced as the formal foundation of related programming language and type systems, In this paper, we introduce recursion into such logic system, which can describe the t...Tree logic, inherited from ambient logic, is introduced as the formal foundation of related programming language and type systems, In this paper, we introduce recursion into such logic system, which can describe the tree data more dearly and concisely. By making a distinction between proposition and predicate, a concise semantics interpretation for our modal logic is given. We also develop a model checking algorithm for the logic without △ operator. The correctness of the algorithm is shown. Such work can be seen as the basis of the semi-structured data processing language and more flexible type system.展开更多
More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditi...More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.展开更多
The problem of nonparametric identification of a multivariate nonlinearity in a D-input Hammer- stein system is examined. It is demonstrated that if the input measurements are structured, in the sense that there exist...The problem of nonparametric identification of a multivariate nonlinearity in a D-input Hammer- stein system is examined. It is demonstrated that if the input measurements are structured, in the sense that there exists some hidden relation between them, i.e. if they are distributed on some (unknown) d-dimensional space M in IRD, d 〈 D, then the system nonlinearity can be recovered at points on M with the convergence rate O(n-1/(2+d)) dependent on d. This rate is thus faster than the generic rate O(n-1/(2+D)) achieved by typical nonparametric algorithms and controlled solely by the number of inputs D.展开更多
Organizing unstructured information from books into a well-defined structure is a significant challenge in digital libraries.Most digital libraries can provide only search services at the granularity of books and few ...Organizing unstructured information from books into a well-defined structure is a significant challenge in digital libraries.Most digital libraries can provide only search services at the granularity of books and few libraries allow books to be accessed at the granularity of chapters,as manually constructing directory information for books is time-consuming.Extracting structured data from scanned books thus remains an urgent and important work.In this paper,we propose a novel structured data organization framework called CMSOF to organize scanned data automatically,and apply it to a Chinese medicine digital library. In the framework,image blocks and text blocks on the scanned page of books are separated based on the gray histogram projection method or a hybrid method of region growth and the Ada-Boosting classifier at first,and then the text structure is obtained from text blocks by text size and font type recognition.Finally,image blocks and structured OCRed text are correlated at the semantic level.By integrating the structured data into a Chinese medicine information system(CMIS) ,we can organize the Chinese medicine books well and users can access the books with flexibility,which indicates that CMSOF is an efficient framework to organize books mixed with images and text.展开更多
Deep learning methods are applied into structured data and in typical methods,low-order features are discarded after combining with high-order featuresfor prediction tasks.However,in structured data,ignorance of low-o...Deep learning methods are applied into structured data and in typical methods,low-order features are discarded after combining with high-order featuresfor prediction tasks.However,in structured data,ignorance of low-order features may cause the low prediction rate.To address this issue,in this paper,deeper attention-based network(DAN)is proposed.With DAN method,to keep both low-and high-order features,attention average pooling layer was utilized to aggregate features of each order.Furthermore,by shortcut connections from each layer to attention average pooling layer,DAN can be built extremely deep to obtain enough capacity.Experimental results show DAN has good performance and works effectively.展开更多
With the rapid advancement of cloud computing technology,reversible data hiding algorithms in encrypted images(RDH-EI)have developed into an important field of study concentrated on safeguarding privacy in distributed...With the rapid advancement of cloud computing technology,reversible data hiding algorithms in encrypted images(RDH-EI)have developed into an important field of study concentrated on safeguarding privacy in distributed cloud environments.However,existing algorithms often suffer from low embedding capacities and are inadequate for complex data access scenarios.To address these challenges,this paper proposes a novel reversible data hiding algorithm in encrypted images based on adaptive median edge detection(AMED)and ciphertext-policy attributebased encryption(CP-ABE).This proposed algorithm enhances the conventional median edge detection(MED)by incorporating dynamic variables to improve pixel prediction accuracy.The carrier image is subsequently reconstructed using the Huffman coding technique.Encrypted image generation is then achieved by encrypting the image based on system user attributes and data access rights,with the hierarchical embedding of the group’s secret data seamlessly integrated during the encryption process using the CP-ABE scheme.Ultimately,the encrypted image is transmitted to the data hider,enabling independent embedding of the secret data and resulting in the creation of the marked encrypted image.This approach allows only the receiver to extract the authorized group’s secret data,thereby enabling fine-grained,controlled access.Test results indicate that,in contrast to current algorithms,the method introduced here considerably improves the embedding rate while preserving lossless image recovery.Specifically,the average maximum embedding rates for the(3,4)-threshold and(6,6)-threshold schemes reach 5.7853 bits per pixel(bpp)and 7.7781 bpp,respectively,across the BOSSbase,BOW-2,and USD databases.Furthermore,the algorithm facilitates permission-granting and joint-decryption capabilities.Additionally,this paper conducts a comprehensive examination of the algorithm’s robustness using metrics such as image correlation,information entropy,and number of pixel change rate(NPCR),confirming its high level of security.Overall,the algorithm can be applied in a multi-user and multi-level cloud service environment to realize the secure storage of carrier images and secret data.展开更多
With the rapid development of information technology,smart teaching platforms have become important tools for higher education teaching reform.As a core course of computer science and technology-related majors in high...With the rapid development of information technology,smart teaching platforms have become important tools for higher education teaching reform.As a core course of computer science and technology-related majors in higher education,the data structure course lays a solid foundation for students’professional learning and plays an important role in promoting their future success in technology,research,and industry.This study conducts an in-depth analysis of the pain points faced by the data structure course,and explores a teaching reform and practice of integration of theory and practice based on the system application of a smart teaching platform before class,during class,and after class.The reform practice shows that this teaching mode improves students’learning initiative,learning motivation,and practical skills.Students not only achieved better results in knowledge mastery but also significantly improved in problem analysis and solution.展开更多
Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, ...Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.展开更多
A robust and efficient algorithm is presented to build multiresolution models (MRMs) of arbitrary meshes without requirement of subdivision connectivity. To overcome the sampling difficulty of arbitrary meshes, edge c...A robust and efficient algorithm is presented to build multiresolution models (MRMs) of arbitrary meshes without requirement of subdivision connectivity. To overcome the sampling difficulty of arbitrary meshes, edge contraction and vertex expansion are used as downsampling and upsampling methods. Our MRMs of a mesh are composed of a base mesh and a series of edge split operations, which are organized as a directed graph. Each split operation encodes two parts of information. One is the modification to the mesh, and the other is the dependency relation among splits. Such organization ensures the efficiency and robustness of our MRM algorithm. Examples demonstrate the functionality of our method.展开更多
To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user throu...To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.展开更多
Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki cont...Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki contributions. Its aim is to create a global resource so that people (and machines) can access common information more effectively which is mostly available in English. In this research work, we have tried to build the technique of creating the Freebase for Bengali language. Today the number of Bengali articles on the internet is growing day by day. So it has become a necessary to have a structured data store in Bengali. It consists of different types of concepts (topics) and relationships between those topics. These include different types of areas like popular culture (e.g. films, music, books, sports, television), location information (restaurants, geolocations, businesses), scholarly information (linguistics, biology, astronomy), birth place of (poets, politicians, actor, actress) and general knowledge (Wikipedia). It will be much more helpful for relation extraction or any kind of Natural Language Processing (NLP) works on Bengali language. In this work, we identified the technique of creating the Bengali Freebase and made a collection of Bengali data. We applied SPARQL query language to extract information from natural language (Bengali) documents such as Wikidata which is typically in RDF (Resource Description Format) triple format.展开更多
In order to improve the quality of web search,a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes,named as concep...In order to improve the quality of web search,a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes,named as concept attribute,context attribute and meaningless attribute,according to their semantic features which are document frequency features and distinguishing capability features.It also defines the semantic relevance between two attributes when they have correlations in the database.Then it proposes trie-bitmap structure and pair pointer tables to implement efficient algorithms for discovering attribute semantic feature and detecting their semantic relevances.By using semantic attributes and their semantic relevances,expansion words can be generated and embedded into a vector space model with interpolation parameters.The experiments use an IMDB movie database and real texts collections to evaluate the proposed method by comparing its performance with a classical vector space model.The results show that the proposed method can improve text search efficiently and also improve both semantic features and semantic relevances with good separation capabilities.展开更多
In this paper, a new concept called numerical structure of seismic data is introduced and the difference between numerical structure and numerical value of seismic data is explained. Our study shows that the numerical...In this paper, a new concept called numerical structure of seismic data is introduced and the difference between numerical structure and numerical value of seismic data is explained. Our study shows that the numerical seismic structure is closely related to oil and gas-bearing reservoir, so it is very useful for a geologist or a geophysicist to precisely interpret the oil-bearing layers from the seismic data. This technology can be applied to any exploration or production stage. The new method has been tested on a series of exploratory or development wells and proved to be reliable in China. Hydrocarbon-detection with this new method for 39 exploration wells on 25 structures indi- cates a success ratio of over 80 percent. The new method of hydrocarbon prediction can be applied for: (1) depositional environment of reservoirs with marine fades, delta, or non-marine fades (including fluvial facies, lacustrine fades); (2) sedimentary rocks of reservoirs that are non-marine clastic rocks and carbonate rock; and (3) burial depths range from 300 m to 7000 m, and the minimum thickness of these reservoirs is over 8 m (main frequency is about 50 Hz).展开更多
Seismic data structure characteristics means the waveform character arranged in the time sequence at discrete data points in each 2-D or 3-D seismic trace. Hydrocarbon prediction using seismic data structure character...Seismic data structure characteristics means the waveform character arranged in the time sequence at discrete data points in each 2-D or 3-D seismic trace. Hydrocarbon prediction using seismic data structure characteristics is a new reservoir prediction technique. When the main pay interval is in carbonate fracture and fissure-cavern type reservoirs with very strong inhomogeneity, there are some difficulties with hydrocarbon prediction. Because of the special geological conditions of the eighth zone in the Tahe oil field, we apply seismic data structure characteristics to hydrocarbon prediction for the Ordovician reservoir in this zone. We divide the area oil zone into favorable and unfavorable blocks. Eighteen well locations were proposed in the favorable oil block, drilled, and recovered higher output of oil and gas.展开更多
Multi-fidelity Data Fusion(MDF)frameworks have emerged as a prominent approach to producing economical but accurate surrogate models for aerodynamic data modeling by integrating data with different fidelity levels.How...Multi-fidelity Data Fusion(MDF)frameworks have emerged as a prominent approach to producing economical but accurate surrogate models for aerodynamic data modeling by integrating data with different fidelity levels.However,most existing MDF frameworks assume a uniform data structure between sampling data sources;thus,producing an accurate solution at the required level,for cases of non-uniform data structures is challenging.To address this challenge,an Adaptive Multi-fidelity Data Fusion(AMDF)framework is proposed to produce a composite surrogate model which can efficiently model multi-fidelity data featuring non-uniform structures.Firstly,the design space of the input data with non-uniform data structures is decomposed into subdomains containing simplified structures.Secondly,different MDF frameworks and a rule-based selection process are adopted to construct multiple local models for the subdomain data.On the other hand,the Enhanced Local Fidelity Modeling(ELFM)method is proposed to combine the generated local models into a unique and continuous global model.Finally,the resulting model inherits the features of local models and approximates a complete database for the whole design space.The validation of the proposed framework is performed to demonstrate its approximation capabilities in(A)four multi-dimensional analytical problems and(B)a practical engineering case study of constructing an F16C fighter aircraft’s aerodynamic database.Accuracy comparisons of the generated models using the proposed AMDF framework and conventional MDF approaches using a single global modeling algorithm are performed to reveal the adaptability of the proposed approach for fusing multi-fidelity data featuring non-uniform structures.Indeed,the results indicated that the proposed framework outperforms the state-of-the-art MDF approach in the cases of non-uniform data.展开更多
The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, custome...The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, etc.While the amount of textual data is increasing rapidly, users ability to summarise, understand, and make sense of such data for making better business/living decisions remains challenging. This paper studies how to analyse textual data, based on layered software patterns, for extracting insightful user intelligence from a large collection of documents and for using such information to improve user operations and performance.展开更多
In the application development of database,sharing information a- mong different DBMSs is an important and meaningful technical subject. This paper analyzes the schema definition and physical organization of popu- lar...In the application development of database,sharing information a- mong different DBMSs is an important and meaningful technical subject. This paper analyzes the schema definition and physical organization of popu- lar relational DBMSs and suggests the use of an intermediary schema.This technology provides many advantages such as powerful extensibility and ease in the integration of data conversions among different DBMSs etc.This pa- per introduces the data conversion system under DOS and XENIX operating systems.展开更多
In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Associ...In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.展开更多
Taking autonomous driving and driverless as the research object,we discuss and define intelligent high-precision map.Intelligent high-precision map is considered as a key link of future travel,a carrier of real-time p...Taking autonomous driving and driverless as the research object,we discuss and define intelligent high-precision map.Intelligent high-precision map is considered as a key link of future travel,a carrier of real-time perception of traffic resources in the entire space-time range,and the criterion for the operation and control of the whole process of the vehicle.As a new form of map,it has distinctive features in terms of cartography theory and application requirements compared with traditional navigation electronic maps.Thus,it is necessary to analyze and discuss its key features and problems to promote the development of research and application of intelligent high-precision map.Accordingly,we propose an information transmission model based on the cartography theory and combine the wheeled robot’s control flow in practical application.Next,we put forward the data logic structure of intelligent high-precision map,and analyze its application in autonomous driving.Then,we summarize the computing mode of“Crowdsourcing+Edge-Cloud Collaborative Computing”,and carry out key technical analysis on how to improve the quality of crowdsourced data.We also analyze the effective application scenarios of intelligent high-precision map in the future.Finally,we present some thoughts and suggestions for the future development of this field.展开更多
文摘Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.
基金Supported by the National Natural Sciences Foun-dation of China (60233010 ,60273034 ,60403014) ,863 ProgramofChina (2002AA116010) ,973 Programof China (2002CB312002)
文摘Tree logic, inherited from ambient logic, is introduced as the formal foundation of related programming language and type systems, In this paper, we introduce recursion into such logic system, which can describe the tree data more dearly and concisely. By making a distinction between proposition and predicate, a concise semantics interpretation for our modal logic is given. We also develop a model checking algorithm for the logic without △ operator. The correctness of the algorithm is shown. Such work can be seen as the basis of the semi-structured data processing language and more flexible type system.
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciencesthe National High-Tech R&D Program of China(2008BAK49B05)
文摘More web pages are widely applying AJAX (Asynchronous JavaScript XML) due to the rich interactivity and incremental communication. By observing, it is found that the AJAX contents, which could not be seen by traditional crawler, are well-structured and belong to one specific domain generally. Extracting the structured data from AJAX contents and annotating its semantic are very significant for further applications. In this paper, a structured AJAX data extraction method for agricultural domain based on agricultural ontology was proposed. Firstly, Crawljax, an open AJAX crawling tool, was overridden to explore and retrieve the AJAX contents; secondly, the retrieved contents were partitioned into items and then classified by combining with agricultural ontology. HTML tags and punctuations were used to segment the retrieved contents into entity items. Finally, the entity items were clustered and the semantic annotation was assigned to clustering results according to agricultural ontology. By experimental evaluation, the proposed approach was proved effectively in resource exploring, entity extraction, and semantic annotation.
文摘The problem of nonparametric identification of a multivariate nonlinearity in a D-input Hammer- stein system is examined. It is demonstrated that if the input measurements are structured, in the sense that there exists some hidden relation between them, i.e. if they are distributed on some (unknown) d-dimensional space M in IRD, d 〈 D, then the system nonlinearity can be recovered at points on M with the convergence rate O(n-1/(2+d)) dependent on d. This rate is thus faster than the generic rate O(n-1/(2+D)) achieved by typical nonparametric algorithms and controlled solely by the number of inputs D.
基金Project supported by the China Academic Digital Associative Library(CADAL)
文摘Organizing unstructured information from books into a well-defined structure is a significant challenge in digital libraries.Most digital libraries can provide only search services at the granularity of books and few libraries allow books to be accessed at the granularity of chapters,as manually constructing directory information for books is time-consuming.Extracting structured data from scanned books thus remains an urgent and important work.In this paper,we propose a novel structured data organization framework called CMSOF to organize scanned data automatically,and apply it to a Chinese medicine digital library. In the framework,image blocks and text blocks on the scanned page of books are separated based on the gray histogram projection method or a hybrid method of region growth and the Ada-Boosting classifier at first,and then the text structure is obtained from text blocks by text size and font type recognition.Finally,image blocks and structured OCRed text are correlated at the semantic level.By integrating the structured data into a Chinese medicine information system(CMIS) ,we can organize the Chinese medicine books well and users can access the books with flexibility,which indicates that CMSOF is an efficient framework to organize books mixed with images and text.
基金Sichuan Science and Technology Program 2018GZDZX0042,2018HH0061.
文摘Deep learning methods are applied into structured data and in typical methods,low-order features are discarded after combining with high-order featuresfor prediction tasks.However,in structured data,ignorance of low-order features may cause the low prediction rate.To address this issue,in this paper,deeper attention-based network(DAN)is proposed.With DAN method,to keep both low-and high-order features,attention average pooling layer was utilized to aggregate features of each order.Furthermore,by shortcut connections from each layer to attention average pooling layer,DAN can be built extremely deep to obtain enough capacity.Experimental results show DAN has good performance and works effectively.
基金the National Natural Science Foundation of China(Grant Numbers 622724786210245062102451).
文摘With the rapid advancement of cloud computing technology,reversible data hiding algorithms in encrypted images(RDH-EI)have developed into an important field of study concentrated on safeguarding privacy in distributed cloud environments.However,existing algorithms often suffer from low embedding capacities and are inadequate for complex data access scenarios.To address these challenges,this paper proposes a novel reversible data hiding algorithm in encrypted images based on adaptive median edge detection(AMED)and ciphertext-policy attributebased encryption(CP-ABE).This proposed algorithm enhances the conventional median edge detection(MED)by incorporating dynamic variables to improve pixel prediction accuracy.The carrier image is subsequently reconstructed using the Huffman coding technique.Encrypted image generation is then achieved by encrypting the image based on system user attributes and data access rights,with the hierarchical embedding of the group’s secret data seamlessly integrated during the encryption process using the CP-ABE scheme.Ultimately,the encrypted image is transmitted to the data hider,enabling independent embedding of the secret data and resulting in the creation of the marked encrypted image.This approach allows only the receiver to extract the authorized group’s secret data,thereby enabling fine-grained,controlled access.Test results indicate that,in contrast to current algorithms,the method introduced here considerably improves the embedding rate while preserving lossless image recovery.Specifically,the average maximum embedding rates for the(3,4)-threshold and(6,6)-threshold schemes reach 5.7853 bits per pixel(bpp)and 7.7781 bpp,respectively,across the BOSSbase,BOW-2,and USD databases.Furthermore,the algorithm facilitates permission-granting and joint-decryption capabilities.Additionally,this paper conducts a comprehensive examination of the algorithm’s robustness using metrics such as image correlation,information entropy,and number of pixel change rate(NPCR),confirming its high level of security.Overall,the algorithm can be applied in a multi-user and multi-level cloud service environment to realize the secure storage of carrier images and secret data.
文摘With the rapid development of information technology,smart teaching platforms have become important tools for higher education teaching reform.As a core course of computer science and technology-related majors in higher education,the data structure course lays a solid foundation for students’professional learning and plays an important role in promoting their future success in technology,research,and industry.This study conducts an in-depth analysis of the pain points faced by the data structure course,and explores a teaching reform and practice of integration of theory and practice based on the system application of a smart teaching platform before class,during class,and after class.The reform practice shows that this teaching mode improves students’learning initiative,learning motivation,and practical skills.Students not only achieved better results in knowledge mastery but also significantly improved in problem analysis and solution.
文摘Data warehouse provides storage and management for mass data, but data schema evolves with time on. When data schema is changed, added or deleted, the data in data warehouse must comply with the changed data schema, so data warehouse must be re organized or re constructed, but this process is exhausting and wasteful. In order to cope with these problems, this paper develops an approach to model data cube with XML, which emerges as a universal format for data exchange on the Web and which can make data warehouse flexible and scalable. This paper also extends OLAP algebra for XML based data cube, which is called X OLAP.
文摘A robust and efficient algorithm is presented to build multiresolution models (MRMs) of arbitrary meshes without requirement of subdivision connectivity. To overcome the sampling difficulty of arbitrary meshes, edge contraction and vertex expansion are used as downsampling and upsampling methods. Our MRMs of a mesh are composed of a base mesh and a series of edge split operations, which are organized as a directed graph. Each split operation encodes two parts of information. One is the modification to the mesh, and the other is the dependency relation among splits. Such organization ensures the efficiency and robustness of our MRM algorithm. Examples demonstrate the functionality of our method.
基金Supported by the National High Technology Research and Development Programme of China(No.2009AA01 Z141)the National Natural Science Foundation of China(No.60573117)Beijing Natural Science Foundation(No.4131001)
文摘To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.
文摘Freebase is a large collaborative knowledge base and database of general, structured information for public use. Its structured data had been harvested from many sources, including individual, user-submitted wiki contributions. Its aim is to create a global resource so that people (and machines) can access common information more effectively which is mostly available in English. In this research work, we have tried to build the technique of creating the Freebase for Bengali language. Today the number of Bengali articles on the internet is growing day by day. So it has become a necessary to have a structured data store in Bengali. It consists of different types of concepts (topics) and relationships between those topics. These include different types of areas like popular culture (e.g. films, music, books, sports, television), location information (restaurants, geolocations, businesses), scholarly information (linguistics, biology, astronomy), birth place of (poets, politicians, actor, actress) and general knowledge (Wikipedia). It will be much more helpful for relation extraction or any kind of Natural Language Processing (NLP) works on Bengali language. In this work, we identified the technique of creating the Bengali Freebase and made a collection of Bengali data. We applied SPARQL query language to extract information from natural language (Bengali) documents such as Wikidata which is typically in RDF (Resource Description Format) triple format.
基金Program for New Century Excellent Talents in University(No.NCET-06-0290)the National Natural Science Foundation of China(No.60503036)the Fok Ying Tong Education Foundation Award(No.104027)
文摘In order to improve the quality of web search,a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes,named as concept attribute,context attribute and meaningless attribute,according to their semantic features which are document frequency features and distinguishing capability features.It also defines the semantic relevance between two attributes when they have correlations in the database.Then it proposes trie-bitmap structure and pair pointer tables to implement efficient algorithms for discovering attribute semantic feature and detecting their semantic relevances.By using semantic attributes and their semantic relevances,expansion words can be generated and embedded into a vector space model with interpolation parameters.The experiments use an IMDB movie database and real texts collections to evaluate the proposed method by comparing its performance with a classical vector space model.The results show that the proposed method can improve text search efficiently and also improve both semantic features and semantic relevances with good separation capabilities.
基金Mainly presented at the 6-th international meeting of acoustics in Aug. 2003, and The 1999 SPE Asia Pacific Oil and GasConference and Exhibition held in Jakarta, Indonesia, 20-22 April 1999, SPE 54274.
文摘In this paper, a new concept called numerical structure of seismic data is introduced and the difference between numerical structure and numerical value of seismic data is explained. Our study shows that the numerical seismic structure is closely related to oil and gas-bearing reservoir, so it is very useful for a geologist or a geophysicist to precisely interpret the oil-bearing layers from the seismic data. This technology can be applied to any exploration or production stage. The new method has been tested on a series of exploratory or development wells and proved to be reliable in China. Hydrocarbon-detection with this new method for 39 exploration wells on 25 structures indi- cates a success ratio of over 80 percent. The new method of hydrocarbon prediction can be applied for: (1) depositional environment of reservoirs with marine fades, delta, or non-marine fades (including fluvial facies, lacustrine fades); (2) sedimentary rocks of reservoirs that are non-marine clastic rocks and carbonate rock; and (3) burial depths range from 300 m to 7000 m, and the minimum thickness of these reservoirs is over 8 m (main frequency is about 50 Hz).
基金This reservoir research is sponsored by the National 973 Subject Project (No. 2001CB209).
文摘Seismic data structure characteristics means the waveform character arranged in the time sequence at discrete data points in each 2-D or 3-D seismic trace. Hydrocarbon prediction using seismic data structure characteristics is a new reservoir prediction technique. When the main pay interval is in carbonate fracture and fissure-cavern type reservoirs with very strong inhomogeneity, there are some difficulties with hydrocarbon prediction. Because of the special geological conditions of the eighth zone in the Tahe oil field, we apply seismic data structure characteristics to hydrocarbon prediction for the Ordovician reservoir in this zone. We divide the area oil zone into favorable and unfavorable blocks. Eighteen well locations were proposed in the favorable oil block, drilled, and recovered higher output of oil and gas.
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2020R1A6A1A03046811).This paper was also supported by Konkuk University Researcher Fund in 2021.
文摘Multi-fidelity Data Fusion(MDF)frameworks have emerged as a prominent approach to producing economical but accurate surrogate models for aerodynamic data modeling by integrating data with different fidelity levels.However,most existing MDF frameworks assume a uniform data structure between sampling data sources;thus,producing an accurate solution at the required level,for cases of non-uniform data structures is challenging.To address this challenge,an Adaptive Multi-fidelity Data Fusion(AMDF)framework is proposed to produce a composite surrogate model which can efficiently model multi-fidelity data featuring non-uniform structures.Firstly,the design space of the input data with non-uniform data structures is decomposed into subdomains containing simplified structures.Secondly,different MDF frameworks and a rule-based selection process are adopted to construct multiple local models for the subdomain data.On the other hand,the Enhanced Local Fidelity Modeling(ELFM)method is proposed to combine the generated local models into a unique and continuous global model.Finally,the resulting model inherits the features of local models and approximates a complete database for the whole design space.The validation of the proposed framework is performed to demonstrate its approximation capabilities in(A)four multi-dimensional analytical problems and(B)a practical engineering case study of constructing an F16C fighter aircraft’s aerodynamic database.Accuracy comparisons of the generated models using the proposed AMDF framework and conventional MDF approaches using a single global modeling algorithm are performed to reveal the adaptability of the proposed approach for fusing multi-fidelity data featuring non-uniform structures.Indeed,the results indicated that the proposed framework outperforms the state-of-the-art MDF approach in the cases of non-uniform data.
文摘The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, etc.While the amount of textual data is increasing rapidly, users ability to summarise, understand, and make sense of such data for making better business/living decisions remains challenging. This paper studies how to analyse textual data, based on layered software patterns, for extracting insightful user intelligence from a large collection of documents and for using such information to improve user operations and performance.
文摘In the application development of database,sharing information a- mong different DBMSs is an important and meaningful technical subject. This paper analyzes the schema definition and physical organization of popu- lar relational DBMSs and suggests the use of an intermediary schema.This technology provides many advantages such as powerful extensibility and ease in the integration of data conversions among different DBMSs etc.This pa- per introduces the data conversion system under DOS and XENIX operating systems.
基金supported by the Key Program of the National Natural Science Foundation of China(Grant No.50539010)the Special Fund for Public Welfare Industry of the Ministry of Water Resources of China(Grant No.200801019)
文摘In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.
基金National Key Research and Development Program(No.2018YFB1305001)Major Consulting and Research Project of Chinese Academy of Engineering(No.2018-ZD-02-07)。
文摘Taking autonomous driving and driverless as the research object,we discuss and define intelligent high-precision map.Intelligent high-precision map is considered as a key link of future travel,a carrier of real-time perception of traffic resources in the entire space-time range,and the criterion for the operation and control of the whole process of the vehicle.As a new form of map,it has distinctive features in terms of cartography theory and application requirements compared with traditional navigation electronic maps.Thus,it is necessary to analyze and discuss its key features and problems to promote the development of research and application of intelligent high-precision map.Accordingly,we propose an information transmission model based on the cartography theory and combine the wheeled robot’s control flow in practical application.Next,we put forward the data logic structure of intelligent high-precision map,and analyze its application in autonomous driving.Then,we summarize the computing mode of“Crowdsourcing+Edge-Cloud Collaborative Computing”,and carry out key technical analysis on how to improve the quality of crowdsourced data.We also analyze the effective application scenarios of intelligent high-precision map in the future.Finally,we present some thoughts and suggestions for the future development of this field.