In this paper,we propose a new relational schema (R-schema) to XML schema translation algorithm, VQT, which analyzes the value cardinality and user query patterns and extracts the implicit referential integrities by u...In this paper,we propose a new relational schema (R-schema) to XML schema translation algorithm, VQT, which analyzes the value cardinality and user query patterns and extracts the implicit referential integrities by using the cardinality property of foreign key constraints between columns and the equi-join characteristic in user queries. The VQT algorithm can apply the extracted implied referential integrity relation information to the R-schema and create an XML schema as the final result. Therefore, the VQT algorithm prevents the R-schema from being incorrectly converted into the XML schema, and it richly and powerfully represents all the information in the R-schema by creating an XML schema as the translation result on behalf of the XML DTD.展开更多
It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequ...It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.展开更多
It is increasingly common to find graphs in which edges are of different types, indicating a variety of relation- ships. For such graphs we propose a class of reachability queries and a class of graph patterns, in whi...It is increasingly common to find graphs in which edges are of different types, indicating a variety of relation- ships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, ex- pressing the connectivity of a data graph via edges of var- ious types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible informa- tion than their traditional counterparts. Better still, their in- creased expressive power does not come with extra complex- ity. Indeed, (1) we investigate their containment and mini- mization problems, and show that these fundamental prob- lems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these al- gorithms are experimentally verified using real-life data and synthetic data.展开更多
As software systems grow more and more complex,extensive techniques have been proposed to analyze the log data to obtain the insight of the system status.However,during log data analysis,tedious manual efforts are pai...As software systems grow more and more complex,extensive techniques have been proposed to analyze the log data to obtain the insight of the system status.However,during log data analysis,tedious manual efforts are paid to search interesting or informative log patterns from a huge volume of log data,named pattern-based queries.Although existing log management tools and DMBS systems can also support pattern-based queries,they suffer from a low efficiency.To deal with this problem,we propose a novel approach,named PLQ(Pattern-based Log Query).First,PLQ organizes logs into disjoint chunks and builds chunk-wise bitmap indexes for log types and attribute values.Then,based on bitmap indexes,PLQ finds candidate logs with a set of efficient bit-wise operations.Finally,PLQ fetches candidate logs and validates them according to the queried pattern.Extensive experiments are conducted on real-life datasets.According to experimental results,compared with existing log management systems,PLQ is more efficient in querying log patterns and has a higher pruning rate for filtering irrelevant logs.Moreover,in PLQ,since the ratio of the index size to the data size does not exceed 2.5%for log datasets of different sizes,PLQ has a high scalability.展开更多
基金Project supported by the 2nd Brain Korea Project
文摘In this paper,we propose a new relational schema (R-schema) to XML schema translation algorithm, VQT, which analyzes the value cardinality and user query patterns and extracts the implicit referential integrities by using the cardinality property of foreign key constraints between columns and the equi-join characteristic in user queries. The VQT algorithm can apply the extracted implied referential integrity relation information to the R-schema and create an XML schema as the final result. Therefore, the VQT algorithm prevents the R-schema from being incorrectly converted into the XML schema, and it richly and powerfully represents all the information in the R-schema by creating an XML schema as the translation result on behalf of the XML DTD.
文摘It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.
文摘It is increasingly common to find graphs in which edges are of different types, indicating a variety of relation- ships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, ex- pressing the connectivity of a data graph via edges of var- ious types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible informa- tion than their traditional counterparts. Better still, their in- creased expressive power does not come with extra complex- ity. Indeed, (1) we investigate their containment and mini- mization problems, and show that these fundamental prob- lems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these al- gorithms are experimentally verified using real-life data and synthetic data.
基金This work was supported by the National Natural Science Foundation of China under Grant No.61672163the MIIT Project:Data Management Standards and Verfication for Industrial Internet Identifer Resoluation.
文摘As software systems grow more and more complex,extensive techniques have been proposed to analyze the log data to obtain the insight of the system status.However,during log data analysis,tedious manual efforts are paid to search interesting or informative log patterns from a huge volume of log data,named pattern-based queries.Although existing log management tools and DMBS systems can also support pattern-based queries,they suffer from a low efficiency.To deal with this problem,we propose a novel approach,named PLQ(Pattern-based Log Query).First,PLQ organizes logs into disjoint chunks and builds chunk-wise bitmap indexes for log types and attribute values.Then,based on bitmap indexes,PLQ finds candidate logs with a set of efficient bit-wise operations.Finally,PLQ fetches candidate logs and validates them according to the queried pattern.Extensive experiments are conducted on real-life datasets.According to experimental results,compared with existing log management systems,PLQ is more efficient in querying log patterns and has a higher pruning rate for filtering irrelevant logs.Moreover,in PLQ,since the ratio of the index size to the data size does not exceed 2.5%for log datasets of different sizes,PLQ has a high scalability.