It is increasingly common to find graphs in which edges are of different types, indicating a variety of relation- ships. For such graphs we propose a class of reachability queries and a class of graph patterns, in whi...It is increasingly common to find graphs in which edges are of different types, indicating a variety of relation- ships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, ex- pressing the connectivity of a data graph via edges of var- ious types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible informa- tion than their traditional counterparts. Better still, their in- creased expressive power does not come with extra complex- ity. Indeed, (1) we investigate their containment and mini- mization problems, and show that these fundamental prob- lems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these al- gorithms are experimentally verified using real-life data and synthetic data.展开更多
Data visualization transforms data into images to aid the understanding of data; therefore, it is an invaluable tool for explaining the significance of data to visually inclined people. Given a(big) dataset, the essen...Data visualization transforms data into images to aid the understanding of data; therefore, it is an invaluable tool for explaining the significance of data to visually inclined people. Given a(big) dataset, the essential task of visualization is to visualize the data to tell compelling stories by selecting, filtering, and transforming the data, and picking the right visualization type such as bar charts or line charts. Our ultimate goal is to automate this task that currently requires heavy user intervention in the existing visualization systems. An evolutionized system in the field faces the following three main challenges:(1) Visualization verification: to determine whether a visualization for a given dataset is interesting, from the viewpoint of human understanding;(2) Visualization search space: a "boring" dataset may become interesting after an arbitrary combination of operations such as selections,joins, and aggregations, among others;(3) On-time responses: do not deplete the user's patience. In this paper,we present the DEEPEYE system to address these challenges. This system solves the first challenge by training a binary classifier to decide whether a particular visualization is good for a given dataset, and by using a supervised learning to rank model to rank the above good visualizations. It also considers popular visualization operations, such as grouping and binning, which can manipulate the data, and this will determine the search space. Our proposed system tackles the third challenge by incorporating database optimization techniques for sharing computations and pruning.展开更多
文摘It is increasingly common to find graphs in which edges are of different types, indicating a variety of relation- ships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, ex- pressing the connectivity of a data graph via edges of var- ious types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible informa- tion than their traditional counterparts. Better still, their in- creased expressive power does not come with extra complex- ity. Indeed, (1) we investigate their containment and mini- mization problems, and show that these fundamental prob- lems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these al- gorithms are experimentally verified using real-life data and synthetic data.
基金supported by the National Key Basic Research and Development(973)Program of China(No.2015CB358700)the National Natural Science Foundation of China(Nos.61373024,61632016,61422205,and 61472198)
文摘Data visualization transforms data into images to aid the understanding of data; therefore, it is an invaluable tool for explaining the significance of data to visually inclined people. Given a(big) dataset, the essential task of visualization is to visualize the data to tell compelling stories by selecting, filtering, and transforming the data, and picking the right visualization type such as bar charts or line charts. Our ultimate goal is to automate this task that currently requires heavy user intervention in the existing visualization systems. An evolutionized system in the field faces the following three main challenges:(1) Visualization verification: to determine whether a visualization for a given dataset is interesting, from the viewpoint of human understanding;(2) Visualization search space: a "boring" dataset may become interesting after an arbitrary combination of operations such as selections,joins, and aggregations, among others;(3) On-time responses: do not deplete the user's patience. In this paper,we present the DEEPEYE system to address these challenges. This system solves the first challenge by training a binary classifier to decide whether a particular visualization is good for a given dataset, and by using a supervised learning to rank model to rank the above good visualizations. It also considers popular visualization operations, such as grouping and binning, which can manipulate the data, and this will determine the search space. Our proposed system tackles the third challenge by incorporating database optimization techniques for sharing computations and pruning.