To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, thr...In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.展开更多
This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exist...This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.展开更多
A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patte...A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patterns of continuous queries,suitable data structures bring great query processing efficiency.In this paper,we proposed a data structure suitable for weak nonmonotonic update pattern in which the lifetime of each tuple is known at generation time,but the length of lifetime is not necessarily the same.The new data structure combined the ladder queue with the feature of weak non-monotonic update pattern.The experiment results show that the new data structure performs much better than the traditional calendar queue in many cases.展开更多
The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimizati...The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.展开更多
The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to impl...The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to implement an effective grammatical processing for the language. It separates the grammar analysis ofa UMQL query specification into two phases: syntactic analysis and semantic analysis, and then respectively uses Backus-Naur form (EBNF) and logical algebra to specify both restrictive grammar rules. As a result, the model can present error guiding information for a query specification which owns incorrect grammar. The model not only suits well the processing of UMQL queries, but aLso has a guiding significance for other projects concerning query processings of descriptive query languages.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient metho...Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.展开更多
HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The se...HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.展开更多
Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may ...Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may result in performing cost traversing operation when answering a query;or accelerate query answering by constructing an index covering the complete reachability relationship,which may be inefficient due to comparing the complete node labels.We propose a novel labeling scheme,which covers the complete reachability relationship,to accelerate reachability queries processing.The idea is to decompose the given directed acyclic graph(DAG)G into two subgraphs,G1 and G2.For G1,we propose to use topological labels consisting of two integers to answer all reachability queries.For G2,we construct 2-hop labels as existing methods do to answer queries that cannot be answered by topological labels.The benefits of our method lie in two aspects.On one hand,our method does not need to perform the cost traversing operation when answering queries.On the other hand,our method can quickly answer most queries in constant time without comparing the whole node labels.We confirm the efficiency of our approaches by extensive experimental studies using 20 real datasets.展开更多
In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper pro...In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper proposes a processing approach for event-based location aware queries (ELAQ), which includes query dissemination algorithm, maximum distance projection proxy selection algorithm, in-network query propagation, and aggregation algorithm. ELAQs are triggered by the events and the query results are dependent on mobile sensors' location, which are the characteristics of ELAQ model. The results show that compared with the TinyDB query processing approach, ELAQ processing approach increases the accuracy of the query result and decreases the query response time.展开更多
The continuous top-t most influential place (CTtMIP) query is defined formally and solved efficiently in this paper. A CTtMIP query continuously monitors the t places with the maximum influence from the set of place...The continuous top-t most influential place (CTtMIP) query is defined formally and solved efficiently in this paper. A CTtMIP query continuously monitors the t places with the maximum influence from the set of places, where the influence of a place is defined as the number of its bichromatic reverse k nearest neighbors (BRkNNs). Two new metrics and their corresponding rules are introduced to shrink the search region and reduce the candidates of BRkNNs checked. Extensive experiments confirm that our proposed approach outperforms the state-of-the-art competitor significantly.展开更多
The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous me...The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.展开更多
Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune...Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system. 展开更多
Modern search systems have become a fundamental tool for accessing the massive amount of information stored in different repositories.These systems use sophisticated techniques to efficiently process a high volume of ...Modern search systems have become a fundamental tool for accessing the massive amount of information stored in different repositories.These systems use sophisticated techniques to efficiently process a high volume of queries(thus optimizing energy consumption).One of these techniques is caching,which is implemented at different levels of a search architecture.In this work,we propose a novel caching strategy that speeds up dynamic pruning techniques(such as Maxscore)by exploiting the information of the lowest(Min)and highest(Max)document identifiers that appear as the result of a previously submitted query.We name this technique as Min/Max caching.The idea is to use Min/Max information for pruning the terms’posting lists in the query before executing the ranking algorithm in a document-at-a-time(DAAT)approach.The proposed technique uses low memory resources,returns safe results,and complements other levels of caching(if present).We also combine the approach with different access policies.Extensive experimentation on real-world data shows that the proposed method increases query processing speedup up to 1.23x and can also reduce high-percentile tail latency(up to 2.0x speedup),an essential requirement for operational scenarios.We evaluate different access and eviction cache policies based on different decision criteria.Our findings confirm that considering the cost of the cached items(cost-aware policies)allows more computation savings.展开更多
Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations...Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.展开更多
An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. P...An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms.展开更多
Maintaining a semantic cache of materialized XPath views inside or outside the database is a novel, feasible and efficient approach to facilitating XML query processing. However, most of the existing approaches incur ...Maintaining a semantic cache of materialized XPath views inside or outside the database is a novel, feasible and efficient approach to facilitating XML query processing. However, most of the existing approaches incur the following disadvantages: 1) they cannot discover enough potential cached views sufficiently to effectively answer subsequent queries; or 2) they are inefficient for view selection due to the complexity of XPath expressions. In this paper, we propose SCEND, an effective Semantic Cache based on dEcompositioN and Divisibility, to exploit the XPath query/view answerability. The contributions of this paper include: 1) a novel technique of decomposing complex XPath queries into some much simpler ones, which can facilitate discovering more potential views to answer a new query than the existing methods and thus can adequately exploit the query/view answerability; 2) an efficient view-section method by checking the divisibility between two positive numbers assigned to queries and views; 3) a cache-replacement approach to further enhancing the query/view answerability; 4) an extensive experimental study which demonstrates that our approach achieves higher performance and outperforms the existing state-of-the-art alternative methods significantly.展开更多
Recently,in the area of big data,some popular applications such as web search engines and recommendation systems,face the problem to diversify results during query processing.In this sense,it is both significant and e...Recently,in the area of big data,some popular applications such as web search engines and recommendation systems,face the problem to diversify results during query processing.In this sense,it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set.In this paper,we firstly define the diversity of a set and the ability of an element to improve the overall diversity.Based on these definitions,we propose a diversification framework which has good performance in terms of effectiveness and efficiency.Also,this framework has theoretical guarantee on probability of success.Secondly,we design implementation algorithms based on this framework for both numerical and string data.Thirdly,for numerical and string data respectively,we carry out extensive experiments on real data to verify the performance of our proposed framework,and also perform scalability experiments on synthetic data.展开更多
Cloud computing is a very promising paradigm of service-oriented computing. One major benefit of cloud computing is its elasticity, i.e., the system's capacity to provide and remove resources automatically at runtime...Cloud computing is a very promising paradigm of service-oriented computing. One major benefit of cloud computing is its elasticity, i.e., the system's capacity to provide and remove resources automatically at runtime. For that, it is essential to design and implement an efficient and effective technique that takes full advantage of the system's potential flexibility. This paper presents a non-intrusive approach that monitors the performance of relational database management systems in a cloud infrastructure, and automatically makes decisions to maximize the efficiency of the provider's environment while still satisfying agreed upon "service level agreements" (SLAs). Our experiments conducted on Amazon's cloud infrastructure, confirm that our technique is capable of automatically and dynamically adjusting the system's allocated resources observing the SLA.展开更多
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金Supported by National Natural Science Foundationof China (60073045)
文摘In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.
基金supported in part by Royal Society YVolfson Research Merit Award WRM/R1/180014,ERC 652976,EPSRC EP/M025268/1,Shenzhen Institute of Computing Sciences,and Beijing Advanced Innovation Center for Big Data and Brain Computing.
文摘This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.
基金Funded by the Natural Science Foundation of China (No. 60873030)National High Technology Research and Development Program of China (No. 2007AA01Z309)Defense Pre-Research Foundation of China (No. 9140A04010209JW0504 and No. 9140A15040208JW0501)
文摘A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patterns of continuous queries,suitable data structures bring great query processing efficiency.In this paper,we proposed a data structure suitable for weak nonmonotonic update pattern in which the lifetime of each tuple is known at generation time,but the length of lifetime is not necessarily the same.The new data structure combined the ladder queue with the feature of weak non-monotonic update pattern.The experiment results show that the new data structure performs much better than the traditional calendar queue in many cases.
基金partially supported by NSFC under Grant Nos.61832001 and 62272008ZTE Industry-University-Institute Fund Project。
文摘The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.
基金the National High-Tech Research and Development Plan of China under Grant No. 2006AA01Z430.
文摘The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to implement an effective grammatical processing for the language. It separates the grammar analysis ofa UMQL query specification into two phases: syntactic analysis and semantic analysis, and then respectively uses Backus-Naur form (EBNF) and logical algebra to specify both restrictive grammar rules. As a result, the model can present error guiding information for a query specification which owns incorrect grammar. The model not only suits well the processing of UMQL queries, but aLso has a guiding significance for other projects concerning query processings of descriptive query languages.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
文摘Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.
基金Project(07JJ1010) supported by Hunan Provincial Natural Science Foundation of ChinaProjects(2006AA01Z202, 2006AA01Z199) supported by the National High-Tech Research and Development Program of China+2 种基金Project(7002102) supported by the City University of Hong Kong, Strategic Research Grant (SRG)Project(IRT-0661) supported by the Program for Changjiang Scholars and Innovative Research Team in UniversityProject(NCET-06-0686) supported by the Program for New Century Excellent Talents in University
文摘HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.
基金This work was partly supported by National Key R&D Program of China,Grant No.2017YFB0309800the grants from the Natural Science Foundation of China(No.61472339,No.61303040,No.61572421,No.61272124)+1 种基金Shanghai Alliance Program(LM201552)Shanghai University of Engineering and Technology School-Enterprise cooperation projects(15)(DZ-025).
文摘Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may result in performing cost traversing operation when answering a query;or accelerate query answering by constructing an index covering the complete reachability relationship,which may be inefficient due to comparing the complete node labels.We propose a novel labeling scheme,which covers the complete reachability relationship,to accelerate reachability queries processing.The idea is to decompose the given directed acyclic graph(DAG)G into two subgraphs,G1 and G2.For G1,we propose to use topological labels consisting of two integers to answer all reachability queries.For G2,we construct 2-hop labels as existing methods do to answer queries that cannot be answered by topological labels.The benefits of our method lie in two aspects.On one hand,our method does not need to perform the cost traversing operation when answering queries.On the other hand,our method can quickly answer most queries in constant time without comparing the whole node labels.We confirm the efficiency of our approaches by extensive experimental studies using 20 real datasets.
基金Supported by the National Pre-research Foundation Project of China (513150402)
文摘In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper proposes a processing approach for event-based location aware queries (ELAQ), which includes query dissemination algorithm, maximum distance projection proxy selection algorithm, in-network query propagation, and aggregation algorithm. ELAQs are triggered by the events and the query results are dependent on mobile sensors' location, which are the characteristics of ELAQ model. The results show that compared with the TinyDB query processing approach, ELAQ processing approach increases the accuracy of the query result and decreases the query response time.
基金Supported by the National Natural Science Foundation of China (61003049)the Natural Science Foundation of Zhejiang Province (Y110278, 2010QNA5051)Zheda Zijin Plan
文摘The continuous top-t most influential place (CTtMIP) query is defined formally and solved efficiently in this paper. A CTtMIP query continuously monitors the t places with the maximum influence from the set of places, where the influence of a place is defined as the number of its bichromatic reverse k nearest neighbors (BRkNNs). Two new metrics and their corresponding rules are introduced to shrink the search region and reduce the candidates of BRkNNs checked. Extensive experiments confirm that our proposed approach outperforms the state-of-the-art competitor significantly.
文摘The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.
文摘Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system.
基金funded by the Universidad Nacional de Luján(Argentina)Disp.CDCB:28421.
文摘Modern search systems have become a fundamental tool for accessing the massive amount of information stored in different repositories.These systems use sophisticated techniques to efficiently process a high volume of queries(thus optimizing energy consumption).One of these techniques is caching,which is implemented at different levels of a search architecture.In this work,we propose a novel caching strategy that speeds up dynamic pruning techniques(such as Maxscore)by exploiting the information of the lowest(Min)and highest(Max)document identifiers that appear as the result of a previously submitted query.We name this technique as Min/Max caching.The idea is to use Min/Max information for pruning the terms’posting lists in the query before executing the ranking algorithm in a document-at-a-time(DAAT)approach.The proposed technique uses low memory resources,returns safe results,and complements other levels of caching(if present).We also combine the approach with different access policies.Extensive experimentation on real-world data shows that the proposed method increases query processing speedup up to 1.23x and can also reduce high-percentile tail latency(up to 2.0x speedup),an essential requirement for operational scenarios.We evaluate different access and eviction cache policies based on different decision criteria.Our findings confirm that considering the cost of the cached items(cost-aware policies)allows more computation savings.
基金supported by the Korea Institute of Science and Technology Information (KISTI)
文摘Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.
文摘An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms.
基金supported by the National Natural Science Foundation of China under Grant No.60873065the National High Technology Research and Development 863 Program of China under Grant Nos.2007AA01Z152 and 2009AA011906the National Basic Research 973 Program of China under Grant No.2006CB303103.
文摘Maintaining a semantic cache of materialized XPath views inside or outside the database is a novel, feasible and efficient approach to facilitating XML query processing. However, most of the existing approaches incur the following disadvantages: 1) they cannot discover enough potential cached views sufficiently to effectively answer subsequent queries; or 2) they are inefficient for view selection due to the complexity of XPath expressions. In this paper, we propose SCEND, an effective Semantic Cache based on dEcompositioN and Divisibility, to exploit the XPath query/view answerability. The contributions of this paper include: 1) a novel technique of decomposing complex XPath queries into some much simpler ones, which can facilitate discovering more potential views to answer a new query than the existing methods and thus can adequately exploit the query/view answerability; 2) an efficient view-section method by checking the divisibility between two positive numbers assigned to queries and views; 3) a cache-replacement approach to further enhancing the query/view answerability; 4) an extensive experimental study which demonstrates that our approach achieves higher performance and outperforms the existing state-of-the-art alternative methods significantly.
基金This paper was partially supported by NSFC(Grant Nos.U1509216,U1866602,61602129)and Microsoft Research Asia.
文摘Recently,in the area of big data,some popular applications such as web search engines and recommendation systems,face the problem to diversify results during query processing.In this sense,it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set.In this paper,we firstly define the diversity of a set and the ability of an element to improve the overall diversity.Based on these definitions,we propose a diversification framework which has good performance in terms of effectiveness and efficiency.Also,this framework has theoretical guarantee on probability of success.Secondly,we design implementation algorithms based on this framework for both numerical and string data.Thirdly,for numerical and string data respectively,we carry out extensive experiments on real data to verify the performance of our proposed framework,and also perform scalability experiments on synthetic data.
文摘Cloud computing is a very promising paradigm of service-oriented computing. One major benefit of cloud computing is its elasticity, i.e., the system's capacity to provide and remove resources automatically at runtime. For that, it is essential to design and implement an efficient and effective technique that takes full advantage of the system's potential flexibility. This paper presents a non-intrusive approach that monitors the performance of relational database management systems in a cloud infrastructure, and automatically makes decisions to maximize the efficiency of the provider's environment while still satisfying agreed upon "service level agreements" (SLAs). Our experiments conducted on Amazon's cloud infrastructure, confirm that our technique is capable of automatically and dynamically adjusting the system's allocated resources observing the SLA.