To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patte...A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patterns of continuous queries,suitable data structures bring great query processing efficiency.In this paper,we proposed a data structure suitable for weak nonmonotonic update pattern in which the lifetime of each tuple is known at generation time,but the length of lifetime is not necessarily the same.The new data structure combined the ladder queue with the feature of weak non-monotonic update pattern.The experiment results show that the new data structure performs much better than the traditional calendar queue in many cases.展开更多
Modern search systems have become a fundamental tool for accessing the massive amount of information stored in different repositories.These systems use sophisticated techniques to efficiently process a high volume of ...Modern search systems have become a fundamental tool for accessing the massive amount of information stored in different repositories.These systems use sophisticated techniques to efficiently process a high volume of queries(thus optimizing energy consumption).One of these techniques is caching,which is implemented at different levels of a search architecture.In this work,we propose a novel caching strategy that speeds up dynamic pruning techniques(such as Maxscore)by exploiting the information of the lowest(Min)and highest(Max)document identifiers that appear as the result of a previously submitted query.We name this technique as Min/Max caching.The idea is to use Min/Max information for pruning the terms’posting lists in the query before executing the ranking algorithm in a document-at-a-time(DAAT)approach.The proposed technique uses low memory resources,returns safe results,and complements other levels of caching(if present).We also combine the approach with different access policies.Extensive experimentation on real-world data shows that the proposed method increases query processing speedup up to 1.23x and can also reduce high-percentile tail latency(up to 2.0x speedup),an essential requirement for operational scenarios.We evaluate different access and eviction cache policies based on different decision criteria.Our findings confirm that considering the cost of the cached items(cost-aware policies)allows more computation savings.展开更多
Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations...Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.展开更多
An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. P...An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms.展开更多
In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, thr...In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.展开更多
Recently,in the area of big data,some popular applications such as web search engines and recommendation systems,face the problem to diversify results during query processing.In this sense,it is both significant and e...Recently,in the area of big data,some popular applications such as web search engines and recommendation systems,face the problem to diversify results during query processing.In this sense,it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set.In this paper,we firstly define the diversity of a set and the ability of an element to improve the overall diversity.Based on these definitions,we propose a diversification framework which has good performance in terms of effectiveness and efficiency.Also,this framework has theoretical guarantee on probability of success.Secondly,we design implementation algorithms based on this framework for both numerical and string data.Thirdly,for numerical and string data respectively,we carry out extensive experiments on real data to verify the performance of our proposed framework,and also perform scalability experiments on synthetic data.展开更多
Cloud computing is a very promising paradigm of service-oriented computing. One major benefit of cloud computing is its elasticity, i.e., the system's capacity to provide and remove resources automatically at runtime...Cloud computing is a very promising paradigm of service-oriented computing. One major benefit of cloud computing is its elasticity, i.e., the system's capacity to provide and remove resources automatically at runtime. For that, it is essential to design and implement an efficient and effective technique that takes full advantage of the system's potential flexibility. This paper presents a non-intrusive approach that monitors the performance of relational database management systems in a cloud infrastructure, and automatically makes decisions to maximize the efficiency of the provider's environment while still satisfying agreed upon "service level agreements" (SLAs). Our experiments conducted on Amazon's cloud infrastructure, confirm that our technique is capable of automatically and dynamically adjusting the system's allocated resources observing the SLA.展开更多
In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P dat...In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P data systems. The autonomy of peers also is not considered enough. In addition, the system cost is very high because the information publishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx) are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems. It can be used to search as few peers as possible but get as many returns satisfying users' queries as possible on the guarantee of high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and support very frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, a simulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results show that abstract indices work well in various P2P data systems.展开更多
Area query processing is significant for various applications of wireless sensor networks since it can request information of particular areas in the monitored environment. Existing query processing techniques cannot ...Area query processing is significant for various applications of wireless sensor networks since it can request information of particular areas in the monitored environment. Existing query processing techniques cannot solve area queries. Intuitively, centralized processing on Base Station can accomplish area queries via collecting information from all sensor nodes. However, this method is not suitable for wireless sensor networks with limited energy since a large amount of energy is wasted for reporting useless data. This motivates us to propose an energy-efficient in-network area query processing scheme. In our scheme, the monitored area is partitioned into grids, and a unique gray code number is used to represent a Grid ID (GID), which is also an effective way to describe an area. Furthermore, a reporting tree is constructed to process area merging and data aggregations. Based on the properties of GIDs, subareas can be merged easily and useless data can be discarded as early as possible to reduce energy consumption. For energy-efficiently answering continuous queries, we also design an incremental update method to continuously generate query results. In essence, all of these strategies are pivots to conserve energy consumption. With a thorough simulation study, it is shown that our scheme is effective and energy-efficient展开更多
In order to reduce the disk access time, a database can be stored on several simultaneously accessi- ble disks. In this paper, we are concerned with the dynamic d-attribute database allocation problem for range querie...In order to reduce the disk access time, a database can be stored on several simultaneously accessi- ble disks. In this paper, we are concerned with the dynamic d-attribute database allocation problem for range queries. An allocation method, called coordinate modulo allocation method, is proposed to al- locate data in a d-attribute database among disks so that the maximum disk accessing concurrency can be achieved for range queries. Our analysis and experiments show that the method achieves the optimum or near-optimum parallelism for range queries. The paper offers the conditions under which the method is optimal. The worst case bounds of the performance of the method are also given. In addi- tion, the parallel algorithm of processing range queries is described at the end of the paper. The meth- od has been used in the statistic and scientific database management system which is being designed by us.展开更多
Multiple time series (MTS), which describes an object in multi-dimensions, is based on single time series and has been proved to be useful. In this paper, a new analytical method called α/β-Dominant-Skyline on MTS...Multiple time series (MTS), which describes an object in multi-dimensions, is based on single time series and has been proved to be useful. In this paper, a new analytical method called α/β-Dominant-Skyline on MTS and a formal definition of the α/β-dominant skyline MTS are given. Also, three algorithms, called NL, BC and MFB, are proposed to address the α/β-dominant skyline queries over MTS. Finally experimental results on both synthetic and real data verify the correctness and effectiveness of the proposed method and algorithms.展开更多
Purpose-Resilient distributed processing technique(RDPT),in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.Design/methodology/approach-The proposed wo...Purpose-Resilient distributed processing technique(RDPT),in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.Design/methodology/approach-The proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.Findings-Query processing in Hadoop influences the distributed processing with the MapReduce model.MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers.Its results are valid for some extent size of the data.Originality/value-Pig supports the required parallel processing framework with the following constructs during the processing of queries:FOREACH;FLATTEN;COGROUP.展开更多
The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimizati...The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper pro...In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper proposes a processing approach for event-based location aware queries (ELAQ), which includes query dissemination algorithm, maximum distance projection proxy selection algorithm, in-network query propagation, and aggregation algorithm. ELAQs are triggered by the events and the query results are dependent on mobile sensors' location, which are the characteristics of ELAQ model. The results show that compared with the TinyDB query processing approach, ELAQ processing approach increases the accuracy of the query result and decreases the query response time.展开更多
This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exist...This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.展开更多
The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to impl...The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to implement an effective grammatical processing for the language. It separates the grammar analysis ofa UMQL query specification into two phases: syntactic analysis and semantic analysis, and then respectively uses Backus-Naur form (EBNF) and logical algebra to specify both restrictive grammar rules. As a result, the model can present error guiding information for a query specification which owns incorrect grammar. The model not only suits well the processing of UMQL queries, but aLso has a guiding significance for other projects concerning query processings of descriptive query languages.展开更多
Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient metho...Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.展开更多
HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The se...HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.展开更多
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金Funded by the Natural Science Foundation of China (No. 60873030)National High Technology Research and Development Program of China (No. 2007AA01Z309)Defense Pre-Research Foundation of China (No. 9140A04010209JW0504 and No. 9140A15040208JW0501)
文摘A defining characteristic of continuous queries over on-line data streams,possibly bounded by sliding windows,is the potentially infinite and time-evolving nature of their inputs and outputs.For different update patterns of continuous queries,suitable data structures bring great query processing efficiency.In this paper,we proposed a data structure suitable for weak nonmonotonic update pattern in which the lifetime of each tuple is known at generation time,but the length of lifetime is not necessarily the same.The new data structure combined the ladder queue with the feature of weak non-monotonic update pattern.The experiment results show that the new data structure performs much better than the traditional calendar queue in many cases.
基金funded by the Universidad Nacional de Luján(Argentina)Disp.CDCB:28421.
文摘Modern search systems have become a fundamental tool for accessing the massive amount of information stored in different repositories.These systems use sophisticated techniques to efficiently process a high volume of queries(thus optimizing energy consumption).One of these techniques is caching,which is implemented at different levels of a search architecture.In this work,we propose a novel caching strategy that speeds up dynamic pruning techniques(such as Maxscore)by exploiting the information of the lowest(Min)and highest(Max)document identifiers that appear as the result of a previously submitted query.We name this technique as Min/Max caching.The idea is to use Min/Max information for pruning the terms’posting lists in the query before executing the ranking algorithm in a document-at-a-time(DAAT)approach.The proposed technique uses low memory resources,returns safe results,and complements other levels of caching(if present).We also combine the approach with different access policies.Extensive experimentation on real-world data shows that the proposed method increases query processing speedup up to 1.23x and can also reduce high-percentile tail latency(up to 2.0x speedup),an essential requirement for operational scenarios.We evaluate different access and eviction cache policies based on different decision criteria.Our findings confirm that considering the cost of the cached items(cost-aware policies)allows more computation savings.
基金supported by the Korea Institute of Science and Technology Information (KISTI)
文摘Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.
文摘An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms.
基金Supported by National Natural Science Foundationof China (60073045)
文摘In this paper, constrained K closest pairs query is introduced, wbich retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them, two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.
基金This paper was partially supported by NSFC(Grant Nos.U1509216,U1866602,61602129)and Microsoft Research Asia.
文摘Recently,in the area of big data,some popular applications such as web search engines and recommendation systems,face the problem to diversify results during query processing.In this sense,it is both significant and essential to propose methods to deal with big data in order to increase the diversity of the result set.In this paper,we firstly define the diversity of a set and the ability of an element to improve the overall diversity.Based on these definitions,we propose a diversification framework which has good performance in terms of effectiveness and efficiency.Also,this framework has theoretical guarantee on probability of success.Secondly,we design implementation algorithms based on this framework for both numerical and string data.Thirdly,for numerical and string data respectively,we carry out extensive experiments on real data to verify the performance of our proposed framework,and also perform scalability experiments on synthetic data.
文摘Cloud computing is a very promising paradigm of service-oriented computing. One major benefit of cloud computing is its elasticity, i.e., the system's capacity to provide and remove resources automatically at runtime. For that, it is essential to design and implement an efficient and effective technique that takes full advantage of the system's potential flexibility. This paper presents a non-intrusive approach that monitors the performance of relational database management systems in a cloud infrastructure, and automatically makes decisions to maximize the efficiency of the provider's environment while still satisfying agreed upon "service level agreements" (SLAs). Our experiments conducted on Amazon's cloud infrastructure, confirm that our technique is capable of automatically and dynamically adjusting the system's allocated resources observing the SLA.
基金Supported by the National Natural Science Foundation of China under Grant No. 60473077 and the Program for New Century Excellent Talents in University.
文摘In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community of data management. However, almost all work, so far, is focused on exact query processing in current P2P data systems. The autonomy of peers also is not considered enough. In addition, the system cost is very high because the information publishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx) are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems. It can be used to search as few peers as possible but get as many returns satisfying users' queries as possible on the guarantee of high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and support very frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, a simulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results show that abstract indices work well in various P2P data systems.
文摘Area query processing is significant for various applications of wireless sensor networks since it can request information of particular areas in the monitored environment. Existing query processing techniques cannot solve area queries. Intuitively, centralized processing on Base Station can accomplish area queries via collecting information from all sensor nodes. However, this method is not suitable for wireless sensor networks with limited energy since a large amount of energy is wasted for reporting useless data. This motivates us to propose an energy-efficient in-network area query processing scheme. In our scheme, the monitored area is partitioned into grids, and a unique gray code number is used to represent a Grid ID (GID), which is also an effective way to describe an area. Furthermore, a reporting tree is constructed to process area merging and data aggregations. Based on the properties of GIDs, subareas can be merged easily and useless data can be discarded as early as possible to reduce energy consumption. For energy-efficiently answering continuous queries, we also design an incremental update method to continuously generate query results. In essence, all of these strategies are pivots to conserve energy consumption. With a thorough simulation study, it is shown that our scheme is effective and energy-efficient
文摘In order to reduce the disk access time, a database can be stored on several simultaneously accessi- ble disks. In this paper, we are concerned with the dynamic d-attribute database allocation problem for range queries. An allocation method, called coordinate modulo allocation method, is proposed to al- locate data in a d-attribute database among disks so that the maximum disk accessing concurrency can be achieved for range queries. Our analysis and experiments show that the method achieves the optimum or near-optimum parallelism for range queries. The paper offers the conditions under which the method is optimal. The worst case bounds of the performance of the method are also given. In addi- tion, the parallel algorithm of processing range queries is described at the end of the paper. The meth- od has been used in the statistic and scientific database management system which is being designed by us.
基金supported by the National Natural Science Foundation of China under Grant No. 61170064the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013204the Tsinghua National Laboratory for Information Science and Technology (TNLIST) Cross-Discipline Foundation
文摘Multiple time series (MTS), which describes an object in multi-dimensions, is based on single time series and has been proved to be useful. In this paper, a new analytical method called α/β-Dominant-Skyline on MTS and a formal definition of the α/β-dominant skyline MTS are given. Also, three algorithms, called NL, BC and MFB, are proposed to address the α/β-dominant skyline queries over MTS. Finally experimental results on both synthetic and real data verify the correctness and effectiveness of the proposed method and algorithms.
文摘Purpose-Resilient distributed processing technique(RDPT),in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.Design/methodology/approach-The proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.Findings-Query processing in Hadoop influences the distributed processing with the MapReduce model.MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers.Its results are valid for some extent size of the data.Originality/value-Pig supports the required parallel processing framework with the following constructs during the processing of queries:FOREACH;FLATTEN;COGROUP.
基金partially supported by NSFC under Grant Nos.61832001 and 62272008ZTE Industry-University-Institute Fund Project。
文摘The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
基金Supported by the National Pre-research Foundation Project of China (513150402)
文摘In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper proposes a processing approach for event-based location aware queries (ELAQ), which includes query dissemination algorithm, maximum distance projection proxy selection algorithm, in-network query propagation, and aggregation algorithm. ELAQs are triggered by the events and the query results are dependent on mobile sensors' location, which are the characteristics of ELAQ model. The results show that compared with the TinyDB query processing approach, ELAQ processing approach increases the accuracy of the query result and decreases the query response time.
基金supported in part by Royal Society YVolfson Research Merit Award WRM/R1/180014,ERC 652976,EPSRC EP/M025268/1,Shenzhen Institute of Computing Sciences,and Beijing Advanced Innovation Center for Big Data and Brain Computing.
文摘This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.
基金the National High-Tech Research and Development Plan of China under Grant No. 2006AA01Z430.
文摘The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to implement an effective grammatical processing for the language. It separates the grammar analysis ofa UMQL query specification into two phases: syntactic analysis and semantic analysis, and then respectively uses Backus-Naur form (EBNF) and logical algebra to specify both restrictive grammar rules. As a result, the model can present error guiding information for a query specification which owns incorrect grammar. The model not only suits well the processing of UMQL queries, but aLso has a guiding significance for other projects concerning query processings of descriptive query languages.
文摘Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.
基金Project(07JJ1010) supported by Hunan Provincial Natural Science Foundation of ChinaProjects(2006AA01Z202, 2006AA01Z199) supported by the National High-Tech Research and Development Program of China+2 种基金Project(7002102) supported by the City University of Hong Kong, Strategic Research Grant (SRG)Project(IRT-0661) supported by the Program for Changjiang Scholars and Innovative Research Team in UniversityProject(NCET-06-0686) supported by the Program for New Century Excellent Talents in University
文摘HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.