Visual queries assist non-expert users to extract information from spatial databases in an intuitive and natural approach,making Geographic information systems comprehensive and efficient for a wide range of applicati...Visual queries assist non-expert users to extract information from spatial databases in an intuitive and natural approach,making Geographic information systems comprehensive and efficient for a wide range of applications.A common visual means of querying takes the form of drawings or graphs,under which many spatial ambiguity and translation errors rise.In this study,common query attributes extracted from user graphs such as spatial topology,size,cardinality,and proximity are regarded under a conceptual moderation scheme.Thus,the system/user may concentrate on various conceptual combinations of information.Furthermore,time is incorporated to support spatiotemporal queries for changing scenes and moving objects.Arbitrary,relative,and absolute scaling is possible according to the data-set and application at hand.The theoretic approach is implemented under a prototype user interface system,called ShapeController.Under this prototype,a user may extract scene-based relations in an automatically inferred fashion,or include single object-oriented relations when all possible relations seem redundant.Finally,a natural language description of the query is extracted upon which the user may select the desired query relations.Experimentation on a spatial database demonstrates the concepts of predefined draw objects,scaling relaxation,conceptual abstraction,and scene,object-and textual-oriented transitions that promote query expressiveness and restrain ambiguities.展开更多
Spatial relationships are core components in the design and definition of spatial queries.A spatial relationship determines how two or more spatial objects are related or connected in space.Hence,given a spatial datas...Spatial relationships are core components in the design and definition of spatial queries.A spatial relationship determines how two or more spatial objects are related or connected in space.Hence,given a spatial dataset,users can retrieve spatial objects in a given relationship with a search object.Different interpretations of spatial relationships are conceivable,leading to different types of relationships.The main types are(i)topological relationships(e.g.overlap,meet,inside),(ii)metric relationships(e.g.nearest neighbors),and(iii)direction relationships(e.g.cardinal directions).Although spatial information retrieval has been extensively studied in the literature,it is unclear which types of spatial queries can be defined using spatial relationships.In this article,we introduce a taxonomy for naming,describing,and classifying types of spatial queries frequently found in the literature.This taxonomy is based on the types of spatial relationships that are employed by spatial queries.By using this taxonomy,we discuss the intuitive descriptions,formal definitions,and possible implementation techniques of several types of spatial queries.The discussions lead to the identification of correspondences between types of spatial queries.Further,we identify challenges and open research topics in the spatial information retrieval area.展开更多
Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide ...Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide range of application scenarios.In TMWSNs,the storage nodes are the key nodes of the network and are more easily captured and utilized by attackers.Once the storage nodes are captured by the attackers,the data stored on them will be exposed.Moreover,the query process and results will not be trusted any more.This paper mainly studies the secure KNN query technology in TMWSNs,and we propose a secure KNN query algorithm named the Basic Algorithm For Secure KNN Query(BAFSKQ)first,which can protect privacy and verify the integrity of query results.However,this algorithm has a large communication overhead in most cases.In order to solve this problem,we propose an improved algorithm named the Secure KNN Query Algorithm Based on MR-Tree(SEKQAM).The MR-Trees are used to find the K-nearest locations and help to generate a verification set to process the verification of query results.It can be proved that our algorithms can effectively guarantee the privacy of the data stored on the storage nodes and the integrity of the query results.Our experimental results also show that after introducing the MR-Trees in KNN queries on TMWSNs,the communication overhead has an effective reduction compared to BAFSKQ.展开更多
Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query pla...Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.展开更多
Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative de...Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.展开更多
Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may ...Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may result in performing cost traversing operation when answering a query;or accelerate query answering by constructing an index covering the complete reachability relationship,which may be inefficient due to comparing the complete node labels.We propose a novel labeling scheme,which covers the complete reachability relationship,to accelerate reachability queries processing.The idea is to decompose the given directed acyclic graph(DAG)G into two subgraphs,G1 and G2.For G1,we propose to use topological labels consisting of two integers to answer all reachability queries.For G2,we construct 2-hop labels as existing methods do to answer queries that cannot be answered by topological labels.The benefits of our method lie in two aspects.On one hand,our method does not need to perform the cost traversing operation when answering queries.On the other hand,our method can quickly answer most queries in constant time without comparing the whole node labels.We confirm the efficiency of our approaches by extensive experimental studies using 20 real datasets.展开更多
There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. O...There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.展开更多
In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly comple...In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD—Document Type Definition, Schema, etc.) increases the risk of obtaining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as “*”). The reason is that in both cases, users write queries according to the document model they have in mind;this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evaluations. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results;moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the proposed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.展开更多
Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset fo...Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset for different applications remains a challenging issue. In order to overcome this challenge there is a clear need to develop the capabilities to take into account complicated patterns of preference describing user and/or application particularities, and use these patterns to rank query results in terms of suitability. This paper offers a demonstration on how intelligent systems can assist geospatial queries to improve retrieval accuracy by customizing results based on preference patterns. We outline the particularities of the geospatial domain and present our method and its application.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data spa...The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.展开更多
In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper pro...In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper proposes a processing approach for event-based location aware queries (ELAQ), which includes query dissemination algorithm, maximum distance projection proxy selection algorithm, in-network query propagation, and aggregation algorithm. ELAQs are triggered by the events and the query results are dependent on mobile sensors' location, which are the characteristics of ELAQ model. The results show that compared with the TinyDB query processing approach, ELAQ processing approach increases the accuracy of the query result and decreases the query response time.展开更多
It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequ...It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.展开更多
This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instan...This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instances. It provides the checking based on the weighted mutual instances considering fault tolerance, gives a way to partition the large-scale mutual instances, and proposes a process greatly reducing the manual annotation work to get more mutual instances. Intension annotation that improves the checking method is also discussed. The method is practical and effective to check subsumption relations between concept queries in different ontologies based on mutual instances.展开更多
Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-p...Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.展开更多
Skyline query processing has recently received a lot of attention in database and data mining communities. However, most existing algorithms consider how to efficiently process skyline queries from base tables. Obviou...Skyline query processing has recently received a lot of attention in database and data mining communities. However, most existing algorithms consider how to efficiently process skyline queries from base tables. Obviously, when the data size and the number of skyline queries increase, the time cost of skyline queries will increase exponentially, which will seriously influence the query efficiency. Motivated by the above, in this paper, we consider improving the query efficiency via skyline views and propose a cost-based algorithm(abbr. CA) to efficiently select the optimal set of skyline views for storage. The CA algorithm mainly includes two phases:(i) reduce the skyline views selection to the minimum steiner tree problem and obtain the approximate optimal set AOS of skyline views, and(ii) adjust AOS and produce the final optimal set FOS of skyline views based on the simulated annealing. Moreover, in order to improve the extendibility of the CA algorithm, we implement it based on the map/reduce distributed computation model in cloud computing environments. The detailed theoretical analyses and extensive experiments demonstrate that the CA algorithm is both efficient and effective.展开更多
Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bott...Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bottleneck for improving query efficiency until the rectangular safe region algorithm partly solved this problem. However, this algorithm can be further improved, as we demonstrate with the dynamic interval based continuous queries algorithm on moving objects. Two components, circular safe region and dynamic intervals were adopted by our algorithm. Theoretical proof and experimental results show that our algorithm substantially outperforms the traditional periodic monitoring and the rectangular safe region algorithm in terms of monitoring accuracy, reducing communication costs and server CPU time. Moreover, in our algorithm, the mobile terminals do not need to have any computational ability.展开更多
Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.Th...Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.The online social networking services spread quickly and store many user data,but these data are worth less and may be unreliable answer to users’ questions.Users can obtain the simple answer but can not expect more additional information in knowledge question-answering(QA)system.In this paper,we design the system with the advantages of knowledge QA system,web searching and characteristics of social networking service for providing social network channel based on the query and answer without users’ contact network.The user can obtain real-time answers by the user network interested in users’ querires through the network channel of this system,get the additional information effectively and share it with others in the social network channel in this system.展开更多
The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous me...The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.展开更多
基金supported by the basic research grant of the Special Account for Research of the Technical University of Crete for the project“Spatiotemporal queries by sketch in moving object geographic databases”(No.80080).
文摘Visual queries assist non-expert users to extract information from spatial databases in an intuitive and natural approach,making Geographic information systems comprehensive and efficient for a wide range of applications.A common visual means of querying takes the form of drawings or graphs,under which many spatial ambiguity and translation errors rise.In this study,common query attributes extracted from user graphs such as spatial topology,size,cardinality,and proximity are regarded under a conceptual moderation scheme.Thus,the system/user may concentrate on various conceptual combinations of information.Furthermore,time is incorporated to support spatiotemporal queries for changing scenes and moving objects.Arbitrary,relative,and absolute scaling is possible according to the data-set and application at hand.The theoretic approach is implemented under a prototype user interface system,called ShapeController.Under this prototype,a user may extract scene-based relations in an automatically inferred fashion,or include single object-oriented relations when all possible relations seem redundant.Finally,a natural language description of the query is extracted upon which the user may select the desired query relations.Experimentation on a spatial database demonstrates the concepts of predefined draw objects,scaling relaxation,conceptual abstraction,and scene,object-and textual-oriented transitions that promote query expressiveness and restrain ambiguities.
基金financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil(CAPES)-Finance Code 001.Anderson C.Carniel was supported by Google as a recipient of the 2022 Google Research Scholar program.
文摘Spatial relationships are core components in the design and definition of spatial queries.A spatial relationship determines how two or more spatial objects are related or connected in space.Hence,given a spatial dataset,users can retrieve spatial objects in a given relationship with a search object.Different interpretations of spatial relationships are conceivable,leading to different types of relationships.The main types are(i)topological relationships(e.g.overlap,meet,inside),(ii)metric relationships(e.g.nearest neighbors),and(iii)direction relationships(e.g.cardinal directions).Although spatial information retrieval has been extensively studied in the literature,it is unclear which types of spatial queries can be defined using spatial relationships.In this article,we introduce a taxonomy for naming,describing,and classifying types of spatial queries frequently found in the literature.This taxonomy is based on the types of spatial relationships that are employed by spatial queries.By using this taxonomy,we discuss the intuitive descriptions,formal definitions,and possible implementation techniques of several types of spatial queries.The discussions lead to the identification of correspondences between types of spatial queries.Further,we identify challenges and open research topics in the spatial information retrieval area.
基金This work is supported by the Aeronautical Science Foundation of China under Grant 20165515001the National Natural Science Foundation of China under Grant No.61402225State Key Laboratory for smart grid protection and operation control Foundation,and the Science and Technology Funds from National State Grid Ltd.(The Research on Key Technologies of Distributed Parallel Database Storage and Processing based on Big Data).
文摘Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide range of application scenarios.In TMWSNs,the storage nodes are the key nodes of the network and are more easily captured and utilized by attackers.Once the storage nodes are captured by the attackers,the data stored on them will be exposed.Moreover,the query process and results will not be trusted any more.This paper mainly studies the secure KNN query technology in TMWSNs,and we propose a secure KNN query algorithm named the Basic Algorithm For Secure KNN Query(BAFSKQ)first,which can protect privacy and verify the integrity of query results.However,this algorithm has a large communication overhead in most cases.In order to solve this problem,we propose an improved algorithm named the Secure KNN Query Algorithm Based on MR-Tree(SEKQAM).The MR-Trees are used to find the K-nearest locations and help to generate a verification set to process the verification of query results.It can be proved that our algorithms can effectively guarantee the privacy of the data stored on the storage nodes and the integrity of the query results.Our experimental results also show that after introducing the MR-Trees in KNN queries on TMWSNs,the communication overhead has an effective reduction compared to BAFSKQ.
基金The National High Technology Research and Development Program of China(863 Program) (No.2006AA01Z430)
文摘Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.
基金This work was supported by the National Natural Science Foundation of China[grant numbers 41222009,41271405].
文摘Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.
基金This work was partly supported by National Key R&D Program of China,Grant No.2017YFB0309800the grants from the Natural Science Foundation of China(No.61472339,No.61303040,No.61572421,No.61272124)+1 种基金Shanghai Alliance Program(LM201552)Shanghai University of Engineering and Technology School-Enterprise cooperation projects(15)(DZ-025).
文摘Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may result in performing cost traversing operation when answering a query;or accelerate query answering by constructing an index covering the complete reachability relationship,which may be inefficient due to comparing the complete node labels.We propose a novel labeling scheme,which covers the complete reachability relationship,to accelerate reachability queries processing.The idea is to decompose the given directed acyclic graph(DAG)G into two subgraphs,G1 and G2.For G1,we propose to use topological labels consisting of two integers to answer all reachability queries.For G2,we construct 2-hop labels as existing methods do to answer queries that cannot be answered by topological labels.The benefits of our method lie in two aspects.On one hand,our method does not need to perform the cost traversing operation when answering queries.On the other hand,our method can quickly answer most queries in constant time without comparing the whole node labels.We confirm the efficiency of our approaches by extensive experimental studies using 20 real datasets.
基金Supported by the National High Technology Research and Development Program of China(863 Program 2012AA011004)the National Natural Science Foundation of China(61232002,61202033)Natural Science Foundation of Hubei Province(2011CDB448)
文摘There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.
文摘In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD—Document Type Definition, Schema, etc.) increases the risk of obtaining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as “*”). The reason is that in both cases, users write queries according to the document model they have in mind;this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evaluations. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results;moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the proposed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.
文摘Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset for different applications remains a challenging issue. In order to overcome this challenge there is a clear need to develop the capabilities to take into account complicated patterns of preference describing user and/or application particularities, and use these patterns to rank query results in terms of suitability. This paper offers a demonstration on how intelligent systems can assist geospatial queries to improve retrieval accuracy by customizing results based on preference patterns. We outline the particularities of the geospatial domain and present our method and its application.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
基金Project (No.ABA048) supported by the Natural Science Foundationof Hubei Province,China
文摘The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.
基金Supported by the National Pre-research Foundation Project of China (513150402)
文摘In hybrid wireless sensor networks, sensor mobility causes the query areas to change dynamically. Aiming at the problem of inefficiency in processing the data aggregation queries in dynamic query areas, this paper proposes a processing approach for event-based location aware queries (ELAQ), which includes query dissemination algorithm, maximum distance projection proxy selection algorithm, in-network query propagation, and aggregation algorithm. ELAQs are triggered by the events and the query results are dependent on mobile sensors' location, which are the characteristics of ELAQ model. The results show that compared with the TinyDB query processing approach, ELAQ processing approach increases the accuracy of the query result and decreases the query response time.
文摘It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.
基金Supported by the National Natural Sciences Foundation of China(60373066 ,60425206 ,90412003) , National Grand Fundamental Research 973 Pro-gramof China(2002CB312000) , National Research Foundation for the Doctoral Pro-gramof Higher Education of China (20020286004)
文摘This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instances. It provides the checking based on the weighted mutual instances considering fault tolerance, gives a way to partition the large-scale mutual instances, and proposes a process greatly reducing the manual annotation work to get more mutual instances. Intension annotation that improves the checking method is also discussed. The method is practical and effective to check subsumption relations between concept queries in different ontologies based on mutual instances.
基金supported by the National High-Tech Research and Development Plan of China (Grant No.2005AA4Z3030)
文摘Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.
基金supported by the National Natural Science Foundation of China(No.61772366)the Natural Science Foundation of Shanghai(No.17ZR1445900)the program of Further Accelerating the Development of Chinese Medicine Three Year Action of Shanghai(2014-2016)ZY3-CCCX-3-6002
文摘Skyline query processing has recently received a lot of attention in database and data mining communities. However, most existing algorithms consider how to efficiently process skyline queries from base tables. Obviously, when the data size and the number of skyline queries increase, the time cost of skyline queries will increase exponentially, which will seriously influence the query efficiency. Motivated by the above, in this paper, we consider improving the query efficiency via skyline views and propose a cost-based algorithm(abbr. CA) to efficiently select the optimal set of skyline views for storage. The CA algorithm mainly includes two phases:(i) reduce the skyline views selection to the minimum steiner tree problem and obtain the approximate optimal set AOS of skyline views, and(ii) adjust AOS and produce the final optimal set FOS of skyline views based on the simulated annealing. Moreover, in order to improve the extendibility of the CA algorithm, we implement it based on the map/reduce distributed computation model in cloud computing environments. The detailed theoretical analyses and extensive experiments demonstrate that the CA algorithm is both efficient and effective.
文摘Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bottleneck for improving query efficiency until the rectangular safe region algorithm partly solved this problem. However, this algorithm can be further improved, as we demonstrate with the dynamic interval based continuous queries algorithm on moving objects. Two components, circular safe region and dynamic intervals were adopted by our algorithm. Theoretical proof and experimental results show that our algorithm substantially outperforms the traditional periodic monitoring and the rectangular safe region algorithm in terms of monitoring accuracy, reducing communication costs and server CPU time. Moreover, in our algorithm, the mobile terminals do not need to have any computational ability.
基金Industrial Strategic Technology Development Program,Development of a Cognitive Planning and Learning Model for Mobile Platforms(No.10035348) funded by MKE(the Ministry of Knowledge Economy),Korea
文摘Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.The online social networking services spread quickly and store many user data,but these data are worth less and may be unreliable answer to users’ questions.Users can obtain the simple answer but can not expect more additional information in knowledge question-answering(QA)system.In this paper,we design the system with the advantages of knowledge QA system,web searching and characteristics of social networking service for providing social network channel based on the query and answer without users’ contact network.The user can obtain real-time answers by the user network interested in users’ querires through the network channel of this system,get the additional information effectively and share it with others in the social network channel in this system.
文摘The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.