Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Pee...Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.展开更多
There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. O...There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.展开更多
The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to und...The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.展开更多
Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly sto...Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.展开更多
This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around...This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around a verb. A NG consists of one or several keywords (application dependent noun or value). Simple semantic filters are defined for identifying these keywords which can be of semantic value: class, simple attribute, composed attribute, key value or not key value. Coherence rules and coherence constraints are introduced, to check the validity of the co-occurrence of two consecutive nouns in complex NG. If a query is constituted of a single NG, no further analysis is required. Otherwise, if a query covers two valid NG, it is a subject of studying the semantic coherence of the verb and both NG which are attached to it.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh...A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.展开更多
In order to improve the quality of web search,a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes,named as concep...In order to improve the quality of web search,a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes,named as concept attribute,context attribute and meaningless attribute,according to their semantic features which are document frequency features and distinguishing capability features.It also defines the semantic relevance between two attributes when they have correlations in the database.Then it proposes trie-bitmap structure and pair pointer tables to implement efficient algorithms for discovering attribute semantic feature and detecting their semantic relevances.By using semantic attributes and their semantic relevances,expansion words can be generated and embedded into a vector space model with interpolation parameters.The experiments use an IMDB movie database and real texts collections to evaluate the proposed method by comparing its performance with a classical vector space model.The results show that the proposed method can improve text search efficiently and also improve both semantic features and semantic relevances with good separation capabilities.展开更多
Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to th...Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to the goals. Four novel entropy-based features extracted from anchor data and click-through data are proposed, and a support vector machines (SVM) classifier is used to identify the user's goal based on these features. Experi- mental results show that the proposed entropy-based features are more effective than those reported in previous work. By combin- ing multiple features the goals for more than 97% of the queries studied can be correctly identified. Besides these, this paper reaches the following important conclusions: First, anchor-based features are more effective than click-through-based features; Second, the number of sites is more reliable than the number of links; Third, click-distribution- based features are more effective than session-based ones.展开更多
Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient metho...Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.展开更多
Cadastral Information System (CIS) is designed for the office automation of cadastral management. With the development of the market economics in China, cadastral management is facing many new problems. The most cruci...Cadastral Information System (CIS) is designed for the office automation of cadastral management. With the development of the market economics in China, cadastral management is facing many new problems. The most crucial one is the temporal problem in cadastral management. That is, CIS must consider both spatial data and temporal data. This paper reviews the situation of the current CIS and provides a method to manage the spatiotemporal data of CIS, and takes the CIS for Guangdong Province as an example to explain how to realize it in practice.展开更多
We focus on security and privacy problems within a cloud database framework,exploiting the DataBase as a Service(DBaaS).In this framework,an information proprietor drives out its information to a cloud database profes...We focus on security and privacy problems within a cloud database framework,exploiting the DataBase as a Service(DBaaS).In this framework,an information proprietor drives out its information to a cloud database professional company.The Data-Owner(DO)encrypts the delicate information before transmission at the cloud database professional company end to offer information security.Current encryption ideas,nonetheless,are just halfway homomorphic as all of them intend to enable an explicit kind of calculation,which is accomplished on scrambled information.These current plans can't be coordinated to solve genuine functional queries that include activities of various types.We propose and evaluate a Verifiable Reliable Secure-DataBase(VRS-DB)framework on shared tables along with many primary operations on scrambled information,which enables information interoperability,and permits an extensive possibility of Structured Query Language(SQL)queries to be prepared by the service provider on the encoded data.We show that our security and privacy idea is protected from two forms of threats and are fundamentally proficient.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
This paper is aimed to develop an algorithm for extracting association rules,called Context-Based Association Rule Mining algorithm(CARM),which can be regarded as an extension of the Context-Based Positive and Negativ...This paper is aimed to develop an algorithm for extracting association rules,called Context-Based Association Rule Mining algorithm(CARM),which can be regarded as an extension of the Context-Based Positive and Negative Association Rule Mining algorithm(CBPNARM).CBPNARM was developed to extract positive and negative association rules from Spatiotemporal(space-time)data only,while the proposed algorithm can be applied to both spatial and non-spatial data.The proposed algorithm is applied to the energy dataset to classify a country’s energy development by uncovering the enthralling interdependencies between the set of variables to get positive and negative associations.Many association rules related to sustainable energy development are extracted by the proposed algorithm that needs to be pruned by some pruning technique.The context,in this paper serves as a pruning measure to extract pertinent association rules from non-spatial data.Conditional Probability Increment Ratio(CPIR)is also added in the proposed algorithm that was not used in CBPNARM.The inclusion of the context variable and CPIR resulted in fewer rules and improved robustness and ease of use.Also,the extraction of a common negative frequent itemset in CARM is different from that of CBPNARM.The rules created by the proposed algorithm are more meaningful,significant,relevant and insightful.The accuracy of the proposed algorithm is compared with the Apriori,PNARM and CBPNARM algorithms.The results demonstrated enhanced accuracy,relevance and timeliness.展开更多
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract...This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.展开更多
HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The se...HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.展开更多
Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used fo...Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used for selecting the join sequence of many sliding windows, which is ineffectively. The graph-based approach is proposed to process the problem. The sliding window join model is introduced primarily. In this model vertex represent join operator and edge indicated the join relationship among sliding windows. Vertex weight and edge weight represent the cost of join and the reciprocity of join operators respectively. Then good query plan with minimal cost can be found in the model. Thus a complete join algorithm combining setting up model, finding optimal query plan and executing query plan is shown. Experiments show that the graph-based approach is feasible and can work better in above environment.展开更多
A Model, called 'Entity-Roles' is proposed in this paper in which the world of Interest is viewed as some mathematical structure. With respect to this structure, a First order (three-valued) Logic Language is ...A Model, called 'Entity-Roles' is proposed in this paper in which the world of Interest is viewed as some mathematical structure. With respect to this structure, a First order (three-valued) Logic Language is constructured.Any world to be modelled can be logically specified in this Language. The integrity constraints on the database and the deducing rules within the Database world are derived from the proper axioms of the world being modelled.展开更多
This paper proposes a useful web-based system for the management and sharing of electron probe micro-analysis( EPMA)data in geology. A new web-based architecture that integrates the management and sharing functions is...This paper proposes a useful web-based system for the management and sharing of electron probe micro-analysis( EPMA)data in geology. A new web-based architecture that integrates the management and sharing functions is developed and implemented.Earth scientists can utilize this system to not only manage their data,but also easily communicate and share it with other researchers.Data query methods provide the core functionality of the proposed management and sharing modules. The modules in this system have been developed using cloud GIS technologies,which help achieve real-time spatial area retrieval on a map. The system has been tested by approximately 263 users at Jilin University and Beijing SHRIMP Center. A survey was conducted among these users to estimate the usability of the primary functions of the system,and the assessment result is summarized and presented.展开更多
How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg...How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.展开更多
基金Supported by the Natural Science Foundation ofJiangsu Province(BG2004034)
文摘Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.
基金Supported by the National High Technology Research and Development Program of China(863 Program 2012AA011004)the National Natural Science Foundation of China(61232002,61202033)Natural Science Foundation of Hubei Province(2011CDB448)
文摘There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.
文摘The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.
文摘Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.
文摘This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around a verb. A NG consists of one or several keywords (application dependent noun or value). Simple semantic filters are defined for identifying these keywords which can be of semantic value: class, simple attribute, composed attribute, key value or not key value. Coherence rules and coherence constraints are introduced, to check the validity of the co-occurrence of two consecutive nouns in complex NG. If a query is constituted of a single NG, no further analysis is required. Otherwise, if a query covers two valid NG, it is a subject of studying the semantic coherence of the verb and both NG which are attached to it.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金The High Technology Research Plan of Jiangsu Prov-ince (No.BG2004034)the Foundation of Graduate Creative Program ofJiangsu Province (No.xm04-36).
文摘A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.
基金Program for New Century Excellent Talents in University(No.NCET-06-0290)the National Natural Science Foundation of China(No.60503036)the Fok Ying Tong Education Foundation Award(No.104027)
文摘In order to improve the quality of web search,a new query expansion method by choosing meaningful structure data from a domain database is proposed.It categories attributes into three different classes,named as concept attribute,context attribute and meaningless attribute,according to their semantic features which are document frequency features and distinguishing capability features.It also defines the semantic relevance between two attributes when they have correlations in the database.Then it proposes trie-bitmap structure and pair pointer tables to implement efficient algorithms for discovering attribute semantic feature and detecting their semantic relevances.By using semantic attributes and their semantic relevances,expansion words can be generated and embedded into a vector space model with interpolation parameters.The experiments use an IMDB movie database and real texts collections to evaluate the proposed method by comparing its performance with a classical vector space model.The results show that the proposed method can improve text search efficiently and also improve both semantic features and semantic relevances with good separation capabilities.
基金the Tianjin Applied Fundamental Research Plan (07JCYBJC14500)
文摘Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to the goals. Four novel entropy-based features extracted from anchor data and click-through data are proposed, and a support vector machines (SVM) classifier is used to identify the user's goal based on these features. Experi- mental results show that the proposed entropy-based features are more effective than those reported in previous work. By combin- ing multiple features the goals for more than 97% of the queries studied can be correctly identified. Besides these, this paper reaches the following important conclusions: First, anchor-based features are more effective than click-through-based features; Second, the number of sites is more reliable than the number of links; Third, click-distribution- based features are more effective than session-based ones.
文摘Since web based GIS processes large size spatial geographic information on internet, we should try to improve the efficiency of spatial data query processing and transmission. This paper presents two efficient methods for this purpose: division transmission and progressive transmission methods. In division transmission method, a map can be divided into several parts, called “tiles”, and only tiles can be transmitted at the request of a client. In progressive transmission method, a map can be split into several phase views based on the significance of vertices, and a server produces a target object and then transmits it progressively when this spatial object is requested from a client. In order to achieve these methods, the algorithms, “tile division”, “priority order estimation” and the strategies for data transmission are proposed in this paper, respectively. Compared with such traditional methods as “map total transmission” and “layer transmission”, the web based GIS data transmission, proposed in this paper, is advantageous in the increase of the data transmission efficiency by a great margin.
文摘Cadastral Information System (CIS) is designed for the office automation of cadastral management. With the development of the market economics in China, cadastral management is facing many new problems. The most crucial one is the temporal problem in cadastral management. That is, CIS must consider both spatial data and temporal data. This paper reviews the situation of the current CIS and provides a method to manage the spatiotemporal data of CIS, and takes the CIS for Guangdong Province as an example to explain how to realize it in practice.
文摘We focus on security and privacy problems within a cloud database framework,exploiting the DataBase as a Service(DBaaS).In this framework,an information proprietor drives out its information to a cloud database professional company.The Data-Owner(DO)encrypts the delicate information before transmission at the cloud database professional company end to offer information security.Current encryption ideas,nonetheless,are just halfway homomorphic as all of them intend to enable an explicit kind of calculation,which is accomplished on scrambled information.These current plans can't be coordinated to solve genuine functional queries that include activities of various types.We propose and evaluate a Verifiable Reliable Secure-DataBase(VRS-DB)framework on shared tables along with many primary operations on scrambled information,which enables information interoperability,and permits an extensive possibility of Structured Query Language(SQL)queries to be prepared by the service provider on the encoded data.We show that our security and privacy idea is protected from two forms of threats and are fundamentally proficient.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
文摘This paper is aimed to develop an algorithm for extracting association rules,called Context-Based Association Rule Mining algorithm(CARM),which can be regarded as an extension of the Context-Based Positive and Negative Association Rule Mining algorithm(CBPNARM).CBPNARM was developed to extract positive and negative association rules from Spatiotemporal(space-time)data only,while the proposed algorithm can be applied to both spatial and non-spatial data.The proposed algorithm is applied to the energy dataset to classify a country’s energy development by uncovering the enthralling interdependencies between the set of variables to get positive and negative associations.Many association rules related to sustainable energy development are extracted by the proposed algorithm that needs to be pruned by some pruning technique.The context,in this paper serves as a pruning measure to extract pertinent association rules from non-spatial data.Conditional Probability Increment Ratio(CPIR)is also added in the proposed algorithm that was not used in CBPNARM.The inclusion of the context variable and CPIR resulted in fewer rules and improved robustness and ease of use.Also,the extraction of a common negative frequent itemset in CARM is different from that of CBPNARM.The rules created by the proposed algorithm are more meaningful,significant,relevant and insightful.The accuracy of the proposed algorithm is compared with the Apriori,PNARM and CBPNARM algorithms.The results demonstrated enhanced accuracy,relevance and timeliness.
基金Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of Chinathe Director Fund(No.IS201116002)from Institute of Seismology,CEA
文摘This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.
基金Project(07JJ1010) supported by Hunan Provincial Natural Science Foundation of ChinaProjects(2006AA01Z202, 2006AA01Z199) supported by the National High-Tech Research and Development Program of China+2 种基金Project(7002102) supported by the City University of Hong Kong, Strategic Research Grant (SRG)Project(IRT-0661) supported by the Program for Changjiang Scholars and Innovative Research Team in UniversityProject(NCET-06-0686) supported by the Program for New Century Excellent Talents in University
文摘HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.
文摘Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used for selecting the join sequence of many sliding windows, which is ineffectively. The graph-based approach is proposed to process the problem. The sliding window join model is introduced primarily. In this model vertex represent join operator and edge indicated the join relationship among sliding windows. Vertex weight and edge weight represent the cost of join and the reciprocity of join operators respectively. Then good query plan with minimal cost can be found in the model. Thus a complete join algorithm combining setting up model, finding optimal query plan and executing query plan is shown. Experiments show that the graph-based approach is feasible and can work better in above environment.
文摘A Model, called 'Entity-Roles' is proposed in this paper in which the world of Interest is viewed as some mathematical structure. With respect to this structure, a First order (three-valued) Logic Language is constructured.Any world to be modelled can be logically specified in this Language. The integrity constraints on the database and the deducing rules within the Database world are derived from the proper axioms of the world being modelled.
基金National Major Scientific Instruments and Equipment Development Special Funds,China(No.2016YFF0103303)National Science and Technology Support Program,China(No.2014BAK02B03)
文摘This paper proposes a useful web-based system for the management and sharing of electron probe micro-analysis( EPMA)data in geology. A new web-based architecture that integrates the management and sharing functions is developed and implemented.Earth scientists can utilize this system to not only manage their data,but also easily communicate and share it with other researchers.Data query methods provide the core functionality of the proposed management and sharing modules. The modules in this system have been developed using cloud GIS technologies,which help achieve real-time spatial area retrieval on a map. The system has been tested by approximately 263 users at Jilin University and Beijing SHRIMP Center. A survey was conducted among these users to estimate the usability of the primary functions of the system,and the assessment result is summarized and presented.
基金Supported by the National Natural Science Foun-dation of China (60573089) the National 985 Project Fundation(985-2-DB-Y01)
文摘How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.