The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections an...The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.展开更多
A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various form...A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
New and emerging use cases, such as the interconnection of geographically distributed data centers(DCs), are drawing attention to the requirement for dynamic end-to-end service provisioning, spanning multiple and hete...New and emerging use cases, such as the interconnection of geographically distributed data centers(DCs), are drawing attention to the requirement for dynamic end-to-end service provisioning, spanning multiple and heterogeneous optical network domains. This heterogeneity is, not only due to the diverse data transmission and switching technologies, but also due to the different options of control plane techniques. In light of this, the problem of heterogeneous control plane interworking needs to be solved, and in particular, the solution must address the specific issues of multi-domain networks, such as limited domain topology visibility, given the scalability and confidentiality constraints. In this article, some of the recent activities regarding the Software-Defined Networking(SDN) orchestration are reviewed to address such a multi-domain control plane interworking problem. Specifically, three different models, including the single SDN controller model, multiple SDN controllers in mesh, and multiple SDN controllers in a hierarchical setting, are presented for the DC interconnection network with multiple SDN/Open Flow domains or multiple Open Flow/Generalized Multi-Protocol Label Switching( GMPLS) heterogeneous domains. I n addition, two concrete implementations of the orchestration architectures are detailed, showing the overall feasibility and procedures of SDN orchestration for the end-to-endservice provisioning in multi-domain data center optical networks.展开更多
Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In dat...Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.展开更多
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract...This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.展开更多
Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Pee...Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.展开更多
How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree alg...How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.展开更多
There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. O...There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.展开更多
The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to und...The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.展开更多
The multidimensional analysis engine data management platform is constructed using big data distributed storage and parallel computing,data warehouse modeling technology,realizing the optimal management and instant qu...The multidimensional analysis engine data management platform is constructed using big data distributed storage and parallel computing,data warehouse modeling technology,realizing the optimal management and instant query of distributed oil and gas production dynamic big data.The centralized management and quick response of the production data of more than 36×10^4 oil,gas and water wells is realized.Multidimensional analysis subject model of oil,gas and water well production is built to pretreat the relevant data.At the level of China National Petroleum Corporation(CNPC),the rapid analysis and applications such as oil and gas production tracking,early production warning of key oilfields,analysis of low production wells and long shutdown wells,classification of reservoir development laws have been realized,and the processing time has been shortened from 1 d to 5 s.The basic unit of oil and gas production analysis is refined from oilfield to single well,making the production management more detailed.The process can be traced step by step according to CNPC,oil field company,field,block and single well,and the oil and gas production performance of each unit can be mastered in real time.展开更多
Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly sto...Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.展开更多
This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around...This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around a verb. A NG consists of one or several keywords (application dependent noun or value). Simple semantic filters are defined for identifying these keywords which can be of semantic value: class, simple attribute, composed attribute, key value or not key value. Coherence rules and coherence constraints are introduced, to check the validity of the co-occurrence of two consecutive nouns in complex NG. If a query is constituted of a single NG, no further analysis is required. Otherwise, if a query covers two valid NG, it is a subject of studying the semantic coherence of the verb and both NG which are attached to it.展开更多
With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issue...With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.展开更多
Query recommendation is an effective method to help users describe their search intentions.In a personalized system,cold-start and the data sparsity were unavoidable,which directly lead to deficient performance of per...Query recommendation is an effective method to help users describe their search intentions.In a personalized system,cold-start and the data sparsity were unavoidable,which directly lead to deficient performance of personalizing.As a significant part of a user’s personal information space,a personal computer owns lots of documents relevant to his or her interest.Therefore,desktop data was introduced to construct a user’s preference model.Furthermore,considering the variety of desktop data,relationship between search task and work task was simultaneously exploited to predict a user’s specific information need.Ten volunteers joined experiments to evaluate the potential of desktop data.A series of experiments were conducted and the results proved that desktop data greatly contributed to providing effective personalized reference words.Besides,the results demonstrated that a user’s long-term interest model performed steadier than work task context,but the most valuable words were the top-3 words extracted from the work context.展开更多
Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational databas...Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.展开更多
文摘The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.
基金supported by the Student Scheme provided by Universiti Kebangsaan Malaysia with the Code TAP-20558.
文摘A data lake(DL),abbreviated as DL,denotes a vast reservoir or repository of data.It accumulates substantial volumes of data and employs advanced analytics to correlate data from diverse origins containing various forms of semi-structured,structured,and unstructured information.These systems use a flat architecture and run different types of data analytics.NoSQL databases are nontabular and store data in a different manner than the relational table.NoSQL databases come in various forms,including key-value pairs,documents,wide columns,and graphs,each based on its data model.They offer simpler scalability and generally outperform traditional relational databases.While NoSQL databases can store diverse data types,they lack full support for atomicity,consistency,isolation,and durability features found in relational databases.Consequently,employing machine learning approaches becomes necessary to categorize complex structured query language(SQL)queries.Results indicate that the most frequently used automatic classification technique in processing SQL queries on NoSQL databases is machine learning-based classification.Overall,this study provides an overview of the automatic classification techniques used in processing SQL queries on NoSQL databases.Understanding these techniques can aid in the development of effective and efficient NoSQL database applications.
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
文摘New and emerging use cases, such as the interconnection of geographically distributed data centers(DCs), are drawing attention to the requirement for dynamic end-to-end service provisioning, spanning multiple and heterogeneous optical network domains. This heterogeneity is, not only due to the diverse data transmission and switching technologies, but also due to the different options of control plane techniques. In light of this, the problem of heterogeneous control plane interworking needs to be solved, and in particular, the solution must address the specific issues of multi-domain networks, such as limited domain topology visibility, given the scalability and confidentiality constraints. In this article, some of the recent activities regarding the Software-Defined Networking(SDN) orchestration are reviewed to address such a multi-domain control plane interworking problem. Specifically, three different models, including the single SDN controller model, multiple SDN controllers in mesh, and multiple SDN controllers in a hierarchical setting, are presented for the DC interconnection network with multiple SDN/Open Flow domains or multiple Open Flow/Generalized Multi-Protocol Label Switching( GMPLS) heterogeneous domains. I n addition, two concrete implementations of the orchestration architectures are detailed, showing the overall feasibility and procedures of SDN orchestration for the end-to-endservice provisioning in multi-domain data center optical networks.
基金supported in part by the following funding agencies of China:National Natural Science Foundation under Grant 61602050 and U1534201National Key Research and Development Program of China under Grant 2016QY01W0200
文摘Virtual data center is a new form of cloud computing concept applied to data center. As one of the most important challenges, virtual data center embedding problem has attracted much attention from researchers. In data centers, energy issue is very important for the reality that data center energy consumption has increased by dozens of times in the last decade. In this paper, we are concerned about the cost-aware multi-domain virtual data center embedding problem. In order to solve this problem, this paper first addresses the energy consumption model. The model includes the energy consumption model of the virtual machine node and the virtual switch node, to quantify the energy consumption in the virtual data center embedding process. Based on the energy consumption model above, this paper presents a heuristic algorithm for cost-aware multi-domain virtual data center embedding. The algorithm consists of two steps: inter-domain embedding and intra-domain embedding. Inter-domain virtual data center embedding refers to dividing virtual data center requests into several slices to select the appropriate single data center. Intra-domain virtual data center refers to embedding virtual data center requests in each data center. We first propose an inter-domain virtual data center embedding algorithm based on label propagation to select the appropriate single data center. We then propose a cost-aware virtual data center embedding algorithm to perform the intra-domain data center embedding. Extensive simulation results show that our proposed algorithm in this paper can effectively reduce the energy consumption while ensuring the success ratio of embedding.
基金Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of Chinathe Director Fund(No.IS201116002)from Institute of Seismology,CEA
文摘This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.
基金Supported by the Natural Science Foundation ofJiangsu Province(BG2004034)
文摘Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on ten tralized systems. A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers. The system is based on a balanced tree structured P2P network. By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited. Dynamic load balancing can be achieved during space partitioning and query resolving. Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.
基金Supported by the National Natural Science Foun-dation of China (60573089) the National 985 Project Fundation(985-2-DB-Y01)
文摘How to process aggregate queries over data streams efficiently and effectively have been becoming hot re search topics in both academic community and industrial community. Aiming at the issues, a novel Linked-tree algorithm based on sliding window is proposed in this paper. Due to the proposal of concept area, the Linked-tree algorithm reuses many primary results in last window and then avoids lots of unnecessary repeated comparison operations between two successive windows. As a result, execution efficiency of MAX query is improved dramatically. In addition, since the size of memory is relevant to the number of areas but irrelevant to the size of sliding window, memory is economized greatly. The extensive experimental results show that the performance of Linked-tree algorithm has significant improvement gains over the traditional SC (Simple Compared) algorithm and Ranked-tree algorithm.
基金Supported by the National High Technology Research and Development Program of China(863 Program 2012AA011004)the National Natural Science Foundation of China(61232002,61202033)Natural Science Foundation of Hubei Province(2011CDB448)
文摘There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.
文摘The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.
基金Supported by the China National Science and Technology Major Project(2016ZX05016-006).
文摘The multidimensional analysis engine data management platform is constructed using big data distributed storage and parallel computing,data warehouse modeling technology,realizing the optimal management and instant query of distributed oil and gas production dynamic big data.The centralized management and quick response of the production data of more than 36×10^4 oil,gas and water wells is realized.Multidimensional analysis subject model of oil,gas and water well production is built to pretreat the relevant data.At the level of China National Petroleum Corporation(CNPC),the rapid analysis and applications such as oil and gas production tracking,early production warning of key oilfields,analysis of low production wells and long shutdown wells,classification of reservoir development laws have been realized,and the processing time has been shortened from 1 d to 5 s.The basic unit of oil and gas production analysis is refined from oilfield to single well,making the production management more detailed.The process can be traced step by step according to CNPC,oil field company,field,block and single well,and the oil and gas production performance of each unit can be mastered in real time.
文摘Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.
文摘This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around a verb. A NG consists of one or several keywords (application dependent noun or value). Simple semantic filters are defined for identifying these keywords which can be of semantic value: class, simple attribute, composed attribute, key value or not key value. Coherence rules and coherence constraints are introduced, to check the validity of the co-occurrence of two consecutive nouns in complex NG. If a query is constituted of a single NG, no further analysis is required. Otherwise, if a query covers two valid NG, it is a subject of studying the semantic coherence of the verb and both NG which are attached to it.
文摘With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.
文摘Query recommendation is an effective method to help users describe their search intentions.In a personalized system,cold-start and the data sparsity were unavoidable,which directly lead to deficient performance of personalizing.As a significant part of a user’s personal information space,a personal computer owns lots of documents relevant to his or her interest.Therefore,desktop data was introduced to construct a user’s preference model.Furthermore,considering the variety of desktop data,relationship between search task and work task was simultaneously exploited to predict a user’s specific information need.Ten volunteers joined experiments to evaluate the potential of desktop data.A series of experiments were conducted and the results proved that desktop data greatly contributed to providing effective personalized reference words.Besides,the results demonstrated that a user’s long-term interest model performed steadier than work task context,but the most valuable words were the top-3 words extracted from the work context.
基金This work is supported by the National High Technology Research and Development Program ofChina(2 0 0 2 AA135 2 30 ) and the Major Project of National Natural Science Foundation of Beijing(4 0 110 0 2 ) .
文摘Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.