The paper presents a novel benefit based query processing strategy for efficient query routing. Based on DHT as the overlay network, it first applies Nash equilibrium to construct the optimal peer group based on the c...The paper presents a novel benefit based query processing strategy for efficient query routing. Based on DHT as the overlay network, it first applies Nash equilibrium to construct the optimal peer group based on the correlations of keywords and coverage and overlap of the peers to decrease the time cost, and then presents a two-layered architecture for query processing that utilizes Bloom filter as compact representation to reduce the bandwidth consumption. Extensive experiments conducted on a real world dataset have demonstrated that our approach obviously decreases the processing time, while improves the precision and recall as well.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
With the rapid growth of spatial data,POI(Point of Interest)is becoming ever more intensive,and the text description of each spatial point is also gradually increasing.The traditional query method can only address the...With the rapid growth of spatial data,POI(Point of Interest)is becoming ever more intensive,and the text description of each spatial point is also gradually increasing.The traditional query method can only address the problem that the text description is less and single keyword query.In view of this situation,the paper proposes an approximate matching algorithm to support spatial multi-keyword.The fuzzy matching algorithm is integrated into this algorithm,which not only supports multiple POI queries,but also supports fault tolerance of the query keywords.The simulation results demonstrate that the proposed algorithm can improve the accuracy and efficiency of query.展开更多
Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from quer...Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.展开更多
With the continuous growth of exponential data in IoT,it is usually chosen to outsource data to the cloud server.However,cloud servers are usually provided by third parties,and there is a risk of privacy leakage.Encry...With the continuous growth of exponential data in IoT,it is usually chosen to outsource data to the cloud server.However,cloud servers are usually provided by third parties,and there is a risk of privacy leakage.Encrypting data can ensure its security,but at the same time,it loses the retrieval function of IoT data.Searchable Encryption(SE)can achieve direct retrieval based on ciphertext data.The traditional searchable encryption scheme has the problems of imperfect function,low retrieval efficiency,inaccurate retrieval results,and centralized cloud servers being vulnerable and untrustworthy.This paper proposes an Efficient searchable encryption scheme supporting fuzzy multi-keyword ranking search on the blockchain.The blockchain and IPFS are used to store the index and encrypted files in a distributed manner respectively.The tamper resistance of the distributed ledger ensures the authenticity of the data.The data retrieval work is performed by the smart contract to ensure the reliability of the data retrieval.The Local Sensitive Hash(LSH)function is combined with the Bloom Filter(BF)to realize the fuzzy multi-keyword retrieval function.In addition,to measure the correlation between keywords and files,a new weighted statistical algorithm combining RegionalWeight Score(RWS)and Term Frequency–Inverse Document Frequency(TF-IDF)is proposed to rank the search results.The balanced binary tree is introduced to establish the index structure,and the index binary tree traversal strategy suitable for this scheme is constructed to optimize the index structure and improve the retrieval efficiency.The experimental results show that the scheme is safe and effective in practical applications.展开更多
Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),ha...Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),has recently been proposed in the industry.Unlike traditional kNN queries,A-kNN considers not only the road network distance but also the availability status of vehicles.In this context,even vehicles with passengers can still be considered potential candidates for dispatch if their destinations are near the requester’s location.The V-Treebased query method,due to its structural characteristics,is capable of efficiently finding k-nearest moving objects within a road network.It is a currently popular query solution in ride-hailing services.However,when vertices to be queried are close in the graph but distant in the index,the V-Tree-based method necessitates the traversal of numerous irrelevant subgraphs,which makes its processing of A-kNN queries less efficient.To address this issue,we optimize the V-Tree-based method and propose a novel index structure,the Path-Accelerated V-Tree(PAV-Tree),to improve query performance by introducing shortcuts.Leveraging this index,we introduce a novel query optimization algorithm,PAVA-kNN,specifically designed to processA-kNNqueries efficiently.Experimental results showthat PAV-A-kNNachieves query times up to 2.2–15 times faster than baseline methods,with microsecond-level latency.展开更多
Time-series databases(TSDBs)are essential for managing large-scale time-series data in fields like finance,IoT,and agriculture.However,traditional query optimization methods,such as dynamic programming,struggle with h...Time-series databases(TSDBs)are essential for managing large-scale time-series data in fields like finance,IoT,and agriculture.However,traditional query optimization methods,such as dynamic programming,struggle with high computational complexity and inaccurate cost estimates.This paper proposes a novel query optimization module for TSDBs using reinforcement learning(RL),specifically Deep Q-Networks(DQN)and Double Deep Q-Networks(DDQN).These algorithms dynamically learn optimal join orders based on query workloads and connection costs.Experiments show that RL-based methods achieve better optimization performance and stability compared to traditional heuristics,especially under complex cost models.This work highlights the potential of RL in improving query optimization for TSDBs.展开更多
Cardinality estimation is crucial for query optimization,but traditional methods struggle with complex queries.We propose LW-CQ,a lightweight machine learning-based algorithm that improves cardinality estimation for c...Cardinality estimation is crucial for query optimization,but traditional methods struggle with complex queries.We propose LW-CQ,a lightweight machine learning-based algorithm that improves cardinality estimation for complex queries by enhancing the LW-XGB method.LW-CQ introduces four feature-level improvements and extends support for disjunctive queries and LIKE predicates.Experimental results show that LW-CQ achieves competitive accuracy while significantly reducing training and inference time,making it a promising solution for real-world database applications.展开更多
In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantu...In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantum states and derives the key bit by comparing the initial quantum state and the outcome state returned from the database by ctrl or shift mode,instead of announcing two non-orthogonal qubits as others which may leak part secret information.To some extent,the security of the database and the privacy of the user are strengthened.Unfortunately,we find that in this protocol,the dishonest user could be obtained,utilizing unambiguous state discrimination,much more database information than that is analyzed in Zhou et al's original research.To strengthen the database security,we improved the mentioned protocol by modifying the information returned by the database in various ways.The analysis indicates that the security of the improved protocols is greatly enhanced.展开更多
聚焦于中小型企业,深入探讨借助Excel Power Query工具批量生成记账凭证的方法。通过分析中小型企业记账凭证处理的现状,对比手工录入的会计电算化记账方式(以下简称手工录账)与借助Excel Power Query批量生成记账凭证的模式,阐述Excel ...聚焦于中小型企业,深入探讨借助Excel Power Query工具批量生成记账凭证的方法。通过分析中小型企业记账凭证处理的现状,对比手工录入的会计电算化记账方式(以下简称手工录账)与借助Excel Power Query批量生成记账凭证的模式,阐述Excel Power Query在数据处理各环节的应用优势,详细介绍应用该工具批量生成记账凭证的具体步骤,并结合实际案例展示其应用效果。展开更多
To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal ba...Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.展开更多
Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and re...Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.展开更多
查询是数据库系统最主要的操作,查询性能直接决定了应用程序的响应速度和用户体验.多查询并行场景下,查询之间争用或共享数据库系统资源,产生查询交互(Query Interaction,QI),是影响查询性能的主要因素,准确度量QI是为查询选择合适执行...查询是数据库系统最主要的操作,查询性能直接决定了应用程序的响应速度和用户体验.多查询并行场景下,查询之间争用或共享数据库系统资源,产生查询交互(Query Interaction,QI),是影响查询性能的主要因素,准确度量QI是为查询选择合适执行计划及提升查询性能的关键.QI随着查询中操作的执行动态变化,现有度量方法只考虑新查询加入时刻系统资源的使用情况,不考虑系统资源在查询执行过程中的变化,度量不准确.为此,本文提出查询组合时序异构图,用于描述查询组合中QI随时间的动态变化;提出时间感知多边类型权重计算模型(Time-Aware Multi-edge Type Weight Calculation,TA-MTWC),计算异构图中操作节点之间任意执行时刻的边权重,捕捉QI随时间的动态变化;提出查询组合时序异构图分类模型(Query-mix Time-series Heterogeneous Graph Classification,QTHGC),采用长短期记忆神经网络(Long Short Term Memory,LSTM)学习多个时刻图表示之间的时序关系,为并行查询选择执行计划.在PostgreSQL上的实验证明,QTHGC的平均准确率比查询优化器提高51.2%,比现有最新的QHGC模型提高2.87%.展开更多
基金Supported by the National Natural Science Foundation of China (60673139, 60473073, 60573090)
文摘The paper presents a novel benefit based query processing strategy for efficient query routing. Based on DHT as the overlay network, it first applies Nash equilibrium to construct the optimal peer group based on the correlations of keywords and coverage and overlap of the peers to decrease the time cost, and then presents a two-layered architecture for query processing that utilizes Bloom filter as compact representation to reduce the bandwidth consumption. Extensive experiments conducted on a real world dataset have demonstrated that our approach obviously decreases the processing time, while improves the precision and recall as well.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
文摘With the rapid growth of spatial data,POI(Point of Interest)is becoming ever more intensive,and the text description of each spatial point is also gradually increasing.The traditional query method can only address the problem that the text description is less and single keyword query.In view of this situation,the paper proposes an approximate matching algorithm to support spatial multi-keyword.The fuzzy matching algorithm is integrated into this algorithm,which not only supports multiple POI queries,but also supports fault tolerance of the query keywords.The simulation results demonstrate that the proposed algorithm can improve the accuracy and efficiency of query.
基金supported by the Social Science Planning Foundation of Chongqing(Grant No.:2011QNCB28)
文摘Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
基金funded by the Jilin Provincial Department of Education Scientific Research Project(Project No.JJKH20250872KJ).
文摘With the continuous growth of exponential data in IoT,it is usually chosen to outsource data to the cloud server.However,cloud servers are usually provided by third parties,and there is a risk of privacy leakage.Encrypting data can ensure its security,but at the same time,it loses the retrieval function of IoT data.Searchable Encryption(SE)can achieve direct retrieval based on ciphertext data.The traditional searchable encryption scheme has the problems of imperfect function,low retrieval efficiency,inaccurate retrieval results,and centralized cloud servers being vulnerable and untrustworthy.This paper proposes an Efficient searchable encryption scheme supporting fuzzy multi-keyword ranking search on the blockchain.The blockchain and IPFS are used to store the index and encrypted files in a distributed manner respectively.The tamper resistance of the distributed ledger ensures the authenticity of the data.The data retrieval work is performed by the smart contract to ensure the reliability of the data retrieval.The Local Sensitive Hash(LSH)function is combined with the Bloom Filter(BF)to realize the fuzzy multi-keyword retrieval function.In addition,to measure the correlation between keywords and files,a new weighted statistical algorithm combining RegionalWeight Score(RWS)and Term Frequency–Inverse Document Frequency(TF-IDF)is proposed to rank the search results.The balanced binary tree is introduced to establish the index structure,and the index binary tree traversal strategy suitable for this scheme is constructed to optimize the index structure and improve the retrieval efficiency.The experimental results show that the scheme is safe and effective in practical applications.
基金supported by the Special Project of Henan Provincial Key Research,Development and Promotion(Key Science and Technology Program)under Grant 252102210154in part by the National Natural Science Foundation of China under Grant 62403437.
文摘Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),has recently been proposed in the industry.Unlike traditional kNN queries,A-kNN considers not only the road network distance but also the availability status of vehicles.In this context,even vehicles with passengers can still be considered potential candidates for dispatch if their destinations are near the requester’s location.The V-Treebased query method,due to its structural characteristics,is capable of efficiently finding k-nearest moving objects within a road network.It is a currently popular query solution in ride-hailing services.However,when vertices to be queried are close in the graph but distant in the index,the V-Tree-based method necessitates the traversal of numerous irrelevant subgraphs,which makes its processing of A-kNN queries less efficient.To address this issue,we optimize the V-Tree-based method and propose a novel index structure,the Path-Accelerated V-Tree(PAV-Tree),to improve query performance by introducing shortcuts.Leveraging this index,we introduce a novel query optimization algorithm,PAVA-kNN,specifically designed to processA-kNNqueries efficiently.Experimental results showthat PAV-A-kNNachieves query times up to 2.2–15 times faster than baseline methods,with microsecond-level latency.
基金supported by Sichuan Science and Technology Program(2024YFHZ0161).
文摘Time-series databases(TSDBs)are essential for managing large-scale time-series data in fields like finance,IoT,and agriculture.However,traditional query optimization methods,such as dynamic programming,struggle with high computational complexity and inaccurate cost estimates.This paper proposes a novel query optimization module for TSDBs using reinforcement learning(RL),specifically Deep Q-Networks(DQN)and Double Deep Q-Networks(DDQN).These algorithms dynamically learn optimal join orders based on query workloads and connection costs.Experiments show that RL-based methods achieve better optimization performance and stability compared to traditional heuristics,especially under complex cost models.This work highlights the potential of RL in improving query optimization for TSDBs.
基金supported by Sichuan Science and Technology Program(2024YFHZ0161).
文摘Cardinality estimation is crucial for query optimization,but traditional methods struggle with complex queries.We propose LW-CQ,a lightweight machine learning-based algorithm that improves cardinality estimation for complex queries by enhancing the LW-XGB method.LW-CQ introduces four feature-level improvements and extends support for disjunctive queries and LIKE predicates.Experimental results show that LW-CQ achieves competitive accuracy while significantly reducing training and inference time,making it a promising solution for real-world database applications.
基金supported by the National Key R&D Program of China(Grant No.2022YFC3801700)the National Natural Science Foundation of China(Grant No.62472052)Xinjiang Production and Construction Corps Key Laboratory of Computing Intelligence and Network Information Security(Grant No.CZ002702-3)。
文摘In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantum states and derives the key bit by comparing the initial quantum state and the outcome state returned from the database by ctrl or shift mode,instead of announcing two non-orthogonal qubits as others which may leak part secret information.To some extent,the security of the database and the privacy of the user are strengthened.Unfortunately,we find that in this protocol,the dishonest user could be obtained,utilizing unambiguous state discrimination,much more database information than that is analyzed in Zhou et al's original research.To strengthen the database security,we improved the mentioned protocol by modifying the information returned by the database in various ways.The analysis indicates that the security of the improved protocols is greatly enhanced.
文摘聚焦于中小型企业,深入探讨借助Excel Power Query工具批量生成记账凭证的方法。通过分析中小型企业记账凭证处理的现状,对比手工录入的会计电算化记账方式(以下简称手工录账)与借助Excel Power Query批量生成记账凭证的模式,阐述Excel Power Query在数据处理各环节的应用优势,详细介绍应用该工具批量生成记账凭证的具体步骤,并结合实际案例展示其应用效果。
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金The National Natural Science Foundation of China(No.61070170)the Natural Science Foundation of Higher Education Institutions of Jiangsu Province(No.11KJB520017)Suzhou Application Foundation Research Project(No.SYG201238)
文摘Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.
文摘Query expansion with thesaurus is one of the useful techniques in modern information retrieval (IR). In this paper, a method of query expansion for Chinese IR by using a decaying co-occurrence model is proposed and realized. The model is an extension of the traditional co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Experimental results on TREC-9 collections show this query expansion method results in significant improvements over the IR without query expansion.
文摘查询是数据库系统最主要的操作,查询性能直接决定了应用程序的响应速度和用户体验.多查询并行场景下,查询之间争用或共享数据库系统资源,产生查询交互(Query Interaction,QI),是影响查询性能的主要因素,准确度量QI是为查询选择合适执行计划及提升查询性能的关键.QI随着查询中操作的执行动态变化,现有度量方法只考虑新查询加入时刻系统资源的使用情况,不考虑系统资源在查询执行过程中的变化,度量不准确.为此,本文提出查询组合时序异构图,用于描述查询组合中QI随时间的动态变化;提出时间感知多边类型权重计算模型(Time-Aware Multi-edge Type Weight Calculation,TA-MTWC),计算异构图中操作节点之间任意执行时刻的边权重,捕捉QI随时间的动态变化;提出查询组合时序异构图分类模型(Query-mix Time-series Heterogeneous Graph Classification,QTHGC),采用长短期记忆神经网络(Long Short Term Memory,LSTM)学习多个时刻图表示之间的时序关系,为并行查询选择执行计划.在PostgreSQL上的实验证明,QTHGC的平均准确率比查询优化器提高51.2%,比现有最新的QHGC模型提高2.87%.