The smart grid has caught great attentions in recent years, which is poised to transform a centralized, producer-controlled network to a decentralized, consumer- interactive network that's supported by fine-grained m...The smart grid has caught great attentions in recent years, which is poised to transform a centralized, producer-controlled network to a decentralized, consumer- interactive network that's supported by fine-grained monitoring. Large-scale WSNs (Wireless Sensor Networks) have been considered one of the very promising technologies to support the implementation of smart grid. WSNs are applied in almost every aspect of smart grid, including power generation, power transmission, power distribution, power utilization and power dispatch, and the data query processing of 'WSNs in power grid' become an hotspot issue due to the amount of data of power grid is very large and the requirement of response time is very high. To meet the demands, top-k query processing is a good choice, which performs the cooperative query by aggregating the database objects' degree of match for each different query predicate and returning the best k matching objects. In this paper, a framework that can effectively apply top-k query to wireless sensor network in smart grid is proposed, which is based on the cluster-topology sensor network. In the new method, local indices are used to optimize the necessary query routing and process intermediate results inside the cluster to cut down the data traffic, and the hierarchical join query is executed based on the local results.Besides, top-k query results are verified by the clean-up process, and two schemes are taken to deal with the problem of node's dynamicity, which further reduce communication cost. Case studies and experimental results show that our algorithm has outperformed the current existing one with higher quality results and better efficiently.展开更多
Top-k ranking of websites according to traffic volume is important for Internet Service Providers(ISPs) to understand network status and optimize network resources. However, the ranking result always has a big deviati...Top-k ranking of websites according to traffic volume is important for Internet Service Providers(ISPs) to understand network status and optimize network resources. However, the ranking result always has a big deviation with actual rank for the existence of unknown web traffic, which cannot be identified accurately under current techniques. In this paper, we introduce a novel method to approximate the actual rank. This method associates unknown web traffic with websites according to statistical probabilities. Then, we construct a probabilistic top-k query model to rank websites. We conduct several experiments by using real HTTP traffic traces collected from a commercial ISP covering an entire city in northern China. Experimental results show that the proposed techniques can reduce the deviation existing between the ground truth and the ranking results vastly. In addition, we find that the websites providing video service have higher ratio of unknown IP as well as higher ratio of unknown traffic than the websites providing text web page service. Specifically, we find that the top-3 video websites have more than 90% of unknown web traffic. All these findings are helpful for ISPs understanding network status and deploying Content Distributed Network(CDN).展开更多
Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this ...Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-κ query, which returns approximate answers for top-κ query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-κ query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 - 5 of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.展开更多
There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. O...There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.展开更多
Caching is an important technique to enhance the efficiency of query processing. Unfortunately, traditional caching mechanisms are not efficient for deep Web because of storage space and dynamic maintenance limitation...Caching is an important technique to enhance the efficiency of query processing. Unfortunately, traditional caching mechanisms are not efficient for deep Web because of storage space and dynamic maintenance limitations. In this paper, we present on providing a cache mechanism based on Top-K data source (KDS-CM) instead of result records for deep Web query. By integrating techniques from IR and Top-K, a data reorganization strategy is presented to model KDS-CM. Also some measures about cache management and optimization are proposed to improve the performances of cache effectively. Experimental results show the benefits of KDS-CM in execution cost and dynamic maintenance when compared with various alternate strategies.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
We propose an influential set based moving k keyword query processing model, which avoids the shortcoming of safe region-based approaches that the update cost and update frequency cannot be optimized simultaneously. B...We propose an influential set based moving k keyword query processing model, which avoids the shortcoming of safe region-based approaches that the update cost and update frequency cannot be optimized simultaneously. Based on the model, we design a parallel query processing method and a parallel validation method for multicore processing platforms. The time complexity of the algorithms is O((log|D|+p.k)/p.k)?and O(log p.k), respectively, which are all O(1/k) times the time complexity of the state-of-the-art method. The experiment result confirms the superiority of our algorithms over the state-of-the-art method.展开更多
Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from quer...Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.展开更多
This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structure...This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structured,and unstructured data.Multimodal data queries are crucial because they enable seamless retrieval of related data across modalities,such as tables,images,and text,which has applications in fields like e-commerce,healthcare,and education.However,existing methods primarily focus on single-modality queries,such as joinable or unionable table discovery,and struggle to handle the heterogeneity and lack of metadata in data lakes while balancing accuracy and efficiency.To tackle these challenges,we propose a Multimodal data Query mechanism for Data Lakes(MQDL),which employs a modality-adaptive indexing mechanism raleted and contrastive learning based embeddings to unify representations across modalities.Additionally,we introduce product quantization to optimize candidate verification during queries,reducing computational overhead while maintaining precision.We evaluate MQDL using a table-image dataset across multiple business scenarios,measuring metrics such as precision,recall,and F1-score.Results show that MQDL achieves an accuracy rate of approximately 90%,while demonstrating strong scalability and reduced query response time compared to traditional methods.These findings highlight MQDL's potential to enhance multimodal data retrieval in complex data lake environments.展开更多
Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),ha...Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),has recently been proposed in the industry.Unlike traditional kNN queries,A-kNN considers not only the road network distance but also the availability status of vehicles.In this context,even vehicles with passengers can still be considered potential candidates for dispatch if their destinations are near the requester’s location.The V-Treebased query method,due to its structural characteristics,is capable of efficiently finding k-nearest moving objects within a road network.It is a currently popular query solution in ride-hailing services.However,when vertices to be queried are close in the graph but distant in the index,the V-Tree-based method necessitates the traversal of numerous irrelevant subgraphs,which makes its processing of A-kNN queries less efficient.To address this issue,we optimize the V-Tree-based method and propose a novel index structure,the Path-Accelerated V-Tree(PAV-Tree),to improve query performance by introducing shortcuts.Leveraging this index,we introduce a novel query optimization algorithm,PAVA-kNN,specifically designed to processA-kNNqueries efficiently.Experimental results showthat PAV-A-kNNachieves query times up to 2.2–15 times faster than baseline methods,with microsecond-level latency.展开更多
针对现有高效用项集挖掘算法存在的阈值提升缓慢、剪枝效用差等问题,提出了一种能够更加高效地挖掘效用值最大的前k个项集的算法。TKUL(minging Top-K high Utility itemsets based List)算法综合采用RIUQ、CUDQ和EPB阈值提升策略,加快...针对现有高效用项集挖掘算法存在的阈值提升缓慢、剪枝效用差等问题,提出了一种能够更加高效地挖掘效用值最大的前k个项集的算法。TKUL(minging Top-K high Utility itemsets based List)算法综合采用RIUQ、CUDQ和EPB阈值提升策略,加快最小阈值获取的速度,大大减少了生成的非高效用项集的数量,并通过RUI和EUCPM策略进行剪枝,有效缩小了搜索空间的规模,从而提高了高效用项集的挖掘效率。展开更多
The diferential privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specifed parametric distribution model with one or two degrees of freedom. ...The diferential privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specifed parametric distribution model with one or two degrees of freedom. However, this emphasis tends to neglect the crucial considerations of response accuracy and utility, especially in the context of categorical or discrete numerical database queries, where the parameters defning the noise distribution are fnite and could be chosen optimally. This paper addresses this gap by introducing a novel framework for designing an optimal noise probability mass function (PMF) tailored to discrete and fnite query sets. Our approach considers the modulo summation of random noise as the DP mechanism, aiming to present a tractable solution that not only satisfes privacy constraints but also minimizes query distortion. Unlike existing approaches focused solely on meet-ingprivacy constraints, our framework seeks to optimize the noise distribution under an arbitrary (ǫ, δ) constraint, thereby enhancing the accuracy and utility of the response. We demonstrate that the optimal PMF can be obtained through solving a mixed-integer linear program. Additionally, closed-form solutions for the optimal PMF are provided, minimizing the probability of error for two specifc cases. Numerical experiments highlight the superior performance of our proposed optimal mechanisms compared to state-of-the-art methods. This paper contributes to the DP literature by presenting a clear and systematic approach to designing noise mechanisms that not only satisfy pri-vacyrequirements but also optimize query distortion. The framework introduced here opens avenues for improved privacy-preserving database queries, ofering signifcant enhancements in response accuracy and utility.展开更多
The interconnection between query processing and data partitioning is pivotal for the acceleration of massive data processing during query execution,primarily by minimizing the number of scanned block files.Existing p...The interconnection between query processing and data partitioning is pivotal for the acceleration of massive data processing during query execution,primarily by minimizing the number of scanned block files.Existing partitioning techniques predominantly focus on query accesses on numeric columns for constructing partitions,often overlooking non-numeric columns and thus limiting optimization potential.Additionally,these techniques,despite creating fine-grained partitions from representative queries to enhance system performance,experience from notable performance declines due to unpredictable fluctuations in future queries.To tackle these issues,we introduce LRP,a learned robust partitioning system for dynamic query processing.LRP first proposes a method for data and query encoding that captures comprehensive column access patterns from historical queries.It then employs Multi-Layer Perceptron and Long Short-Term Memory networks to predict shifts in the distribution of historical queries.To create high-quality,robust partitions based on these predictions,LRP adopts a greedy beam search algorithm for optimal partition division and implements a data redundancy mechanism to share frequently accessed data across partitions.Experimental evaluations reveal that LRP yields partitions with more stable performance under incoming queries and significantly surpasses state-of-the-art partitioning methods.展开更多
In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantu...In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantum states and derives the key bit by comparing the initial quantum state and the outcome state returned from the database by ctrl or shift mode,instead of announcing two non-orthogonal qubits as others which may leak part secret information.To some extent,the security of the database and the privacy of the user are strengthened.Unfortunately,we find that in this protocol,the dishonest user could be obtained,utilizing unambiguous state discrimination,much more database information than that is analyzed in Zhou et al's original research.To strengthen the database security,we improved the mentioned protocol by modifying the information returned by the database in various ways.The analysis indicates that the security of the improved protocols is greatly enhanced.展开更多
文摘The smart grid has caught great attentions in recent years, which is poised to transform a centralized, producer-controlled network to a decentralized, consumer- interactive network that's supported by fine-grained monitoring. Large-scale WSNs (Wireless Sensor Networks) have been considered one of the very promising technologies to support the implementation of smart grid. WSNs are applied in almost every aspect of smart grid, including power generation, power transmission, power distribution, power utilization and power dispatch, and the data query processing of 'WSNs in power grid' become an hotspot issue due to the amount of data of power grid is very large and the requirement of response time is very high. To meet the demands, top-k query processing is a good choice, which performs the cooperative query by aggregating the database objects' degree of match for each different query predicate and returning the best k matching objects. In this paper, a framework that can effectively apply top-k query to wireless sensor network in smart grid is proposed, which is based on the cluster-topology sensor network. In the new method, local indices are used to optimize the necessary query routing and process intermediate results inside the cluster to cut down the data traffic, and the hierarchical join query is executed based on the local results.Besides, top-k query results are verified by the clean-up process, and two schemes are taken to deal with the problem of node's dynamicity, which further reduce communication cost. Case studies and experimental results show that our algorithm has outperformed the current existing one with higher quality results and better efficiently.
基金supported by 111 Project of China under Grant No.B08004
文摘Top-k ranking of websites according to traffic volume is important for Internet Service Providers(ISPs) to understand network status and optimize network resources. However, the ranking result always has a big deviation with actual rank for the existence of unknown web traffic, which cannot be identified accurately under current techniques. In this paper, we introduce a novel method to approximate the actual rank. This method associates unknown web traffic with websites according to statistical probabilities. Then, we construct a probabilistic top-k query model to rank websites. We conduct several experiments by using real HTTP traffic traces collected from a commercial ISP covering an entire city in northern China. Experimental results show that the proposed techniques can reduce the deviation existing between the ground truth and the ranking results vastly. In addition, we find that the websites providing video service have higher ratio of unknown IP as well as higher ratio of unknown traffic than the websites providing text web page service. Specifically, we find that the top-3 video websites have more than 90% of unknown web traffic. All these findings are helpful for ISPs understanding network status and deploying Content Distributed Network(CDN).
基金This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.
文摘Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-κ query, which returns approximate answers for top-κ query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-κ query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 - 5 of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.
基金Supported by the National High Technology Research and Development Program of China(863 Program 2012AA011004)the National Natural Science Foundation of China(61232002,61202033)Natural Science Foundation of Hubei Province(2011CDB448)
文摘There have been many researches and semantics in answering top-k queries on uncertain data in various applications. However, most of these semantics must consume much of their time in computing position probability. Our approach to support various top-k queries is based on position probability distribution (PPD) sharing. In this paper, a PPD-tree structure and several basic operations on it are proposed to support various top-k queries. In addition, we proposed an approximation method to improve the efficiency of PPD generation. We also verify the effectiveness and efficiency of our approach by both theoretical analysis and experiments.
基金Supported by the National Natural Science Foundation of China (60673139, 60473073, 60573090)
文摘Caching is an important technique to enhance the efficiency of query processing. Unfortunately, traditional caching mechanisms are not efficient for deep Web because of storage space and dynamic maintenance limitations. In this paper, we present on providing a cache mechanism based on Top-K data source (KDS-CM) instead of result records for deep Web query. By integrating techniques from IR and Top-K, a data reorganization strategy is presented to model KDS-CM. Also some measures about cache management and optimization are proposed to improve the performances of cache effectively. Experimental results show the benefits of KDS-CM in execution cost and dynamic maintenance when compared with various alternate strategies.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
文摘We propose an influential set based moving k keyword query processing model, which avoids the shortcoming of safe region-based approaches that the update cost and update frequency cannot be optimized simultaneously. Based on the model, we design a parallel query processing method and a parallel validation method for multicore processing platforms. The time complexity of the algorithms is O((log|D|+p.k)/p.k)?and O(log p.k), respectively, which are all O(1/k) times the time complexity of the state-of-the-art method. The experiment result confirms the superiority of our algorithms over the state-of-the-art method.
基金supported by the Social Science Planning Foundation of Chongqing(Grant No.:2011QNCB28)
文摘Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
文摘This paper addresses the challenge of efficiently querying multimodal related data in data lakes,a large-scale storage and management system that supports heterogeneous data formats,including structured,semi-structured,and unstructured data.Multimodal data queries are crucial because they enable seamless retrieval of related data across modalities,such as tables,images,and text,which has applications in fields like e-commerce,healthcare,and education.However,existing methods primarily focus on single-modality queries,such as joinable or unionable table discovery,and struggle to handle the heterogeneity and lack of metadata in data lakes while balancing accuracy and efficiency.To tackle these challenges,we propose a Multimodal data Query mechanism for Data Lakes(MQDL),which employs a modality-adaptive indexing mechanism raleted and contrastive learning based embeddings to unify representations across modalities.Additionally,we introduce product quantization to optimize candidate verification during queries,reducing computational overhead while maintaining precision.We evaluate MQDL using a table-image dataset across multiple business scenarios,measuring metrics such as precision,recall,and F1-score.Results show that MQDL achieves an accuracy rate of approximately 90%,while demonstrating strong scalability and reduced query response time compared to traditional methods.These findings highlight MQDL's potential to enhance multimodal data retrieval in complex data lake environments.
基金supported by the Special Project of Henan Provincial Key Research,Development and Promotion(Key Science and Technology Program)under Grant 252102210154in part by the National Natural Science Foundation of China under Grant 62403437.
文摘Ride-hailing(e.g.,DiDi andUber)has become an important tool formodern urban mobility.To improve the utilization efficiency of ride-hailing vehicles,a novel query method,called Approachable k-nearest neighbor(A-kNN),has recently been proposed in the industry.Unlike traditional kNN queries,A-kNN considers not only the road network distance but also the availability status of vehicles.In this context,even vehicles with passengers can still be considered potential candidates for dispatch if their destinations are near the requester’s location.The V-Treebased query method,due to its structural characteristics,is capable of efficiently finding k-nearest moving objects within a road network.It is a currently popular query solution in ride-hailing services.However,when vertices to be queried are close in the graph but distant in the index,the V-Tree-based method necessitates the traversal of numerous irrelevant subgraphs,which makes its processing of A-kNN queries less efficient.To address this issue,we optimize the V-Tree-based method and propose a novel index structure,the Path-Accelerated V-Tree(PAV-Tree),to improve query performance by introducing shortcuts.Leveraging this index,we introduce a novel query optimization algorithm,PAVA-kNN,specifically designed to processA-kNNqueries efficiently.Experimental results showthat PAV-A-kNNachieves query times up to 2.2–15 times faster than baseline methods,with microsecond-level latency.
文摘针对现有高效用项集挖掘算法存在的阈值提升缓慢、剪枝效用差等问题,提出了一种能够更加高效地挖掘效用值最大的前k个项集的算法。TKUL(minging Top-K high Utility itemsets based List)算法综合采用RIUQ、CUDQ和EPB阈值提升策略,加快最小阈值获取的速度,大大减少了生成的非高效用项集的数量,并通过RUI和EUCPM策略进行剪枝,有效缩小了搜索空间的规模,从而提高了高效用项集的挖掘效率。
基金supported by the Director,Cybersecurity,Energy Security,and Emergency Response,Cybersecurity for Energy Delivery Systems pro-gram,of the U.S.Department of Energy,under contract DE-AC02-05CH11231。
文摘The diferential privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specifed parametric distribution model with one or two degrees of freedom. However, this emphasis tends to neglect the crucial considerations of response accuracy and utility, especially in the context of categorical or discrete numerical database queries, where the parameters defning the noise distribution are fnite and could be chosen optimally. This paper addresses this gap by introducing a novel framework for designing an optimal noise probability mass function (PMF) tailored to discrete and fnite query sets. Our approach considers the modulo summation of random noise as the DP mechanism, aiming to present a tractable solution that not only satisfes privacy constraints but also minimizes query distortion. Unlike existing approaches focused solely on meet-ingprivacy constraints, our framework seeks to optimize the noise distribution under an arbitrary (ǫ, δ) constraint, thereby enhancing the accuracy and utility of the response. We demonstrate that the optimal PMF can be obtained through solving a mixed-integer linear program. Additionally, closed-form solutions for the optimal PMF are provided, minimizing the probability of error for two specifc cases. Numerical experiments highlight the superior performance of our proposed optimal mechanisms compared to state-of-the-art methods. This paper contributes to the DP literature by presenting a clear and systematic approach to designing noise mechanisms that not only satisfy pri-vacyrequirements but also optimize query distortion. The framework introduced here opens avenues for improved privacy-preserving database queries, ofering signifcant enhancements in response accuracy and utility.
基金supported by the National Key Research and Development Program of China(Grant No.2023YFB4503600)the National Natural Science Foundation of China(Grant Nos.U23A20299,62072460,62172424,62276270,and 62322214).
文摘The interconnection between query processing and data partitioning is pivotal for the acceleration of massive data processing during query execution,primarily by minimizing the number of scanned block files.Existing partitioning techniques predominantly focus on query accesses on numeric columns for constructing partitions,often overlooking non-numeric columns and thus limiting optimization potential.Additionally,these techniques,despite creating fine-grained partitions from representative queries to enhance system performance,experience from notable performance declines due to unpredictable fluctuations in future queries.To tackle these issues,we introduce LRP,a learned robust partitioning system for dynamic query processing.LRP first proposes a method for data and query encoding that captures comprehensive column access patterns from historical queries.It then employs Multi-Layer Perceptron and Long Short-Term Memory networks to predict shifts in the distribution of historical queries.To create high-quality,robust partitions based on these predictions,LRP adopts a greedy beam search algorithm for optimal partition division and implements a data redundancy mechanism to share frequently accessed data across partitions.Experimental evaluations reveal that LRP yields partitions with more stable performance under incoming queries and significantly surpasses state-of-the-art partitioning methods.
基金supported by the National Key R&D Program of China(Grant No.2022YFC3801700)the National Natural Science Foundation of China(Grant No.62472052)Xinjiang Production and Construction Corps Key Laboratory of Computing Intelligence and Network Information Security(Grant No.CZ002702-3)。
文摘In order to protect the privacy of the query user and database,some QKD-based quantum private query(QPQ)protocols were proposed.One example is the protocol proposed by Zhou et al,in which the user makes initial quantum states and derives the key bit by comparing the initial quantum state and the outcome state returned from the database by ctrl or shift mode,instead of announcing two non-orthogonal qubits as others which may leak part secret information.To some extent,the security of the database and the privacy of the user are strengthened.Unfortunately,we find that in this protocol,the dishonest user could be obtained,utilizing unambiguous state discrimination,much more database information than that is analyzed in Zhou et al's original research.To strengthen the database security,we improved the mentioned protocol by modifying the information returned by the database in various ways.The analysis indicates that the security of the improved protocols is greatly enhanced.