Continuously publishing histograms in data streams is crucial to many real-time applications,as it provides not only critical statistical information,but also reduces privacy leaking risk.As the importance of elements...Continuously publishing histograms in data streams is crucial to many real-time applications,as it provides not only critical statistical information,but also reduces privacy leaking risk.As the importance of elements usually decreases over time in data streams,in this paper we model a data stream by a sequence of weighted sliding windows,and then study how to publish histograms over these windows continuously.The existing literature can hardly solve this problem in a real-time way,because they need to buffer all elements in each sliding window,resulting in high computational overhead and prohibitive storage burden.In this paper,we overcome this drawback by proposing an online algorithm denoted by Efficient Streaming Histogram Publishing(ESHP)to continuously publish histograms over weighted sliding windows.Specifically,our method first creates a novel sketching structure,called Approximate-Estimate Sketch(AESketch),to maintain the counting information of each histogram interval at every time instance;then,it creates histograms that satisfy the differential privacy requirement by smartly adding appropriate noise values into the sketching structure.Extensive experimental results and rigorous theoretical analysis demonstrate that the ESHP method can offer equivalent data utility with significantly lower computational overhead and storage costs when compared to other existing methods.展开更多
针对当前关于数据流加权最大频繁项集WMFI(weighted maximal frequent itemsets)的研究无法有效地处理频繁阈值和加权频繁阈值不一致情况下WMFI的挖掘问题,提出了完全加权最大频繁项集FWM FI(full w eighted maximal frequent itemsets...针对当前关于数据流加权最大频繁项集WMFI(weighted maximal frequent itemsets)的研究无法有效地处理频繁阈值和加权频繁阈值不一致情况下WMFI的挖掘问题,提出了完全加权最大频繁项集FWM FI(full w eighted maximal frequent itemsets)的概念.为了减少naive算法在处理滑动窗口下完全加权最大频繁项集挖掘时存在的冗余运算,提出了FWMFI-SW(FWMFI mining based on sliding window over data stream)算法.所提出的算法通过基于频繁约束条件的优化策略减少了naive算法中M ax W优化策略的无效调用次数;采用编辑距离比率作为WMFP-SW-tree的重构判别函数,可以有效减少该树的重构次数.实验结果表明FWMFI-SW算法是有效的,且比naive算法更有时间优势.展开更多
基金supported by the Program for Synergy Innovation in the Anhui Higher Education Institutions of China(No.GXXT-2020-012)the National Natural Science Foundation of China(No.62172003)+2 种基金the Anhui Provincial Natural Science Foundation(No.2108085MF218)the Anhui Province University Natural Science Research Project(No.2022AH040052)the Science and Technology Innovation Program of Ma’anshan,China(No.2021a120009).
文摘Continuously publishing histograms in data streams is crucial to many real-time applications,as it provides not only critical statistical information,but also reduces privacy leaking risk.As the importance of elements usually decreases over time in data streams,in this paper we model a data stream by a sequence of weighted sliding windows,and then study how to publish histograms over these windows continuously.The existing literature can hardly solve this problem in a real-time way,because they need to buffer all elements in each sliding window,resulting in high computational overhead and prohibitive storage burden.In this paper,we overcome this drawback by proposing an online algorithm denoted by Efficient Streaming Histogram Publishing(ESHP)to continuously publish histograms over weighted sliding windows.Specifically,our method first creates a novel sketching structure,called Approximate-Estimate Sketch(AESketch),to maintain the counting information of each histogram interval at every time instance;then,it creates histograms that satisfy the differential privacy requirement by smartly adding appropriate noise values into the sketching structure.Extensive experimental results and rigorous theoretical analysis demonstrate that the ESHP method can offer equivalent data utility with significantly lower computational overhead and storage costs when compared to other existing methods.
文摘针对当前关于数据流加权最大频繁项集WMFI(weighted maximal frequent itemsets)的研究无法有效地处理频繁阈值和加权频繁阈值不一致情况下WMFI的挖掘问题,提出了完全加权最大频繁项集FWM FI(full w eighted maximal frequent itemsets)的概念.为了减少naive算法在处理滑动窗口下完全加权最大频繁项集挖掘时存在的冗余运算,提出了FWMFI-SW(FWMFI mining based on sliding window over data stream)算法.所提出的算法通过基于频繁约束条件的优化策略减少了naive算法中M ax W优化策略的无效调用次数;采用编辑距离比率作为WMFP-SW-tree的重构判别函数,可以有效减少该树的重构次数.实验结果表明FWMFI-SW算法是有效的,且比naive算法更有时间优势.