GraphX is a graph computing library based on Spark systems,where fault tolerance is a necessary guarantee for the high availability.However,the existing fault tolerance methods are mostly implemented in a pessimistic ...GraphX is a graph computing library based on Spark systems,where fault tolerance is a necessary guarantee for the high availability.However,the existing fault tolerance methods are mostly implemented in a pessimistic way and are aimed at general computing tasks.Considering the characteristics of iterative computation,this paper presents a combination method of the optimistic fault tolerance and checkpoint for recovering the data under different failure conditions.Firstly,for single node failure,we propose the optimistic fault tolerance mechanism based on compensation function.It does not add fault tolerance measures in advance and will not incur additional costs when there are no failures.Secondly,for multiple node failures,we propose the automatic checkpoint management strategy based on RDD importance.It comprehensively considers the factors of lineage length of RDD,dependency relationship,and computation time of RDD,which can set the RDD as the checkpoint properly.Finally,we implement our proposals in GraphX of Spark−3.5.1,and evaluate the performance by using representative iterative graph algorithms on the high performance computing cluster.The results verify the correctness of iteration results of the mechanism,and illustrate that when recovering the RDD partition,the job execution time can be reduced by the mechanism and strategy substantially.展开更多
现有的匿名技术多关注匿名后数据的可用性,忽略了攻击者可以通过多种背景知识进行攻击的问题。此外,随着用户规模的逐年递增,传统的匿名技术已不能满足实际需求。为此,提出一种保护链接关系的分布式匿名方法PLRD-(k,m)(distributed k-de...现有的匿名技术多关注匿名后数据的可用性,忽略了攻击者可以通过多种背景知识进行攻击的问题。此外,随着用户规模的逐年递增,传统的匿名技术已不能满足实际需求。为此,提出一种保护链接关系的分布式匿名方法PLRD-(k,m)(distributed k-degree-m-label anonymity with protecting link relationships)。该方法利用GraphX的消息传递机制,通过将互为N-hop邻居的节点分为一组并进行k-degree匿名和m-标签匿名,保证攻击者无法通过度和标签识别出目标并保护链接关系不被泄露。最后,扩展了PLRD-(k,m)方法,提出一种个性化匿名方法以满足用户不同的需求。基于真实社会网络数据集的实验结果表明,提出的方法不仅能提高处理大规模社会网络的执行效率,同时具有很好的数据可用性。展开更多
针对当前社会网络隐私保护方法存在社区结构破坏严重、单工作站处理数据能力低等不足,提出一种保护社区结构的社会网络度匿名SNDA-PCS(social network degree anonymity for protecting community structure)方法。社会网络社区发现使...针对当前社会网络隐私保护方法存在社区结构破坏严重、单工作站处理数据能力低等不足,提出一种保护社区结构的社会网络度匿名SNDA-PCS(social network degree anonymity for protecting community structure)方法。社会网络社区发现使用分裂聚集算法,由聚合向量构造的压缩二叉树分组匿名度序列,添加虚拟顶点构造匿名图,根据顶点所属社区设计虚拟顶点删除-添加算法以提高发布图数据可用性。SNDA-PCS算法基于大规模并行图处理系统GraphX实现,实验结果表明,SNDA-PCS算法在满足匿名要求的同时保证了社区结构的可用性。展开更多
基金supported by the National Key Research and Development Program of China(Grant No.2021YFB0301200)the Hunan Natural Science Foundation Project(Grant No.2023JJ40555)+1 种基金the Hunan Provincial Graduate Student Research and Innovation Project(Grant No.LXBZZ2024035)the Hunan Provincial Department of Education Scientific Research Project(Grant No.22B0451).
文摘GraphX is a graph computing library based on Spark systems,where fault tolerance is a necessary guarantee for the high availability.However,the existing fault tolerance methods are mostly implemented in a pessimistic way and are aimed at general computing tasks.Considering the characteristics of iterative computation,this paper presents a combination method of the optimistic fault tolerance and checkpoint for recovering the data under different failure conditions.Firstly,for single node failure,we propose the optimistic fault tolerance mechanism based on compensation function.It does not add fault tolerance measures in advance and will not incur additional costs when there are no failures.Secondly,for multiple node failures,we propose the automatic checkpoint management strategy based on RDD importance.It comprehensively considers the factors of lineage length of RDD,dependency relationship,and computation time of RDD,which can set the RDD as the checkpoint properly.Finally,we implement our proposals in GraphX of Spark−3.5.1,and evaluate the performance by using representative iterative graph algorithms on the high performance computing cluster.The results verify the correctness of iteration results of the mechanism,and illustrate that when recovering the RDD partition,the job execution time can be reduced by the mechanism and strategy substantially.
文摘现有的匿名技术多关注匿名后数据的可用性,忽略了攻击者可以通过多种背景知识进行攻击的问题。此外,随着用户规模的逐年递增,传统的匿名技术已不能满足实际需求。为此,提出一种保护链接关系的分布式匿名方法PLRD-(k,m)(distributed k-degree-m-label anonymity with protecting link relationships)。该方法利用GraphX的消息传递机制,通过将互为N-hop邻居的节点分为一组并进行k-degree匿名和m-标签匿名,保证攻击者无法通过度和标签识别出目标并保护链接关系不被泄露。最后,扩展了PLRD-(k,m)方法,提出一种个性化匿名方法以满足用户不同的需求。基于真实社会网络数据集的实验结果表明,提出的方法不仅能提高处理大规模社会网络的执行效率,同时具有很好的数据可用性。
文摘针对当前社会网络隐私保护方法存在社区结构破坏严重、单工作站处理数据能力低等不足,提出一种保护社区结构的社会网络度匿名SNDA-PCS(social network degree anonymity for protecting community structure)方法。社会网络社区发现使用分裂聚集算法,由聚合向量构造的压缩二叉树分组匿名度序列,添加虚拟顶点构造匿名图,根据顶点所属社区设计虚拟顶点删除-添加算法以提高发布图数据可用性。SNDA-PCS算法基于大规模并行图处理系统GraphX实现,实验结果表明,SNDA-PCS算法在满足匿名要求的同时保证了社区结构的可用性。