聚类是大规模高维向量数据分析的关键技术之一.近年来,基于密度的聚类算法DBSCAN(density-based spatial clustering of applications with noise)因其无须预先指定聚类数量、能够发现复杂聚类结构并有效识别噪声点的特性,在数据分析领...聚类是大规模高维向量数据分析的关键技术之一.近年来,基于密度的聚类算法DBSCAN(density-based spatial clustering of applications with noise)因其无须预先指定聚类数量、能够发现复杂聚类结构并有效识别噪声点的特性,在数据分析领域得到了广泛应用.然而,现有的基于密度的聚类算法在处理高维向量数据时将产生极高的时间代价且面临维度灾难等问题,难以在实际场景中部署应用.此外,随着信息技术的发展,高维向量数据规模急剧增加,使用CPU进行高维向量聚类在时间代价和可扩展性等方面将面临更大的挑战.为此,提出一种GPU加速的高维向量聚类算法,通过引入K近邻(K-nearest neighbor,KNN)图索引加速DBSCAN的计算.首先,设计了GPU加速的并行K近邻图构建算法,显著降低了K近邻图索引的构建开销.其次,提出了基于层间并行的K-means树分区算法及基于广度优先搜索和核心近邻图的并行聚类算法,改进了DBSCAN算法的计算流程,实现了高并发向量聚类.最后,在真实向量数据集上进行了大量实验,并将所提出的方法与现有方法进行了性能对比.实验结果表明,所提方法在保证聚类精度的前提下,将大规模向量聚类的效率提高了5.7–2822.5倍.展开更多
Let H=(V,E)be an n-balanced k-partite k-graph with partition classes V1,...,Vk.Suppose for every legal(k-1)-tuple f contained in V\V1 and for every legal(k-1)-tuple g contained in V\Vk such that f∪g■E(H),we have d(f...Let H=(V,E)be an n-balanced k-partite k-graph with partition classes V1,...,Vk.Suppose for every legal(k-1)-tuple f contained in V\V1 and for every legal(k-1)-tuple g contained in V\Vk such that f∪g■E(H),we have d(f)+d(g)≥n+1.In this paper,we prove that under this condition H must have a perfect matching.Another result of this paper is about the perfect matching in 3-uniform hm-bipartite hypergraphs.Let G be a 3-uniform hm-bipartite hypergraph with one of whose sides V1 has the size n,the another side V2 has size 2 n.If for all the legal 2-tuple f with|f∩V1|=1 and for all the legal 2-tuple g with|g∩V1|=0,we have d(f)≥n-2 and d(g)>n/2,then G has a perfect matching.展开更多
A variation in the classical Turn extremal problem is studied. A simple graph G of order n is said to have property P k if it contains a clique of size k+1 as its subgraph. An n term nonincreasing nonnegative integer ...A variation in the classical Turn extremal problem is studied. A simple graph G of order n is said to have property P k if it contains a clique of size k+1 as its subgraph. An n term nonincreasing nonnegative integer sequence π=(d 1,d 2,...,d n) is said to be graphic if it is the degree sequence of a simple graph G of order n and such a graph G is referred to as a realization of π . A graphic sequence π is said to be potentially P k graphic if it has a realization G having property P k . The problem: determine the smallest positive even number σ(k,n) such that every n term graphic sequence π=(d 1,d 2,...,d n) without zero terms and with degree sum σ(π)=d 1+d 2+...+d n at least σ(k,n) is potentially P k graphic has been proved positive.展开更多
文摘聚类是大规模高维向量数据分析的关键技术之一.近年来,基于密度的聚类算法DBSCAN(density-based spatial clustering of applications with noise)因其无须预先指定聚类数量、能够发现复杂聚类结构并有效识别噪声点的特性,在数据分析领域得到了广泛应用.然而,现有的基于密度的聚类算法在处理高维向量数据时将产生极高的时间代价且面临维度灾难等问题,难以在实际场景中部署应用.此外,随着信息技术的发展,高维向量数据规模急剧增加,使用CPU进行高维向量聚类在时间代价和可扩展性等方面将面临更大的挑战.为此,提出一种GPU加速的高维向量聚类算法,通过引入K近邻(K-nearest neighbor,KNN)图索引加速DBSCAN的计算.首先,设计了GPU加速的并行K近邻图构建算法,显著降低了K近邻图索引的构建开销.其次,提出了基于层间并行的K-means树分区算法及基于广度优先搜索和核心近邻图的并行聚类算法,改进了DBSCAN算法的计算流程,实现了高并发向量聚类.最后,在真实向量数据集上进行了大量实验,并将所提出的方法与现有方法进行了性能对比.实验结果表明,所提方法在保证聚类精度的前提下,将大规模向量聚类的效率提高了5.7–2822.5倍.
基金supported in part by the National Natural Science Foundation of China (No. 61373019)
文摘Let H=(V,E)be an n-balanced k-partite k-graph with partition classes V1,...,Vk.Suppose for every legal(k-1)-tuple f contained in V\V1 and for every legal(k-1)-tuple g contained in V\Vk such that f∪g■E(H),we have d(f)+d(g)≥n+1.In this paper,we prove that under this condition H must have a perfect matching.Another result of this paper is about the perfect matching in 3-uniform hm-bipartite hypergraphs.Let G be a 3-uniform hm-bipartite hypergraph with one of whose sides V1 has the size n,the another side V2 has size 2 n.If for all the legal 2-tuple f with|f∩V1|=1 and for all the legal 2-tuple g with|g∩V1|=0,we have d(f)≥n-2 and d(g)>n/2,then G has a perfect matching.
文摘A variation in the classical Turn extremal problem is studied. A simple graph G of order n is said to have property P k if it contains a clique of size k+1 as its subgraph. An n term nonincreasing nonnegative integer sequence π=(d 1,d 2,...,d n) is said to be graphic if it is the degree sequence of a simple graph G of order n and such a graph G is referred to as a realization of π . A graphic sequence π is said to be potentially P k graphic if it has a realization G having property P k . The problem: determine the smallest positive even number σ(k,n) such that every n term graphic sequence π=(d 1,d 2,...,d n) without zero terms and with degree sum σ(π)=d 1+d 2+...+d n at least σ(k,n) is potentially P k graphic has been proved positive.