摘要
视觉特征空间中的大规模聚类问题是图像识别和检索中亟待解决的问题.当前最好的算法是近似k-means算法,它是Lloyd算法的近似算法,只能依靠采用高准确率的近似搜索近似地保证聚类结果的性能.为此针对近似k-means算法提出改进的基本不增加时间、空间代价新算法,具有更好的算法收敛性和聚类性能.该算法利用了迭代求解过程中更多的信息,更有效地更新子类划分,使得聚类损失单调不增并且快速减小.理论证明,采用任意准确率的近似搜索,该算法都可以在有限轮迭代后收敛到Lloyd算法的收敛解.实验结果表明,分别采用最优参数产生同等性能结果时,所提出的算法比近似k-means算法快10倍.此外,通过比较全局特征聚类实验中的子类的图像,也直观地验证了其聚类效果.
The large scale clustering problem of visual features is crucial for image recognition and retrieval. The state-of-the-art algorithm, the approximate k means, approximately guarantees the clustering performance by applying the high-precision approximate search. An improved algorithm was proposed, which requires no extra memory cost and nearly no extra time consumption. The robust approximate algorithm can better guarantee its convergence and clustering performance by utilizing more information in the iteration to update the partition, so that clustering loss is non-increasing and reduced rapidly. Theoretical proofs guarantee that the algorithm converges to the converged solution of the Lloyd algorithm, regard less of the precision values of approximate search. The experiment results show that the algorithm has about 10 times the speed of the approximate k-means algorithm. Besides, the clustering performance is also directly verified by comparing the images in the clustering results of global features.