摘要
聚类是机器学习中的经典任务,旨在根据相似度度量将数据划分为若干簇。k-均值聚类作为最基本的聚类模型,自提出以来已被深入研究并在众多领域得到广泛应用。聚焦k-均值模型的求解问题,从理论计算机科学的视角出发,介绍k-均值的(接近)线性时间的快速近似算法的研究进展。此外,简要讨论其他相关大数据计算模型中的聚类算法的相关进展,包括动态、数据流与并行计算等计算模型。
Clustering is a classic task in machine learning.The goal of clustering is to partition data points into groups,with respect to a similarity measure.As one of the most fundamental models for clustering,k-means has been extensively studied and widely applied.This paper focuses on the computational issue of solving k-means efficiently,and discusses the progress of(near-)linear time approximation algorithms for k-means,from the perspective of theoretical computer science.It also briefly discusses the status of clustering algorithms in various big data computational models,including dynamic,streaming and distributed computing.
作者
高贵晨
姜少峰
GAO Guichen;JIANG Shaofeng(School of Computer Science,Peking University,Beijing 100871,China)
出处
《计算机科学》
北大核心
2026年第4期24-32,共9页
Computer Science
基金
国家自然科学基金(62572006)。
关键词
K-均值
欧氏空间
近线性时间算法
亚线性算法
k-means
Euclidean spaces
Near-linear time algorithms
Sublinear algorithms