Constrained clustering,such as k-means with instance-level Must-Link(ML)and Cannot-Link(CL)auxiliary information as the constraints,has been extensively studied recently,due to its broad applications in data science a...Constrained clustering,such as k-means with instance-level Must-Link(ML)and Cannot-Link(CL)auxiliary information as the constraints,has been extensively studied recently,due to its broad applications in data science and AI.Despite some heuristic approaches,there has not been any algorithm providing a non-trivial approximation ratio to the constrained k-means problem.To address this issue,we propose an algorithm with a provable approximation ratio of O(logk)when only ML constraints are considered.We also empirically evaluate the performance of our algorithm on real-world datasets having artificial ML and disjoint CL constraints.The experimental results show that our algorithm outperforms the existing greedy-based heuristic methods in clustering accuracy.展开更多
基金This work was supported by the National Natural Science Foundation of China(Nos.12271098 and 61772005)the Outstanding Youth Innovation Team Project for Universities of Shandong Province(No.2020KJN008)。
文摘Constrained clustering,such as k-means with instance-level Must-Link(ML)and Cannot-Link(CL)auxiliary information as the constraints,has been extensively studied recently,due to its broad applications in data science and AI.Despite some heuristic approaches,there has not been any algorithm providing a non-trivial approximation ratio to the constrained k-means problem.To address this issue,we propose an algorithm with a provable approximation ratio of O(logk)when only ML constraints are considered.We also empirically evaluate the performance of our algorithm on real-world datasets having artificial ML and disjoint CL constraints.The experimental results show that our algorithm outperforms the existing greedy-based heuristic methods in clustering accuracy.