Article
Version 1
Preserved in Portico This version is not peer-reviewed
A Fast K-prototypes Algorithm Using Partial Distance Computation
Version 1
: Received: 17 April 2017 / Approved: 17 April 2017 / Online: 17 April 2017 (11:08:26 CEST)
A peer-reviewed article of this Preprint also exists.
Kim, B. A Fast K-prototypes Algorithm Using Partial Distance Computation. Symmetry 2017, 9, 58. Kim, B. A Fast K-prototypes Algorithm Using Partial Distance Computation. Symmetry 2017, 9, 58.
Abstract
The k-means is one of the most popular and widely used clustering algorithm, however, it is limited to only numeric data. The k-prototypes algorithm is one of the famous algorithms for dealing with both numeric and categorical data. However, there have been no studies to accelerate k-prototypes algorithm. In this paper, we propose a new fast k-prototypes algorithm that gives the same answer as original k-prototypes. The proposed algorithm avoids distance computations using partial distance computation. Our k-prototypes algorithm finds minimum distance without distance computations of all attributes between an object and a cluster center, which allows it to reduce time complexity. A partial distance computation uses a fact that a value of the maximum difference between two categorical attributes is 1 during distance computations. If data objects have m categorical attributes, maximum difference of categorical attributes between an object and a cluster center is m. Our algorithm first computes distance with only numeric attributes. If a difference of the minimum distance and the second smallest with numeric attributes is higher than m, we can find minimum distance between an object and a cluster center without distance computations of categorical attributes. The experimental shows proposed k-prototypes algorithm improves computational performance than original k-prototypes algorithm in our dataset.
Keywords
clustering algorithm; k-prototypes algorithm, partial distance computation
Subject
Computer Science and Mathematics, Computer Science
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment