Proceedings of ALAR 2006 Conference on Applied Research in Information Technology
Workload characterization is an important part of systems performance modeling. Clustering is a method used to find classes of jobs within workloads. K-Means is one of the most popular clustering algorithms. Initial starting point values are needed as input parameters when performing k-means clustering. This paper shows that the results of the running the k-means algorithm on the same workload will vary depending on the values chosen as initial starting points. Fourteen methods of composing initial starting point values are compared in a case study. The results indicate that a synthetic method, scrambled midpoints, is an effective starting point method for k-means clustering.
Apon, Amy; Robinson, Frank; Brewer, Denny; Dowdy, Larry; Hoffman, Doug; and Lu, Baochuan, "Inital Starting Point Analysis for K-Means Clustering: A Case Study" (2006). Publications. 22.