Publication Date


Publication Title

Proceedings of ALAR 2006 Conference on Applied Research in Information Technology


Workload characterization is an important part of systems performance modeling. Clustering is a method used to find classes of jobs within workloads. K-Means is one of the most popular clustering algorithms. Initial starting point values are needed as input parameters when performing k-means clustering. This paper shows that the results of the running the k-means algorithm on the same workload will vary depending on the values chosen as initial starting points. Fourteen methods of composing initial starting point values are compared in a case study. The results indicate that a synthetic method, scrambled midpoints, is an effective starting point method for k-means clustering.