COMAD 2005 START ConferenceManager    

SynDECA: A Tool to Generate Synthetic Datasets for Evaluation of Clustering Algorithms

Jhansi Rani Vennam, Soujanya Vadapalli

Presented at 11th International Conference on Management of Data (COMAD 2005) (COMAD 2005), Goa, India, January 6-8, 2005


A large number of clustering algorithms have been proposed of late, which can identify clusters of arbitrary shapes, varying densities and sizes. This necessitates the idea of "benchmarking" datasets that can evaluate clustering algorithms on various aspects like scalability, accuracy and robustness to noise. Real-life datasets are few in number and do not have the "original" clustering results by default. This emphasizes the need to have a toolkit that can generate datasets, which mimic real-life data along with the actual clustering results. In this paper, we propose a few algorithms and methodologies that generate high-dimensional datasets along with the original clustering results. We developed a toolkit called SynDECA that generates synthetic datasets based on the algorithms proposed.

START Conference Manager (V2.47.4)