Improving Efficiency of K-Means Algorithm for Large Datasets

التفاصيل البيبلوغرافية
العنوان: Improving Efficiency of K-Means Algorithm for Large Datasets
المؤلفون: Swapna, Ch., Kumar, V., Murthy, J.V.R
المصدر: International Journal of Rough Sets and Data Analysis; April 2016, Vol. 3 Issue: 2 p1-9, 9p
مستخلص: Clustering is a process of grouping objects into different classes based on their similarities. K-means is a widely studied partitional based algorithm. It is reported to work efficiently for small datasets; however the performance is not very appreciable in terms of time of computation for large datasets. Several modifications have been made by researchers to address this issue. This paper proposes a novel way of handling the large datasets using K-means in a distributed manner to obtain efficiency. The concept of parallel processing is exploited by dividing the datasets to a number of baskets and then applying K-means in parallel manner to each such basket. The proposed BasketK-means provides a very competitive performance with considerably less computation time. The simulation results on various real datasets and synthetic datasets presented in the work clearly emphasize the effectiveness of the proposed approach.
قاعدة البيانات: Supplemental Index
الوصف
تدمد:23344598
23344601
DOI:10.4018/IJRSDA.2016040101