Complexity Results in Some Clustering Algorithms
No Thumbnail Available
Date
2025-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indian Statistical Institute, Kolkata
Abstract
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a prevalent
Clustering method without supervision renowned for its capability to recognize arbitrarily
shaped clusters and detect noise in spatial data. Unlike partitioning methods such as
k-means, DBSCAN operates without inputting a predefined number of clusters and is
particularly effective in handling datasets with varying densities.
In this dissertation, we have undertaken an in-depth exploration of the DBSCAN
algorithm. We reviewed and analyzed several research papers that build upon or revise the
original DBSCAN framework, with the goal of understanding their motivations, design
choices, and computational implications.
In addition to studying the foundational principles, we examined traditional spatial
data structures that are commonly employed to accelerate DBSCAN, such as R-trees and
KD-trees. This background enabled us to identify key computational bottlenecks in both
neighbor search and density estimation.
Building on these insights, we proposed two novel algorithms. The first is an approximate
algorithm that efficiently replicates standard DBSCAN behavior, and the second is
a modified version termed Box-based DBSCAN, which operates under a slightly altered
definition of neighborhood using axis-aligned bounding boxes. The box-based approach
improves clustering performance for geometrically structured data and introduces new
ways to identify core regions without relying on exhaustive point-wise comparisons.
Description
Dissertation under the supervision of Prof. Sandip Das
Keywords
DBSCAN (Density-Based Spatial Clustering of Applications with Noise ), BB (Bounding Box), MBB (Minimum Bounding Box), USEC (Unit-Spherical Emptiness Checking), BCP (Bi-Chromatic Closest Pair)
Citation
53p.
