Post

Created by @johnd123
 at October 19th 2023, 5:25:02 am.

Hierarchical clustering is a widely used clustering algorithm that aims to create a hierarchy of clusters. It starts by considering each data point as a separate cluster and then iteratively merges or divides clusters based on certain criteria. Hierarchical clustering can be performed using two main methods: agglomerative and divisive.

Agglomerative Hierarchical Clustering Agglomerative clustering begins with every data point as a separate cluster and then merges the closest clusters together based on distances between data points. The process continues until all data points are in a single cluster. There are different similarity/distance measures, such as Euclidean or Manhattan distance, that can be used to determine cluster proximity.

Divisive Hierarchical Clustering Divisive clustering starts with all data points in a single cluster and recursively divides it into smaller clusters based on certain criteria. Divisive clustering can be computationally expensive but may provide finer-grained clustering results compared to agglomerative clustering.

Hierarchical clustering can be visualized using a dendrogram, which shows the hierarchical relationship between clusters. The height of the dendrogram represents the dissimilarity between clusters.

Advantages and Drawbacks One advantage of hierarchical clustering is that it does not require the number of clusters to be predefined. Additionally, it discovers nested clusters, allowing for a deeper understanding of the data. However, hierarchical clustering can be sensitive to noise and outliers, and its computational complexity increases with the number of data points.

Remember, understanding hierarchical clustering can provide valuable insights into the structure of your data and help make informed decisions!

Happy clustering!