Machine Learning

K-Means Clustering

Welcome back to the course! In our last lesson, we discovered that Unsupervised Learning is like sorting a giant box of mixed Lego blocks without an instruction manual the algorithm finds patterns in unlabeled data entirely on its own.

Today, we are going to look under the hood. How exactly does the computer group these blocks? Let’s explore two of the most popular algorithms for this task: K-Means Clustering and Hierarchical Clustering.

Clean K-Means clustering visualization featuring scattered data points grouped into colorful clusters around highlighted centroid points, demonstrating how similar data is organized into meaningful groups.

1. K-Means Clustering: The “Center of Attention” Approach

A Simple Definition

K-Means is an algorithm that groups unlabeled data into a specific number of clusters, represented by the letter K. You tell the computer, “I want to organize this data into exactly 3 groups” (so K=3), and the algorithm figures out the best way to do it.

Step-by-Step: How It Works

Imagine you are organizing a networking event in a large hall, and you want to set up 3 tables (K=3) so people with similar interests naturally gather around them.

Pick the Centers (Initialization): You randomly place 3 tables in the room. These tables are called centroids (the center point of a cluster).
Assign the Data (Assignment): Everyone in the room walks to the table that is closest to them.
Move the Centers (Update): You look at where the crowds have formed. To make it fairer for everyone, you move each table to the exact mathematical center of the group standing around it.
Repeat: Because the tables moved, some people might now be closer to a different table. They switch tables. You move the tables to the new center again.
Finish (Convergence): You repeat this until nobody needs to switch tables anymore. The clusters are set!

K-Means Clustering (Unsupervised ML)

# Import libraries
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Sample dataset
data = {
    'X': [1, 2, 3, 8, 9, 10],
    'Y': [2, 3, 4, 8, 9, 10]
}

df = pd.DataFrame(data)

# Model
kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(df)

# Predictions
df['Cluster'] = kmeans.labels_

print(df)

# Visualization
plt.scatter(df['X'], df['Y'], c=df['Cluster'])
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='X')
plt.title("K-Means Clustering")
plt.show()

Interactive K-Means Explorer

To really understand this, try stepping through the algorithm yourself. Generate some random data, pick your “K”, and watch the centroids find their groups!

K-Means Clustering

1. K-Means Clustering: The “Center of Attention” Approach

A Simple Definition

Step-by-Step: How It Works

Interactive K-Means Explorer

Sign in

Sign up