How to write K-means clustering algorithm in Python?
K-means clustering algorithm is a commonly used data mining and machine learning algorithm that can classify and cluster a set of data according to its attributes. This article will introduce how to write the K-means clustering algorithm in Python and provide specific code examples.
Before we start writing code, we need to understand the basic principles of K-means clustering algorithm.
The basic steps of K-means clustering algorithm are as follows:
Now we can start writing code.
First, we need to import the necessary libraries, such as numpy and matplotlib.
import numpy as np import matplotlib.pyplot as plt
We need to prepare a set of data for clustering. Here we use numpy to randomly generate a set of two-dimensional data.
data = np.random.randn(100, 2)
We need to initialize k centroids for the clustering algorithm. Here we use numpy to randomly select k data points as the initial centroid.
k = 3 centroids = data[np.random.choice(range(len(data)), k, replace=False)]
We need to define a function to calculate the distance between the data point and the centroid. Here we use Euclidean distance.
def compute_distances(data, centroids): return np.linalg.norm(data[:, np.newaxis] - centroids, axis=2)
We need to define a function to assign each data point to the category represented by the nearest centroid.
def assign_clusters(data, centroids): distances = compute_distances(data, centroids) return np.argmin(distances, axis=1)
We need to define a function to update the position of the centroid, that is, set it to the average of all data points in the category.
def update_centroids(data, clusters, k): centroids = [] for i in range(k): centroids.append(np.mean(data[clusters == i], axis=0)) return np.array(centroids)
Finally, we need to iterate the clustering process until the position of the centroid no longer changes.
def kmeans(data, k, max_iter=100): centroids = data[np.random.choice(range(len(data)), k, replace=False)] for _ in range(max_iter): clusters = assign_clusters(data, centroids) new_centroids = update_centroids(data, clusters, k) if np.all(centroids == new_centroids): break centroids = new_centroids return clusters, centroids
Now we can run the clustering algorithm to get the category to which each data point belongs and the final centroid.
clusters, centroids = kmeans(data, k)
Finally, we can use matplotlib to visualize the results. Each data point is color-coded according to the category it belongs to, and the location of the centroid is indicated by a red circle.
plt.scatter(data[:, 0], data[:, 1], c=clusters) plt.scatter(centroids[:, 0], centroids[:, 1], s=100, c='red', marker='o') plt.show()
Through the above code examples, we can use Python to implement the K-means clustering algorithm. You can adjust the number of clusters k and other parameters according to your needs. I hope this article will help you understand and implement the K-means clustering algorithm!
The above is the detailed content of How to write K-means clustering algorithm in Python?. For more information, please follow other related articles on the PHP Chinese website!