Question:
How can I efficiently create a scatter plot using a Pandas DataFrame, where the markers are dictated by a third column in the DataFrame?
Answer:
Using matplotlib.pyplot.scatter() to differentiate markers by category can be inefficient. Instead, consider using matplotlib.pyplot.plot() for discrete categories:
import matplotlib.pyplot as plt import numpy as np import pandas as pd # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels)) # Group by labels groups = df.groupby('label') # Plot fig, ax = plt.subplots() ax.margins(0.05) # Optional padding # Use different markers and colors for each group for name, group in groups: ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend() # Specify custom colors and styles plt.rcParams.update(pd.tools.plotting.mpl_stylesheet) colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random') ax.set_color_cycle(colors) ax.legend(numpoints=1, loc='upper left') plt.show()
This code generates a scatter plot with markers color-coded by category.
The above is the detailed content of How to create a scatter plot with markers differentiated by category in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!