Clustering Stacked Bars in Pandas and Matplotlib
Introduction
This article addresses the problem of creating stacked bar plots with clustered bars for multiple dataframes sharing the same index and columns. The goal is to have clustered stacked bars per index, ensuring clarity in the visualization.
Using Pandas and Matplotlib
The solution leverages the capabilities of Pandas and Matplotlib libraries. Here's the code:
<code class="python">import pandas as pd import matplotlib.cm as cm import matplotlib.pyplot as plt def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot"): n_df = len(dfall) n_col = len(dfall[0].columns) n_ind = len(dfall[0].index) axe = plt.subplot(111) for df in dfall: # for each data frame axe = df.plot(kind="bar", linewidth=0, stacked=True, ax=axe, legend=False, grid=False) h,l = axe.get_legend_handles_labels() # get the handles we want to modify for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df for j, pa in enumerate(h[i:i+n_col]): for rect in pa.patches: # for each index rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col)) rect.set_hatch("/" * int(i / n_col)) #edited part rect.set_width(1 / float(n_df + 1)) axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.) axe.set_xticklabels(df.index, rotation = 0) axe.set_title(title) # Add invisible data to add another legend n=[] for i in range(n_df): n.append(axe.bar(0, 0, color="gray", hatch="/" * i)) l1 = axe.legend(h[:n_col], l[:n_col]) if labels is not None: l2 = plt.legend(n, labels) axe.add_artist(l1) return axe</code>
To use this function, simply pass in a list of dataframes and optional arguments such as labels and title. It will produce clustered stacked bars with hatches to differentiate the dataframes.
Example
Here's an example using this function:
<code class="python"># create fake dataframes df1 = pd.DataFrame(np.random.rand(4, 5), index=["A", "B", "C", "D"], columns=["I", "J", "K", "L", "M"]) df2 = pd.DataFrame(np.random.rand(4, 5), index=["A", "B", "C", "D"], columns=["I", "J", "K", "L", "M"]) df3 = pd.DataFrame(np.random.rand(4, 5), index=["A", "B", "C", "D"], columns=["I", "J", "K", "L", "M"]) # plot clustered stacked bar plot_clustered_stacked([df1, df2, df3], ["df1", "df2", "df3"])</code>
Additional Features
You can customize the colors of the bars by passing a cmap argument:
<code class="python">plot_clustered_stacked([df1, df2, df3], ["df1", "df2", "df3"], cmap=plt.cm.viridis)</code>
Conclusion
This solution provides a flexible and convenient way to create clustered stacked bar plots. You can easily modify the code to meet the specific requirements of your data visualization.
The above is the detailed content of How to Create Clustered Stacked Bar Plots in Pandas and Matplotlib?. For more information, please follow other related articles on the PHP Chinese website!