Grouping by Term Counts in a Pandas Dataframe
Problem:
Given a dataframe with the following columns: id, group, and term. The goal is to determine the number of occurrences of each term within each unique combination of id and group.
Solution:
To avoid using loops, utilize the groupby and size functions in Pandas:
The groupby function groups the dataframe by the specified columns (id, group, and term), while the size function counts the occurrences of each combination. The unstack function produces a more visually appealing table with the counts arranged in a matrix.
The result is a table with multi-index columns where the first two levels represent the combination of id and group, and the third level corresponds to the term. Each cell in the table shows the number of times a particular term appears for the corresponding id and group.
Timing:
For large datasets (e.g., 1,000,000 rows), the performance is excellent:
Using the aforementioned approach, the elapsed time is approximately 1 second.
The above is the detailed content of How to Efficiently Count Term Occurrences within Groups in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!