SQL Query Equivalent in Pandas using 'count(distinct)'
In SQL, counting distinct values in a column can be achieved using the 'count(distinct)' function. For example, to count unique client codes per year month:
<code class="sql">SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;</code>
A similar operation can be performed in Pandas using the 'nunique()' method on a grouped DataFrame. By grouping the data by the 'YEARMONTH' column and then calling 'nunique()' on the 'CLIENTCODE' column, we can obtain the number of unique clients per year month.
<code class="python">table.groupby('YEARMONTH').CLIENTCODE.nunique()</code>
Example:
Consider a DataFrame 'table' containing the following columns:
CLIENTCODE | YEARMONTH |
---|---|
1 | 201301 |
1 | 201301 |
2 | 201301 |
1 | 201302 |
2 | 201302 |
2 | 201302 |
3 | 201302 |
Applying the aforementioned code yields:
<code class="python">Out[3]: YEARMONTH 201301 2 201302 3</code>
This output matches the expected result, showing the count of unique clients for each year month.
The above is the detailed content of How to Perform SQL \'count(distinct)\' Equivalent in Pandas using \'nunique()\'?. For more information, please follow other related articles on the PHP Chinese website!