To see how many unique values in a column, use Series.nunique:
df.domain.nunique()
# 4
To get all these distinct values, you can use unique or drop_duplicates, the slight difference between the two functions is that unique return a numpy.array while drop_duplicates returns a pandas.Series:
As for this specific problem, since you'd like to count distinct value with respect to another variable, besides groupby method provided by other answers here, you can also simply drop duplicates firstly and then do value_counts():
You could also use value_counts, which is slightly less efficient.But the best is Jezrael's answer using nunique:
%timeit df.drop_duplicates().groupby('domain').size()
1000 loops, best of 3: 939 µs per loop
%timeit df.drop_duplicates().domain.value_counts()
1000 loops, best of 3: 1.1 ms per loop
%timeit df.groupby('domain')['ID'].nunique()
1000 loops, best of 3: 440 µs per loop
You need
nunique
:If you need to
strip
'
characters:Or as Jon Clements commented:
You can retain the column name like this:
The difference is that
nunique()
returns a Series andagg()
returns a DataFrame.Generally to count distinct values in single column, you can use
Series.value_counts
:To see how many unique values in a column, use
Series.nunique
:To get all these distinct values, you can use
unique
ordrop_duplicates
, the slight difference between the two functions is thatunique
return anumpy.array
whiledrop_duplicates
returns apandas.Series
:As for this specific problem, since you'd like to count distinct value with respect to another variable, besides
groupby
method provided by other answers here, you can also simply drop duplicates firstly and then dovalue_counts()
:df.domain.value_counts()
IIUC you want the number of different
ID
for everydomain
, then you can try this:output:
You could also use
value_counts
, which is slightly less efficient.But the best is Jezrael's answer usingnunique
: