# python - Spočítat jedinečné hodnoty s pandami na skupiny

original title: "python - Count unique values with pandas per groups"

Translate

Tato otázka již má odpověď zde: Pandas počet (odlišný) ekvivalent 5 odpovědí ...

Toto je shrnutí po překladu. Pokud potřebujete zobrazit celý překlad, klikněte na ikonu „přeložit“

Všechny odpovědi
• Translate

You need `nunique`:

``````df = df.groupby('domain')['ID'].nunique()

print (df)
domain
'vk.com'          3
Name: ID, dtype: int64
``````

If you need to `strip` `'` characters:

``````df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
print (df)
domain
vk.com          3
Name: ID, dtype: int64
``````

Or as Jon Clements commented:

``````df.groupby(df.domain.str.strip("'"))['ID'].nunique()
``````

You can retain the column name like this:

``````df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
print(df)
domain  ID
0       fb   1
1      ggl   1
3       vk   3
``````

The difference is that `nunique()` returns a Series and `agg()` returns a DataFrame.

• Translate

Generally to count distinct values in single column, you can use `Series.value_counts`:

``````df.domain.value_counts()

#'vk.com'          5
#Name: domain, dtype: int64
``````

To see how many unique values in a column, use `Series.nunique`:

``````df.domain.nunique()
# 4
``````

To get all these distinct values, you can use `unique` or `drop_duplicates`, the slight difference between the two functions is that `unique` return a `numpy.array` while `drop_duplicates` returns a `pandas.Series`:

``````df.domain.unique()

df.domain.drop_duplicates()
#0          'vk.com'
#Name: domain, dtype: object
``````

As for this specific problem, since you'd like to count distinct value with respect to another variable, besides `groupby` method provided by other answers here, you can also simply drop duplicates firstly and then do `value_counts()`:

``````import pandas as pd
df.drop_duplicates().domain.value_counts()

# 'vk.com'          3
# Name: domain, dtype: int64
``````

• Translate

df.domain.value_counts()

``````>>> df.domain.value_counts()

vk.com          5

Name: domain, dtype: int64
``````

• Translate

IIUC you want the number of different `ID` for every `domain`, then you can try this:

``````output = df.drop_duplicates()
output.groupby('domain').size()
``````

output:

``````    domain
vk.com          3
dtype: int64
``````

You could also use `value_counts`, which is slightly less efficient.But the best is Jezrael's answer using `nunique`:

``````%timeit df.drop_duplicates().groupby('domain').size()
1000 loops, best of 3: 939 µs per loop
%timeit df.drop_duplicates().domain.value_counts()
1000 loops, best of 3: 1.1 ms per loop
%timeit df.groupby('domain')['ID'].nunique()
1000 loops, best of 3: 440 µs per loop
``````