Let's say I have:
v = rep(c(1,2, 2, 2), 25)
Now, I want to count the number of times each unique value appears. unique(v)
returns what the unique values are, but not how many they are.
> unique(v)
[1] 1 2
I want something that gives me
length(v[v==1])
[1] 25
length(v[v==2])
[1] 75
but as a more general one-liner :) Something close (but not quite) like this:
#<doesn't work right> length(v[v==unique(v)])
Perhaps table is what you are after?
If you have multiple factors (= a multi-dimensional data frame), you can use the
dplyr
package to count unique values in each combination of factors:It uses the pipe operator
%>%
to chain method calls on the data framedata
.It is a one-line approach by using
aggregate
.table() function is a good way to go, as Chase suggested. If you are analyzing a large dataset, an alternative way is to use .N function in datatable package.
Make sure you installed the data table package by
Code:
To get an un-dimensioned integer vector that contains the count of unique values, use
c()
.This may be useful if you need to feed the counts of unique values into another function, and is shorter and more idiomatic than the
t(as.data.frame(table(dummyData))[,2]
posted in a comment to Chase's answer. Thanks to Ricardo Saporta who pointed this out to me here.This works for me. Take your vector
v
length(summary(as.factor(v),maxsum=50000))
Comment: set maxsum to be large enough to capture the number of unique values
or with the
magrittr
packagev %>% as.factor %>% summary(maxsum=50000) %>% length
If you need to have the number of unique values as an additional column in the data frame containing your values (a column which may represent sample size for example), plyr provides a neat way:
Also making the values categorical and calling
summary()
would work.You can try also a
tidyverse
If you want to run unique on a data.frame (e.g., train.data), and also get the counts (which can be used as the weight in classifiers), you can do the following: