r - Count unique values


Translate

Let's say I have:

v = rep(c(1,2, 2, 2), 25)

Now, I want to count the number of times each unique value appears. unique(v) returns what the unique values are, but not how many they are.

> unique(v)
[1] 1 2

I want something that gives me

length(v[v==1])
[1] 25
length(v[v==2])
[1] 75

but as a more general one-liner :) Something close (but not quite) like this:

#<doesn't work right> length(v[v==unique(v)])


All Answers
  • Translate

    Perhaps table is what you are after?

    dummyData = rep(c(1,2, 2, 2), 25)
    
    table(dummyData)
    # dummyData
    #  1  2 
    # 25 75
    
    ## or another presentation of the same data
    as.data.frame(table(dummyData))
    #    dummyData Freq
    #  1         1   25
    #  2         2   75
    

  • Translate

    If you have multiple factors (= a multi-dimensional data frame), you can use the dplyr package to count unique values in each combination of factors:

    library("dplyr")
    data %>% group_by(factor1, factor2) %>% summarize(count=n())
    

    It uses the pipe operator %>% to chain method calls on the data frame data.


  • Translate

    It is a one-line approach by using aggregate.

    > aggregate(data.frame(count = v), list(value = v), length)
    
      value count
    1     1    25
    2     2    75
    

  • Translate

    table() function is a good way to go, as Chase suggested. If you are analyzing a large dataset, an alternative way is to use .N function in datatable package.

    Make sure you installed the data table package by

    install.packages("data.table")
    

    Code:

    # Import the data.table package
    library(data.table)
    
    # Generate a data table object, which draws a number 10^7 times  
    # from 1 to 10 with replacement
    DT<-data.table(x=sample(1:10,1E7,TRUE))
    
    # Count Frequency of each factor level
    DT[,.N,by=x]
    

  • Translate

    To get an un-dimensioned integer vector that contains the count of unique values, use c().

    dummyData = rep(c(1, 2, 2, 2), 25) # Chase's reproducible data
    c(table(dummyData)) # get un-dimensioned integer vector
     1  2 
    25 75
    
    str(c(table(dummyData)) ) # confirm structure
     Named int [1:2] 25 75
     - attr(*, "names")= chr [1:2] "1" "2"
    

    This may be useful if you need to feed the counts of unique values into another function, and is shorter and more idiomatic than the t(as.data.frame(table(dummyData))[,2] posted in a comment to Chase's answer. Thanks to Ricardo Saporta who pointed this out to me here.


  • Translate

    This works for me. Take your vector v

    length(summary(as.factor(v),maxsum=50000))

    Comment: set maxsum to be large enough to capture the number of unique values

    or with the magrittr package

    v %>% as.factor %>% summary(maxsum=50000) %>% length


  • Translate

    If you need to have the number of unique values as an additional column in the data frame containing your values (a column which may represent sample size for example), plyr provides a neat way:

    data_frame <- data.frame(v = rep(c(1,2, 2, 2), 25))
    
    library("plyr")
    data_frame <- ddply(data_frame, .(v), transform, n = length(v))
    

  • Translate

    Also making the values categorical and calling summary() would work.

    > v = rep(as.factor(c(1,2, 2, 2)), 25)
    > summary(v)
     1  2 
    25 75 
    

  • Translate

    You can try also a tidyverse

    library(tidyverse) 
    dummyData %>% 
        as.tibble() %>% 
        count(value)
    # A tibble: 2 x 2
      value     n
      <dbl> <int>
    1     1    25
    2     2    75
    

  • Translate

    If you want to run unique on a data.frame (e.g., train.data), and also get the counts (which can be used as the weight in classifiers), you can do the following:

    unique.count = function(train.data, all.numeric=FALSE) {                                                                                                                                                                                                 
      # first convert each row in the data.frame to a string                                                                                                                                                                              
      train.data.str = apply(train.data, 1, function(x) paste(x, collapse=','))                                                                                                                                                           
      # use table to index and count the strings                                                                                                                                                                                          
      train.data.str.t = table(train.data.str)                                                                                                                                                                                            
      # get the unique data string from the row.names                                                                                                                                                                                     
      train.data.str.uniq = row.names(train.data.str.t)                                                                                                                                                                                   
      weight = as.numeric(train.data.str.t)                                                                                                                                                                                               
      # convert the unique data string to data.frame
      if (all.numeric) {
        train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
          function(x) as.numeric(unlist(strsplit(x, split=","))))))                                                                                                    
      } else {
        train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
          function(x) unlist(strsplit(x, split=",")))))                                                                                                    
      }
      names(train.data.uniq) = names(train.data)                                                                                                                                                                                          
      list(data=train.data.uniq, weight=weight)                                                                                                                                                                                           
    }  
    

  • Translate
    count_unique_words <-function(wlist) {
    ucountlist = list()
    unamelist = c()
    for (i in wlist)
    {
    if (is.element(i, unamelist))
        ucountlist[[i]] <- ucountlist[[i]] +1
    else
        {
        listlen <- length(ucountlist)
        ucountlist[[i]] <- 1
        unamelist <- c(unamelist, i)
        }
    }
    ucountlist
    }
    
    expt_counts <- count_unique_words(population)
    for(i in names(expt_counts))
        cat(i, expt_counts[[i]], "\n")