mongodb count num of distinct values per field/key


Translate

Is there a query for calculating how many distinct values a field contains in DB.

f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)

If someone adds more documents with a new country I would like the query to return 9.

Is there easier way then group and count?



All Answers
  • Translate

    MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.

    There is a shell db.collection.distinct() helper as well:

    > db.countries.distinct('country');
    [ "Spain", "England", "France", "Australia" ]
    
    > db.countries.distinct('country').length
    4
    

  • Translate

    Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.

    db.articles.aggregate([
        {
            $match: {
                keywords: { $not: {$size: 0} }
            }
        },
        { $unwind: "$keywords" },
        {
            $group: {
                _id: {$toLower: '$keywords'},
                count: { $sum: 1 }
            }
        },
        {
            $match: {
                count: { $gte: 2 }
            }
        },
        { $sort : { count : -1} },
        { $limit : 100 }
    ]);
    

    that give result such as

    { "_id" : "inflammation", "count" : 765 }
    { "_id" : "obesity", "count" : 641 }
    { "_id" : "epidemiology", "count" : 617 }
    { "_id" : "cancer", "count" : 604 }
    { "_id" : "breast cancer", "count" : 596 }
    { "_id" : "apoptosis", "count" : 570 }
    { "_id" : "children", "count" : 487 }
    { "_id" : "depression", "count" : 474 }
    { "_id" : "hiv", "count" : 468 }
    { "_id" : "prognosis", "count" : 428 }
    

  • Translate

    With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.

    For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:

    db.users.aggregate([
        { "$group": {
            "_id": { "$toLower": "$role" },
            "count": { "$sum": 1 }
        } },
        { "$group": {
            "_id": null,
            "counts": {
                "$push": { "k": "$_id", "v": "$count" }
            }
        } },
        { "$replaceRoot": {
            "newRoot": { "$arrayToObject": "$counts" }
        } }    
    ])
    

    Example Output

    {
        "user" : 67,
        "superuser" : 5,
        "admin" : 4,
        "moderator" : 12
    }
    

  • Translate

    You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.

    Sample

    For each distinct value of field counts the occurrences in documents optionally filtered by query

    > db.users.distinctAndCount('name', {name: /^a/i})

    {
      "Abagail": 1,
      "Abbey": 3,
      "Abbie": 1,
      ...
    }
    

    The field parameter could be an array of fields

    > db.users.distinctAndCount(['name','job'], {name: /^a/i})

    {
      "Austin,Educator" : 1,
      "Aurelia,Educator" : 1,
      "Augustine,Carpenter" : 1,
      ...
    }
    

  • Translate

    To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :

    db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})

    So, find number distinct names from a collection where age > 25 will be like :

    db.your_collection_name.distinct('names', {'age': {"$gt": 25}})

    Hope it helps!