Find duplicate records in MongoDB

MongodbAggregation FrameworkDatabase

Mongodb Problem Overview


How would I find duplicate fields in a mongo collection.

I'd like to check if any of the "name" fields are duplicates.

{
    "name" : "ksqn291",
    "__v" : 0,
    "_id" : ObjectId("540f346c3e7fc1054ffa7086"),
    "channel" : "Sales"
}

Many thanks!

Mongodb Solutions


Solution 1 - Mongodb

Use aggregation on name and get name with count > 1:

db.collection.aggregate([
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
]);

To sort the results by most to least duplicates:

db.collection.aggregate([
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$sort": {"count" : -1} },
    {"$project": {"name" : "$_id", "_id" : 0} }     
]);

To use with another column name than "name", change "$name" to "$column_name"

Solution 2 - Mongodb

You can find the list of duplicate names using the following aggregate pipeline:

  • Group all the records having similar name.
  • Match those groups having records greater than 1.
  • Then group again to project all the duplicate names as an array.

The Code:

db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])

o/p:

{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }

Solution 3 - Mongodb

The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.

To improve efficiency you can add a $match to the aggregation.

db.collection.aggregate(
    {"$match": {"name" :{ "$ne" : null } } }, 
    {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

Solution 4 - Mongodb

db.getCollection('orders').aggregate([  
    {$group: { 
            _id: {name: "$name"},
            uniqueIds: {$addToSet: "$_id"},
            count: {$sum: 1}
        } 
    },
    {$match: { 
        count: {"$gt": 1}
        }
    }
])

First Group Query the group according to the fields.

Then we check the unique Id and count it, If count is greater then 1 then the field is duplicate in the entire collection so that thing is to be handle by $match query.

Solution 5 - Mongodb

Another option is to use $sortByCount stage.

db.collection.aggregate([
  { $sortByCount: '$name' }
]

This is the combination of $group & $sort.

> The $sortByCount stage is equivalent to the following $group + $sort sequence: > lang-javascript > { $group: { _id: <expression>, count: { $sum: 1 } } }, > { $sort: { count: -1 } } >

Solution 6 - Mongodb

UPDATE ====== Works everytime!

db.users.aggregate([
    // Group by the key and compute the number of documents that match the key
    {
        $group: {
            _id: "$username",  // or if you want to use multiple fields _id: { a: "$FirstName", b: "$LastName" }
            count: { $sum: 1 }
        }
    },
    // Filter group having more than 1 item, which means that at least 2 documents have the same key
    {
        $match: {
            count: { $gt: 1 }
        }
    }
])

==========

This also aggregation worked for me...

db.collection.aggregate([
    {"$group" : { "_id": "$username", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"username" : "$_id", "_id" : 0} }
]);

You can also try $sortByCount

db.collection.aggregate([
  { $sortByCount: '$username' }
]

Solution 7 - Mongodb

If somebody is looking for a query for duplicates with an extra "$and" where clause, like "and where someOtherField is true"

The trick is to start with that other $match, because after the grouping you don't have all the data available anymore

// Do a first match before the grouping
{ $match: { "someOtherField": true }},
{ $group: {
    _id: { name: "$name" },
    count: { $sum: 1 }
}},
{ $match: { count: { $gte: 2 } }},

I searched for a very long time to find this notation, hope I can help somebody with the same problem

Solution 8 - Mongodb

In case you need to see all duplicated rows:

db.collection.aggregate([
     {"$group" : { "_id": "$name", "count": { "$sum": 1 },"data": { "$push": "$$ROOT" }}},
     {"$unwind": "$data"}
     {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
]);

Solution 9 - Mongodb

enter image description here

this is how we can achieve this in mongoDB compass

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChrisView Question on Stackoverflow
Solution 1 - MongodbanhlcView Answer on Stackoverflow
Solution 2 - MongodbBatScreamView Answer on Stackoverflow
Solution 3 - MongodbJuanínView Answer on Stackoverflow
Solution 4 - MongodbAman shrivastavaView Answer on Stackoverflow
Solution 5 - MongodbTan DatView Answer on Stackoverflow
Solution 6 - MongodbYusuf GaniyuView Answer on Stackoverflow
Solution 7 - MongodbJulesezaarView Answer on Stackoverflow
Solution 8 - MongodbAndoctoreyView Answer on Stackoverflow
Solution 9 - MongodbTanzeelView Answer on Stackoverflow