“Count” and “Count Distinct” Queries in MongoDB

For the following examples, assume that you have a database that includes four records that include the following fields and values:

{ "_id" : ObjectId("54936…dd0c"), "last_name" : "smith", "first_name" : "mike" }
{ "_id" : ObjectId("54936…dd0d"), "last_name" : "smith", "first_name" : "william" }
{ "_id" : ObjectId("54936…dd0e"), "last_name" : "smith", "first_name" : "william" }
{ "_id" : ObjectId("54936…dd0f"), "last_name" : "smith", "first_name" : "mark" }

Note that there are four records with a last_name value of “smith”.  The four records have three distinct values for the first_name field (“mike, “william”, and “mark”).

To count the number of rows returned by a query, use "count()", as shown here:

> db.collection.find({“last_name”:”smith”}).count();

4

To count the unique values, use "distinct()" rather than "find()", and "length" rather than "count()".  The first argument for "distinct" is the field for which to aggregate distinct values, the second is the conditional statement that specifies which rows to select.  Append "length" to the end of the query to count the number of rows returned.  (The "count()" function does not work on the results of a "distinct" query.) 

Here is an example which counts the distinct number of first_name values for records with a last_name value of “smith”:

> db.collection.distinct("first_name", {“last_name”:”smith”}).length;

3

Removing Duplicates from MongoDB

IMPORTANT:  The dropDups option was removed starting with MongoDB 3.x, so this solution is only valid for MongoDB versions 2.x and before.  There is no direct replacement for the dropDups option. The answers to the question posed at http://stackoverflow.com/questions/30187688/mongo-3-duplicates-on-unique-index-dropdups offer some possible alternative ways to remove duplicates in Mongo 3.x.

Duplicate records can be removed from a MongoDB collection by creating a unique index on the collection and specifying the dropDups option.

Assuming the collection includes a field named record_id that uniquely identifies a record in the collection, the command to use to create a unique index and drop duplicates is:

db.collection.ensureIndex( { record_id:1 }, { unique:true, dropDups:true } )

Here is the trace of a session that shows the contents of a collection before and after creating a unique index with dropDups.  Notice that duplicate records are no longer present after the index is created.

> db.pages.find()
{ “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8d”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8f”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
>
> db.pages.ensureIndex( { scan_id:1, leaf_num:1 }, { unique:true, dropDups:true } )
>
> db.pages.find()
{ “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
>

More information about creating MongoDB indexes, including the use of the dropDups option, can be found here.

Directing MongoDB Query Results to a File

The syntax for querying a MongoDB database and directing the results to a file is:

mongo server/database –eval "query" >> outputfile

where “server/database” is the address of the MongoDB database, “query” is the MongoDB command to execute, and “outputfile” is the file to which to direct the output.

Each of the following examples assumes that MongoDB has been installed in the C:\MongoDB folder.

Running this command…

C:\>mongodb\bin\mongo localhost/mydb –eval "db.docs.count()" >> counttest.txt

… will result in a file named counttest.txt that has contents similar to the following:

MongoDB shell version: 2.4.8
connecting to: localhost/mydb
1979179

For queries that will return JSON objects, be sure to wrap the command in the printjson() function, as in this example:

C:\>mongodb\bin\mongo localhost/mydb –eval "printjson(db.docs.findOne())" >> selectone.json

If the command is not wrapped by printjson, the resulting file will contain something like this:

MongoDB shell version: 2.4.8
connecting to: localhost/mydb
[object Object]

Notice that instead of the contents of the JSON object, the file simply contains the text [object Object].  However, if printjson is used, then the contents of the file will contain the actual JSON object, as shown here:

MongoDB shell version: 2.4.8
connecting to: localhost/mydb
{
     "_id" : ObjectId("528b7c7b594d11167ecdd1a7"),
     "leaf_num" : "0"
}

Here is an example that shows the use of a more complicated command.  It performs an aggregation and returns the results as JSON (notice the use of printjson):

C:\>mongodb\bin\mongo localhost/mydb –eval "printjson(db.docs.group( { key: {scan_id: 1}, reduce: function(cur, result) {result.count += 1 }, initial: {count:0}}))" >> MongoCount.txt

For this example, the contents of the MongoCount.txt file are:

MongoDB shell version: 2.4.8
connecting to: localhost/mydb
[
     {
          "scan_id" : "01A23374-4D72-4A06-9B88-EF74D0ACEE5D",
          "count" : 151
     },
     {
          "scan_id" : "0245B979-D3D6-4B01-83E1-E8D1D2ADA255",
          "count" : 250

     },
     {
          "scan_id" : "0330266.0001.002.umich.edu",
          "count" : 164
     },
     {
          "scan_id" : "03polybiblionrevue55sociuoft",
          "count" : 290
     }
]