Distinct List Of Values
Minimum MongoDB Version: 4.2
Scenario
You want to query a collection of persons where each document contains data on one or more languages spoken by the person. The query result should be an alphabetically sorted list of unique languages that a developer can subsequently use to populate a list of values in a user interface's "drop-down" widget.
This example is the equivalent of a SELECT DISTINCT statement in SQL.
Sample Data Population
Drop any old version of the database (if it exists) and then populate a new persons collection with 9 documents:
db = db.getSiblingDB("book-distinct-values");
db.dropDatabase();
// Insert records into the persons collection
db.persons.insertMany([
{
"firstname": "Elise",
"lastname": "Smith",
"vocation": "ENGINEER",
"language": "English",
},
{
"firstname": "Olive",
"lastname": "Ranieri",
"vocation": "ENGINEER",
"language": ["Italian", "English"],
},
{
"firstname": "Toni",
"lastname": "Jones",
"vocation": "POLITICIAN",
"language": ["English", "Welsh"],
},
{
"firstname": "Bert",
"lastname": "Gooding",
"vocation": "FLORIST",
"language": "English",
},
{
"firstname": "Sophie",
"lastname": "Celements",
"vocation": "ENGINEER",
"language": ["Gaelic", "English"],
},
{
"firstname": "Carl",
"lastname": "Simmons",
"vocation": "ENGINEER",
"language": "English",
},
{
"firstname": "Diego",
"lastname": "Lopez",
"vocation": "CHEF",
"language": "Spanish",
},
{
"firstname": "Helmut",
"lastname": "Schneider",
"vocation": "NURSE",
"language": "German",
},
{
"firstname": "Valerie",
"lastname": "Dubois",
"vocation": "SCIENTIST",
"language": "French",
},
]);
Aggregation Pipeline
Define a pipeline ready to perform the aggregation:
var pipeline = [
// Unpack each language field which may be an array or a single value
{"$unwind": {
"path": "$language",
}},
// Group by language
{"$group": {
"_id": "$language",
}},
// Sort languages alphabetically
{"$sort": {
"_id": 1,
}},
// Change _id field's name to 'language'
{"$set": {
"language": "$_id",
"_id": "$$REMOVE",
}},
];
Execution
Execute the aggregation using the defined pipeline and also view its explain plan:
db.persons.aggregate(pipeline);
db.persons.explain("executionStats").aggregate(pipeline);
Expected Results
Seven unique language names should be returned sorted in alphabetical order, as shown below:
[
{language: 'English'},
{language: 'French'},
{language: 'Gaelic'},
{language: 'German'},
{language: 'Italian'},
{language: 'Spanish'},
{language: 'Welsh'}
]
Observations
-
Unwinding Non-Arrays. In some of the example's documents, the
languagefield is an array, whilst in others, the field is a simple string value. The$unwindstage can seamlessly deal with both field types and does not throw an error if it encounters a non-array value. Instead, if the field is not an array, the stage outputs a single record using the field's string value in the same way it would if the field was an array containing just one element. If you are sure the field in every document will only ever be a simple field rather than an array, you can omit this first stage ($unwind) from the pipeline. -
Group ID Provides Unique Values. By grouping on a single field and not accumulating other fields such as total or count, the output of a
$groupstage is just every unique group's ID, which in this case is every unique language. -
Unset Alternative. For the pipeline to be consistent with earlier examples in this book, it could have included an additional
$unsetstage to exclude the_idfield. However, partly to show another way, the example pipeline used here marks the_idfield for exclusion in the$setstage by being assigned the$$REMOVEvariable.