A Map-Reduce workload essentially does two things. Firstly it scans the entire data set, looking for the matching subset of records required for the given scenario. This phase may also transform or exclude the fields of each record. This is the "map" action. Secondly, it condenses the subset of matched data into grouped, totalled, and averaged result summaries. This is the "reduce" action. Functionally, MongoDB's Map-Reduce capability provides a solution to users' typical data processing requirements, but it comes with the following drawbacks:
- At runtime, the lack of ability to explicitly associate a specific intent to an arbitrary piece of logic means that the database engine has no opportunity to identify and apply optimisations. It is hard for it to target indexes or re-order some logic for more efficient processing. The database has to be conservative, executing the workload with minimal concurrency and employing locks at various times to prevent race conditions and inconsistent results.
- Poor scalability because the monolithic and opaque nature of Map-Reduce logic means the database engine can't break parts of it up and execute these parts in parallel across multiple shards.
Within its first year, the Aggregation Framework rapidly became the go-to tool for processing large volumes of data in MongoDB. Now, nearly a decade on, it is like the Aggregation Framework has always been part of MongoDB. It feels like part of the database's core DNA. MongoDB still supports Map-Reduce, but developers rarely use it nowadays. MongoDB's aggregations are always the correct answer for processing data in the database!
Below is a summary of the evolution of the Aggregation Framework in terms of significant capabilities added in each major release:
- MongoDB 2.2 (August 2012): Initial Release
- MongoDB 2.4 (March 2013): Efficiency improvements (especially for sorts), a concat operator
- MongoDB 2.6 (April 2014): Unlimited size result sets, explain plans, spill to disk for large sorts, an option to output to a new collection, a redact stage
- MongoDB 3.0 (March 2015): Date-to-string operators
- MongoDB 3.2 (December 2015): Sharded cluster optimisations, lookup (join) & sample stages, many new arithmetic & array operators
- MongoDB 3.4 (November 2016): Graph-lookup, bucketing & facets stages, many new array & string operators
- MongoDB 3.6 (November 2017): Array to/from object operators, more extensive date to/from string operators, a REMOVE variable
- MongoDB 4.0 (July 2018): Number to/from string operators, string trimming operators
- MongoDB 4.2 (August 2019): A merge stage to insert/update/replace records in existing non-sharded & sharded collections, set & unset stages to address the verbosity/rigidity of project stages, trigonometry operators, regular expression operators
- MongoDB 5.0 (July 2021): A setWindowFields stage, time-series/window operators, date manipulation operators
- MongoDB 5.1 (November 2021): Support for lookup & graph-lookup stages joining to sharded collections, documents and densify stages
- MongoDB 5.2 (January 2022): An array sorting operator, operators to get a subset of ordered arrays and a subset of ordered grouped documents
- MongoDB 5.3 (April 2022): A fill stage, a linearFill operator