A Map-Reduce workload essentially does two things. Firstly it scans the data set, looking for the matching subset of records required for the given scenario. This phase may also transform or exclude the fields of each record. This is the "map" action. Secondly, it condenses the subset of matched data into grouped, totalled, and averaged result summaries. This is the "reduce" action. Functionally, MongoDB's Map-Reduce capability provides a solution to users' typical data processing requirements, but it comes with the following drawbacks:
- At runtime, the lack of ability to explicitly associate a specific intent to an arbitrary piece of logic means that the database engine has no opportunity to identify and apply optimisations. It is hard for it to target indexes or reorder some logic for more efficient processing. The database has to be conservative, executing the workload with minimal concurrency and employing locks at various times to prevent race conditions and inconsistent results.
- If returning the response to the client application, rather than sending the output to a collection, the response payload must be less than 16MB.
Within its first year, the Aggregation Framework rapidly became the go-to tool for processing large volumes of data in MongoDB. Now, a decade on, it is like the Aggregation Framework has always been part of MongoDB. It feels like part of the database's core DNA. MongoDB still supports Map-Reduce, but developers rarely use it nowadays. MongoDB aggregation pipelines are always the correct answer for processing data in the database!
It is not widely known, but MongoDB's engineering team re-implemented the Map-Reduce "back-end" in MongoDB 4.4 to execute within the aggregation's runtime. They had to develop additional aggregation stages and operators to fill some gaps. For the most part, these are internal-only stages or operators that the Aggregation Framework does not surface for developers to use in regular aggregations. The two exceptions are the new
Below is a summary of the evolution of the Aggregation Framework in terms of significant capabilities added in each major release:
- MongoDB 2.2 (August 2012): Initial Release
- MongoDB 2.4 (March 2013): Efficiency improvements (especially for sorts), a concat operator
- MongoDB 2.6 (April 2014): Unlimited size result sets, explain plans, spill to disk for large sorts, an option to output to a new collection, a redact stage
- MongoDB 3.0 (March 2015): Date-to-string operators
- MongoDB 3.2 (December 2015): Sharded cluster optimisations, lookup (join) & sample stages, many new arithmetic & array operators
- MongoDB 3.4 (November 2016): Graph-lookup, bucketing & facets stages, many new array & string operators
- MongoDB 3.6 (November 2017): Array to/from object operators, more extensive date to/from string operators, a REMOVE variable
- MongoDB 4.0 (July 2018): Number to/from string operators, string trimming operators
- MongoDB 4.2 (August 2019): A merge stage to insert/update/replace records in existing non-sharded & sharded collections, set & unset stages to address the verbosity/rigidity of project stages, trigonometry operators, regular expression operators, Atlas Search integration
- MongoDB 5.0 (July 2021): A setWindowFields stage, time-series/window operators, date manipulation operators
- MongoDB 5.1 (November 2021): Support for lookup & graph-lookup stages joining to sharded collections, documents and densify stages
- MongoDB 5.2 (January 2022): An array sorting operator, operators to get a subset of ordered arrays and a subset of ordered grouped documents
- MongoDB 5.3 (April 2022): A fill stage, a linearFill operator
- MongoDB 6.0 (July 2022): Consolidation of the new features from the "rapid release" versions 5.1 to 5.3