CONCEPT 07 · ABSTRACTIONS

Monoids & Associativity

"Split it, compute in parallel, combine — same answer."

monoidparallelismaggregation

Plain English

A monoid is any operation that is associative (grouping doesn't matter: (a+b)+c = a+(b+c)) and has an identity element (a neutral value: 0 for addition, "" for string concat). Because grouping doesn't matter, you can split work across any number of machines, process in parallel, and combine results — guaranteed same answer.

Analogy

Addition. 1+2+3+4+5 = (1+2)+(3+4+5) = (1+2+3)+(4+5) = same answer no matter how you group it. Split the sum across 100 machines, each sums its portion, you add the totals. MapReduce, Spark, and Hadoop all exploit this exact property. Monoids are why big data pipelines can scale horizontally.

You already know this

SQL COUNT(*), SUM(amount) — associative, so the database can compute them in parallelString concatenation: "a" + "b" + "c" = ("a"+"b")+"c" = "a"+("b"+"c")Array concat: [...a, ...b, ...c] — grouping doesn't matterSet union: A ∪ B ∪ C — can be split and combined freelyReact state merges: setState({...prev, x}) — building up state incrementally

Interactive Demo

Summing [3, 1, 4, 1, 5, 9, 2, 6] — same answer, different structure

Because addition is associative (grouping doesn't matter), you can split the work and do it in parallel. Same result, fewer rounds.

Input:

Code Example

JavaScript

// A monoid needs: combine(a, b) and an identity value
const addMonoid = {
  combine: (a, b) => a + b,
  identity: 0,  // 0 + x = x = x + 0
};

const concatMonoid = {
  combine: (a, b) => a + b,     // string concat
  identity: '',                  // '' + s = s
};

const maxMonoid = {
  combine: (a, b) => Math.max(a, b),
  identity: -Infinity,           // max(-Inf, x) = x
};

// Because it's associative, you can split and parallelize:
const data = [1, 2, 3, 4, 5, 6, 7, 8];

// Sequential (single machine):
const seq = data.reduce(addMonoid.combine, addMonoid.identity); // 36

// Parallel (4 machines, then combine):
const m1 = [1,2].reduce(addMonoid.combine, 0);   // 3
const m2 = [3,4].reduce(addMonoid.combine, 0);   // 7
const m3 = [5,6].reduce(addMonoid.combine, 0);   // 11
const m4 = [7,8].reduce(addMonoid.combine, 0);   // 15

const total = [m1,m2,m3,m4].reduce(addMonoid.combine, 0); // 36
// Same answer! This is why MapReduce works.

Apply when

▸Aggregating data across shards, machines, or time windows — if your operation is a monoid, it parallelizes for free

▸Incremental computation: process new data incrementally and merge with existing results

▸Combining multiple validation results, multiple error lists, multiple log entries

▸Any "combine two things of the same type to get the same type" operation

▸MapReduce pipelines: the "reduce" step is always a monoid

Check Your Understanding

Your analytics pipeline counts events across 100 database shards. Why can each shard safely count its own events and then you sum the shard totals?

◀ PREVProperty-Based TestingNEXT ▶Functor — Map Everything

Monoids & Associativity

JARGON DECODER