Trigger Side-Effect Batching: 5x Speedup on Cascading Mutations

When a database maintains materialized views, aggregation caches, and
join precomputation tables, a single DELETE can trigger hundreds of
cascading operations. We measured a single-row DELETE taking 1.6 seconds
on a modest cross-join (100 × 200 rows) — entirely due to trigger cascades.

This post describes how we reduced that to 12 seconds for a full test
suite that previously took 58 seconds, by batching trigger side effects
and flushing them after lock release.

The Problem: Death by a Thousand Scans

Consider a query like:

SELECT a.grp, SUM(a.val)
FROM customers a, regions b
GROUP BY a.grp

The database materializes the cross-join into a precomputed table (20,000 rows)
and maintains an aggregation cache (the GROUP BY result) on top of it. Both
are kept in sync via triggers:

INSERT/DELETE on customers → incremental update of the precomputed join
INSERT/DELETE on the join table → incremental update of the aggregation cache
DELETE on the join table → cleanup check: if no rows remain for a group key,
delete the aggregation cache entry

When we delete one row from customers, the cascade looks like:

Trigger scans the join table, deletes 200 matching rows (one per region)
Each of those 200 deletions fires:

An incremental decrement on the aggregation cache (-1 per row)
A cleanup scan: “does this group key still have rows?”

Total: 600+ scans for a single source DELETE

Each cleanup scan costs ~10ms (it must verify row existence via index lookup).
200 cleanup scans × 10ms = 2 seconds of pure overhead.

The Fix: Batched Side Effects

The key insight: all 200 join-table deletions happen within a single scan.
Instead of firing triggers per row during the scan (while holding locks),
we collect the side effects and flush them after the scan completes and
locks are released.

Three types of side effects are batched:

1. Incremental Updates (Aggregation)

Before: 200 individual increment(-1) calls on the aggregation cache,
each requiring a lock acquisition and value update.

After: Deltas are aggregated per target row during the scan. At flush time,
one increment(-200) call per unique group key. For 3 group keys, that’s
3 calls instead of 200.

2. Invalidation Coalescing

Before: 200 individual “mark as stale” calls on computed columns, each
doing a binary search to find the affected sort position.

After: A single “invalidate all” call per computed column. One O(1) bitmap
reset instead of 200 O(log n) binary searches.

3. Trigger Cascade Deferral

Before: Each of the 200 deletions immediately fires cleanup triggers while
the source scan holds a write lock. The cleanup triggers try to scan other
tables → deadlock or severe lock contention.

After: Deleted rows are collected in a batch. The batch is flushed after the
write lock is released. Cleanup triggers run lock-free, eliminating contention.

The Deadlock Problem

The most subtle issue was defer ordering. In Go, deferred calls execute in
LIFO (last-in-first-out) order. If the trigger flush is deferred after the
lock acquisition, it executes before the lock release — while still holding
the write lock. Trigger handlers that scan other tables then deadlock.

The fix: explicit lock release before the flush, with a guard flag to prevent
double-release in the deferred cleanup:

// During scan: side effects collected, not executed
// After scan completes:
1. Release write lock (explicit, set guard flag)
2. Flush batched side effects (triggers run lock-free)
3. Deferred cleanup skips lock release (guard flag set)

Results

Test suite: incremental invalidation with cross-join GROUP BY,
concurrent INSERT/DELETE during scans, 10k-row window functions.

Metric	Before	After	Speedup
Test suite duration	58s	12s	4.8x
Single DELETE	1.6s	~0.3s	5x
Scans per DELETE	600+	~10	60x

The improvement comes entirely from reducing redundant work:

200 individual increments → 3 aggregated increments
200 individual invalidations → 1 bulk invalidation
200 lock-contended trigger calls → 1 lock-free batch flush

Architecture Takeaway

The general principle: never execute triggers inside a scan loop.
Collect side effects during the scan, flush after the scan. This enables:

Aggregation: 200 × increment(-1) → 1 × increment(-200)
Deduplication: 200 cleanup checks on 3 unique keys → 3 checks
Lock freedom: trigger handlers run without holding source locks
Vectorization potential: batched triggers can be further optimized
with IN-filter scans instead of per-row equality scans

The infrastructure supports automatic trigger vectorization: trigger bodies
are analyzed at registration time, and if they match known patterns (e.g.,
scan-with-equality-filter-on-OLD-value + $update), a batch-aware version
is generated that replaces N individual scans with one IN-filtered scan.
This is currently implemented for the join-table DELETE pattern and will
be extended to cleanup and aggregation patterns.