Start for free
execution/string_ops.rs // before — 1.7M rows/s, allocator-bound fn regex_replace(col, pat, rep) -> StringArray { ... } // after — 11.4M rows/s, scan-bound fn regex_replace_simd(col, pat, rep) -> StringArray { ... }
Engine internals

Inside REGEX_REPLACE: how a string-rewriting function compiles down to vectorised SIMD

A look at the patch that brought a real, vectorised string-rewriting function to the Opteryx execution engine — and why the obvious implementation was the wrong one.

JJ Justin Joyce · Apr 24, 2026 · 8 min read

More posts

6 posts
REGEX_REPLACE
Engine internals

Inside REGEX_REPLACE: how a string-rewriting function compiles down to vectorised SIMD

The first version was correct and slow. The second version stopped allocating per row and finally let the matcher breathe.

Apr 24 · 8 min
PUSHDOWN
Engine internals

Predicate pushdown across joins, when the right side is a view

When the planner has to reason about a view it cannot see through, it has two choices: be conservative, or be clever.

Apr 17 · 11 min
.parquet
Storage & Parquet

Why the Parquet reader skips the page index — most of the time

The Parquet page index is fast, but reading it is not free. The break-even point is narrower than most engines admit.

Apr 10 · 14 min
POLICY
Governance

The smallest governance model that still works

Most data governance is policy theatre. This is the minimum model that still gives security teams something solid to trust.

Apr 03 · 7 min
$ EXPLAIN
Engine internals

Cost-aware execution: pricing a query before you run it

Showing scan size up-front changes user behaviour more than any quota does. The planner now has to earn its estimate.

Mar 27 · 9 min
bloom()
Engine internals

Adaptive bloom filters for join pruning, without the pain

Bloom filters are easy to add and hard to keep. The planner should drop them the moment they stop earning their keep.

Mar 13 · 15 min