Benchmarks

TPC-H at SF=1 / SF=10 / SF=100, all 22 queries on an Apple M4 Max — ematix-flow vs DuckDB, Polars, single-node PySpark, and Postgres. Median ms, same hardware and same Parquet files.

Same-machine TPC-H benchmark (Apple M4 Max, single-node) over all 22 queries at three scale factors — SF=1 (~1 GB, fits in cache), SF=10 (~10 GB, production shape), and SF=100 (~100 GB, out-of-core). Five engines, the same Parquet files, the same machine. Switch scale with the tabs:

22 / 22 ematix-flow wins 3.03× vs DuckDB 5.04× vs Polars 21.85× vs PySpark 20.09× vs Postgres
Query ematix-flowDuckDBPolarsPySparkPostgres
Q01 17.1 48.5 38.6 167 411
Q02 7.0 17.6 47.9 190 123
Q03 9.6 32.8 46.5 257 163
Q04 10.3 22.3 23.8 184 94.6
Q05 6.6 31.7 8,949 335 226
Q06 0.9 13.1 10.5 41.5 218
Q07 27.0 32.9 118 260 1,262
Q08 11.4 39.4 96.7 182 99.7
Q09 17.4 55.6 47.5 583 820
Q10 27.2 60.6 111 362 355
Q11 6.0 9.7 8.9 119 36.3
Q12 14.4 25.1 19.2 269 361
Q13 9.2 141 118 684 871
Q14 10.2 22.3 12.3 114 69.8
Q15 10.4 14.1 11.2 127 140
Q16 7.7 21.4 21.2 205 113
Q17 14.8 24.7 39.0 233 398
Q18 1.6 45.7 56.6 560 1,154
Q19 15.5 34.4 105 86.0 31.9
Q20 16.7 28.9 22.4 106 147
Q21 35.0 74.5 721 628 609
Q22 8.5 20.9 13.6 354 24.1

Median ms · lowest per row in teal.

21 / 22 ematix-flow wins 1.67× vs DuckDB 4.90× vs Polars (n=21) 14.46× vs PySpark 26.18× vs Postgres
Query ematix-flowDuckDBPolarsPySparkPostgres
Q01 230 254 343 732 4,306
Q02 18.8 41.6 447 599 2,222
Q03 83.7 150 590 2,722 3,488
Q04 57.9 92.3 277 1,711 905
Q05 116 148 4,589 3,233
Q06 37.8 85.8 58.6 205 1,373
Q07 124 133 1,315 3,737 2,188
Q08 200 181 1,282 940 1,340
Q09 270 276 414 2,187 7,431
Q10 200 385 4,153 2,355 3,362
Q11 12.7 24.6 32.5 197 578
Q12 96.7 122 134 826 3,542
Q13 115 268 424 2,069 10,989
Q14 82.4 123 83.8 379 813
Q15 63.4 91.2 72.0 645 1,606
Q16 33.6 59.6 171 638 1,098
Q17 121 159 523 3,956 5,387
Q18 20.6 245 624 6,953 19,846
Q19 134 185 1,389 493 148
Q20 110 140 275 419 3,279
Q21 257 409 34,717 7,523 6,952
Q22 51.3 115 112 628 203

Median ms · lowest per row in teal. “—” = engine couldn’t run the query at this scale (see caveats below).

18 / 22 ematix-flow wins 1.29× vs DuckDB 5.91× vs Polars (n=17) 9.67× vs PySpark 71.51× vs Postgres (n=6)
Query ematix-flowDuckDBPolarsPySparkPostgres
Q01 2,361 2,619 79,084 5,184
Q02 241 419 53,109 6,697 26,054
Q03 1,040 1,539 30,103 26,128
Q04 840 897 6,550 10,543
Q05 1,381 1,617 38,145
Q06 483 742 540 1,142
Q07 1,560 1,714 95,946 12,775
Q08 1,901 2,417 23,610
Q09 6,086 7,371 21,438 67,105
Q10 3,291 2,691 24,590
Q11 221 234 412 5,639 60,108
Q12 1,009 1,229 1,112 6,482
Q13 2,004 2,349 5,114 14,442 86,674
Q14 835 1,249 895 2,623
Q15 947 1,086 890 5,142
Q16 196 401 1,809 5,012 29,144
Q17 1,849 1,892 9,536 47,823
Q18 495 2,812 15,572 54,206
Q19 1,488 1,941 3,101 52,107
Q20 2,445 2,019 6,692 5,682
Q21 5,274 5,589 55,583
Q22 831 804 1,255 5,107 16,805

Median ms · lowest per row in teal. Postgres ran with a 90 s per-query cap — 16 / 22 heavy queries timed out (—). “—” = engine couldn’t run the query at this scale (see caveats below).

Scope: every number here is single-node. ematix-flow also has an auto-detected distributed mode (Arrow Flight peer mesh — see Why ematix-flow). A cross-host cluster-scale panel is deferred to a later release; the harness (tpch_distributed) already ships in the repo.

How to read it

  • ematix-flow + DuckDB are co-measured in one process (10 timed trials after 3 warmups, medians) — the head-to-head that matters, so thermal drift hits both equally.
  • Polars runs the same in-process harness (hand-translated q??.polars.sql where its planner rejects the canonical shape).
  • PySpark runs local[*] out-of-process on the JVM via bench-tpch-pyspark.py, against the same files.
  • Postgres 14 runs each query under EXPLAIN ANALYZE (B-tree indexed + ANALYZEd), reported as the planner’s Execution Time.
  • The fastest engine per row is highlighted; ematix-flow’s column is tinted for scanning.

What changes across scale

At SF=1 the working set is L3-resident and per-query constant cost dominates — ematix-flow’s fused aggregate / filter paths take all 22. At SF=10 and SF=100 the workload turns memory- then IO-bound: DuckDB’s mature join-order heuristics and vectorised kernels reclaim a handful of multi-fact joins (Q08 at SF=10; Q10 / Q20 / Q22 at SF=100), while ematix-flow still leads the field (21 / 22 then 18 / 22) and widens specific wins — most visibly Q18 at SF=100: 495 ms vs DuckDB’s 2 812 ms (5.7×), from the scale-relative broadcast-join rule.

Caveats

  • Polars can’t run several canonical TPC-H shapes; we feed it semantically-identical hand-translated variants. Q05 still overflows Polars’s default 32-bit row index (the bigidx build would fix it) and shows ”—” at SF=10 / SF=100; a few more SF=100 queries (Q08, Q10, Q19, Q21) likewise exceed it.
  • Postgres is a row-store OLTP engine, included as a familiar single-node reference — not a columnar-analytics peer. At SF=100 it ran with a 90 s per-query cap; 16 / 22 heavy queries timed out (shown ”—”), so its SF=100 geomean covers only the 6 that finished.
  • DuckDB runs at defaults (in-memory read_parquet). ematix-flow runs the production presettarget_partitions = cores, plus the fused-aggregate, dict-group-count, push-LeftSemi, runtime-bloom, and scale-relative-broadcast rules. That’s the same config you get from pip install ematix-flow; no per-query tuning.
  • Thermal note (SF=10 / SF=100): back-to-back runs on an M4 Max drift ~5–20 % as the package heats. ematix-flow and DuckDB move together, so the head-to-head holds — which is exactly why the two are co-measured in a single process.

Reproducing

# ematix-flow + DuckDB + Polars — same harness, swap the data dir per scale
TPCH_DATA_DIR=examples/tpch/data/sf1 \
  cargo run --release -p ematix-flow-core \
  --example tpch_triangulation_bench --features triangulation
#   → sf10, then sf100 (SF=100 wants ~64 GB free + a few minutes/run)

# PySpark — needs JDK 17+ (brew install openjdk)
JAVA_HOME=$(/usr/libexec/java_home) \
  SPARK_DRIVER_MEM=16g SPARK_SHUFFLE_PARTS=64 \
  python scripts/bench-tpch-pyspark.py \
  --data-dir examples/tpch/data/sf100 --trials 5 --warmups 1

# Postgres — load the Parquet via ematix-flow's own ADBC ingest,
# then run each query under EXPLAIN ANALYZE.

The 2026-06-07 ematix/DuckDB/Polars refresh re-runs the harness above; the carried PySpark/Postgres raw logs are in bench-results/refresh-2026-05-30/ (same machine). Release-over-release history lives in the repo CHANGELOG.