Skip to content

Benchmarks

These results were collected on the Lanexio Parser self-hosted Linux runner (dev-parser). All numbers are from benchmarks/results/benchmark-results.json in the repository.

Each benchmark runs 100 warm iterations after 10 warm-up iterations. “Cold start” is the first parse after module load. Parse throughput is measured in MB/s of input bytes. Serialization throughput is measured in MB/s of output bytes.

The warm/hot distinction matters: cold-start numbers include V8 Turbofan JIT compilation. For github.html in particular, the Lanexio Parser cold-start time reflects the JIT cost of the adoption-agency and phantom-synthesis branches, which are exercised heavily by GitHub’s HTML output. Hot throughput is the steady-state number.

HTML — parse throughput (worst-of-3, v1.0.0)

Section titled “HTML — parse throughput (worst-of-3, v1.0.0)”
InputSizeLanexio Parser MB/sparse5 MB/sRatio
wikipedia.html195 KB6.2710.040.62×
github.html306 KB6.197.850.79×
nodejs-docs.html107 KB30.307.644.0×
InputLanexio Parser msparse5 ms
wikipedia.html33.6728.15
github.html100.0260.21
nodejs-docs.html6.4310.98

HTML — heap usage (worst-of-3, single-parse delta)

Section titled “HTML — heap usage (worst-of-3, single-parse delta)”
InputLanexio Parser KBparse5 KB
wikipedia.html11,3387,538
github.html14,7589,213
nodejs-docs.html1,2053,535

HTML — serialization throughput (worst-of-3)

Section titled “HTML — serialization throughput (worst-of-3)”
InputLanexio Parser MB/sparse5 MB/sGap
wikipedia.html10.2971.517.0×
github.html18.48122.506.6×
nodejs-docs.html215.853,07214.2×

Markdown — parse throughput (worst-of-3, v1.0.0)

Section titled “Markdown — parse throughput (worst-of-3, v1.0.0)”
InputSizeLanexio Parser MB/smicromark MB/sRatio
headings.md135 B2.280.1317.7×
inline.md275 B2.130.375.8×
InputLanexio Parser msmicromark ms
headings.md0.201.03
inline.md0.261.44
InputLanexio Parser KBmicromark KB
inline.md25423
headings.md13505

Lanexio Parser uses 94% less heap than micromark on inline.md. The flat-buffer design pays off especially clearly for small Markdown inputs.

The Zig/WASM HTML tokenizer is included in v1.0 as an opt-in acceleration path. It provides a measurable end-to-end speedup on HTML documents that contain no character references.

InputSizeTS parse (ms)WASM parse (ms)Speedup
nodejs-docs.html107 KB3.683.371.09×
wikipedia.html195 KB17.1516.191.00× *
github.html306 KB22.4122.391.00× *

* Wikipedia and GitHub HTML contain character references (&, <, ©). The Zig tokenizer currently stubs character references; a pre-scan detects & followed by a letter or # and gracefully falls back to the TypeScript tokenizer. Full character reference resolution in Zig is deferred to v1.1.

ItemExpected Gain
Character reference port to Zig (entity trie)Enables WASM on all inputs
Lazy flat-buffer token view (skip Token[] materialization)~1.06×
Zig tree builder (full parse pipeline in WASM)2–4× overall
SIMD byte scanning in Zig tokenizerAdditional 20–30%
  • Serialization (Markdown): benchmarked separately — serializeMarkdown is included in v1.0 (MD-S4).
  • Incremental reparse: not in v1.0.
  • Streaming parse: not in v1.0.
Terminal window
cd /data/projects/parser
pnpm bench

Results are written to benchmarks/results/benchmark-results.json.