Benchmarks

These results were collected on the Lanexio Parser self-hosted Linux runner (dev-parser). All numbers are from benchmarks/results/benchmark-results.json in the repository.

Methodology

Each benchmark runs 100 warm iterations after 10 warm-up iterations. “Cold start” is the first parse after module load. Parse throughput is measured in MB/s of input bytes. Serialization throughput is measured in MB/s of output bytes.

The warm/hot distinction matters: cold-start numbers include V8 Turbofan JIT compilation. For github.html in particular, the Lanexio Parser cold-start time reflects the JIT cost of the adoption-agency and phantom-synthesis branches, which are exercised heavily by GitHub’s HTML output. Hot throughput is the steady-state number.

HTML — parse throughput (worst-of-3, v1.0.0)

Input	Size	Lanexio Parser MB/s	parse5 MB/s	Ratio
wikipedia.html	195 KB	6.27	10.04	0.62×
github.html	306 KB	6.19	7.85	0.79×
nodejs-docs.html	107 KB	30.30	7.64	4.0×

HTML — cold start (worst-of-3)

Input	Lanexio Parser ms	parse5 ms
wikipedia.html	33.67	28.15
github.html	100.02	60.21
nodejs-docs.html	6.43	10.98

HTML — heap usage (worst-of-3, single-parse delta)

Input	Lanexio Parser KB	parse5 KB
wikipedia.html	11,338	7,538
github.html	14,758	9,213
nodejs-docs.html	1,205	3,535

HTML — serialization throughput (worst-of-3)

Input	Lanexio Parser MB/s	parse5 MB/s	Gap
wikipedia.html	10.29	71.51	7.0×
github.html	18.48	122.50	6.6×
nodejs-docs.html	215.85	3,072	14.2×

Markdown — parse throughput (worst-of-3, v1.0.0)

Input	Size	Lanexio Parser MB/s	micromark MB/s	Ratio
headings.md	135 B	2.28	0.13	17.7×
inline.md	275 B	2.13	0.37	5.8×

Markdown — cold start (worst-of-3)

Input	Lanexio Parser ms	micromark ms
headings.md	0.20	1.03
inline.md	0.26	1.44

Markdown — heap usage (worst-of-3)

Input	Lanexio Parser KB	micromark KB
inline.md	25	423
headings.md	13	505

Lanexio Parser uses 94% less heap than micromark on inline.md. The flat-buffer design pays off especially clearly for small Markdown inputs.

WASM HTML tokenizer (v1.0)

The Zig/WASM HTML tokenizer is included in v1.0 as an opt-in acceleration path. It provides a measurable end-to-end speedup on HTML documents that contain no character references.

Input	Size	TS parse (ms)	WASM parse (ms)	Speedup
nodejs-docs.html	107 KB	3.68	3.37	1.09×
wikipedia.html	195 KB	17.15	16.19	1.00× *
github.html	306 KB	22.41	22.39	1.00× *

* Wikipedia and GitHub HTML contain character references (&, <, ©). The Zig tokenizer currently stubs character references; a pre-scan detects & followed by a letter or # and gracefully falls back to the TypeScript tokenizer. Full character reference resolution in Zig is deferred to v1.1.

v1.1 performance track

Item	Expected Gain
Character reference port to Zig (entity trie)	Enables WASM on all inputs
Lazy flat-buffer token view (skip Token[] materialization)	~1.06×
Zig tree builder (full parse pipeline in WASM)	2–4× overall
SIMD byte scanning in Zig tokenizer	Additional 20–30%

What is not benchmarked in v1.0

Serialization (Markdown): benchmarked separately — serializeMarkdown is included in v1.0 (MD-S4).
Incremental reparse: not in v1.0.
Streaming parse: not in v1.0.

Running benchmarks locally

cd /data/projects/parser
pnpm bench

Results are written to benchmarks/results/benchmark-results.json.