Skip to content

parser-wasm API

@lanexio/parser-wasm defines the contract a WebAssembly grammar pack must satisfy to plug into createParser, plus the loaders, a bundled demo grammar, and an optional WASM accelerator for the HTML tokenizer.

Layer: 1.5 (WASM host glue). Depends on @lanexio/parser-core. Runtime: Universal — Node/Bun read bundled bytes from disk; browsers pass a fetched BufferSource.

type LanexioParserGrammar = {
/** Stable npm package name (used in diagnostics). */
readonly name: string;
/** Shared buffer protocol version this grammar targets. */
readonly protocolVersion: number;
/** WASM bytes / module — or a Promise of one, or a thunk returning either. */
readonly source: LanexioParserGrammarSourceLike;
/** Builds a runtime parser from the resolved source. */
readonly instantiate: (source: LanexioParserGrammarSource) => Promise<WasmBridgeParser>;
};
type LanexioParserGrammarSource = BufferSource | WebAssembly.Module;
type LanexioParserGrammarSourceLike =
| LanexioParserGrammarSource
| Promise<LanexioParserGrammarSource>
| (() => LanexioParserGrammarSource | Promise<LanexioParserGrammarSource>);

Use defineLanexioParserGrammar(grammar) to get full type checking while keeping the value opaque:

import { defineLanexioParserGrammar, instantiateToyWasm } from '@lanexio/parser-wasm';
export const myGrammar = defineLanexioParserGrammar({
name: '@me/grammar-mything',
protocolVersion: 1,
source: () => fetch('/wasm/mything.wasm').then((r) => r.arrayBuffer()),
instantiate: async (src) => buildBridge(await instantiate(src)),
});

The lazy-thunk source form means no bytes load until createParser runs — ship grammar descriptors in your bundle for free.

resolveGrammarSource(source: LanexioParserGrammarSourceLike): Promise<LanexioParserGrammarSource>

Normalizes all three source shapes. createParser uses this internally; custom hosts can share the exact semantics.

Loaders throw (construction-time, by design — see Error Handling) with stable codes:

CodeMeaning
instantiation_failedWebAssembly.instantiate rejected the bytes/module.
missing_exportsThe module lacks the required parser export surface.
missing_toy_exportsThe module lacks the toy buffer-ABI exports (toy loader only).
protocol_mismatchThe module reports an incompatible protocol version.

A successful load yields a parser whose runtime parse is never-throwing, like every other grammar.

import { toyGrammar, TOY_GRAMMAR_NAME } from '@lanexio/parser-wasm';

toyGrammar is the complete worked example of the contract — bundled .wasm bytes loaded via node:fs on the server (browsers should supply their own BufferSource), instantiateToyWasm + a bridge for instantiate. It parses the (ident + ident) demo language and backs the README quickstart, the fuzz harnesses, and createParser’s tests.

import { createParser, defineLanexioParserGrammar } from '@lanexio/parser';
import { instantiateToyWasm } from '@lanexio/parser-wasm';
import { createToyWasmBridgeParser } from '@lanexio/parser-wasm';
const browserToy = defineLanexioParserGrammar({
name: '@lanexio/grammar-toy-internal',
protocolVersion: 1,
source: () => fetch('/wasm/parser-toy.wasm').then((r) => r.arrayBuffer()),
instantiate: async (src) => createToyWasmBridgeParser(await instantiateToyWasm(src)),
});
const parser = await createParser(browserToy);

Independently of the grammar contract, this package ships an optional WASM fast path for the HTML tokenizer and the HTML serializer’s escape scanning:

ExportDescription
instantiateWasmHtmlTokenizer(source, imports?)Async instantiation. Returns the typed exports object, or null if the module doesn’t satisfy the HTML tokenizer ABI.
instantiateWasmHtmlTokenizerSync(module, imports?)Sync variant for a precompiled WebAssembly.Module.
createWasmHtmlTokenizer(exports)Wraps validated exports into a (source: Uint8Array) => Token[] tokenizer, or returns null for an invalid ABI.
WasmHtmlTokenizerExportsThe ABI type (memory + lanexio_parser_html_* exports).

The same exports object can be passed as wasmScanner to serializeHtml. All accelerated paths fall back to the pure-TypeScript implementations on capacity or instantiation failure, and outputs are byte-identical either way (enforced by an escape-consistency property suite).