Skip to content

Bimodal AST

The Lanexio Parser AST has two layers that work together. The universal base layer handles storage and traversal. The grammar-specific typed layer gives you named kind constants and type-safe field access.

Every node in every grammar is a LexNode. The universal layer is grammar-agnostic. It stores and traverses nodes, but it does not know whether a node is an HTML element, a Markdown heading, or any other domain concept.

The universal base is implemented in @lanexio/parser-core. It provides:

  • LexTree — the parsed result, wrapping a flat ArrayBuffer
  • LexNode — a view into a 16-byte record in that buffer
  • LexCursor — a stateful cursor for efficient traversal

You can use the universal base alone if you want to write grammar-neutral code, but most application code also imports the grammar-specific layer for named kind constants.

Each grammar pack adds a typed projection on top of the universal base. The projection is a set of const objects generated from the Zig grammar definition.

For @lanexio/parser-grammar-html, the projection is HtmlKind, HtmlField, and HTML_FIELD_NAMES_BY_ID. For @lanexio/parser-grammar-markdown, it is MdKind, MdField, and MD_FIELD_NAMES_BY_ID.

import { parseHtml, HtmlKind } from '@lanexio/parser-grammar-html';
const encoder = new TextEncoder();
const tree = parseHtml(encoder.encode('<p>Hello</p>'));
const cursor = tree.cursor();
visit: while (true) {
if (cursor.current.kind === HtmlKind.Element) {
console.log('element node at', cursor.current.range);
}
if (cursor.gotoFirstChild()) continue;
while (!cursor.gotoNextSibling()) {
if (!cursor.gotoParent()) break visit;
}
}

HtmlKind.Element is a stable numeric constant. The numeric values are generated from zig/grammars/html/src/kinds.zig and never change within a major version.

Lanexio Parser uses const objects with typeof union types instead of TypeScript enum. This is because TypeScript enums have runtime behavior and type erasure that create subtle bugs in strict code.

// What the generated code looks like
export const HtmlKind = {
Document: 1,
Element: 2,
Text: 3,
// ...
} as const;
export type HtmlKindValue = typeof HtmlKind[keyof typeof HtmlKind];

When you need to type-check a kind value, use the union type:

import { HtmlKind } from '@lanexio/parser-grammar-html';
type HtmlKindValue = typeof HtmlKind[keyof typeof HtmlKind];
function describeKind(kind: HtmlKindValue): string {
switch (kind) {
case HtmlKind.Element: return 'Element';
case HtmlKind.Text: return 'Text';
default: return 'Other';
}
}

Grammar kind constants are defined in Zig, in files like zig/grammars/html/src/kinds.zig. The pnpm codegen command generates the TypeScript .generated.ts file from the Zig source.

Never edit .generated.ts files by hand. They are overwritten by pnpm codegen. If you need to add or change a kind, edit kinds.zig and run pnpm codegen.

  1. Add the kind to the appropriate kinds.zig.
  2. Run pnpm codegen.
  3. Commit both kinds.zig and the generated .generated.ts.
  4. Add tests and a corpus entry for the new kind.
  5. Run pnpm verify:no-throw to confirm the parse path still holds.

See AGENTS.md §7 for the full protocol.

Each grammar pack is its own npm package. Grammar packs depend only on parser-core. They never depend on each other. This means you can use grammar-html without loading grammar-markdown, and vice versa.

The lazy-load story depends on this independence. If a grammar pack imported another, bundlers could not tree-shake unused grammars.