Flat AST

The flat AST is the core data structure of Lanexio Parser. Every parse result is a contiguous ArrayBuffer of 16-byte nodes, laid out in preorder depth-first-search order. There are no per-node heap objects, no pointer chasing, and no garbage collection pressure.

Node layout

Each node occupies exactly 16 bytes:

Bytes	Field	Type	Description
0–1	`kind`	`uint16`	Node kind ID. Corresponds to `HtmlKind` or `MdKind` constants.
2	`flags`	`uint8`	Node flags. Bit 0 = `hasError`.
3	`fieldId`	`uint8`	Field slot ID within the parent. 0 = no field.
4–7	`start`	`uint32`	Byte offset of the first source byte of this node.
8–11	`end`	`uint32`	Byte offset one past the last source byte of this node.
12–15	`subtreeSize`	`uint32`	Number of nodes in this subtree, including self.

The layout constants are exported from @lanexio/parser-core (NODE_RECORD_SIZE, TREE_HEADER_SIZE, LEX_TREE_MAGIC, LEX_TREE_VERSION). The PROTOCOL_VERSION constant governs this contract. Do not decode the buffer manually — use the LexNode and LexCursor APIs, which always read against the correct layout version.

Preorder DFS

Nodes are stored in preorder depth-first-search order: a parent appears before all its descendants, and siblings appear in source order. This means:

The root node is always at index 0.
A node’s subtree is a contiguous slice of the buffer.
You can skip an entire subtree in O(1) by advancing the index by node.subtreeSize.
Sibling enumeration is O(number of siblings), not O(tree depth).

Tree header

The first 16 bytes of the buffer are the tree header:

Bytes	Field	Description
0–3	`magic`	`uint32`. Magic value `0x4c584943` identifying a Lanexio Parser tree buffer.
4–5	`treeVersion`	`uint16`. Tree buffer format version. Currently `1`.
6–7	`nodeRecordSize`	`uint16`. Size in bytes of each node record (always `16`).
8–11	`nodeCount`	`uint32`. Total number of nodes in the tree.
12–15	`rootIndex`	`uint32`. Index of the root node (always `0`).

Traversal: LexCursor

For hot traversal, use LexTree.cursor() which returns a LexCursor positioned at the root.

import { parseHtml } from '@lanexio/parser-grammar-html';

const encoder = new TextEncoder();
const tree = parseHtml(encoder.encode('<div><p>Hello</p><p>World</p></div>'));

const cursor = tree.cursor();
visit: while (true) {
  const node = cursor.current;
  const [start, end] = node.range;   // LexRange is a tuple: readonly [start, end]
  console.log(node.kind, start, end);
  if (cursor.gotoFirstChild()) continue;
  while (!cursor.gotoNextSibling()) {
    if (!cursor.gotoParent()) break visit;
  }
}

LexCursor methods

Method	Returns	Description
`cursor.current`	`LexNode`	The node at the current position.
`cursor.gotoFirstChild()`	`boolean`	Move to the first child. Returns `false` if none.
`cursor.gotoNextSibling()`	`boolean`	Move to the next sibling. Returns `false` if none.
`cursor.gotoPreviousSibling()`	`boolean`	Move to the previous sibling. Returns `false` if none.
`cursor.gotoParent()`	`boolean`	Move to the parent. Returns `false` at root.
`cursor.gotoFirstChildForRange([s, e])`	`boolean`	Move to the first child containing the byte range.
`cursor.gotoDescendantForRange([s, e])`	`boolean`	Move to the deepest descendant containing the byte range.

LexNode methods

Property/Method	Type	Description
`node.kind`	`number`	Numeric kind ID.
`node.flags`	`number`	Bit flags. Bit 0 = `hasError`.
`node.fieldId`	`number`	Field slot ID.
`node.range`	`LexRange` — `readonly [start, end]`	Half-open byte range in source. Destructure: `const [start, end] = node.range`.
`node.subtreeSize`	`number`	Total nodes in this subtree.
`node.hasError`	`boolean`	Whether this or any descendant has an error.
`node.text`	`string`	Source text slice as a string.
`node.children()`	`IterableIterator<LexNode>`	Iterate direct children. O(k) total via `subtreeSize` skips.
`node.childCount()`	`number`	Count direct children. O(k).
`node.child(index)`	`LexNode \| null`	Get child by position. O(k) per call — prefer `children()` iteration over indexed loops.
`node.childByField(name)`	`LexNode \| null`	First direct child with the named field role (e.g. `'attrs'`, `'key'`, `'value'`).
`node.fieldName()`	`string \| null`	This node’s field-role name via the tree’s field metadata.
`node.firstChildForRange([s, e])`	`LexNode \| null`	First direct child whose range contains the given range.
`node.descendantForRange([s, e])`	`LexNode \| null`	Deepest descendant whose range contains the given range.
`node.nextSibling()` / `previousSibling()`	`LexNode \| null`	Sibling navigation.
`node.parent()`	`LexNode \| null`	Parent lookup; builds a per-tree parent side table lazily on first use.

Zero-copy

The source bytes and the AST buffer are never copied during traversal. tree.source returns the original Uint8Array passed to parseHtml or parseMarkdown. tree.buffer returns the raw ArrayBuffer of nodes.

node.text does create a new string by decoding source bytes. Avoid it in hot loops — use node.range and decode only when needed.

O(1) subtree operations

Because subtreeSize is stored in every node:

Skip a subtree: index += node.subtreeSize
Find all siblings: iterate at the same depth level, each advancing by subtreeSize
Count descendants: node.subtreeSize - 1

This is why traversal with LexCursor is fast. The cursor tracks the current buffer index and uses subtreeSize to navigate without parent pointer lookup.

Why flat, not a tree of objects?

A conventional parse tree with one object per node creates heavy garbage collection pressure for large documents. A document with 10,000 nodes means 10,000 objects allocated. Lanexio Parser’s flat layout means one ArrayBuffer allocation per parse. The GC never sees individual nodes.