Skip to content

Flat AST

The flat AST is the core data structure of Lanexio Parser. Every parse result is a contiguous ArrayBuffer of 16-byte nodes, laid out in preorder depth-first-search order. There are no per-node heap objects, no pointer chasing, and no garbage collection pressure.

Each node occupies exactly 16 bytes:

BytesFieldTypeDescription
0–1kinduint16Node kind ID. Corresponds to HtmlKind or MdKind constants.
2flagsuint8Node flags. Bit 0 = hasError.
3fieldIduint8Field slot ID within the parent. 0 = no field.
4–7startuint32Byte offset of the first source byte of this node.
8–11enduint32Byte offset one past the last source byte of this node.
12–15subtreeSizeuint32Number of nodes in this subtree, including self.

The layout constants are exported from @lanexio/parser-core (NODE_RECORD_SIZE, TREE_HEADER_SIZE, LEX_TREE_MAGIC, LEX_TREE_VERSION). The PROTOCOL_VERSION constant governs this contract. Do not decode the buffer manually — use the LexNode and LexCursor APIs, which always read against the correct layout version.

Nodes are stored in preorder depth-first-search order: a parent appears before all its descendants, and siblings appear in source order. This means:

  • The root node is always at index 0.
  • A node’s subtree is a contiguous slice of the buffer.
  • You can skip an entire subtree in O(1) by advancing the index by node.subtreeSize.
  • Sibling enumeration is O(number of siblings), not O(tree depth).

The first 16 bytes of the buffer are the tree header:

BytesFieldDescription
0–3magicuint32. Magic value 0x4c584943 identifying a Lanexio Parser tree buffer.
4–5treeVersionuint16. Tree buffer format version. Currently 1.
6–7nodeRecordSizeuint16. Size in bytes of each node record (always 16).
8–11nodeCountuint32. Total number of nodes in the tree.
12–15rootIndexuint32. Index of the root node (always 0).

For hot traversal, use LexTree.cursor() which returns a LexCursor positioned at the root.

import { parseHtml } from '@lanexio/parser-grammar-html';
const encoder = new TextEncoder();
const tree = parseHtml(encoder.encode('<div><p>Hello</p><p>World</p></div>'));
const cursor = tree.cursor();
visit: while (true) {
const node = cursor.current;
const [start, end] = node.range; // LexRange is a tuple: readonly [start, end]
console.log(node.kind, start, end);
if (cursor.gotoFirstChild()) continue;
while (!cursor.gotoNextSibling()) {
if (!cursor.gotoParent()) break visit;
}
}
MethodReturnsDescription
cursor.currentLexNodeThe node at the current position.
cursor.gotoFirstChild()booleanMove to the first child. Returns false if none.
cursor.gotoNextSibling()booleanMove to the next sibling. Returns false if none.
cursor.gotoPreviousSibling()booleanMove to the previous sibling. Returns false if none.
cursor.gotoParent()booleanMove to the parent. Returns false at root.
cursor.gotoFirstChildForRange([s, e])booleanMove to the first child containing the byte range.
cursor.gotoDescendantForRange([s, e])booleanMove to the deepest descendant containing the byte range.
Property/MethodTypeDescription
node.kindnumberNumeric kind ID.
node.flagsnumberBit flags. Bit 0 = hasError.
node.fieldIdnumberField slot ID.
node.rangeLexRangereadonly [start, end]Half-open byte range in source. Destructure: const [start, end] = node.range.
node.subtreeSizenumberTotal nodes in this subtree.
node.hasErrorbooleanWhether this or any descendant has an error.
node.textstringSource text slice as a string.
node.children()IterableIterator<LexNode>Iterate direct children. O(k) total via subtreeSize skips.
node.childCount()numberCount direct children. O(k).
node.child(index)LexNode | nullGet child by position. O(k) per call — prefer children() iteration over indexed loops.
node.childByField(name)LexNode | nullFirst direct child with the named field role (e.g. 'attrs', 'key', 'value').
node.fieldName()string | nullThis node’s field-role name via the tree’s field metadata.
node.firstChildForRange([s, e])LexNode | nullFirst direct child whose range contains the given range.
node.descendantForRange([s, e])LexNode | nullDeepest descendant whose range contains the given range.
node.nextSibling() / previousSibling()LexNode | nullSibling navigation.
node.parent()LexNode | nullParent lookup; builds a per-tree parent side table lazily on first use.

The source bytes and the AST buffer are never copied during traversal. tree.source returns the original Uint8Array passed to parseHtml or parseMarkdown. tree.buffer returns the raw ArrayBuffer of nodes.

node.text does create a new string by decoding source bytes. Avoid it in hot loops — use node.range and decode only when needed.

Because subtreeSize is stored in every node:

  • Skip a subtree: index += node.subtreeSize
  • Find all siblings: iterate at the same depth level, each advancing by subtreeSize
  • Count descendants: node.subtreeSize - 1

This is why traversal with LexCursor is fast. The cursor tracks the current buffer index and uses subtreeSize to navigate without parent pointer lookup.

A conventional parse tree with one object per node creates heavy garbage collection pressure for large documents. A document with 10,000 nodes means 10,000 objects allocated. Lanexio Parser’s flat layout means one ArrayBuffer allocation per parse. The GC never sees individual nodes.