Flat AST
The flat AST is the core data structure of Lanexio Parser. Every parse result is a contiguous ArrayBuffer of 16-byte nodes, laid out in preorder depth-first-search order. There are no per-node heap objects, no pointer chasing, and no garbage collection pressure.
Node layout
Section titled “Node layout”Each node occupies exactly 16 bytes:
| Bytes | Field | Type | Description |
|---|---|---|---|
| 0–1 | kind | uint16 | Node kind ID. Corresponds to HtmlKind or MdKind constants. |
| 2 | flags | uint8 | Node flags. Bit 0 = hasError. |
| 3 | fieldId | uint8 | Field slot ID within the parent. 0 = no field. |
| 4–7 | start | uint32 | Byte offset of the first source byte of this node. |
| 8–11 | end | uint32 | Byte offset one past the last source byte of this node. |
| 12–15 | subtreeSize | uint32 | Number of nodes in this subtree, including self. |
The layout constants are exported from @lanexio/parser-core (NODE_RECORD_SIZE, TREE_HEADER_SIZE, LEX_TREE_MAGIC, LEX_TREE_VERSION). The PROTOCOL_VERSION constant governs this contract. Do not decode the buffer manually — use the LexNode and LexCursor APIs, which always read against the correct layout version.
Preorder DFS
Section titled “Preorder DFS”Nodes are stored in preorder depth-first-search order: a parent appears before all its descendants, and siblings appear in source order. This means:
- The root node is always at index 0.
- A node’s subtree is a contiguous slice of the buffer.
- You can skip an entire subtree in O(1) by advancing the index by
node.subtreeSize. - Sibling enumeration is O(number of siblings), not O(tree depth).
Tree header
Section titled “Tree header”The first 16 bytes of the buffer are the tree header:
| Bytes | Field | Description |
|---|---|---|
| 0–3 | magic | uint32. Magic value 0x4c584943 identifying a Lanexio Parser tree buffer. |
| 4–5 | treeVersion | uint16. Tree buffer format version. Currently 1. |
| 6–7 | nodeRecordSize | uint16. Size in bytes of each node record (always 16). |
| 8–11 | nodeCount | uint32. Total number of nodes in the tree. |
| 12–15 | rootIndex | uint32. Index of the root node (always 0). |
Traversal: LexCursor
Section titled “Traversal: LexCursor”For hot traversal, use LexTree.cursor() which returns a LexCursor positioned at the root.
import { parseHtml } from '@lanexio/parser-grammar-html';
const encoder = new TextEncoder();const tree = parseHtml(encoder.encode('<div><p>Hello</p><p>World</p></div>'));
const cursor = tree.cursor();visit: while (true) { const node = cursor.current; const [start, end] = node.range; // LexRange is a tuple: readonly [start, end] console.log(node.kind, start, end); if (cursor.gotoFirstChild()) continue; while (!cursor.gotoNextSibling()) { if (!cursor.gotoParent()) break visit; }}LexCursor methods
Section titled “LexCursor methods”| Method | Returns | Description |
|---|---|---|
cursor.current | LexNode | The node at the current position. |
cursor.gotoFirstChild() | boolean | Move to the first child. Returns false if none. |
cursor.gotoNextSibling() | boolean | Move to the next sibling. Returns false if none. |
cursor.gotoPreviousSibling() | boolean | Move to the previous sibling. Returns false if none. |
cursor.gotoParent() | boolean | Move to the parent. Returns false at root. |
cursor.gotoFirstChildForRange([s, e]) | boolean | Move to the first child containing the byte range. |
cursor.gotoDescendantForRange([s, e]) | boolean | Move to the deepest descendant containing the byte range. |
LexNode methods
Section titled “LexNode methods”| Property/Method | Type | Description |
|---|---|---|
node.kind | number | Numeric kind ID. |
node.flags | number | Bit flags. Bit 0 = hasError. |
node.fieldId | number | Field slot ID. |
node.range | LexRange — readonly [start, end] | Half-open byte range in source. Destructure: const [start, end] = node.range. |
node.subtreeSize | number | Total nodes in this subtree. |
node.hasError | boolean | Whether this or any descendant has an error. |
node.text | string | Source text slice as a string. |
node.children() | IterableIterator<LexNode> | Iterate direct children. O(k) total via subtreeSize skips. |
node.childCount() | number | Count direct children. O(k). |
node.child(index) | LexNode | null | Get child by position. O(k) per call — prefer children() iteration over indexed loops. |
node.childByField(name) | LexNode | null | First direct child with the named field role (e.g. 'attrs', 'key', 'value'). |
node.fieldName() | string | null | This node’s field-role name via the tree’s field metadata. |
node.firstChildForRange([s, e]) | LexNode | null | First direct child whose range contains the given range. |
node.descendantForRange([s, e]) | LexNode | null | Deepest descendant whose range contains the given range. |
node.nextSibling() / previousSibling() | LexNode | null | Sibling navigation. |
node.parent() | LexNode | null | Parent lookup; builds a per-tree parent side table lazily on first use. |
Zero-copy
Section titled “Zero-copy”The source bytes and the AST buffer are never copied during traversal. tree.source returns the original Uint8Array passed to parseHtml or parseMarkdown. tree.buffer returns the raw ArrayBuffer of nodes.
node.text does create a new string by decoding source bytes. Avoid it in hot loops — use node.range and decode only when needed.
O(1) subtree operations
Section titled “O(1) subtree operations”Because subtreeSize is stored in every node:
- Skip a subtree:
index += node.subtreeSize - Find all siblings: iterate at the same depth level, each advancing by
subtreeSize - Count descendants:
node.subtreeSize - 1
This is why traversal with LexCursor is fast. The cursor tracks the current buffer index and uses subtreeSize to navigate without parent pointer lookup.
Why flat, not a tree of objects?
Section titled “Why flat, not a tree of objects?”A conventional parse tree with one object per node creates heavy garbage collection pressure for large documents. A document with 10,000 nodes means 10,000 objects allocated. Lanexio Parser’s flat layout means one ArrayBuffer allocation per parse. The GC never sees individual nodes.