Migrating from parse5
This page covers the key differences between parse5 and @lanexio/parser-grammar-html and shows how to rewrite common parse5 patterns.
API mapping
Section titled “API mapping”| parse5 | Lanexio Parser | Notes |
|---|---|---|
parse(html: string) | parseHtml(bytes: Uint8Array) | Lanexio Parser accepts bytes, not a string. Use new TextEncoder().encode(str). |
parseFragment(html: string) | parseHtml(bytes, { mode: HtmlParseMode.Fragment, contextElement }) | Pass the context element name as a string. |
serialize(node) | serializeHtml(tree) | Accepts a LexTree or LexNode. |
serializeOuter(node) | serializeHtml(tree, { outer: true }) | outer: true is the default. |
| Node tree of JS objects | Flat ArrayBuffer + LexNode views | No per-node objects. Use tree.cursor() for traversal. |
node.nodeName | HtmlKind lookup | Kind IDs are numeric. Use node.kind === HtmlKind.Element. |
node.childNodes | node.children() | Returns an IterableIterator<LexNode>. |
node.parentNode | node.parent() | Opt-in. Materializes a parent index table on first call. |
node.attrs | Field iteration + tree.source | Attributes are child nodes with field IDs. |
defaultTreeAdapter | Not applicable | No adapter concept. The flat buffer is the tree. |
Step-by-step migration
Section titled “Step-by-step migration”-
Install
@lanexio/parser-grammar-htmland removeparse5.Terminal window pnpm add @lanexio/parser-grammar-htmlpnpm remove parse5 -
Replace
parse(html)withparseHtml(encoder.encode(html)).// Before (parse5)import { parse } from 'parse5';const doc = parse('<p>Hello</p>');// After (Lanexio Parser)import { parseHtml } from '@lanexio/parser-grammar-html';const encoder = new TextEncoder();const tree = parseHtml(encoder.encode('<p>Hello</p>')); -
Replace
serialize(node)withserializeHtml(tree).// Before (parse5)import { serialize } from 'parse5';const html = serialize(doc);// After (Lanexio Parser)import { serializeHtml } from '@lanexio/parser-grammar-html';const html = serializeHtml(tree); -
Replace node traversal with cursor traversal.
// Before (parse5) — recursive object traversalfunction walkParse5(node) {console.log(node.nodeName);for (const child of node.childNodes ?? []) walkParse5(child);}// After (Lanexio Parser) — cursor traversal (no recursion, zero allocation)import { HtmlKind } from '@lanexio/parser-grammar-html';const cursor = tree.cursor();visit: while (true) {console.log(cursor.current.kind);if (cursor.gotoFirstChild()) continue;while (!cursor.gotoNextSibling()) {if (!cursor.gotoParent()) break visit;}} -
Replace error handling.
// Before (parse5) — parsing always succeeds, errors in onParseError callback// After (Lanexio Parser) — parsing always succeeds, errors are LexError nodes in the treefor (const node of tree.root.children()) {if (node.hasError) console.log('parse error at', node.range);}
Key differences
Section titled “Key differences”Input type
Section titled “Input type”parse5 accepts strings. Lanexio Parser accepts Uint8Array. This reflects the zero-copy design: bytes are not decoded to a JS string before parsing.
// Lanexio Parser: always encode firstconst encoder = new TextEncoder();const tree = parseHtml(encoder.encode(htmlString));Tree shape
Section titled “Tree shape”parse5 builds a tree of JavaScript objects (Element, TextNode, etc.) that implement the DOM interface. Lanexio Parser builds a flat ArrayBuffer of 16-byte records. There are no per-node objects. Traversal is through LexCursor, not through object properties.
Error handling
Section titled “Error handling”parse5 reports errors through an onParseError callback option. Lanexio Parser embeds errors as LexError nodes in the AST. There is no callback.
Fragment parsing
Section titled “Fragment parsing”parse5’s parseFragment takes a context element as an object. Lanexio Parser takes the context element tag name as a string:
import { parseHtml, HtmlParseMode } from '@lanexio/parser-grammar-html';const encoder = new TextEncoder();const tree = parseHtml( encoder.encode('<li>Item</li>'), { mode: HtmlParseMode.Fragment, contextElement: 'ul' });Serialization performance
Section titled “Serialization performance”In v1.0, Lanexio Parser’s serializer runs 7-15× slower than parse5’s on the real-world corpus. If serialization throughput is critical for your workload, keep this in mind. See Benchmarks.
Attributes
Section titled “Attributes”parse5 puts attributes on node.attrs as an array of { name, value } objects. In Lanexio Parser, attribute values are child nodes of Element nodes, each with a fieldId that you can look up in HTML_FIELD_NAMES_BY_ID. Attribute access patterns differ significantly. Refer to the flat AST documentation for the traversal approach.