Skip to content

Migrating from parse5

This page covers the key differences between parse5 and @lanexio/parser-grammar-html and shows how to rewrite common parse5 patterns.

parse5Lanexio ParserNotes
parse(html: string)parseHtml(bytes: Uint8Array)Lanexio Parser accepts bytes, not a string. Use new TextEncoder().encode(str).
parseFragment(html: string)parseHtml(bytes, { mode: HtmlParseMode.Fragment, contextElement })Pass the context element name as a string.
serialize(node)serializeHtml(tree)Accepts a LexTree or LexNode.
serializeOuter(node)serializeHtml(tree, { outer: true })outer: true is the default.
Node tree of JS objectsFlat ArrayBuffer + LexNode viewsNo per-node objects. Use tree.cursor() for traversal.
node.nodeNameHtmlKind lookupKind IDs are numeric. Use node.kind === HtmlKind.Element.
node.childNodesnode.children()Returns an IterableIterator<LexNode>.
node.parentNodenode.parent()Opt-in. Materializes a parent index table on first call.
node.attrsField iteration + tree.sourceAttributes are child nodes with field IDs.
defaultTreeAdapterNot applicableNo adapter concept. The flat buffer is the tree.
  1. Install @lanexio/parser-grammar-html and remove parse5.

    Terminal window
    pnpm add @lanexio/parser-grammar-html
    pnpm remove parse5
  2. Replace parse(html) with parseHtml(encoder.encode(html)).

    // Before (parse5)
    import { parse } from 'parse5';
    const doc = parse('<p>Hello</p>');
    // After (Lanexio Parser)
    import { parseHtml } from '@lanexio/parser-grammar-html';
    const encoder = new TextEncoder();
    const tree = parseHtml(encoder.encode('<p>Hello</p>'));
  3. Replace serialize(node) with serializeHtml(tree).

    // Before (parse5)
    import { serialize } from 'parse5';
    const html = serialize(doc);
    // After (Lanexio Parser)
    import { serializeHtml } from '@lanexio/parser-grammar-html';
    const html = serializeHtml(tree);
  4. Replace node traversal with cursor traversal.

    // Before (parse5) — recursive object traversal
    function walkParse5(node) {
    console.log(node.nodeName);
    for (const child of node.childNodes ?? []) walkParse5(child);
    }
    // After (Lanexio Parser) — cursor traversal (no recursion, zero allocation)
    import { HtmlKind } from '@lanexio/parser-grammar-html';
    const cursor = tree.cursor();
    visit: while (true) {
    console.log(cursor.current.kind);
    if (cursor.gotoFirstChild()) continue;
    while (!cursor.gotoNextSibling()) {
    if (!cursor.gotoParent()) break visit;
    }
    }
  5. Replace error handling.

    // Before (parse5) — parsing always succeeds, errors in onParseError callback
    // After (Lanexio Parser) — parsing always succeeds, errors are LexError nodes in the tree
    for (const node of tree.root.children()) {
    if (node.hasError) console.log('parse error at', node.range);
    }

parse5 accepts strings. Lanexio Parser accepts Uint8Array. This reflects the zero-copy design: bytes are not decoded to a JS string before parsing.

// Lanexio Parser: always encode first
const encoder = new TextEncoder();
const tree = parseHtml(encoder.encode(htmlString));

parse5 builds a tree of JavaScript objects (Element, TextNode, etc.) that implement the DOM interface. Lanexio Parser builds a flat ArrayBuffer of 16-byte records. There are no per-node objects. Traversal is through LexCursor, not through object properties.

parse5 reports errors through an onParseError callback option. Lanexio Parser embeds errors as LexError nodes in the AST. There is no callback.

parse5’s parseFragment takes a context element as an object. Lanexio Parser takes the context element tag name as a string:

import { parseHtml, HtmlParseMode } from '@lanexio/parser-grammar-html';
const encoder = new TextEncoder();
const tree = parseHtml(
encoder.encode('<li>Item</li>'),
{ mode: HtmlParseMode.Fragment, contextElement: 'ul' }
);

In v1.0, Lanexio Parser’s serializer runs 7-15× slower than parse5’s on the real-world corpus. If serialization throughput is critical for your workload, keep this in mind. See Benchmarks.

parse5 puts attributes on node.attrs as an array of { name, value } objects. In Lanexio Parser, attribute values are child nodes of Element nodes, each with a fieldId that you can look up in HTML_FIELD_NAMES_BY_ID. Attribute access patterns differ significantly. Refer to the flat AST documentation for the traversal approach.