Parsing Markdown
Package: @lanexio/parser-grammar-markdown Stable
Layer: 2 (Grammar). Depends only on @lanexio/parser-core.
Runtime: Universal (browser, server, edge worker).
Serializer: serializeMarkdown — full roundtrip (MD-S4).
Overview
Section titled “Overview”parseMarkdown produces a flat AST from CommonMark 0.31.2 and GitHub Flavored Markdown (GFM) documents. The full spec-example corpora for both dialects are ingested as tests with empty skip and expected-fail lists (1,898 passing tests in the full package suite). GFM extensions are enabled by default. serializeMarkdown roundtrips the parsed tree back to source text — parse → serialize → re-parse produces a structurally equivalent tree.
Import
Section titled “Import”import { parseMarkdown, serializeMarkdown, MdKind, MdField, type ParseMarkdownOptions,} from '@lanexio/parser-grammar-markdown';Serialize back to Markdown
Section titled “Serialize back to Markdown”serializeMarkdown roundtrips a parsed Markdown tree back to source text.
Parse → serialize → re-parse produces an AST-identical tree for all
CommonMark and GFM features.
import { parseMarkdown, serializeMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();const tree = parseMarkdown(encoder.encode('# Hello\n\nA paragraph with **bold** text.'));const markdown = serializeMarkdown(tree);// → "# Hello\n\nA paragraph with **bold** text.\n"
// Golden roundtrip: parse → serialize → re-parse produces identical ASTconst tree2 = parseMarkdown(encoder.encode(markdown));// tree2 is structurally identical to treeHow it works
Section titled “How it works”| Node | Serialization |
|---|---|
Document | Children separated by blank lines |
AtxHeading | # × level + space + content |
SetextHeading | Content + \n=== (h1) or \n--- (h2) |
Paragraph | Inline content + newline |
Text | Escaped literal text |
Emphasis / Strong | *content* / **content** |
CodeSpan | Backtick-delimited with escaping |
Link / Image | [text](url) /  (inline-only) |
Autolink | <url> |
BlockQuote | > prefix per line |
List (unordered) | - bullet |
List (ordered) | 1. , 2. , … |
Table (GFM) | Pipe table with alignment |
ThematicBreak | --- |
FencedCodeBlock | ``` + info + content + ``` |
IndentedCodeBlock | 4-space indented lines |
HtmlBlock / RawHtml | Verbatim |
CharacterReference | &#NNN; verbatim |
Characters that would trigger inline formatting (*, _, `, [, ], ~, \)
are backslash-escaped to preserve roundtrip fidelity.
serializeMarkdown always returns a string and never throws — it is fully
iterative, and 50,000-level-deep block-quote nesting is covered by a
regression test in the never-throw suite.
Parse a document
Section titled “Parse a document”import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();const tree = parseMarkdown(encoder.encode('# Hello\n\nA paragraph with **bold** text.'));
console.log(tree.nodeCount);console.log(tree.root.kind); // Document root kind idparseMarkdown accepts a Uint8Array. Always use TextEncoder when converting a string. It never throws.
ParseMarkdownOptions
Section titled “ParseMarkdownOptions”| Field | Type | Default | Description |
|---|---|---|---|
gfm | boolean | true | Enable GitHub Flavored Markdown extensions (tables, task lists, strikethrough, autolinks). |
extendedAutolink | boolean | true (when gfm: true) | Enable GFM extended autolink detection. Has no effect when gfm: false. |
CommonMark-only mode
Section titled “CommonMark-only mode”import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();const tree = parseMarkdown( encoder.encode('# CommonMark only\n\nNo GFM tables or task lists.'), { gfm: false });GFM tables
Section titled “GFM tables”import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();const markdown = `| Name | Score || -------- | ----- || Alice | 100 || Bob | 95 |`;const tree = parseMarkdown(encoder.encode(markdown)); // gfm: true by defaultDetect LexError nodes
Section titled “Detect LexError nodes”import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();const tree = parseMarkdown(encoder.encode('This is valid. [incomplete link'));
for (const node of tree.root.children()) { if (node.hasError) { console.log('parse error at', node.range); }}parseMarkdown never throws. Malformed input produces LexError nodes. CommonMark is forgiving by design, so many inputs that look like “errors” are actually valid by spec.
Markdown serializer
Section titled “Markdown serializer”serializeMarkdown is available as of v1.0 (MD-S4). See Serialize back to Markdown above for usage.
The CLI serialize subcommand supports Markdown input:
parser serialize --grammar markdown document.mdMdKind constants
Section titled “MdKind constants”import { MdKind } from '@lanexio/parser-grammar-markdown';
// MdKind is a const object.const cursor = tree.cursor();visit: while (true) { if (cursor.current.kind === MdKind.AtxHeading) { console.log('heading at', cursor.current.range); } if (cursor.gotoFirstChild()) continue; while (!cursor.gotoNextSibling()) { if (!cursor.gotoParent()) break visit; }}MdKind is a const object. Do not use raw numbers.
Full exports
Section titled “Full exports”| Export | Type | Description |
|---|---|---|
parseMarkdown | (source: Uint8Array, options?: ParseMarkdownOptions) => LexTree | Parse Markdown. Never throws. |
serializeMarkdown | (input: LexTree | LexNode) => string | Serialize to Markdown source. Never throws; fully iterative. |
MdKind | const object | Numeric kind IDs for all Markdown node types. |
MdField | const object | Numeric field IDs for Markdown element slots. |
MD_FIELD_NAMES_BY_ID | Readonly<Record<number, string>> | Field-name lookup by numeric field ID. |
MdParseErrorCode | const object | Parse error code constants. |
markdownGrammar | LanexioParserPureGrammar | Grammar descriptor — pass to createParser from @lanexio/parser. |