Skip to content

Parsing Markdown

Package: @lanexio/parser-grammar-markdown Stable
Layer: 2 (Grammar). Depends only on @lanexio/parser-core.
Runtime: Universal (browser, server, edge worker). Serializer: serializeMarkdown — full roundtrip (MD-S4).

parseMarkdown produces a flat AST from CommonMark 0.31.2 and GitHub Flavored Markdown (GFM) documents. The full spec-example corpora for both dialects are ingested as tests with empty skip and expected-fail lists (1,898 passing tests in the full package suite). GFM extensions are enabled by default. serializeMarkdown roundtrips the parsed tree back to source text — parse → serialize → re-parse produces a structurally equivalent tree.

import {
parseMarkdown,
serializeMarkdown,
MdKind,
MdField,
type ParseMarkdownOptions,
} from '@lanexio/parser-grammar-markdown';

serializeMarkdown roundtrips a parsed Markdown tree back to source text. Parse → serialize → re-parse produces an AST-identical tree for all CommonMark and GFM features.

import { parseMarkdown, serializeMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();
const tree = parseMarkdown(encoder.encode('# Hello\n\nA paragraph with **bold** text.'));
const markdown = serializeMarkdown(tree);
// → "# Hello\n\nA paragraph with **bold** text.\n"
// Golden roundtrip: parse → serialize → re-parse produces identical AST
const tree2 = parseMarkdown(encoder.encode(markdown));
// tree2 is structurally identical to tree
NodeSerialization
DocumentChildren separated by blank lines
AtxHeading# × level + space + content
SetextHeadingContent + \n=== (h1) or \n--- (h2)
ParagraphInline content + newline
TextEscaped literal text
Emphasis / Strong*content* / **content**
CodeSpanBacktick-delimited with escaping
Link / Image[text](url) / ![alt](url) (inline-only)
Autolink<url>
BlockQuote> prefix per line
List (unordered)- bullet
List (ordered)1. , 2. , …
Table (GFM)Pipe table with alignment
ThematicBreak---
FencedCodeBlock``` + info + content + ```
IndentedCodeBlock4-space indented lines
HtmlBlock / RawHtmlVerbatim
CharacterReference&#NNN; verbatim

Characters that would trigger inline formatting (*, _, `, [, ], ~, \) are backslash-escaped to preserve roundtrip fidelity.

serializeMarkdown always returns a string and never throws — it is fully iterative, and 50,000-level-deep block-quote nesting is covered by a regression test in the never-throw suite.

import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();
const tree = parseMarkdown(encoder.encode('# Hello\n\nA paragraph with **bold** text.'));
console.log(tree.nodeCount);
console.log(tree.root.kind); // Document root kind id

parseMarkdown accepts a Uint8Array. Always use TextEncoder when converting a string. It never throws.

FieldTypeDefaultDescription
gfmbooleantrueEnable GitHub Flavored Markdown extensions (tables, task lists, strikethrough, autolinks).
extendedAutolinkbooleantrue (when gfm: true)Enable GFM extended autolink detection. Has no effect when gfm: false.
import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();
const tree = parseMarkdown(
encoder.encode('# CommonMark only\n\nNo GFM tables or task lists.'),
{ gfm: false }
);
import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();
const markdown = `
| Name | Score |
| -------- | ----- |
| Alice | 100 |
| Bob | 95 |
`;
const tree = parseMarkdown(encoder.encode(markdown)); // gfm: true by default
import { parseMarkdown } from '@lanexio/parser-grammar-markdown';
const encoder = new TextEncoder();
const tree = parseMarkdown(encoder.encode('This is valid. [incomplete link'));
for (const node of tree.root.children()) {
if (node.hasError) {
console.log('parse error at', node.range);
}
}

parseMarkdown never throws. Malformed input produces LexError nodes. CommonMark is forgiving by design, so many inputs that look like “errors” are actually valid by spec.

serializeMarkdown is available as of v1.0 (MD-S4). See Serialize back to Markdown above for usage.

The CLI serialize subcommand supports Markdown input:

Terminal window
parser serialize --grammar markdown document.md
import { MdKind } from '@lanexio/parser-grammar-markdown';
// MdKind is a const object.
const cursor = tree.cursor();
visit: while (true) {
if (cursor.current.kind === MdKind.AtxHeading) {
console.log('heading at', cursor.current.range);
}
if (cursor.gotoFirstChild()) continue;
while (!cursor.gotoNextSibling()) {
if (!cursor.gotoParent()) break visit;
}
}

MdKind is a const object. Do not use raw numbers.

ExportTypeDescription
parseMarkdown(source: Uint8Array, options?: ParseMarkdownOptions) => LexTreeParse Markdown. Never throws.
serializeMarkdown(input: LexTree | LexNode) => stringSerialize to Markdown source. Never throws; fully iterative.
MdKindconst objectNumeric kind IDs for all Markdown node types.
MdFieldconst objectNumeric field IDs for Markdown element slots.
MD_FIELD_NAMES_BY_IDReadonly<Record<number, string>>Field-name lookup by numeric field ID.
MdParseErrorCodeconst objectParse error code constants.
markdownGrammarLanexioParserPureGrammarGrammar descriptor — pass to createParser from @lanexio/parser.