Markdown is great for writing content but we sometimes need certain precise HTML. Processing the HTML generated by a Markdown library can give the required flexibility.

Markdown is easy to use making it an attractive choice for writing web content. This is doubly true for getting programmers to write documentation, because we don't have to think too deeply about the HTML. But, the marketing department often wants the website to look "just so", meaning we must sometimes use certain precise HTML. That desire leads us away from Markdowns main selling point, which is the simplicity of creating headers of several sizes, apply bold or italic styles, create lists, etc. How do you throw a <span>
around something with particular classes or ID's, or to add ID's to the headers, or many other types of front end engineering wizardry?
Markdown's primary selling point is as a lightweight markup language for creating formatted text using a regular text editor. That's attractive to programmers who have an aversion to WYSIWYG editors like Libre Office.
But, what Markdown lacks is precise control over HTML. You use # header text
to write an <h1>
tag, for example. That's simple and easy, but what if you need the HTML to be <h1 class="chapter-title" id="chapter-1">
instead?
Most Markdown processors do allow you to write any HTML tag, which will pass through to the rendered output. But that's cumbersome because we have to write the HTML ourselves. We've flocked to Markdown for its ease of use, which conflicts with the desire for precisely controlling the generated HTML.
Extensions to the Markdown language are supported by some Markdown processors, such as the more powerful table formatting of GitHub Flavored Markdown. These extensions give more flexibility, but is not the same as having precise control over the generated HTML.
In this article, we want to contrast two approaches to extending Markdown. The first, Markdoc, adds a tags feature similar to template systems like Mustache. The second, Mahabhuta, adds custom HTML (DOM) processing and custom HTML tags that can be used with any HTML document, not just Markdown.
More precisely:
- Extending the Markdown syntax to almost be a template processing engine -- Markdoc
- Extending a Markdown processor with plugins -- Markdown-IT
- Processing HTML with server-side DOM processing (using jQuery/Cheerio) to implement custom HTML tags and custom HTML processing -- AkashaCMS with Mahabhuta
Markdoc - Markdown plus tags
I recently came across
Markdoc, which is a Markdown-based document format that includes a custom syntax for tags and annotations. While this syntax is roughly similar to template engines like Handlebars, it does not have "full-blown templating support". The Markdoc processor is written for Node.js, and it parses documents to an AST (abstract syntax tree), which can be rendered to several output formats including HTML.
{% tag %}
Content
{% /tag %}
The tag is a core Markdoc concept. Each tag has its own function, and like HTML it starts with an opening tag and closes with an end tag.
{% image width=40 /%}
Tag invocations can also be given attributes.
Here I am rendering a custom {% $variable %}
The Markdoc processor can be given values to be treated as variables, and then substituted into the output as shown here.
# Examples {% #examples %}
This construct will an ID to some Markdown markup, for this example <h1 id="examples">
.
The AkashaCMS project recently added Markdoc to its list of built-in rendering engines. This gave us some experience in its use, and with its API.
The tags and attributes supplied with Markdoc are interesting and in many cases are useful. One issue where we had trouble was to correctly integrate Markdoc's partial support to be useful in AkashaCMS.
A Markdoc partial looks like so:
{% partial file="header.md" /%}
You'd think, cool, it's going to read the named file, sincehe file=
attribute sure reads like it's going to read a file inserting it into the output. No, that's not what it does. Instead, it looks in the Markdoc configuration for an object, partials
, for an entry whose key matches the value of the file=
attribute. This configuration looks something like this:
const config = {
partials: {
'header.md': Markdoc.parse(`# My header`)
}
};
In other words, file=
is a key in the partials
object in the configuration. Despite the attribute name being file
it is not reading a file.
In AkashaCMS the <partial>
tag does read files from the filesystem. To integrate Markdoc with AkashaCMS, we tried to override the {% partial %}
tag, as the documentation suggests. But, the documentation was not clearly enough written to explain what to do.
Since Markdoc is currently at version 0.2 we can expect there to be rough edges.
- While Markdoc looks promising, it feels incomplete.
- The configuration object allows for several degrees of extending its functionality.
In other words, it is well worth exploring. Their implementation of the {% partial %}
tag is extremely puzzling. Maybe one day we'll make enough time to work out how to properly integrate it with the AkashaCMS partial support.
Extended Markdown with plugins
As noted earlier, many Markdown processors support plugins for implementing extensions. AkashaCMS has long used Markdown-IT for processing Markdown. It supports plugins, and the community around Markdown-IT has developed quite a few of them.
Inline code highlighting - Since Markdown is best used by software engineers, we often need to write code snippets. This article has a couple, for example. It's desirable for code snippets to use color coding like is common in IDEs.
-
markdown-it-highlightjs
uses the HighlightJS package. -
markdown-it-prism
uses the Prism package.
Another programmer-friendly need is to produce UML diagrams from simple text descriptions. The
markdown-it-textual-uml
plugin supports four different engines for this purpose.
One useful GitHub Flavored Markdown feature is task lists, which are Markdown lists annotated with [ ]
or [x]
. The
markdown-it-task-lists
takes care of this.
MultiMarkdown is another extended flavor of Markdown, with a powerful table format. The
markdown-it-multimd-table
plugin supports this table format.
As we noted earlier, we sometimes need to use specific HTML attributes. The
markdown-it-attrs
lets us do so with a simple syntax.
These are a few of the more interesting found while browsing
available plugins in the npm registry.
To use a Markdown-IT plugin, the Markdown package is initialized like so:
const md = require('markdown-it')()
.use(require('markdown-it-anchor'), {
// optional options object
})
That is, you pass the package to the .use
method, and also provide an options object.
Server-side HTML processing using a jQuery-like API
A completely different approach is to do custom server-side DOM processing using a jQuery-like API. The
Cheerio package is an implementation of most of the jQuery API that runs on Node.js and does not require a web browser. In other words, a rendering pipeline could convert Markdown to HTML, which is then processed using Cheerio to massage the generated HTML.
The
Mahabhuta package was developed for AkashaRender, and serves that exact purpose. The content management system supplies one or more libraries of HTML processing operations, using Mahabhuta to manage their execution. Instead of extending Markdown, Mahabhuta rewrites the HTML and can therefore be used with other content rendering packages.
For example you might want images to be surrounded by <figure>
and to use a <figcaption>
tag. Do you want to write all those tags by hand? One of the AkashaRender supports rewriting <img figure src="./img/content-rendering-process.png" caption="Content rendering process diagram">
into:
<figure>
<img src="./img/content-rendering-process.png"/>
<figcaption>Content rendering process diagram</figcaption>
</figure>
The figure
attribute triggers creation of the <figure>
wrapper. That function also converts the caption
attribute into a <figcaption>
tag.
Another Mahabhuta function in AkashaCMS takes empty links, like [](../07/mahabhuta-cli.html)
, or <a href="../07/mahabhuta-cli.html"></a>
, finds the title of the corresponding document, and uses the title as the anchor text of the link. This simplifies writing internal links, while ensuring the anchor text is automatically updated if the title of the referenced article changes.
Another Mahabhuta function in AkashaCMS converts <youtube-video-embed href="YouTube link">
into a responsively resizing YouTube video player. This is not Web Components, where the browser is coerced into recognizing the custom HTML tag hiding details with as a shadow DOM. Instead, the Mahabhuta function rewrites the custom tag into standard HTML.
A <partial file=name="partial-file.md">
is available which does what's expected. It locates the named file, performs any required rendering (such as Markdown to HTML), then inserts the HTML into the DOM.
In other words, Mahabhuta supports both custom rewriting of standard HTML tags, and expansion of custom HTML-like tags into standard HTML.
While Mahabhuta is primarily used in AkashaCMS, it was designed to be usable from any application.
To learn more about this package see:
http://akashacms.com/mahabhuta/toc.html
Summary
Programmers inventively come up with several ways to extend Markdown. The core requirement is for both Markdowns ease of use and the ability to generate custom HTML. Extra points goes to a solution which supports changing the precisely generated HTML as we need to tweak the presentation. The path of the lazy programmer is to reuse code wherever possible.
Let's consider the three approaches discussed above:
- With Markdoc - Does it really make sense to integrate Markdown with what looks like a template processing language? The creators of that package obviously felt it to be a good idea
- Markdown extensions - Will content written for one Markdown processor be compatible with another? Language extensions are nice, but create the opportunity for incompatibilities. Further, the result only applies to Markdown.
- Custom DOM processing - With this approach, Markdown documents are littered with custom HTML tags. The advantage is custom DOM processing handles any HTML file, not just rendered Markdown.