Robert Važan Aug 10, 2025

Code-behind with Markdown

Some of my projects are half code, half content. And the two are interleaved. I have data widgets scattered all over the content. There are many ways to organize such mixed codebase, each with its own tradeoffs. I have settled on code-behind with Markdown as the optimal solution.

Strategies for mixing content and code

There are several standard ways to interleave content and code:

Code in content: This is the PHP approach, now used in a lot of web development frameworks. Content is the outer file type, usually HTML, but you can embed code via special syntax. This works well if you are using standard frameworks that have solid IDE support, but it becomes very inconvenient for smaller or custom frameworks. Even when well supported, it works best when you have a little code in a lot of content.
Content in code: You can embed content in most programming languages in multi-line string literals. You can use format strings or string interpolation (like Python's f-strings) to embed variables in the content. Widget blocks can be placed between content blocks. The obvious disadvantage here is the loss of syntax highlightting and other editor support for the content. And the content is forced to conform to line length constraints of the code. Content in code works best when you have a little content scattered in a lot of code.
Content database: Some applications treat content the same way GUIs treat translatable string literals. Every piece of content is loaded from content repository, which can be a content directory in source repository or a separate user-editable content database. This way both content and code get their own file type and editor support. The downside is that there is now a whole separate content repository to manage with fragile ties to the code. This approach works best when writers and coders are not the same people.
Code-behind: In code-behind approach, every content file has an associated code file. The two files are usually placed next to each other and named the same except for the extension. The advantage is clean codebase with good editor support. The downside is that one page/window is now defined in two files linked only by name. Code-behind works best when both content and code is authored by the same person.

Since this article is about projects with balanced mix of content and code, only the last two options are relevant for this discussion. Of those, I favor code-behind, because my projects are all relatively small with single author.

Markdown templates

When the final output format is HTML, source format is usually either HTML, XML, or Markdown. I like Markdown, because its design is slanted more towards typical content. It can represent 99% of content more cleanly than HTML/XML. The catch with Markdown is that it does not have any obvious extension points, which could be used to specify where data goes in a Markdown template.

Since Markdown lets you embed HTML, you could use HTML/XML tags to mark locations where data should be embedded. HTML tags are unevenly supported in Markdown processors though. And they are unnecessarily verbose when all you need is to mention the ID of the data item to include.

I have instead opted to use standard Markdown syntax for image links. Image links in Markdown do not necessarily refer to actual images. Conceptually, they represent any self-contained content that can be rendered into HTML. Yes, I am stretching definition of Markdown images here a bit, but it's a clean and useful extension. To differentiate template placeholders from normal images, we will use custom URL schema, say x:.

Here's an example of what Markdown template looks like:

Paragraph of text with with ![inline widget](x:my-inline-widget).

![Block widget](x:my-block-widget)

Another paragraph of text with [custom URL resolver](x:my-link).

URLs with our custom x: schema refer to data or widgets from code-behind class. For example, x:my-data refers to data/widget with ID my-data.

The alt text in image links is optional. I actually leave it empty in my code, e.g. ![](x:my-widget). If present, Markdown processor could use it as figure caption or fallback text.

Markdown processor must be a bit smart about differentiating between inline and block images/widgets, so that block widgets are not unnecessarily wrapped in <p> elements, but that's ususally easy to implement.

Besides image links, the custom scheme can be used in regular links, which is useful when you want your code to generate just the URL. I find it particularly useful for internal links in projects that frequently rename pages without leaving redirects behind. The link is tied to code-behind class of the target page, so that renaming the class in any reasonable IDE renames links too. It spares me of a lot of trouble with broken links.

Code-behind classes

Every Markdown template has an associated code-benind class that includes:

Base class or interface: It marks the class as code-behind class and defines common API for code-behind classes.
Metadata: Title is usually pulled from Markdown template, but pages published on the Internet likely need additional metadata, notably stable slug, timestamp, breadcrumb name, and perhaps tags or a list of child pages. This metadata is returned by methods of the code-behind class.
Widgets: Code-behind class of course primarily defines widgets to be embedded in the page.
Data: Embeddable data is just a trivial case of a widget.
Mapping: Code-behind class must somehow associate keys in Markdown with data and widgets defined in the class. You could use method annotations to directly expose methods to Markdown, but I prefer to return an explicit mapping from data() method, usually constructed using utility builder class.

Note that there are no fields in the code-behind class. That's because code-behind class represents page location rather than page view and it is thus naturally a singleton. Page state, if you need it, must be represented separately.

You will also need some page discovery mechanism that enumerates all pages in your project, so that you can setup HTTP handlers for them. There are many ways to do this. I usually prefer to do a code scan, looking for anything inheriting from the base class. If your pages form a hierarchy and every page can enumerate its children, you can also walk the hierarchy, starting with homepage.

Markdown processor

Now that we have the code nicely structured, we need some way to translate it to HTML. Markdown library will do most of the heavy lifting. We just need to intercept processing of links and replace x: links with data pulled from the code-behind class.

All my content-heavy projects are in Java, so I use commonmark-java. It has a very convenient LinkProcessor class that does exactly what we need. I use it to convert all links using x: schema to CustomNode during parsing. I then pass the node tree to my implementation of AbstractVisitor to generate HTML. The visitor is largely identical to the built-in CoreHtmlNodeRenderer except it has support for my custom nodes.

Interestingly, LLMs seem to have some knowledge of CommonMark, so most of the required code is fairly easy to generate.

What about front matter?

Some Markdown parsers have support for front matter extension of Markdown, which is a block of YAML/JSON/TOML metadata you can put at the top of your Markdown file, for example:

---
slug: My_page
date: 2025-08-10
---

# Page title

Page content ...

Even though front matter is not flexible enough to mix code and data, it is a plausible alternative for some of the metadata in code-behind classes. It's however poorly supported in current Markdown editors and parsers. Since we need code-behind classes anyway, we can avoid a lot of compatibility problems by using them to structure our metadata too.