kristianfreeman.com

The big bearblog syntax highlighting hack

While writing my post about Astro's content system, I realized that bearblog doesn't have syntax highlighting support for a lot of newer languages and frameworks.

For a few hours, I had actually added a notice:

CleanShot 2024-10-07 at 13

An Astro code block (raw version, seen below) doesn't work as expected:

```astro
---
const { slug } = Astro.params
---

The current slug is {slug}

It renders, but it doesn't have any highlighting.

bearblog uses pygmentity, which is serviceable, but missing a lot of modern stuff.

I wanted to see if I could get custom syntax highlighting working on my site - by side-stepping pygmentify entirely and doing it by hand on the client-side.

And it worked! In fact, the above Markdown snippet is rendered with my custom solution. Let's look at how it works.

Shiki

Shiki is a modern syntax highlighting JS library that I first noticed in Astro's docs. It looks great. And it has theme support, meaning I can use my beloved Catppuccin Mocha without having to come up with a bunch of custom CSS selectors.

It has the ability to be used via CDN, too. Here's the example they give on the site:

<body>
  <div id="foo"></div>

  <script type="module">
    // be sure to specify the exact version
    import { codeToHtml } from 'https://esm.sh/[email protected]'
    // or
    // import { codeToHtml } from 'https://esm.run/[email protected]'

    const foo = document.getElementById('foo')
    foo.innerHTML = await codeToHtml('console.log("Hi, Shiki on CDN :)")', {
      lang: 'js',
      theme: 'rose-pine'
    })
  </script>
</body>

Basically, you can import it as an ESM, grab a given element, and replace the contents of it with transformed code passed through Shiki.

This gives us a great starting point. We need to grab any code in our blog post on page load, pass it through Shiki, and then replace the code with the new syntax highlighted version.

Implementation

Two problems quickly emerged as I tried to implement this:

  1. The code blocks generated by bearblog have no language attached to them. If you specify a code block as "markdown", any indication that the code was Markdown is stripped away by the time the client loads the page.
  2. The highlighted content doesn't "look" like the original code. A python snippet isn't Python anymore, it's HTML styled to look like Python.

That means we have to sidestep the entire code highlighting part of bearblog. Which is annoying, but certainly doable.

Instead, we can just provide a pre element inside of HTML, and give it a data attribute of language. pre elements retain the spacing of the text content inside, so it looks like code, and gets parsed by Shiki like it too!

Here's a very meta example of the first snippet in this post - how it actually appears in the raw post content. It's a pre with class shiki-highlight, and data-language set to Markdown:

<pre class="shiki-highlight" data-language="markdown">```astro
---
const { slug } = Astro.params
---
<p>The current slug is {slug}</p></pre>

Now, we1 can write some custom JavaScript to find all instances of pre.shiki-highlight, and do the following:

  1. Transform the content through Shiki, and put it in a new pre tag with all the right formatting.
  2. Take the original content and wrap it with noscript - that way, if someone has JS disabled, they still see code.

The full snippet, including comments, is included below:

<script type="module">
import { codeToHtml } from 'https://esm.sh/[email protected]';

const codeBlocks = document.querySelectorAll("pre.shiki-highlight");

Array.from(codeBlocks).forEach(async el => {
  // Create a noscript element
  const noscript = document.createElement('noscript');
  
  // Clone the original div into the noscript element as a fallback
  const clone = el.cloneNode(true);
  noscript.appendChild(clone);
  
  // Insert noscript after the current element
  el.parentNode.insertBefore(noscript, el.nextSibling);
  
  // Generate rendered HTML (which might already include a <pre> from Shiki)
  const transformedHtml = await codeToHtml(el.innerText, {
    lang: el.dataset.language,
    theme: 'catppuccin-mocha',
  });
  
  // Parse the resulting HTML string into a DOM node (to get the inner <pre>)
  const tempContainer = document.createElement('div');
  tempContainer.innerHTML = transformedHtml;
  
  // Grab the <pre> from the generated HTML (Shiki returns <pre> with code already)
  const generatedPre = tempContainer.querySelector('pre');
  
  // Insert the new <pre> with its styles and classes after the noscript element
  if (generatedPre) {
      generatedPre.style["background-color"] = "var(--code-background-color)";
      el.parentNode.insertBefore(generatedPre, noscript.nextSibling);
  }
  
  // Hide the original pre
  el.style.display = 'none';
});
</script>

Issues

The main issue with this approach is that HTML still has to be escaped. .astro files are a combination of JavaScript and HTML. Any HTML inside of a pre tag gets evaluated as HTML, meaning it can quickly spiral out of control and try and render that HTML inside of your blog post! Uh-oh. Instead, we have to escape any HTML in the raw Markdown of our post before it even gets to Shiki on the client.

The second issue: this is a lot of work, and pretty brittle. 99% of the code samples on this site are pushed through Pygmentify with little issue, and the GPT-generated replacements I worked up for Pygmentify to render Catppuccin Mocha colors are good enough.

For newer file formats, this hack is a solution. But it would be better to have better support on bearblog's server for other highlighting solutions. I opened a feature suggestion to begin the convo about improving the syntax highlighting situation in bearblog. It would be even cooler if we could just get Shiki or another newer engine built into Bearblog, with the ability to switch to it with a single click.

I have noticed that Shiki has the ability to run inside of Cloudflare Workers, meaning that there could be a cool solution where the client doesn't even know or care about Shiki - the code to transform the HTML and syntax highlight it could happen on the edge, on the way to the reader. I don't see any particular advantage to implementing it there yet - but if we could turn off syntax highlighting and pass the raw Markdown block through the server to the client, this could be an interesting solution.

  1. OK, it was me and ChatGPT 🤷↩

#meta #webdev