Markdown: Speed and Denormalisation

Monday, May 26th, 2014

Two weeks ago, I posted about an easy way to render Markdown content on a Django site using a template filter. The post generated an interesting couple of responses, which centred around the speed advantages of denormalisation. I suggest reading the first post for context around the comments below.

Speed...

You're now requiring the template layer to render your HTML for you, though. The template layer is already fairly slow and now you're adding a (potentially) huge burden to it, too. I suggest adding an _html field to your model and, on save, use a markdown library to turn your blog post into HTML which will be saved in that field. In your template, render that field instead of filtering your actual blog content field.Kenneth Love

The argument behind this suggestion is that template rendering is not a fast process, and adding a potentially heavyweight extra step to the rendering could cause unwanted overhead.

From a theoretical perspective, this makes sense. Why add the Markdown processing to the rendering step of our content? This simply means that we're regenerating HTML from our Markdown content for every user that loads the relevant page - an unnecessary repetition of work, and a potential slowdown from the perspective of the end user.

Let's take a look at a basic alternative implementation that follows the suggestion above.

The advantage of this method is that we only process our Markdown on model save, which is likely to be much less frequent than a page load (unless our blog is getting a sadly tiny amount of traffic). The user only needs to load the pre-generated HTML, removing any potential Markdown-induced waiting time in the template rendering step.

However, speed...

...requires denormalisation

Unless some profiling actually highlights this as a problem, I wouldn't bother myself. Adding an extra field with pre-rendered content is denormalising your DB, and so has all the problems associated with that (e.g. if you update the extra field on 'save', what if someone updates the markdown field using 'update', or from an external >process etc.) spookylukey

Denormalisation, in this case, means that we are storing redundant data in the database in order to gain a performance increase. This means increased storage, but in this case database size is not the main issue. As pointed out above, denormalisation can lead to inconsistency. Unless we are sure to use the save method every time we update our Markdown field, it is possible that our _html field may become out of sync with our Markdown content.

So which should I choose?

For the case of a simple blog post, I'm inclined to favour the second argument - against denormalisation - in the interests of practicality. Generating Markdown in the template layer does not lead to a noticeable slowdown of my blog, and so saving a separate _html field seems unnecessary.

In a more general sense, the trade-off between speed and data redundancy is much more complex and there are no universally 'right' answers - only answers that are right for a particular situation or implementation. In reality, the best way to make such a decision is to take a data-driven approach and benchmark. Unless we know that a certain data manipulation operation is a bottleneck, there is no advantage to storing the manipulated data. Without evidence, we are simply shooting in the dark. If we can show that there is a consistent, significant speed-up or resource saving from denormalisation, then we can consider whether the additional complexity is warranted.


Thoughts about denormalisation? Let me know in the comments or on Twitter.

comments powered by Disqus