Keep Your Blog Consistent With Jekyll and Jest

Cameron McEfee /

In a blog like ours, where each author’s voice is an important part of the content, the task of maintaining a consistent style and tone is a challenge that grows with the size of our company. To reduce the load on the people that typically serve as editors, I’ve written a suite of copy-editor-style tests with Jest for our blog, which is generated by Jekyll.

If you’d like to try the same thing for your blog or statically generated site, this post covers the high-level problems that need to be solved and my take on how to address them. While we’re biased toward Jekyll and Jest, the same concepts can be used with other generators and testing frameworks.

Assumptions

We’ll assume your blog post workflow looks like this:

Create a new branch > write a new post > make a pull request > the tests pass > merge and deploy

We’ll also assume you have Ruby and Node.js installed.

Step 1: Getting set up

I’ve gone ahead and created a boilerplate project for you over at cameronmcefee/jekyll-jest-example. You can work within the 1-getting-started folder, or copy it somewhere if you’d like to build on top of it.

Download the project boilerplate

This is the output from jekyll new <sitename>, with one tweak, which is that all the template files are now in the src folder, and _config.yml has been updated with source: src to ensure it builds correctly. By doing this, we prevent Jekyll from processing the node_modules and __tests__ folders, which would slow things down dramatically. This step isn’t necessary, but if you don’t keep your Jekyll source in a subfolder, be sure to ignore those folders in your config file (and all other non-essential files while you’re at it).

This folder also includes a package.json with Jest and Babel set as dependencies, a shell script in bin/test, which will let us do a little logic before out tests, and a hello world test in the __tests__ folder that we can use to prove everything works.

cd into the source for step 1 and run:

$ bundle install
$ npm install
$ bin/test

If everything went well, you should see something like this:

$ bin/test
Configuration file: _config.yml
            Source: src
       Destination: /Users/cameronmcefee/github/jekyll-jest-example/1-getting-started/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
                    done in 0.553 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
 PASS  __tests__/frontmatter.js
  Post frontmatter
    ✓ successfully runs Jest (5ms)

Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        1.073s
Ran all test suites.

Problems to solve

Writing a test suite for a statically generated blog requires a few special considerations, which we’ll cover in this post.

  • We must be able to test the same data Jekyll builds from.

  • We must be able to know which posts fail tests.

  • We must be able to test post data to ensure builds work.

  • We must be able to lint the content of posts to enforce our style guide.

  • We must be able to make some posts ignore tests.

  • We must only test posts that are modified in the current branch.

Step 2: Expose Jekyll data

Our first task is to ensure we can test against the same data Jekyll uses. This is important because Jekyll assembles data from a number of sources. We want to be sure our data is the same data that is in the final output.

For example, post URLs can be defined in the config file, in a post’s permalink value, or by the post filename. While we could comprehensively implement a system that checks all those things, it is much easier to have Jekyll dump the data we care about into a JSON file that we can then import into the Jest environment.

If you are familiar with Jekyll and Liquid templates, you might be tempted to simply dump {{ site | jsonify }}. While this does work, in a blog of any decent size, you will end up with a text file many megabytes large because every page and post includes its full rendered output. You’d essentially be dumping the entire site into one file.

Instead, we’ll create a new Jekll template that we’ll use to output only the data we care about, in this case, post frontmatter. While it’s a little tedious — and frankly, ugly — this gives us fine-grained control over the content we can work with.

Create _jest.json in the src directory and add the following content:

---
---
{
  "posts": [
    {% for post in site.posts %}{
      "title": "{{ post.title | smartify }}",
      "url": "{{ post.url }}",
      "path": "{{ post.path }}",
      "content": {{ post.content | strip_html | jsonify }}
    }{% unless forloop.last %},{% endunless %}{% endfor %}
  ]
}

Important things to note:

  • The triple dashes are required to make Jekyll process the file as a template.
  • The content is run through the jsonify filter to ensure that posts with quotes don’t break the JSON file.

What you put in this file is up to you, depending on what you want to test. Dream big. You’ll notice that the filename starts with an underscore. That’s because Jekyll ignores files that start with underscores by default, so this template won’t render in production. To make our JSON file render in our test environment, we need to tell Jekyll to include it.

Create a new file called _config.test.yml, and add the following:

include:
  - _jest.json

To make Jekyll apply this config when we run the tests, we need to add it to the config list in bin/test. Update line 6 of the config like this:

bundle exec jekyll build --config _config.yml,_config.test.yml

Next, we’ll load this data into our Jest environment. We’ll need to read a file, so add fs-extra to your package.json file:

npm install --save-dev fs-extra

Create a file at lib/helpers.js and add:

import { readJSON } from 'fs-extra';

export const getJekyllData = () => {
  return Promise.all([
    // Fetch site metadata produced by Jekyll so we can test against what Jekyll
    // is using to render.
    readJSON('_site/_jest.json')
  ]).then(([site_meta]) => {
    // Get initial data
    let site = site_meta;

    return site;
  });
};

The use of Promise.all here isn’t technically necessary, but I’ve done it so it’s easy to plug in other data fetchers at this stage.

Our final step is to load this data into our tests. Open up __tests__/frontmatter.js and use the new helper to load the Jekyll content into the context.

import { getJekyllData } from '../lib/helpers';

describe('Post frontmatter', function() {
  beforeAll(() => {
    return getJekyllData().then(site => {
      this.site = site;
    });
  });

  test('post urls must end with .html', () => {
    const { posts } = this.site;
    posts.forEach(post => {
      expect(post.url).toMatch(/.html$/i);
    });
  });
});

You’ll notice I’ve also updated our test to be a little more relevant. It now loops through all posts, checking the condition we’ve set.

Step 3: Report which posts fail

If you’ve run the code so far, you’ll notice that the tests fail as expected, but Jest doesn’t tell you which post failed. When working on a single post, you might know from the context which file is the culprit. However, when editing multiple, it’s nice to know specifically which posts fail the tests.

While Jest lets you make custom matchers, if you want to do something as simple as a custom message for an existing matcher, there isn’t an out-of-the-box solution. Fortunately, when a Jest assertion fails, it throws a formatted error. To make our own custom message, all we need to do is catch that error and add our content to it.

To lib/helpers.js add the following method:

export const printOnFail = (message, fn) => {
  try {
    fn();
  } catch (e) {
    e.message = `${message}\n\n${e.message}`;
    throw e;
  }
};

This will catch any error that occurs within it (in this case the Jest error) and append a given message.

In __tests__/frontmatter.js, be sure to import printOnFail and then update the expect line to use the new method.

test('post urls must end with .html', () => {
  const { posts } = this.site;
  posts.forEach(post => {
    printOnFail(post.path, () => {
      expect(post.url).toMatch(/.html$/i);
    });
  });
});

Note that we’re passing post.path as the message. If you add a post that fails the test (adding permalink: /this-post-will-fail/ to the frontmatter of a post will do it), you’ll see something like this:

$ bin/test
Configuration file: _config.yml
Configuration file: _config.test.yml
            Source: src
       Destination: /Users/cameronmcefee/github/jekyll-jest-example/3-report-which-posts-fail/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
                    done in 1.072 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
 FAIL  __tests__/frontmatter.js
  Post frontmatter
    ✕ post urls must end with .html (7ms)

  ● Post frontmatter › post urls must end with .html

    _posts/2017-10-04-this-post-will-fail.markdown

    expect(received).toMatch(expected)

    Expected value to match:
      /.html$/i
    Received:
      "/this-post-will-fail/"

      at printOnFail (lib/helpers.js:20:11)
      at __tests__/frontmatter.js:13:32
      at Array.forEach (native)
      at Object.<anonymous> (__tests__/frontmatter.js:12:11)
          at Promise (<anonymous>)
          at <anonymous>

Test Suites: 1 failed, 1 total
Tests:       1 failed, 1 total
Snapshots:   0 total
Time:        3.279s
Ran all test suites.

Step 4: Enforce a style guide

Jest can, by default, check regexes, so you can do things like check that em dashes are used instead of double dashes -- or ellipses are used instead of triple dots ....

However, the current tools will just tell you that these things exist somewhere within the entire post. That’s not super helpful for quickly finding errors, especially in cases where a specific syntax may be valid in some cases but not others. We’ll now build a custom Jest matcher that will allow us to print just the offending paragraph and highlight the failing string.

While my earlier examples are good representations of style guide rules, they’re hard to see in a demo. Instead, let’s write a test that prevents us from using one of the most overused phrases in tech blog posts: “We are <adjective> to announce”.

First, we make our matcher by creating a file at lib/matchers.js and add to it:

export const toMatchLint = function(received, argument) {
  // Make a regex that matches our lint regexp, plus the rest of the paragraph
  // in which it exists.
  const lineMatch = new RegExp(`(^|\\n).*?(${argument.source}).*?(\\n|$)`);
  const match = lineMatch.exec(received);
  const pass = !!match;

  const message = pass ? () => {
    // this happens when expect(input).not.toMatchLint fails
    const line = match[0];
    const matchedString = argument.exec(line)[0];
    const highlightedMatch = this.utils
      // Use Jest's built in highlight util to color the string.
      .printReceived(`[${matchedString}]`)
      // Replace the quotes that `printReceived` adds
      .replace(/"/g, '');

    return line.replace(argument, highlightedMatch);
  } : () => {
    // this happens when expect(input).toMatchLint fails.
    return `expected post to match ${argument}`
  };

  return { actual: received, message, pass };
};

Now when we use expect(input).not.toMatchLint(regexp), Jest will print the paragraph that failed, with the failing phrase highlighted.

To apply it to our test, let’s create a new style guide test file in __tests__/styleguide.js.

import { printOnFail, getJekyllData } from '../lib/helpers';
import * as matchers from '../lib/matchers';

expect.extend(matchers);

describe('Style guide', function() {
  beforeAll(() => {
    return getJekyllData().then(site => {
      this.site = site;
    });
  });

  test('avoid use of "<word> to announce" in posts', () => {
    const { posts } = this.site;
    posts.forEach(post => {
      printOnFail(post.path, () => {
        expect(post.content).not.toMatchLint(/([a-z]+) to announce/gi);
      });
    });
  });
});

To try our new test out, edit one of the posts to start with “We are excited to announce” and watch with pleasure as Jest rejects your ubiquitous choice of language.

$ bin/test
Configuration file: _config.yml
Configuration file: _config.test.yml
            Source: src
       Destination: /Users/cameronmcefee/github/jekyll-jest-example/4-enforce-a-style-guides/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
                    done in 0.519 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
 FAIL  __tests__/styleguide.js
  ● Style guide › avoid use of "<word> to announce" in posts

    _posts/2017-10-04-this-post-will-fail.markdown

    We are [excited to announce], today you’ll find this post in your _posts directory, Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run jekyll serve, which launches a web server and auto-regenerates your site when a file is updated.


      at printOnFail (lib/helpers.js:20:11)
      at __tests__/styleguide.js:16:32
      at Array.forEach (native)
      at Object.<anonymous> (__tests__/styleguide.js:15:11)
          at Promise (<anonymous>)
          at <anonymous>

 PASS  __tests__/frontmatter.js

Test Suites: 1 failed, 1 passed, 2 total
Tests:       1 failed, 1 passed, 2 total
Snapshots:   0 total
Time:        1.724s, estimated 2s
Ran all test suites.

Step 5: Selectively test posts

The last step, selectively testing posts, may be the most important. In a given pull request, we probably prefer to run the tests against only the posts that were added or modified. Likewise, we probably won’t bother to update our old blog posts, so we need a way to tell Jest to ignore them even if we make minor edits to them in the future.

We have three tasks to address:

  • Tell Jekyll to ignore posts when they contain ignore_posts: true in their frontmatter
  • Add way to only run tests on modified posts
  • Configure CI to support modified post tests

First, let’s update _jest.json to pass along ignore_tests if it exists in a post.

---
---
{
  "posts": [
    {% for post in site.posts %}{
      {% if post.ignore_tests %}"ignore_tests": "{{ post.ignore_tests }}",{% endif %}
      "title": "{{ post.title | smartify }}",
      "url": "{{ post.url }}",
      "path": "{{ post.path }}",
      "content": {{ post.content | strip_html | jsonify }}
    }{% unless forloop.last %},{% endunless %}{% endfor %}
  ]
}

Next, we’ll update bin/test and use git to fetch the changed files.

#!/bin/bash
# usage: bin/test
#
# Run tests

if [ -z ${var+x} ]; then
  # If we don't have changed files from Travis, get them ourselves

  # Get what has changed since the mergebase for this branch.
  MODIFIED=$(git diff --name-only origin/master...HEAD -- src)

  # Add unstaged changes too.
  UNSTAGED+=$(git diff --name-only -- src)

  # Make a unique list of files
  export CHANGED_FILES=$(echo "$MODIFIED $UNSTAGED" | sort -u)
fi

echo "Tests that only consider modified files will run against:"
echo "$CHANGED_FILES"

bundle exec jekyll build --config _config.yml,_config.test.yml --drafts

./node_modules/.bin/jest $@

Now that we have a list of changed files saved to the CHANGED_FILES environment variable, we can use that to filter our post list when we test.

We need to mutate some arrays, so let’s take a shortcut and use lodash:

$ npm install --save-dev lodash

Update lib/helpers.js with a method that will let us filter out modified posts.

import { readJSON } from 'fs-extra';
import _ from 'lodash';

export const getJekyllData = () => {
  return Promise.all([
    // Fetch site metadata produced by Jekyll so we can test against what Jekyll
    // is using to render.
    readJSON('_site/_jest.json')
  ]).then(([site_meta]) => {
    // Get initial data
    let site = site_meta;

    // Filter the post list to only include files that are different from master
    //
    // Returns an array
    site.posts.modified = filterModifiedFiles;

    return site;
  });
};

// Filter an array of jekyll page objects to include only those that are
// different from master.
//
// Returns an array
const filterModifiedFiles = function() {
  const gitPaths = process.env.CHANGED_FILES
    .split(/\s/)
    .map(path => path.replace('src/', ''));
  const modifiedFilePaths = _.intersection(
    this.map(file => file.path),
    _.compact(gitPaths)
  );
  const modifiedFiles = _.filter(this, file => {
    return modifiedFilePaths.includes(file.path) && !file.ignore_tests;
  });
  return modifiedFiles;
};

export const printOnFail = (message, fn) => {
  try {
    fn();
  } catch (e) {
    e.message = `${message}\n\n${e.message}`;
    throw e;
  }
};

In this, we’ve imported lodash, made a new method filterModifiedFiles that returns the intersection of the posts array and the modified files array and omits any files that include an ignore_tests value, and attached this method to the posts array so we can chain it like posts.modified().forEach.

All that’s left is to use the new helper in our tests:

test('post urls must end with .html', () => {
  const { posts } = this.site;
  posts.modified().forEach(post => {
    printOnFail(post.path, () => {
      expect(post.url).toMatch(/.html$/i);
    });
  });
});

Go forth and nitpick

While I omitted these topics to keep this post a reasonable length, if you wish to further enhance the system, you may also want to consider stripping out code blocks so they don’t trigger tests, allowing posts to include an array of specific tests to ignore, or adding a heavy-duty style guide checker like Vale.

Need more ideas for tests? Here are a few of my favorites that we use on our blog:

  • Ensure posts do not include <adjective> to announce
  • Ensure sentences do not begin with Today, we
  • Ensure posts do not contain more than one emoji or exclamation point
  • Ensure meta content does not exceed recommended character counts
  • Ensure authors and posts have consistent metadata
  • Ensure categories are used consistently
  • Ensure that images are reasonable sizes

That’s it, we’re done. Hook your blog up to your CI service and watch as the test do the tedius work while your editors focus on helping your authors shape their stories.