Sunday, June 4, 2023
HomeHack The EntrepreneurThe 100% Markdown Expedition

The 100% Markdown Expedition

A snowy mountain peak at sunset

In June 2021, we decided to start converting the source code for MDN web docs from HTML into a format that would be easier for us to work with. The goal was to get 100% of our manually-written documentation converted to Markdown, and we really had a mountain of source code to climb for this particular expedition.

In this post, we’ll describe why we decided to migrate to Markdown, and the steps you can take that will help us on our mission.

We want to get all active content on MDN Web Docs to Markdown for several reasons. The top three reasons are:

Here is the tracking issue for this project on the translated content repository.

This section describes the tools you’ll need to participate in this project.

If you do not have git installed, you can follow the steps described on this getting started page.

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

If you are on Linux or macOS, you may already have Git. To check, open your terminal and run: git --version

On Windows, there are a couple of options:

We’re tracking source code and managing contributions on GitHub, so the following will be needed:

• A GitHub account.
• The GitHub CLI to follow the commands below. (Encouraged, but optional, i.e., if you are already comfortable using Git, you can accomplish all the same tasks without the need for the GitHub CLI.)

First, install nvm – https://github.com/nvm-sh/nvm#installing-and-updating or on Windows https://github.com/coreybutler/nvm-windows

Once all of the above is installed, install Nodejs version 16 with NVM:

This should output a Nodejs version number that is similar to v16.15.1.

You’ll need code and content from several repositories for this project, as listed below.

You only need to fork the translated-content repository. We will make direct clones of the other two repositories.

Clone the above repositories and your fork of translated-content as follows using the GitHub CLI:

You’ll also need to add some configuration via an .env file. In the root of the directory, create a new file called .env with the following contents:

I will touch on some specific commands here, but for detailed documentation, please check out the markdown repo’s README.

We maintain a list of documents that need to be converted to Markdown in this Google sheet. There is a worksheet for each language. The worksheets are sorted in the order of the number of documents to be converted in each language – from the lowest to the highest. You do not need to understand the language to do the conversion. As long as you are comfortable with Markdown and some HTML, you will be able to contribute.

NOTE: You can find a useful reference to the flavor of Markdown supported on MDN Web Docs. There are some customizations, but in general, it is based on GitHub flavoured Markdown.

On the translated-content repository go to the Issues tab and click on the “New issue” button. As mentioned in the introduction, there is a tracking issue for this work and so, it is good practice to reference the tracking issue in the issue you’ll create.

You will be presented with three options when you click the “New issue” button. For our purposes here, we will choose the “Open a blank issue” option. For the title of the issue, use something like, “chore: convert mozilla/firefox/releases for Spanish to Markdown”. In your description, you can add something like the following:

As part of the larger 100% Markdown project, I am converting the set of documents under mozilla/firefox/releases to Markdown.

NOTE: You will most likely be unable to a assign an issue to yourself. The best thing to do here is to mention the localization team member for the appropriate locale and ask them to assign the issue to you. For example, on GitHub you would add a comment like this: “Hey @mdn/yari-content-es I would like to work on this issue, please assign it to me. Thank you!”

You can find a list of teams here.

The tracking spreadsheet contains a couple of fields that you should update if you intend to work on speific items. The first item you need to add is your GitHub username and link the text to your GitHub profile. Secondly, set the status to “In progress”. In the issue column, paste a link to the issue you created in the previous step.

It is a common practice on projects that use Git and GitHub to follow a feature branch workflow. I therefore need to create a feature branch for the work on the translated-content repository. To do this, we will again use our issue as a reference.

Let’s say your issue was called ” chore: convert mozilla/firefox/releases for Spanish to Markdown” with an id of 8192. You will do the following at the root of the translated-content repository folder:

NOTE: The translated content repository is a very active repository. Before creating your feature branch, be sure to pull the latest from the remote using the command git pull upstream main

NOTE: In older version of Git, you will need to use git checkout -B 8192-chore-es-convert-firefox-release-docs-to-markdown.

The above command will create the feature branch and switch to it.

Now you are ready to do the conversion. The Markdown conversion tool has a couple of modes you can run it in:

You will almost always start with a dry run.

NOTE: Before running the command below, esnure that you are in the root of the markdown repository.

This is because the conversion tool will sometimes encounter situations where it does not know how to convert parts of the document. The markdown tool will produce a report with details of the errors encountered. For example:

The first line in the report states that the tool had a problem converting four instances of li.toggle. So, there are four list items with the class attribute set to toggle. In the larger report, there is this section:

The problem is therefore in the file /es/docs/Mozilla/Firefox/Releases/9. In this instance, we can ignore this as we will simply leave the HTML as is in the Markdown. This is sometimes needed as the HTML we need cannot be accurately represented in Markdown. The part you cannot see in the output above is this portion of the file:

If you do a search in the main content repo you will find lots of instances of this. In all those cases, you will see that the HTML is kept in place and this section is not converted to Markdown.

The next two problematic items are two dl or description list elements. These elements will require manual conversion using the guidelines in our documentation. The last item, the ol is actually related to the li.toggle issue. Those list items are wrapped by an ol and because the tool is not sure what to do with the list items, it is also complaining about the ordered list item.

Now that we understand what the problems are, we have two options. We can run the exact same command but this time use the replace mode or, we can use the keep mode. I am going to go ahead and run the command with replace. While the previous command did not actually write anything to the translated content repository, when run with replace it will create a new file called index.md with the converted Markdown and delete the index.html that resides in the same directory.

Following the guidelines from the report, I will have to pay particular attention to the following files post conversion:

After running the command, run the following at the root of the translated content repository folder, git status. This will show you a list of the changes made by the command. Depending on the number of files touched, the output can be verbose. The vital thing to keep an eye out for is that there are no changes to folders or files you did not expect.

Now that the conversion has been done, we need to review the syntax and see that the pages render correctly. This is where the content repo is going to come into play. As with the markdown repository, we also need to create a .env file at the root of the content folder.

With this in place we can start the development server and take a look at the pages in the browser. To start the server, run yarn start. You should see output like the following:

Go ahead and open http://localhost:5042 which will serve the homepage. To find the URL for one of the pages that was converted open up the Markdown file and look at the slug in the frontmatter. When you ran git status earlier, it would have printed out the file paths to the terminal window. The file path will show you exactly where to find the file, for example, files/es/mozilla/firefox/releases/1.5/index.md. Go ahead and open the file in your editor of choice.

In the frontmatter, you will find an entry like this:

To load the page in your browser, you will always prepend http://localhost:5042/es/docs/ to the slug. In other words, the final URL you will open in your browser will be http://localhost:5042/es/docs/Mozilla/Firefox/Releases/1.5. You can open the English version of the page in a separate tab to compare, but be aware that the content could be wildly different as you might have converted a page that has not been updated in some time.

What you want to look out for is anything in the page that looks like it is not rendering correctly. If you find something that looks incorrect, look at the Markdown file and see if you can find any syntax that looks incorrect or completely broken. It can be extremely useful to use a tool such as VSCode with a Markdown tool and Prettier installed.

Even if the rendered content looks good, do take a minute and skim over the generated Markdown and see if the linters bring up any possible errors.

NOTE: If you see code like this {{FirefoxSidebar}} this is a macro call. There is not a lot of documentation yet but, these macros come from KumaScript in Yari.

A couple of other things to keep in mind. When you run into an error, before you spend a lot of time trying to understand what exatly the problem is or how to fix it, do the following:

For example, I ran into an error where a page I loaded simply printed the following in the browser: Error: 500 on /es/docs/Mozilla/Firefox/Releases/2/Adding_feed_readers_to_Firefox/index.json: SyntaxError: Expected "u" or ["bfnrt\\\\/] but "_" found.. I narrowed it down to the following piece of code inside the Markdown:

In French it seems that they removed the page, but when I looked in zh-tw it looks like they simply removed this macro call. I opted for the latter and just removed the macro call. This solved the problem and the page rendered correctly. Once you have gone through all of the files you converted it is time to open a pull request.

Start by getting all your changes ready for committing:

If you run git status now you will see something like the following:

Commit your changes:

Finally you need to push the changes to GitHub so we can open the pull request:

You can now head over to the translated content repository on GitHub where you should see a banner that asks whether you want to open a pull request. Click the “Compare and pull button” and look over your changes on the next page to ensure nothing surprises.

At this point, you can also add some more information and context around the pull request in the description box. It is also critical that you add a line as follows, “Fix #8192”. Substitute the number with the number of the issue you created earlier. The reason we do this is so that we link the issue and the pull request. What will also happen is, once the pull request is merged, GitHub will automatically close the issue.

Once you are satisfied with the changes as well as your description, go ahead and click the button to open the pull request. At this stage GitHub will auto-assign someone from the appropriate localization team to review your pull request. You can now sit back and wait for feedback. Once you receive feedback, address any changes requested by the reviewer and update your pull request.

Once you are both satisfied with the end result, the pull request will be merged and you will have helped us get a little bit closer to 100% Markdown. Thank you! One final step remains though. Open the spreadsheet and update the relevant rows with a link to the pull request, and update the status to “In review”.

Once the pull request has been merged, remember to come back and update the status to done.

If you run into any problems and have questions, please join our MDN Web Docs channel on Matrix.

https://matrix.to/#/#mdn:mozilla.org

 

Photo by Cristian Grecu on Unsplash

I am a Mozillian, an evangelist, writer and developer with a passion for open source, web standards and accessibility. I have been so involved with these worlds that I feel they have become a part of me and cannot foresee a future where these topics will not be a part of my daily life.

More articles by Schalk Neethling…

Sign up for the Mozilla Developer Newsletter:

If you haven’t previously confirmed a subscription to a Mozilla-related newsletter you may have to do so. Please check your inbox or your spam filter for an email from us.

Except where otherwise noted, content on this site is licensed
under the
Creative Commons Attribution Share-Alike License v3.0
or any later version.

RELATED ARTICLES

Most Popular

Recent Comments