I’m a huge fan of automation. If it’s worth doing two or more times, it’s worth placing in a script so that you can make it happen at the push of a button.
I’m also a huge fan of documentation. I’m obsessive about describing our processes and actions, and writing those descriptions down so that we can refer back to them.
But to be honest, documentation is a time-consuming task and almost always assumes a low priority in a work project. When deadlines get tight, documentation is the first task that slips. Yes, people should document as they work, but PhD students should write their dissertation as they perform their research, and very very few people do either.
In recent years a number of tools have been created to make attractive documentation more easily and to automate the publication of said documentation. They are Sphinx, Bootstrap, and GitHub Pages, and we have used them to create our documentation.
Sphinx is an documentation generator tool that converts marked-up plaintext files into properly formatted HTML, PDF, EPub or other documents. It has a very close relationship with the Python community; a number of Python projects created their documentation using Sphinx, including the Python language project. (It goes without saying that it’s almost entirely written in Python.) One awesome feature of Sphinx is that it can take docstrings within the source code and generate nice documentation for every routine in the source code.
Bootstrap is a front-end framework that makes web development easier. It was started at Twitter by a couple of front-end developers and designers there, but has since spun off as the creators left for other companies. I cannot tell you how helpful this framework has been to me as someone who is not a front-end developer.
Both of these tools can be customized, but a lot of people don’t. They create default products that are attractive in their own right, but the downside is that a lot of websites start looking the same. We wanted to create documentation that allowed us to use Sphinx and display it using a Bootstrap-derived framework. Ryan Roemer has created a theme called Sphinx Bootstrap that has worked very well for us.
The published HTML pages need a server from which they can be hosted, and GitHub Pages hosts project and author pages. It works really well for us because we already use GitHub to backup and version control our software. You can customize the workflow so that the pages are automatically updated and published if and when the code documentation changes. This is what we did.
Creating the documentation
We created the documentation in a number of ways. The code for our API server isn’t publicly available, so we wrote the documentation by hand. It’s not as daunting as you might think, and there’s a lot of duplicated markup after a few pages, but there were something like 20-odd pages and it was a lot of work.
For the API client, we had some pages that were hand-built but all of our code references were constructed using docstrings in the source code. It took some effort to write descriptive docstrings, but if you comment as you code it shouldn’t add more time.
Automating the documentation
When it comes to automating our documentation, we have a problem. Git works with branches, and GitHub Pages are hosted from a specialized branch of the code repository (gh-pages). Our source text — and our source code — reside on another branch (master). It’s possible to switch branches, pull in files from another branch, render the new HTML, delete the working files, commit and push the new HTML to GitHub Pages, and switch back to the original branch. Those tasks are error-prone if they’re repeated by hand, so they’re perfect for automation.
I borrowed rather heavily from a site called Nikhilism, so I’m definitely standing on the shoulders of others. I liked the pattern and wanted to remember it, so I’m publishing it here.
When we create our GitHub Pages for the project, we create a new branch and do some one-time setup. You must make sure that any changes to the working copy in your other branches have been committed or reverted!! Here are the commands:
$ cd path/to/repo $ git checkout --orphan gh-pages $ git rm -rf . $ echo "Blah blah blah" > index.html $ touch .nojekyll $ git add . $ git commit -m "Initial commit to gh-pages" $ git push origin gh-pages
Assuming that we’re starting in the master branch, we create a new branch that has no ties to it (an orphan branch). All of those files still exist in the branch, so we get rid of them all. (Don’t worry, those files under version control in the master branch are still alive and well over there, but if you have ignored any files in the master branch those same files need to be ignored in the gh-pages branch.) We add an index.html file with some dummy content and add a .nojekyll file. The .nojekyll file is important for reasons we’ll discuss later. We tell Git that those changes are ready for commit (a ‘stage’), and then we commit and push the pages to the GitHub repository.
If we haven’t already, we start the setup on Sphinx with the command:
We answer the questions from the quickstart and set up an initial configuration (make sure to answer yes on autodoc so that we can use docstrings to document our code!). Sphinx uses make to generate the documentation, so we edit the Makefile provided. We want to add a new variable called GH_PAGES_SOURCES which will contain files and directories that contain the documentation sources. Don’t worry about including pathnames; we’ll have to copy all of these files and directories to the top-level directory anyway.
GH_PAGES_SOURCES = docs/source soccermetrics docs/Makefile
Now we add a target gh-pages and associate the following commands with it:
gh-pages: git checkout gh-pages rm -rf build _sources _static _modules git checkout master $(GH_PAGES_SOURCES) git reset HEAD make -f docs/Makefile html mv -fv build/html/* ./ rm -rf $(GH_PAGES_SOURCES) build git add -A git commit -m "Generated gh-pages for `git log master -1 --pretty=short \ --abbrev-commit`" && git push origin gh-pages ; git checkout master
Here’s what the commands do:
- Checkout the documentation branch (gh-pages).
- Remove all of the old pages. At minimum there will be a build and _sources directory, but there might be a _static directory if you’ve customized the layout or imported your own images and a _modules directory if you’re using autodoc to document your code.
- We pull into gh-pages the source files from the master branch.
- We reset the pointer in the gh-pages branch to the previous commit (the HEAD). The files imported from master are still there.
- Sphinx compiles the new HTML from source.
- Move the HTML files to the top-level.
- Remove the build and source files and directories, they aren’t needed anymore.
- Stage the new HTML files as well as the new _sources, _static, and _modules directories if they exist.
- Commit the changes and autogenerate a commit message on gh-pages, which is the first line of the most recent commit message on master. Push to GitHub and we’re done.
Earlier I wrote about that .nojekyll file. Why is that file so important? Well the reason is that GitHub Pages are rendered using Jekyll, which is a really neat document generator in its own right (created by the founders of GitHub). Jekyll ignores directories whose names start with underscores when it renders HTML, which is a problem because in Sphinx, output directory names start with underscores. We turn off Jekyll rendering by including the .nojekyll file in the top-level of the directory so that we can use custom layouts, styles, and graphics.
The resulting documents look great and provide a starting point for further customization. We can make changes to the documentation and deploy them online with just a few keystrokes.
Hope this walkthrough has been useful. It’ll definitely be useful to us as we release future software products.