PyTorch Lightning Developer Blog

PyTorch Lightning is a lightweight machine learning framework that handles most of the engineering work, leaving you to focus on the science. Check it out: pytorchlightning.ai

Follow publication

Best Practices for Publishing PyTorch Lightning Tutorial Notebooks

Light-weighted fully reproducible rich notebook CI/CD system

5 min readJul 12, 2021

--

Illustration with notebooks

IPython notebooks are convenient for presenting code snippets and complete Markdowns descriptions such as title hierarchy, lists, equations, images, and more.

Unfortunately, developing notebooks introduces several difficulties:
1. Each cell can be called independently and therefore executed out of order.
2. JSON under the hood, so it is heavy for version control systems.
3. Reviewing Notebooks with Github is hard.

You can easily find some interesting notebooks on the internet, but you will often find that they are outdated, that some requirements are missing, or that cells have to be called in a specific order.

We have faced these challenges when designing our tutorials in the past, and this post outlines some of the PyTorch Lightning Team’s Best Practices for Lightning Example Development…

Disclaimer: We are not the first to address the above mentioned limitations of Jupyter notebooks, and we do not claim that we discovered anything ground-breaking, just want to share our fully automated solution based on existing tools — one for each step of CI/CD workflow.

Lightning Best Practices for Tutorial Development

Illustration for thinking about main principles

Our main goal is to combine the purity and lightness of scripts suitable for versioning with a visually rich layout for documentation.

This seems intuitive as you expect to perform some transformation from script code to ipython notebooks; for example, Keras builds its own engine for this purpose.

10 Best Practice Design Principles

We created a tutorials publication repo as a staging place for further integration with the main PytorchLightning documentation.

  1. Separating our notebook examples from the main Lightning repository keeps installing Lightning minimal and cleanly separates the notebook’s logic from PL’s package CI/CD.
  2. Focusing on python scripts as the lighter notebook format — keeping the repository light-weighted without eventually exploding git history
  3. Notebooks are generated from these example scripts as needed in a separate publication branch; this branch can be recreated from the main one with the latest notebooks.
  4. Using git LFS, we save space as generated/rendered identical illustrations usually yield in different bytecode (significant code changes with almost no visual differences), so we place notebooks aside and pulled on redemand for particular commit if requested.
  5. All updates are executed in a differentiable fashion, so only updated notebooks are regenerated, which prevents a long execution of large notebook collections.
  6. All scripts/notebooks are tested to be fully executable with specified requirements (and pulled data if needed) before merging them into the collection.
  7. Save executions details (package versions, etc.) from each generation/rendering to be reproduced anytime soon.
  8. Preserve a unified format — adding a standard header with mandatory information (e.g. title, author, requirements, etc.) and warn community welcome footer
  9. Mount the tutorial repository as a git submodule to the main PL repository for clarity
  10. Generate a documentation page from a notebook with all used formatting and illustrations (description and produced outputs)

Main Building Blocks & Technologies

Continues Integration (CI) ensures that the given script can be converted to a notebook that can be rendered within the documentation, and all code is valid/executable. Continued Deployment (CD) performs the script to notebook conversion, executes notebook to get all outputs, and renders it a documentation page.

For each of these particular steps, we used exiting tooling to keep CI/CD simple and easily replaceable block-by-block if needed:

The steps in CI and CD are quite similar, and mainly they differ in the execution order, plus CI runs testing, and CD runs execution.

So let's show these steps with sample command calls.

Comparison of CI and CD steps.

Orchestration

In principle, you can use any CI/CD platform. We started with native GitHub actions to write the initial orchestration and synchronization between CI and CD.

We run the CI steps on each PR, which verify that changes in edited scripts are valid. We then trigger the publication CD workflow for each merged PR (commit to the main branch), updating related notebooks and committing changes.

Snapshot from the repo and interaction between the main branch with scripts and publication branch including generated notebooks; moreover, the GH-pages branch presents the latest collection as documentation.

Later we created the main CI and CD with GPU instances…

How to Contribute a Lightning Example in 2 Steps

We welcome the community to contribute a wide range of PyTorch applications and examples, as well as deep learning courses.

Some benefits of publishing your notebooks with us come from our main principles. The code is easy to maintain (it complies with all Python formatting rules) and it is fully reproducible, and all outputs are authentic.

As we want to boost open-source sharing, we aim at keeping the contribution is straightforward.

Step 1: Convert your tutorial Script or Notebook

You can start with your existing notebooks and convert them to scripts with

jupytext --set-formats ipynb,py:percent my-notebook.ipynb

Or write your script from the ground up — see our template.

Step 2: Add a Lightning Tutorial Metafile

The second step is adding metafile with some extra details such as Title, Author, extra dependencies for this notebooks if needed, and defining if your notebook shall run only on GPU or basic CPU. See the sample config file below:

title: How to write a PyTorch Lightning tutorial
author: PL team
created: 2021–06–15
updated: 2021–06–17
license: CC
description: |
This is a template to show how to contribute a tutorial.
requirements:
— matplotlib
accelerator:
— CPU
— GPU

And there you go! 🚀
You create a script you want to share, create a PR, and set it on track to publish with us… ⚡️

Illustration for publishing a notebook with our CD.

Have you enjoyed this story? Stay tuned, and follow me to learn more!

About the Author

Jirka Borovec has been working in Machine learning and Data science for several years in a few different IT companies. In particular, he enjoys exploring interesting world problems and solving them with state-of-the-art techniques. In addition, he developed several open-source python packages and actively participating in other well-known projects.

--

--

PyTorch Lightning Developer Blog
PyTorch Lightning Developer Blog

PyTorch Lightning is a lightweight machine learning framework that handles most of the engineering work, leaving you to focus on the science. Check it out: pytorchlightning.ai

イルカ Borovec
イルカ Borovec

I have been working in ML and DS for a while in a few IT companies. I enjoy exploring interesting world problems and solving them with SOTA techniques…

No responses yet