Using jupyter notebooks for reproducible publication figures

I usually keep a single python script for each publication figure. The source data is loaded, processed and finally plotted. This allows me to quickly change xy limits or tweak the font size of the third subplot legend. I just have to make the necessary changes and execute the script. The new figure is automatically created and I do not have to mess with any editing programs.

This method has worked great for me for the past few years. I keep the source code for all the figures as well as code snippets that I may later reuse and share with others.

Now this is all great when the loading and processing the data doesn’t take that long. I can quickly type the figure changes and rerun the script to see the results.

However, recently I had to load a 30 GB dataset each time I ran the script, followed by a few minutes of processing. This quickly became boring as I wasn’t managing to get the figure just right.

Then I found out about jupyter notebooks for python. jupyter allows for interactive Mathematica-style notebooks which allow you to execute chunks of code at a time. I can keep the processed data in memory and execute only the code related to the figure plotting.

In addition, jupyter works by running a server accessible through a web interface. This means I can run the python scripts locally or on a remote server.

I usually access the data through the computing cluster hosted by the research institute. The most straightforward programming must be done through an SSH session or a VNC interface. Using a jupyter session running in the server allows me to access all the data from the comfort of my own computer without caring for AFS configuration or missing libraries or environment paths.

So… how to set up the jupyter environment?

First of all, make sure jupyter is installed and accessible. Try running jupyter notebook. This creates the local server and should open up a browser session to the default localhost:8888 URL.

I used the anaconda module that should include Jupiter.

If you’re running on the same servers as I, do

$ module unload python27/basic
$ module load anaconda
$ jupyter notebook

This is great for running local jupyter notebooks. But we want remote access.

So we must set up the jupyter server to accept any incoming IP addresses and protect it a little bit with a password and encryption. We don’t want anyone with the URL to run arbitrary code on our remote system.

For this I followed the configuration in http://jupyter-notebook.readthedocs.io/en/latest/public_server.html

But I resume it here:

First generate the default configuration file for the jupyter session, in case it doesn’t exist yet:

$ jupyter notebook --generate-config

Then create a password to access the jupyter interface from the web

$ jupyter notebook password
Enter password: **** 
Verify password: **** 
[NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json 

Now we should create a certificate to use in our encrypted sessions. Go into a known directory where you want to store your certificates and you may create a self signed certificate by:

$ openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem

Remember the path to this folder.

Now make your jupyter server public. Go into the generated jupyter configuration file in /Users/you/.jupyter/jupyter_notebook_config.json.

Inside this file, add the paths to your certificates

c.NotebookApp.certfile = u’/absolute/path/to/your/certificate/mycert.pem'
c.NotebookApp.keyfile = u’/absolute/path/to/your/certificate/mykey.key' 

Set the IP to * so that any IP can connect to the jupyter

c.NotebookApp.ip = '*'

If you ran the previous password command, the hashed password is already stored somewhere else making this line redundant

c.NotebookApp.password = u’sha1:bcd259ccf...<your hashed password here>' 

You may disable the automatic opening of the web browser when starting the jupyter server. Since it is running remotely, you don’t need it anyway.

c.NotebookApp.open_browser = False 

And finally fix the port to access the jupyter web interface.

c.NotebookApp.port = 9999 

Now just open up a web browser and go to https://serverip:port/

http://jupyter-notebook.readthedocs.io/en/latest/public_server.html

  1. Lifehacks
  2. Uncategorized
  3. Writing