1. HOME
  2. What's New
  3. JupyterLab in a High Performance Computing (HPC) Environment

What's New

Some contents are available in Japanese only.

Tech Blog

JupyterLab in a High Performance Computing (HPC) Environment

One way of describing High Performance Computing (HPC) is that it is the processing of very large datasets and the solving of large and complex problems. It utilizes specialized supercomputers or large clusters and employs parallel processing techniques. HPC is used in multiple industries, such as research laboratories, manufacturing, engineering, genomics, oil & gas, finance, media & entertainment, and more. It also includes AI and machine learning applications.

With the growing use of HPC, JupyterLab is becoming a must-have tool in an HPC environment. In our previous blog, we talked about a quick starter guide to JupyterLab. Here we discuss how to deploy it in an HPC environment.

First, there are prerequisites to be installed.

NGINX is an open source web server that can be used to reverse proxying, caching, load balancing, media streaming, and more. It is commonly used to manage incoming traffic and distribute it to slower upstream – anything from legacy database servers to microservices.

# Ubuntu 18.04 and 20.04
apt install -y nginx
# CentOS
yum install -y nginx

Node.js is an open source, cross-platform, back-end JavaScript runtime environment that runs on the V8 JavaScript engine and executes JavaScript code outside a web browser.

# Ubuntu 18.04
apt install -y nodejs nodejs-dev node-gyp libssl1.0-dev

# Ubuntu 20.04
apt install -y nodejs node-gyp libssl1.1

# CentOS
yum install -y nodejs

npm is a package manager for JavaScript and it is maintained by npm, Inc. It is the default package manager for the JavaScript runtime environment Node.js.

# Ubuntu 18.04 and Ubuntu 20.04
apt install -y npm

# CentOS
yum install -y npm

Upgrade the Node.js version to v14.8.0 or later:

npm cache clean -f
npm install -g n 
n 14.8.0

After installing the prerequisites, the next thing we need to do is to create a systemd service file. This file will be used to easily manage JupyterLab.

Create a file jupyterlab.service:

cd /etc/systemd/system/
touch jupyterlab.service
# Paste the content below for jupyterlab.service
[Unit]
Description=JupyterLab
After=syslog.target network.target

[Service]
User=root
StandardOutput=file:/var/log/systemd/jupyterlab/sysout.log
ExecStart=jupyter lab

Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Start and enable the jupyterlab.service:

systemctl daemon-reload
systemctl start jupyterlab.service
systemctl enable jupyterlab.service
systemctl status jupyterlab.service

# systemctl status should show something like this
● jupyterlab.service - JupyterLab
   Loaded: loaded (/path/to/jupyterlab.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-10-04 04:04:09 UTC; 24h ago
 Main PID: 54 (gunicorn)
   CGroup: /path/to/jupyterlab.service
           ├─  54 /path/to/python3 /path/to/jupyter lab

With that configured, we can now proceed to set up a reverse proxy. The reverse proxy serves as a gateway between clients, users, and application servers. It is typically implemented to help increase security, performance, and reliability.

Create an nginx config file named jupyterlab.conf:
Note: You should provide your own SSL certificate and domain name.

vi /etc/nginx/conf.d/jupyterlab.conf
# Paste the content below for jupyterlab.conf
map $http_upgrade $connection_upgrade {
   default upgrade;
   ''      close;
}

upstream jupyterlab {
   server 127.0.0.1:8888;
}

server {
   listen 443 ssl;
   server_name ;
   charset utf-8;

   include /path/to/ssl.certificate.conf;

   location / {
       proxy_pass http://jupyterlab;
       proxy_set_header X-Real-IP $remote_addr;
       proxy_set_header Host $host;
       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

       # websocket headers
       proxy_set_header Upgrade $http_upgrade;
       proxy_set_header Connection $connection_upgrade;
       proxy_set_header X-Scheme $scheme;
       proxy_read_timeout 86400s;
       proxy_send_timeout 86400s;

       proxy_buffering off;
   }

   location ~ /.well-known {
       allow all;
   }
}

Test nginx and make sure it’s working properly without any errors:

nginx -t 
systemctl reload nginx

Finally, access the JupyterLab (<insert_domain_name>) on your desired browser.

After deploying JupyterLab, there are extensions that you need to install. Note that these are my personal favorites, and there are other plugins that might help your productivity.

JupyterHub is the best way to serve Jupyter notebooks for multiple users. It can be used in a class, for example, or within a corporate data science group or any scientific research group. It is a multi-user Hub that spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.

JupyterHub-ldap-authenticator is a package that integrates and enables LDAP on your JupyterLab. Together with JupyterHub, users will be able to log in using their LDAP accounts. This package also supports multiple LDAP servers, which is very useful for enterprises.

JupyterLab-Matplotlib is an extension that can make your Matplotlib interactive:

JupyterLab-DawIO is an extension that allows you to draw diagrams in your JupyterLab:

JupyterLab-Spreadsheet is an extension that adds a simple spreadsheet viewer to JupyterLab. It is a big help for data scientists, data engineers, and casual users who want to view spreadsheets in JupyterLab:

Conclusion

JupyterLab is a great addition not only for personal tasks and workloads, but also for anything in the HPC environment. Its comprehensive capabilities can support all of your daily activities, and with the help of plugins, you can increase your productivity.

<About the author>
Ray Marc Marcellones has been an HPC Engineer at XTREME-D Inc. for more than three years. He is engaged in the development of both a back end and a front end for AXXE-L Web and Services.