Skip to content

cadia-lvl/compute

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Language and Voice Laboratory Computing Resources

Table of Contents

Introduction

The Language and Voice Laboratory (LVL) runs a tiny computing “cluster” called Terra. This cluster consists of two physical nodes, terra and torpaq.

Access is granted by request by a sysadmin in the LVL. Once you have a user account you can log into the main node:

Any questions additional questions can be asked on the Compute channel on Teams.

A short slideshow with examples and explinations is available here.

Scheduler

The LVL cluster uses Slurm to handle compute job scheduling and resource allocation. All resource intensive tasks must use the scheduling system, and please refrain from requesting way more resources than is necessary.

The command sbatch is used to submit batch jobs to the scheduler. This is the most common way to run tasks on the cluster. A batch job is described by a batch script and the command-line arguments to sbatch.

A batch script is a bash script with some special preprocessor directives, as seen in the example below.

#!/bin/bash
#SBATCH --gres=gpu:titanx:2
#SBATCH --mem=12G
#SBATCH --output=test-sbatch.log
echo "I have these GPUs:" $CUDA_VISIBLE_DEVICES
echo "On this machine" $(hostname)
exit 0

We send this job to the scheduler with

sbatch example-job.sbatch

This defines a job that will request two NVidia Titan X GPUs, 12 GB of memory and write stdout/stderr to the file test-sbatch.log in the current directory. Once the scheduler is able to allocate the necessary resources it will execute the job, writing the IDs of the allocated GPUs and the hostname of the allocated node to test-sbatch.log.

We can use sacct to see the job history and squeue to see queued and running jobs.

Storage

There are a few file systems available on Terra. None of these are backed up. All, except /scratch, are raided for fault-tolerance.

Due to the nature of home, reading and writing to it slows Terra cluster down for everyone. So, do most of your modelling work or intensive reading/writing on /scratch or /work . Home should only be for your code repos, configuration files, etc. Your model and data directories should always be on /scratch or /work.

Mount pathPurposeSizeSpeedlocal node
/dataShared datasets, models and archives. Read-only for users.2.7 TiBFast reads & slow writesterra
/scratch“Unimportant” temporary files with many writes and reads.2 TiBFastestterra
/mnt/scratchLinks to /scratch for legacy reasons
/workMore important temporary files3.4 TiBFastest reads & fast writestorpaq
/homeCode, configuration files, etc5.4TSlowterra

Useful places

Users have access to a few read-only folders on Terra. These places are meant to store frequently used corpora, models and tools.

PathPurpose
/dataDatasets and data used by and created by LVL
/modelsPretrained models from LVL or other sources
/data/toolsShared tools and libraries

If you want to add your own or additional data, models or libraries contact the admins.

Containers

Singularity (FAQ) is a container solution for scientific computing that allows unprivileged use of containers. Singularity supports building its own images from scratch and ready-made Docker images.

A user can build their own containerized application/project on there own machines which can be run on Terra in a Slurm batch job.

Jupyter Notebooks (JupyterHub)

Jupyter notebooks have become a popular way of doing scientific computing and interactive machine learning.

LVL runs a JupyterHub accessible at https://terra.hir.is (RU intranet, you’ll have to accept the self-signed cert) which allows users to spin up notebook servers through Slurm.

The notebook server runs in a container using an image with a Python 3.7 Conda base environment. The Conda tab allows you to create new environments, and new packages can be added to enviroments through the UI or in a notebook using a specific environment.

Installing software

An easy way for a user to install necessary tools and libraries, other than compiling things yourself, is to use the Conda package manager.

To use it you first have to add it to your environment:

source /data/tools/anaconda/etc/profile.d/conda.sh

Then, to always have conda available you can add it to your bash profile with:

conda init

Let’s say that for some reason you need to use pdftotext from poppler-utils, then you can create and environment specifically for that:

conda create -n pdf-stuff poppler-utils

This will create an environment named pdf-stuff with the package poppler-utils and all of its dependencies installed. To activate it you run:

conda activate pdf-stuff

To verify that it has been loaded:

whereis pdftotext
pdftotext: /home/staff/rkjaran/.conda/envs/test-poppler-env/bin/pdftotext

Releases

No releases published

Packages

No packages published