Skip to content

Talk for the Research Computing Leeds Conference: Save your tears for the data - A touch of Docker in a Data Scientist's workflow

Notifications You must be signed in to change notification settings

R-icntay/res_comp_leeds_2022

Repository files navigation

res_comp_leeds_2022

Talk for the Research Computing Leeds Conference: Save your tears for the data - A touch of Docker in a Data Scientist's workflow.

Slides: PDF

Abstract

Many data science teams have become multilingual, leveraging R, Python, Julia and friends in their work. Into the bargain, different data scientists have different preferences in their coding environments and operating systems. While this diversity allows data scientists to work with the tools they are most comfortable with, it can become a pain to share the same projects on different machines with different configurations. This talk illustrates how data scientists can leverage Dev Containers to create portable, reproducible and tailored development environments, which can be instantiated reliably in different environments, operating systems and hardware. Data scientists can therefore focus on what they love and do best (i.e data science) without having to worry about the hassle required to reproduce their work, deploy their analysis dashboards or even deploy their models.

Setting up the development container

A development container is a running Docker container with a well-defined tool/runtime stack and its prerequisites. You can try out development containers with GitHub Codespaces on the cloud or Visual Studio Code Remote - Containers on a machine that has Docker installed as per the instructions below:

GitHub Codespaces

Follow these steps to open this workshop in a Codespace:

  1. Click the Code drop-down menu on the repo and select the Open with Codespaces option.
  2. Select + New codespace at the bottom on the pane.

For more info, check out the GitHub documentation.

VS Code Remote - Containers

Follow these steps to open this workshop in a container using the VS Code Remote - Containers extension:

  1. If this is your first time using a development container, please ensure your system meets the pre-reqs (i.e. have Docker installed) in the getting started steps.

  2. Press F1 select and Add Development Container Configuration Files... command for Remote-Containers or Codespaces.

    Note: If needed, you can drag-and-drop the .devcontainer folder from this sub-folder in a locally cloned copy of this repository into the VS Code file explorer instead of using the command.

  3. Select this definition. You may also need to select Show All Definitions... for it to appear.

  4. Finally, press F1 and run Remote-Containers: Reopen Folder in Container to start using the definition.

At some point, you may want to make changes to your container, such as installing a new package. You'll need to rebuild your container for your changes to take effect.

Demo

We create a dev container that can support data science tasks in R, Python, VS Code and RStudio.

Spinning up RStudio server

Toggle terminal: ctrl + `

Navigate to PORTS tab, then click on the 🌐 icon:

The default username and password is rstudio (as it should? 🤭)

Resources

Thanks to

Thank you to the following folks for providing helpful info on how to set up RStudio server on a dev container:

About

Talk for the Research Computing Leeds Conference: Save your tears for the data - A touch of Docker in a Data Scientist's workflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published