Thank you to @arnonerba for figuring out how to get Julia compiled with Intel MKL and major revisions below.
This guide explains how to set up a simple Amazon Linux 2 instance on EC2 for use with R and Julia. Renting EC2 space can be quite cheap, especially if you use a Spot Instance. Pricing is a continuous uniform-price auction, and if you are outbid, your instance is terminated w/out warning.
The guide assumes basic familiarity with a UNIX-like systems (e.g., navigating the file structure, copying, moving, ssh, etc). Note that Windows 10 now includes the "Windows Subsytem for Linux" (WSL), which provides a very nice terminal environment (MSDN setup guide). The Git Bash terminal is also a good choice for Windows. macOS users may make use of the built-in macOS terminal.
If you have suggestions, pull requests & edits are welcome!
- Sign up for an Amazon AWS account. Sign up for a GitHub education pack if eligible and you may get some free Amazon AWS credits.
- Spin up an Amazon Linux 2 instance. The "compute optimized" tier is recommended, and you should not need more than 16GB of storage.
- Create a new SSH key pair when prompted, or choose one already saved in your AWS account. If you create a new pair, you will be asked to download your private key.
- Your private key must be kept secure. By convention it should be placed in your local
~/.ssh
directory and be protected by either0400
or0600
permissions (note). Your~/.ssh
directory should have0700
permissions. - Private keys may take several different forms:
*.pem
- Standard file format for cryptographic keys/certificates. AWS uses this format.*.key
- Alternate file extension for a PEM file only containing a private key.*.ppk
- Proprietary PuTTY format for private keys. PuTTY does not support the PEM format.
- Public keys utilize the
*.pub
extension, but when copied to a server are appended to your remote~/.ssh/authorized_keys
file. The presence of your public key in this remote file grants you access to the server.- If you are manually copying a new public key to an instance you already have access to, use the
ssh-copy-id
command (note). Otherwise, the AWS setup guide handles this process for you.
- If you are manually copying a new public key to an instance you already have access to, use the
- Your private key must be kept secure. By convention it should be placed in your local
- Connect to your EC2 instance via SSH. You can find the IP address/hostname of your instance in your AWS dashboard.
- Append the following to your local
~/.ssh/config
file, substituting the appropriate values as necessary:Host your_server_name HostName your_ip_address_or_hostname User ec2-user IdentityFile ~/.ssh/your_private_key.pem
- Then, SSH into the server with
ssh your_server_name
. - Alternatively, you can skip the instructions above and connect directly with:
ssh ec2-user@your_ip_address_or_hostname -i ~/.ssh/your_private_key.pem
- Append the following to your local
- Install Git
sudo yum install git
- To push and pull from GitHub over SSH, you will need another public/private key pair that is tied to your GitHub account (note). If you do not have a key pair, generate one on your EC2 instance with
ssh-keygen
and add the public key to your GitHub account. If you already have an authorized key pair, copy the private key to your EC2 instance and place it in your remote~/.ssh
directory:- On the local machine, navigate to your directory with relevant keys (usually
~/.ssh
or%USERPROFILE%/.ssh
). - Use
sftp
to put yourgithub_rsa
private key on the remote server. - Exit
sftp
, and thenssh
back into the server. - Move the private key into .ssh:
mv github_rsa .ssh/
. - Check that the permissions are correct:
ls -al .ssh
.
- On the local machine, navigate to your directory with relevant keys (usually
- See install guide. I had to use PackageCloud to install from command line.
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.rpm.sh | sudo bash sudo yum install git-lfs git lfs install # only run once for initial install
- Install LFTP to connect to Box accounts.
sudo yum install lftp
- Intel MKL is available for free from Intel. The yum repository can be easily added on an Amazon Linux 2 system:
sudo yum-config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo sudo rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
- Multiple versions of MKL are available, but the latest can be easily installed:
sudo yum install intel-mkl
These instructions are for building Julia from source. Binary files are also available on the Julia Downloads page, and installation instructions are available here
- First, install the necessary dependencies:
sudo yum groupinstall 'Development Tools' sudo yum install make gcc gcc-c++ libatomic python gcc-gfortran perl wget m4 patch pkgconfig sudo yum autoremove cmake # default version is too old
- Download the Julia source code:
wget https://github.com/JuliaLang/julia/archive/v1.0.1.tar.gz
- Extract Julia source code tarball and move to
/usr/local
:tar -xzvf v1.0.1.tar.gz mv julia-1.0.1/ /usr/local/julia-1.0.1/ cd /usr/local/julia-1.0.1/
- (Optional) Enable MKL in Julia:
source /opt/intel/bin/compilervars.sh intel64 echo "USE_INTEL_MKL = 1" > Make.user
- Use
make
to compile Julia:./contrib/download_cmake.sh # force Julia to build an updated version of cmake make -j4 # where '4' is the number of available CPU threads
- Symlink Julia to
/usr/local/bin
:ln -s /usr/local/julia-1.0.1/julia /usr/local/bin/julia
- Test MKL Integration in a Julia prompt:
using LinearAlgebra LinearAlgebra.BLAS.vendor()
- Open up a julia prompt and install packages into the default folder
]add AxisAlgorithms BenchmarkTools Calculus CategoricalArrays DataFrames Distributions FileIO Formatting GLM GR Gadfly IndirectArrays Interpolations JLD2 MixedModels NLSolversBase NLopt Optim PkgDev Plots Primes Profile ProgressMeter PyPlot RData Ratios StatsBase StatsFuns StatsModels
Initialize package repo with(deprecated in v0.7)Pkg.init()
in juliaBulk install by updating(deprecated in v0.7)REQUIRE
in~/.julia/v0.x/REQUIRE
and runningPkg.resolve()
. You may need to run julia assudo
with elevated priveleges, but hopefully not.
No guide yet for installing R or R Studio server. See Louis Aslett's page
A great way to make sure log files persist across sessions and in case your spot instance gets killed is to use Amazon's EFS storage. EFS storage is accessible only by instances in the same security group in the same region. So to access files there, you have to go through a running instance: you cannot SSH directly to the the EFS drive.
- Add an EFS instance (encrypted?)
- Install EFS utilities
- If running Amazon Linux:
sudo yum install -y amazon-efs-utils
- If running Ubuntu:
- install
amazon-efs-utils
(manually?) using https://docs.aws.amazon.com/efs/latest/ug/using-amazon-efs-utils.html#installing-other-distro apt-get install libssl-dev
- Upgrade
stunnel
and symlinmk itsudo ln -s /usr/local/bin/stunnel /bin/stunnel
and/orsudo ln -s /usr/bin/stunnel /bin/stunnel
- install
- If running Amazon Linux:
- Make sure that EFS and EC2 instances are in the same security group (SG), and that the SG has an inbound rule allowing NFS traffic from the same SG
- One-time mounting can be done with
sudo mount -t efs fs-[INSTANCE_ID]:/ [TARGET MOUNT POINT]
- Permanent mounting can be done by opening
/etc/fstab
withsudo
privileges and adding a new line to:fs-[INSTANCE_ID]:/ [TARGET MOUNT POINT SUCH AS /mnt/efs] efs defaults,_netdev,nofail 0 0
- It can be nice to symlink the efs volume to a directory:
ln -s /mnt/efs efsdir
.
- Launch your AMI from the EC2 console:
AMI > Select on your AMI > Under "Actions," select "Spot Request"
Request a big instance, and set the MAX price you are willing to pay per hour (usually it's much lower than this) - Once your AMI is running (can take a bit), SSH into it & get to work or point your browser to the relevant IP address.
Sometimes things error out. We have not yet figured out how to get the instance to email us when a job errors out. However, to be notified when an instance's CPU utilization falls below a threshold,
- Create a subscription at AWS SNS
- Under EC2 instances > Monitoring, click "create alarm". Set alarm for CPU utilization <= X pct for less than Y min.
nohup
is a way to run a script that will stay running after your terminal session is killed and have the script dump all STDOUT and STDERR to a log file. For example, we can run
nohup julia --optimize=3 ~/dev-pkgs/ShaleDrillingEstimation/example/run_estimator.jl > ~/efs-ubuntu/JOBNAME\ "$(TZ=America/Los_Angeles date +on\ %Y-%m-%d\ at\ %Hh%Mm)"\ by\ ${MYIP}.out 2>&1 &
where JOBNAME
is to be filled in by you.
- One can use
lftp
to transfer files between AWS instances and Box.net in lieu of Git and git-lfs. Note that special characters in password may have to be escaped or translated to HTML.lftp -p 990 -u "username@institution,PASSWORD" ftps://ftp.box.com mirror [project_dir_on_box] [remote_project_dir]
If required, X11 can be easily used to run remote GUI applications on macOS.
- Install XQuartz
- Log out and log back in, then connect using
ssh -YC user@server
in Terminal to enable X11 forwarding.
Sometimes it is nice to use a GUI. This is pretty straightforward to do on AWS or Azure Ubuntu instances. Note that the instructions below are not tested on AWS Amazon linux.
- Install Remote Desktop
- Install
xrdp
andxcfe4
software as per https://docs.microsoft.com/en-us/azure/virtual-machines/linux/use-remote-desktop. We'll connect over SSH, so no need to open a special RDP port. - On local machine, create ssh tunnel to remote with port forwarding
ssh -L [LOCALPORT]:localhost:[3389] username@remoteip
. Can do this with terminal, Bash for Windows, or Putty. - After connecting to remote instance, set a password on the remote machine so that the RDP can log in
sudo setpasswd [yourname]
. I have sometimes found that I need to do this step to set up a password each time I launch an instance... even if the AMI I'm launching had a password. - Open Remote Desktop Connection (search for mstsc.exe on Windows) & log in to
localhost:[LOCALPORT]
- To be able to reconnect to the same desktop, see http://c-nergy.be/blog/?p=5305 and https://askubuntu.com/questions/133343/how-do-i-set-up-xrdp-session-that-reuses-an-existing-session. Basically, the idea is to edit the xrdp ini file to allow this. Run
sudo [gedit/pico/vim] /etc/xrdp/xrdp.ini
and change section[xrdp1]
where it saysport=-1
toport=ask-1
. When logging in for the first time, leave the port as-1
and note the port number you get (will default to5910
). Then on subsquent logins, change the port to whater the previous one was (I it should default to5910
). Sessions seem to persist even when the SSH tunnel is closed.
- Install
- Install the gnome terminal
sudo apt-get install gnome-terminal
, or something better than thexcfe
terminal. This should swap out automatically if you open a new terminal window - Install unzip (at least if on Azure):
sudo apt-get install unzip
so that Julia can buildHttpParser
for Atom - Fix tab-completion by following https://www.starnet.com/xwin32kb/tab-key-not-working-when-using-xfce-desktop/
- Install Firefox using
sudo apt-get install firefox
- Installing Atom
- Download .deb file from https://atom.io/
- attempt to install with
sudo dpkg -i atom-amd64.deb
- After error, run
sudo apt-get install -f
- Then again
sudo dpkg -i atom-amd64.deb
- Follow atom/atom#4360 (comment) to get Atom to run. You can find the file with
dpkg -L libxcb1 # to find the file cd /usr/share/atom cp /usr/lib/x86_64-linux-gnu/libxcb.so.1 sudo sed -i 's/BIG-REQUESTS/_IG-REQUESTS/' libxcb.so.1