Skip to content
This repository was archived by the owner on Jan 28, 2025. It is now read-only.

Commit 2a99af8

Browse files
committed
Final squash commit for 2019.4 release
1 parent c060e5b commit 2a99af8

File tree

149 files changed

+284593
-36728
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

149 files changed

+284593
-36728
lines changed

.gitignore

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,12 @@ test/target
1111

1212
test/src/main/resources/awscredentials
1313
scripts/setup_aws_env.sh
14-
dist/odapweb.tar.gz
14+
dist/mliyweb.tar.gz
1515
logs/auth.log
1616
logs/boto.log
17-
logs/odapweb.log
17+
logs/mliyweb.log
1818
logs
19-
odapweb/migrations
19+
mliyweb/migrations
2020
tests
2121
*.exvCfB
22+
*.cloudpass

DCO

Lines changed: 0 additions & 31 deletions
This file was deleted.

docs/AdministratorGuide.md

Lines changed: 480 additions & 113 deletions
Large diffs are not rendered by default.

docs/DeveloperGuide.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,4 +207,5 @@ MLiy uses data and format from www.ec2instances.info:
207207

208208
Replace the instances.json file in [mliyweb/fixtures]. This should work unless the format has changed.
209209

210+
210211
[mliyweb/fixtures]:../mliyweb/fixtures

docs/Getting Started.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ Adminsitrators define one or more sets of software that are used by end users. M
2727
- Tag processing power for cost allocation
2828
- On-demand cost and usage reports
2929

30+
## MLiy High-level Diagrams
31+
![End User Use Cases](img/MLiy_End_User_Use_Cases.GIF)
32+
33+
![Administrator Use Cases](img/MLiy_Administrator_Use_Cases.GIF)
34+
3035
## MLiy Documentation
3136
- [Administrator Guide](./AdministratorGuide.md)
3237
- [User Guide](./UserGuide.md)

docs/MLiyClusterGuide.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# MLiy Cluster Guide
2+
3+
[Prerequisites](#prerequisites)
4+
5+
[Accessing the MLiy Cluster](#accessing-the-mliy-cluster)
6+
7+
[SSH](#ssh)
8+
9+
[Using the MLiy Cluster](#using-the-mliy-cluster)
10+
- [JupyterHub](#jupyterhub)
11+
- [Apache Livy](#apache-livy)
12+
- [Apache Spark](#apache-spark)
13+
- [Apache Hadoop Yarn](#apache-hadoop-yarn)
14+
- [Ganglia](#ganglia)
15+
- [SparkMagic](#sparkmagic)
16+
- [Addlib Magic](#addlibmagic)
17+
18+
MLiy clusters are AWS EMR clusters (Spark and Hive) based on MLiy cluster provisioned by MLiy Web Application. This guide lists the features included with the sample MLiy cluster and how to use them.
19+
20+
## Prerequisites
21+
- SSH key for the MLiy Cluster Master Node (for administrative access)
22+
- Assigned to a preconfigured group in MLiy Web Application
23+
- Firefox or Chrome
24+
25+
## Accessing the MLiy Cluster
26+
27+
You can interact with your MLiy Cluster via the HTTPS and SSH/SCP protocols.
28+
29+
## HTTPS
30+
31+
The following services are available via links in your MLiy home page:
32+
- JupyterHub (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub.html and https://jupyterhub.readthedocs.io/en/stable/)
33+
- Apache Livy (https://livy.incubator.apache.org/)
34+
- Apache Spark (https://spark.apache.org/docs/2.4.0/)
35+
- Apache Hadoop Yarn (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
36+
- Ganglia (http://ganglia.info/)
37+
38+
You will be asked to logon with your LDAP/Active Directory credentials when you click on any of the services.
39+
40+
![alt text](./img/mliy_emr_master.png "MLiy EMR Master")
41+
42+
## SSH
43+
44+
If needed for troubleshooting, you can logon to your MLiy Cluster master node using the SSH key specified in EMR cluster confirgurion in MLiy web application.
45+
46+
```
47+
ssh -i /path/to/pem/file hadoop@instance_ip_or_fqdn_of_master_node
48+
```
49+
50+
## Using the MLiy Instance
51+
You will use your LDAP/Active Directory credentials to logon to services provided by your MLiy Cluster master node.
52+
53+
### JupyterHub
54+
In the JupyterHub web interface, you can create PySpark, PySpark3, Python 3 and Spark (Scala) notebooks . Go to top/right and click on New. You will see the option to select PySpark, PySpark3, Python 3 and Spark. You can browse files in your home directory in S3, and upload and download notebooks.
55+
56+
![alt text](./img/jupyterhub.png "JypyterHub")
57+
58+
### Apache Livy
59+
Apache Livy is REST Service for Apache Spark. The Apache Livy user interface allows you to monitor active Livy sessions and logs.
60+
61+
![alt text](./img/apache_livy.png "Apache Livy")
62+
63+
### Apache Spark
64+
65+
Spark History Server lists active Spark application and provide details about the associated Spark job.
66+
67+
![alt text](./img/spark_history_server.png "Spark History Server")
68+
![alt text](./img/spark_jobs.png "Spark Jobs")
69+
70+
### Apache Hadoop Yarn
71+
72+
Yarn is resource manager for Hadoop. It helps monitor Spark applications scheduled in Yarn.
73+
74+
![alt text](./img/apache_hadoop_yarn.png "Apache Hadoop Yarn")
75+
76+
### Ganglia
77+
78+
Ganglia provides a web-based user interface to view the metrics, including Hadoop and Spark metrics, for nodes in the cluster.
79+
80+
![alt text](./img/ganglia.png "Ganglia")
81+
82+
### SparkMagic
83+
84+
Sparkmagic (https://github.com/jupyter-incubator/sparkmagic) is a library of kernels that allows Jupyter notebooks to interact with Apache Spark through Apache Livy. There are two ways to use Sparkmagic:
85+
1. Via the IPython kernel (https://github.com/jupyter-incubator/sparkmagic/blob/master/examples/Magics%20in%20IPython%20Kernel.ipynb)
86+
2. Via the PySpark (https://github.com/jupyter-incubator/sparkmagic/blob/master/examples/Pyspark%20Kernel.ipynb) and Spark (https://github.com/jupyter-incubator/sparkmagic/blob/master/examples/Spark%20Kernel.ipynb) kernels
87+
88+
### Addlib Magic
89+
Once a cluster is up and running and you have logged on and created a notebook, you can use iPython custom magic, addlib_magic, to deploy your library as jar or zip file and import that library in your notebook cell and use it. Here is a sample code to load the magic and use it:
90+
91+
```
92+
%load_ext addlib_magic
93+
```
94+
95+
```
96+
%addlib {Absolute S3 Path to Jar or Zip file}
97+
```

0 commit comments

Comments
 (0)