Skip to content

Apache Livy on DC OS

dcvan24 edited this page Apr 25, 2018 · 2 revisions

Apache Livy on DC/OS

1. Install

2. Configuration

3. Gotchas

3.1 Use Spark executor Docker image

The default Livy installation assumes that Spark are installed on the Mesos agent, which can rarely be the case on a generic DC/OS cluster. As a consequence, if you create a Spark session without specifying spark.mesos.executor.docker.image, Mesos will create LXC containers that load Spark libraries and executables from the Mesos agents they are running on, and raise errors due to missing files. Instead, as shown in the code snippet below, you should point the new Spark session to a Docker image in which all the Spark libraries and executables (e.g., pyspark, sparkR) are installed. The docker image will be used to start Spark executor containers for running Spark tasks submitted to this session.

import json
import requests

host = 'http://<livy-host>:8998'
data = {
  'kind': 'spark',
  'conf':{
    'spark.mesos.executor.docker.image': 'heliumdatacommons/spark:1.0.9-2.1.0-1-hadoop-2.6',
    'spark.mesos.executor.home': '/opt/spark/dist',
  }
}
headers = {'Content-Type': 'application/json'}

# create a Spark session
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)
print(r.json())
print(r.headers['location'])

3.2 Keep Spark version consistent on Livy and Spark executors

Current Livy is built on top of heliumdatacommons/spark:1.0.9-2.1.0-1-hadoop-2.6. Make sure you are using the same Docker image for creating Spark sessions.

Inconsistent versions of Spark running on Livy and the Spark executors will cause compatibility issues and fail Spark tasks.