Skip to content

Docker Image for deploying and running queries on the Apache Arrow based SkyhookDM Ceph on a Rook cluster

Notifications You must be signed in to change notification settings

uccross/skyhookdm-arrow-docker

Repository files navigation

SkyhookDM-Arrow Docker

Docker image containing SkyhookDM built on top of Arrow along with C++ and Python API clients.

SkyhookDM-Arrow: skyhook image pulls

SkyhookDM-Arrow-Benchmark: skyhook benchmark image pulls

Triggering release builds

./release_skyhook.sh [tag]
./release_skyhook_benchmark.sh [tag]

Deploying SkyhookDM-Arrow on a Rook cluster

  • Change the Ceph image tag in the Rook CRD here to the image built from this dir (or you can quickly use uccross/skyhookdm-arrow:vX.Y.Z as the image tag) to change your Rook Ceph cluster to the vX.Y.Z version of SkyhookDM Arrow.

  • After the cluster is updated, we need to deploy a Pod with the PyArrow (with SkyhookFileFormat API) library installed to start interacting with the cluster. This can be achieved by following these steps:

    1. Update the ConfigMap with configuration options to be able to load the arrow CLS plugins.
    kubectl apply -f cls.yaml
    1. Create a Pod with PyArrow pre-installed for connecting to the cluster and running queries.
    kubectl apply -f client.yaml
    1. Create a CephFS on the Rook cluster.
    kubectl create -f filesystem.yaml
    1. Copy the Ceph configuration and Keyring from some OSD/MON Pod to the playground Pod.
    # copy the ceph config
    kubectl -n [namespace] cp [any-osd/mon-pod]:/var/lib/rook/[namespace]/[namespace].config ceph.conf
    kubectl -n [namespace] cp ceph.conf rook-ceph-playground:/etc/ceph/ceph.conf
    
    # copy the keyring
    kubectl -n [namespace] cp [any-osd/mon-pod]:/var/lib/rook/[namespace]/client.admin.keyring keyring
    kubectl -n [namespace] cp keyring rook-ceph-playground:/etc/ceph/keyring

    NOTE: You would need to change the keyring path in the ceph config to /etc/ceph/keyring manually.

    1. Check the connection to the cluster from the client Pod.
    # get a shell into the client pod
    kubectl -n [namespace] exec -it rook-ceph-playground bash
    
    # check the connection status
    $ ceph -s
    1. Now, install ceph-fuse and mount CephFS into some path in the client Pod using it. [In a later release ceph-fuse will come installed in the SkyhookDM image itself.]
    yum install ceph-fuse
    mkdir -p /mnt/cephfs
    ceph-fuse --client_fs cephfs /mnt/cephfs 

    NOTE: The client_fs name can be different. Please check the filesystem.yaml file for the filesystem name you are using.

    1. Download some example dataset into /path/to/cephfs/mount. For example,
    cd /mnt/cephfs
    wget https://raw.githubusercontent.com/JayjeetAtGithub/zips/main/nyc.zip
    unzip nyc.zip
    1. Modify the example python script according to your needs and execute.
    python3 example.py

About

Docker Image for deploying and running queries on the Apache Arrow based SkyhookDM Ceph on a Rook cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published