Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Merge branch 'master' into k8s_depl_notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
panchul committed Sep 29, 2020
2 parents 2fbb895 + cbbc344 commit dc07fbb
Show file tree
Hide file tree
Showing 21 changed files with 388 additions and 102 deletions.
92 changes: 81 additions & 11 deletions Research/kubeflow-on-azure-stack-lab/00-Intro/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,35 +74,66 @@ The simpliest way to istall Kubeflow is to use a CNAP package.
## Step 1: Install Porter

Make sure you have Porter installed. You can find the installation instructions for your OS at
Porter's [Installation Instructions](https://porter.sh/install/)
Porter's [Installation Instructions](https://porter.sh/install/). Latest version on Linux:

$ curl https://cdn.porter.sh/latest/install-linux.sh | bash
Installing porter to /home/azureuser/.porter
Installed porter v0.29.0 (5e7240cf)
...

**NOTE:** be sure to add porter to your `PATH` variable so it can find the binaries

## Step 2: Build Porter CNAB

First you will need to navigate to porter directory in the repository. For example

$ git clone https://github.com/Azure-Samples/azure-intelligent-edge-patterns.git
$ cd azure-intelligent-edge-patterns/Research/kubeflow-on-azure-stack/00-Intro
$ cd porter/kubeflow

Change the file permissions
Change the file permissions if needed:

$ chmod 777 kubeflow.sh
$ chmod 755 kubeflow.sh

Next, you will build the porter CNAB
Build the porter CNAB like so:

$ porter build
Copying porter runtime ===>
Copying mixins ===>
Copying mixin exec ===>
Copying mixin kubernetes ===>
Generating Dockerfile =======>
Writing Dockerfile =======>
Starting Invocation Image Build =======>


## Step 3: Generate Credentials

This step is needed to connect to your Kubernetes cluster
This step is needed to connect to your Kubernetes cluster. An easy way to define the connection is to
point to the kubeconfig file. It is usually either in `/home/azureuser/.kube/config`, or you can find
and copy it from `/etc/kubernetes/admin.conf`. Here is the idiomatic way to do it:

$ porter credentials generate
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

Enter path to your kubeconfig file when prompted eg on master node of my cluster i gave </home/azureuser/.kube/config>
Alternatively, you could use `KUBECONFIG` environment variable, if you are the root user, you can run:

$ export KUBECONFIG=/etc/kubernetes/admin.conf

To generate Porter credentials, pick the menu 'from file' and enter your path `/home/azureuser/.kube/config`:

$ porter credentials generate
? How would you like to set credential "kubeconfig"
file path
? Enter the path that will be used to set credential "kubeconfig"
/home/azureuser/.kube/config

Validate that your credential is present by running the below command. You should see something like the below output.

$ porter credentials list
NAME MODIFIED
KubeflowInstaller 40 seconds ago

![List Porter Credentials](porter/kubeflow/pics/porter-credentials-validate.png)

Expand All @@ -113,13 +144,26 @@ Run one of the below commands to interact with the CNAB
To Install :

$ porter install --cred KubeflowInstaller
installing KubeflowInstaller...
executing install action from KubeflowInstaller (installation: KubeflowInstaller)
Install Kubeflow
Installing Kubeflow
[INFO] Installing kftctl binary for Kubeflow CLI...
... Creating directory to store download
... Downloading kfctl binary
./kfctl
... Creating Kubeflow directory
... Installing Kubeflow for deployment: sandboxASkf
[DEBUG] /root/kubeflow//kfctl apply -V -f https://raw.githubusercontent.com/kubeflow/manifests/v1.1-branch/kfdef/kfctl_k8s_istio.v1.1.0.yaml
...
...
execution completed successfully!

To Upgrade :
The pods will start being created, and it will take several minutes, depending on the performance of your system.

If you watnt to upgrade or uninstall Porter packages, you can use similar commands(do NOT run them right now):

$ porter upgrade --cred KubeflowInstaller

To Uninstall :

$ porter uninstall --cred KubeflowInstaller

## Step 5: Check for pods and services
Expand All @@ -129,6 +173,26 @@ After the installation each of the services gets installed into its own namespac
$ kubectl get pods -n kubeflow
$ kubectl get svc -n kubeflow

Or, use script we provide in `sbin` folder to check until all pods are in `Running` state(press `Ctrl-C` to stop the script
if no pods are in `ContainerCreating`/`Init`/`Error` states anymore):

$ cd azure-intelligent-edge-patterns/Research/kubeflow-on-azure-stack-lab/sbin
$ ./check_status.sh
NAME READY STATUS RESTARTS AGE
cache-deployer-deployment-b75f5c5f6-97fsb 0/2 Error 0 6m24s
cache-server-85bccd99bd-bkvww 0/2 Init:0/1 0 6m24s
kfserving-controller-manager-0 0/2 ContainerCreating 0 6m8s
metadata-db-695fb6f55-l6dgs 0/1 ContainerCreating 0 6m23s
ml-pipeline-persistenceagent-6f99b56974-mnt8l 0/2 PodInitializing 0 6m21s
Press Ctrl-C to stop...
NAME READY STATUS RESTARTS AGE
cache-server-85bccd99bd-bkvww 0/2 Init:0/1 0 7m24s
metadata-grpc-deployment-9fdb476-kszzl 0/1 CrashLoopBackOff 5 7m22s
Press Ctrl-C to stop...
NAME READY STATUS RESTARTS AGE
^C


### Step 6: Opening Kubeflow dashboard

To access the dashboard using external connection, replace "type: NodePort" with "type: LoadBalancer" using the patch command:
Expand Down Expand Up @@ -162,6 +226,12 @@ let the pods create containers and start.
---

In case CNAB package installation does not work, you can do it maually, see [Installing Kubeflow manually](installing_kubeflow_manually.md).
You would need to run `kubeflow_install` script we provided, and follow the instructions. At your Kubernetes master node:

$ git clone https://github.com/Azure-Samples/azure-intelligent-edge-patterns.git
$ cd azure-intelligent-edge-patterns/Research/kubeflow-on-azure-stack/sbin
$ chmod 755 *.sh
$ ./kubeflow_install.sh

We prepared the instructions to [Uninstalling Kubeflow](uninstalling_kubeflow.md) too in case you need to so so.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Download `aks-engone` installation script if you do not have it already:

Run the installer, specifying its version:

$ ./get-akse.sh --version v0.43.0
$ ./get-akse.sh --version v0.55.4

If you have problems, please refer to the official page: [Install the AKS engine on Linux in Azure Stack](https://docs.microsoft.com/en-us/azure-stack/user/azure-stack-kubernetes-aks-engine-deploy-linux).

Expand All @@ -18,10 +18,15 @@ In the completely disconnected environment, you need to acquire the archive via
Verify `aks-engine` version:

$ aks-engine version
Version: v0.43.0
Version: v0.55.4
GitCommit: 8928a4094
GitTreeState: clean

Copy the certificate file with the following command:

$ sudo cp /var/lib/waagent/Certificates.pem /usr/local/share/ca-certificates/azurestackca.crt
$ sudo update-ca-certificates

# Links

- [Azure/aks-engine](https://github.com/Azure/aks-engine)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,16 @@ the master node of your Kubernetes cluster:
At your Kubernetes master node:

$ git clone https://github.com/Azure-Samples/azure-intelligent-edge-patterns.git

Make sure you cloned from the right repository and you are on the correct branch.

$ cd azure-intelligent-edge-patterns/Research/kubeflow-on-azure-stack/sbin

If for some reasons, the scripts are not executable(happens with cross-platform git commits),
update the file permissions:

$ chmod 755 *.sh

**IMPORTANT:**

**Do NOT stop the script until it finishes. Some Kubernetes errors and warnings are expected
Expand Down Expand Up @@ -80,8 +88,15 @@ become `Running` and the list will be empty:

When the pods have been created, you can proceed.

To start using Kubeflow, you may want to make Kubeflow Dashboard be visible, so you will need
to change the type of the ingress behavior - from `NodePort` to `LoadBalancer`, using this
**IMPORTANT:**
To open the dashboard to a public IP address, you should first implement a solution to prevent unauthorized access. You can read more about Azure authentication options from [Access Control for Azure Deployment](https://www.kubeflow.org/docs/azure/authentication/).

For demo use, you can use port-forwarding to visit your cluster, run the following command and visit http://localhost:8080:

$ kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Or, again, **for non-production deployment**, you can to make Kubeflow Dashboard be externally visible by
changing the type of the ingress behavior - from `NodePort` to `LoadBalancer`, using this
command (default editor is vi, to edit you need to press `i`, and to save and exit, `<esc>:wq`):

$ ./edit_external_access.sh
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ you will need to ask your cloud administrator. You need the following:

Make sure you have all this information before proceeding further.

You can chose to create a Kubernetes object from your Portal, keeping
in mind the settings and adjustments we discuss below.
Even though you can create a Kubernetes object from your Portal, for Kubeflow we need to make a few
configuration changes, and it is easier to do with AKS-e. ***Please continue to the next chapterm and do not create a Kubernetes cluster using Portal.***

![pics/creating_k8s_marketplace.png](pics/creating_k8s_marketplace.png)

# Installing Kubernetes using AKS-e (Skip the rest of the page if you did it using Portal)
# Installing Kubernetes using AKS-e

## Login to the desired cloud

Expand Down Expand Up @@ -160,27 +160,46 @@ In our case we updated these fields:
- "portalURL": "https://portal.demo2.stackpoc.com"
- "dnsPrefix": "kube-rgDEMO2"
- "keyData": "\<whatever is in id_rsa_for_demo.pub\>"
- updated the `"orchestratorReleaseVersion"` from 1.15 to 1.15.5(which is among the listed supported versions)
- updated the `"orchestratorReleaseVersion"` with what is among the listed supported versions
- changed the master count from 3 to 1. And have 4 pool count.
- added "apiServerConfig" values to resolve istion-system token storage.

Let's also change the master count from 3 to 1. Here is the resulting `kube-rgDEMO2_demoe2.json`:
***Note that `apiServerConfig` may not be available from the template.*** Please make sure you have this definition in "kuberntetesconfig":
```
"properties": {
...
"orchestratorProfile": {
...
"kubernetesConfig": {
...
"apiServerConfig": {
"--service-account-api-audiences": "api,istio-ca",
"--service-account-issuer": "kubernetes.default.svc",
"--service-account-signing-key-file": "/etc/kubernetes/certs/apiserver.key"
}
...
...
```

Here is the resulting `kube-rgDEMO2_demoe2.json`:

{
"apiVersion": "vlabs",
"location": "",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.15",
"orchestratorRelease": "1.17",
"orchestratorVersion": "1.17.11",
"kubernetesConfig": {
"cloudProviderBackoff": true,
"cloudProviderBackoffRetries": 1,
"cloudProviderBackoffDuration": 30,
"cloudProviderRateLimit": true,
"cloudProviderRateLimitQPS": 3,
"cloudProviderRateLimitBucket": 10,
"cloudProviderRateLimitQPSWrite": 3,
"cloudProviderRateLimitBucketWrite": 10,
"kubernetesImageBase": "mcr.microsoft.com/k8s/azurestack/core/",
"cloudProviderRateLimitQPS": 100,
"cloudProviderRateLimitBucket": 150,
"cloudProviderRateLimitQPSWrite": 25,
"cloudProviderRateLimitBucketWrite": 30,
"useInstanceMetadata": false,
"networkPlugin": "kubenet",
"kubeletConfig": {
Expand All @@ -190,6 +209,11 @@ Let's also change the master count from 3 to 1. Here is the resulting `kube-rgDE
"--node-monitor-grace-period": "5m",
"--pod-eviction-timeout": "5m",
"--route-reconciliation-period": "1m"
},
"apiServerConfig": {
"--service-account-api-audiences": "api,istio-ca",
"--service-account-issuer": "kubernetes.default.svc",
"--service-account-signing-key-file": "/etc/kubernetes/certs/apiserver.key"
}
}
},
Expand All @@ -209,7 +233,7 @@ Let's also change the master count from 3 to 1. Here is the resulting `kube-rgDE
"agentPoolProfiles": [
{
"name": "linuxpool",
"count": 3,
"count": 4,
"vmSize": "Standard_F16",
"distro": "aks-ubuntu-16.04",
"availabilityProfile": "AvailabilitySet",
Expand Down Expand Up @@ -241,11 +265,11 @@ see details in a separate page, [Installing aks-engine](installing_aks-engine.md
Download `aks-engine` installation script:

$ curl -o get-akse.sh https://raw.githubusercontent.com/Azure/aks-engine/master/scripts/get-akse.sh
$ chmod 700 get-akse.sh
$ chmod 755 get-akse.sh

Run the installer, specifying its version:

$ ./get-akse.sh --version v0.43.0
$ ./get-akse.sh --version v0.55.4

If you have problems, please refer to the official page: [Install the AKS engine on Linux in Azure Stack](https://docs.microsoft.com/en-us/azure-stack/user/azure-stack-kubernetes-aks-engine-deploy-linux).

Expand All @@ -257,7 +281,7 @@ does have the connection, and uncompress it on the machine where you plan using
Verify `aks-engine` version:

$ aks-engine version
Version: v0.43.0
Version: v0.55.4
GitCommit: 8928a4094
GitTreeState: clean

Expand Down Expand Up @@ -344,7 +368,8 @@ environment.

For this demo we will substitute `azurefile` with our own locally-mounted network storage.

Follow the steps in [Installing Storage](../01-Jupyter/installing_storage.md) to create a Persistent Volume Claim
Follow the steps in [Installing Storage](../01-Jupyter/installing_storage.md)
to create a Persistent Volume Claim
that you could use in your Kubernetes deployments.

For simplicity, we create a Samba server, but you are welcome to use nfs
Expand Down
7 changes: 5 additions & 2 deletions Research/kubeflow-on-azure-stack-lab/01-Jupyter/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,14 @@ ML/NN and AI more broadly, are mathematical concepts and could be implemented in
and frameworks. In this lab we will use mostly Python, but you are free to pick whatever you are comfortable
with - many of the deployment options are language-agnostic as long as apis are satisfied.

## Tensorboard access
## (Optional) Tensorboard access

There is another useful tool to monitor some ML applications if
they support it. We provided a sample file to start it in your Kubernetes cluster, `tensorboard.yaml`.

**Pre-requisite**: You need persistent volume. Follow the steps in [Installing Storage](installing_storage.md) to create a Persistent Volume Claim
that you could use in your Kubernetes deployments.

To start Tensorboard running, deploy it using `kubectl`, and theck that the pod is up:

$ kubectl create -f tensorboard.yaml
Expand All @@ -56,7 +59,7 @@ Now you can access the port you forward from your Kubernetes environment:

## Tensorboard deployment

Here is how you would connect your Tensorboard with the persistence we discuss next:
Here is how you would connect your Tensorboard with the persistence:

$ cat tb.yaml
apiVersion: extensions/v1beta1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ to other available options on the cluster you are using.
## Creating smb clients

Each node of our Kubernetes cluster has to have Samba client to access our Samba server.
Make sure network ports 137-139,445 are accessible on all nodes of your cluster.

You need to repeat the following on every vm in your Kubernetes cluster(you can get their
local ip from the portal and ssh from the master node):

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ Tensorboard is an application that helps visualizing data. It was built to visua
TensorFlow, but could be used more broadly. For example, in our tutorial we demo
how to use it for TensorFlow and Pytorch.

**Pre-requisite**: You need persistent volume. Follow the steps in [Installing Storage](installing_storage.md) to create a Persistent Volume Claim
that you could use in your Kubernetes deployments.

We could use a generic Tensorboard deplolyment, see `tb_generic.yaml`:

$ kubectl create -f tb_generic.yaml
Expand Down
Loading

0 comments on commit dc07fbb

Please sign in to comment.