-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Training: Initial Documentation for Kubeflow Trainer V2 #3958
base: master
Are you sure you want to change the base?
Changes from 13 commits
adc2950
d031813
8256b10
212d6da
c8d5eff
2e86716
b0d6844
09ef82c
e92f217
8a5a693
127c5e1
0c3a366
9e3d4d0
694872c
be95550
f2afda3
ec946bc
dafbb3b
34532f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
+++ | ||
title = "Kubeflow Trainer" | ||
description = "Documentation for Kubeflow Trainer" | ||
weight = 20 | ||
+++ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
+++ | ||
title = "Contributor Guides" | ||
description = "Documentation for Kubeflow Trainer contributors" | ||
weight = 60 | ||
+++ | ||
|
||
This doc is in progress... |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
+++ | ||
title = "Community Guide" | ||
description = "How to get involved to Kubeflow Trainer community" | ||
weight = 20 | ||
+++ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
+++ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is under discussion, given that other components do not have this content on the website. we want to ensure this is consistent across the website. |
||
title = "Contributing Guide" | ||
description = "How to contribute to Kubeflow Trainer project" | ||
weight = 10 | ||
+++ | ||
|
||
This doc is in progress... |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,29 @@ | ||||||
+++ | ||||||
title = "Getting Started" | ||||||
description = "Get Started with Kubeflow Trainer" | ||||||
weight = 30 | ||||||
+++ | ||||||
|
||||||
This guide describes how to get started with Kubeflow Trainer and run distributed training | ||||||
with PyTorch. | ||||||
|
||||||
## Prerequisites | ||||||
|
||||||
Ensure that you have access to a Kubernetes cluster with Kubeflow Trainer | ||||||
control plane installed. If it is not set up yet, followÍ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
[the installation guide](/docs/components/trainer/operator-guides/installation) to quickly deploy | ||||||
Kubeflow Trainer on your local Kind cluster. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It may be just better straight "quickly deploy Kubeflow Trainer". The provided link gives the specifics. |
||||||
|
||||||
### Installing the Kubeflow Python SDK | ||||||
|
||||||
Install the latest Kubeflow Python SDK version directly from the source repository: | ||||||
|
||||||
```bash | ||||||
pip install git+https://github.com/kubeflow/training-operator.git@master#subdirectory=sdk_v2 | ||||||
``` | ||||||
|
||||||
TODO (andreyvelich): Add command once we release SDK to PyPI: https://pypi.org/project/kubeflow | ||||||
|
||||||
## Getting Started with PyTorch | ||||||
|
||||||
TODO (andreyvelich): Add example from the Notebook | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about or just remove the section until is fully ready There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add the getting started example once we finish this PR with @astefanutti: kubeflow/training-operator#2387 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
+++ | ||
title = "Legacy Kubeflow Training Operator (v1)" | ||
description = "Kubeflow Training Operator V1 Documentation" | ||
weight = 999 | ||
+++ | ||
|
||
{{% alert title="Old Version" color="warning" %}} | ||
This page is about **Kubeflow Training Operator V1**, for the latest information check | ||
[the Kubeflow Trainer V2 documentation](/docs/components/trainer). | ||
|
||
Follow [this guide for migrating to Kubeflow Trainer V2](/docs/components/trainer/operator-guides/migration) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. my two cents here is that given that the component's name changed should we said V2? or just Kubeflow Trainer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could just say: |
||
{{% /alert %}} |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in installing the control plane, https://v1-9-branch.kubeflow.org/docs/started/installing-kubeflow/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, the latest version of Kubeflow Platform 1.10 will also include Training Operator v1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the Next Steps section , the link is associated with the latest getting started guide instead of v1 Run your first Training Operator Job by following the Getting Started guide. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,13 +5,13 @@ weight = 10 | |
+++ | ||
|
||
This page shows how Training Operator implements the | ||
[API to fine-tune LLMs](/docs/components/training/user-guides/fine-tuning). | ||
[API to fine-tune LLMs](/docs/components/trainer/legacy-v1/user-guides/fine-tuning). | ||
|
||
## Architecture | ||
|
||
In the following diagram you can see how `train` Python API works: | ||
|
||
<img src="/docs/components/training/images/fine-tune-llm-api.drawio.svg" | ||
<img src="/docs/components/trainer/legacy-v1/images/fine-tune-llm-api.drawio.svg" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'l be great if we can update the links to the kubernetes documentation to the kubernetes version supported by the V1 legacy https://v1-28.docs.kubernetes.io/docs/concepts/storage/persistent-volumes/ for eadOnlyMany access mode There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need to do it here ? |
||
alt="Fine-Tune API for LLMs" | ||
class="mt-3 mb-3"> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is under discussion, given that other components do not have this content on the website.we want to ensure this is consistent across the website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue to reflect the conversation we had and made a few updates, feel free to make any suggestions
#3971
but the main idea is to not have individual pages on each project on the website, but continue one centralized place on the website and links to the git repos.