diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 811379a0d2c..09ed7a1c881 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -763,13 +763,38 @@ storage.objects.create storage.objects.delete ``` -**Installing packages:** If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster. +**Installing packages:** + +Installation of third-party packages on Dataproc varies depending on whether it's a [cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) or [serverless](https://cloud.google.com/dataproc-serverless/docs). + +- **Dataproc Cluster** — Google recommends installing Python packages while creating the cluster via initialization actions: + - [How initialization actions are used](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/README.md#how-initialization-actions-are-used) + - [Actions for installing via `pip` or `conda`](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/python) + + You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. + +- **Dataproc Serverless** — Google recommends using a [custom docker image](https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers) to install thrid-party packages. The image needs to be hosted in [Google Artifact Registry](https://cloud.google.com/artifact-registry/docs). It can then be used by providing the image path in dbt profiles: + + ```yml + my-profile: + target: dev + outputs: + dev: + type: bigquery + method: oauth + project: abc-123 + dataset: my_dataset + + # for dbt Python models to be run on Dataproc Serverless + gcs_bucket: dbt-python + dataproc_region: us-central1 + submission_method: serverless + dataproc_batch: + runtime_config: + container_image: {HOSTNAME}/{PROJECT_ID}/{IMAGE}:{TAG} + ``` -Google recommends installing Python packages on Dataproc clusters via initialization actions: -- [How initialization actions are used](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/README.md#how-initialization-actions-are-used) -- [Actions for installing via `pip` or `conda`](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/python) - -You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. +