Skip to content

New Container: DatabricksContainer for Databricks SQL Connector #981

@adendek

Description

@adendek

What is the new container you'd like to have?

I propose the addition of a DatabricksContainer module to facilitate testing applications that use the databricks-sql-connector. This would allow developers to run integration and end-to-end tests against a mock Databricks SQL API in an isolated and reproducible manner.

Unlike many services, Databricks does not provide an official local emulator in a Docker container. Therefore, this module would be designed to work with a user-provided Docker image that runs a mock server emulating the Databricks SQL API.

The benefits of having this dedicated container module include:

  • Isolated Testing: Enables hermetic tests without relying on shared, live Databricks workspaces.
  • CI/CD Integration: Simplifies running automated tests in CI/CD pipelines without complex credential management or network configurations.
  • Developer Experience: Provides a simple, Pythonic interface consistent with other testcontainers modules like AzuriteContainer or PostgresContainer.
  • Reliability: Eliminates test flakiness caused by network issues or changes in shared development environments.

Why not just use a generic container for this?

While it is possible to use a generic DockerContainer("my-databricks-mock:latest"), a dedicated DatabricksContainer module would abstract away significant complexity related to configuration and readiness checks.

  1. Complicated Setup and Configuration:

The databricks-sql-connector requires specific connection parameters: server_hostname, http_path, and access_token. A user of DockerContainer would have to manually:

  • Get the container's dynamic IP address and port.
  • Correctly format the server_hostname and http_path.
  • Know which token the mock server expects.
  • This process is cumbersome and error-prone.

Generic DockerContainer approach :

from databricks import sql
from testcontainers.core.container import DockerContainer

with DockerContainer("my-databricks-mock:latest").with_exposed_ports(8080) as mock_container:
    host = mock_container.get_container_host_ip()
    port = mock_container.get_exposed_port(8080)
    
    # User must manually construct connection parameters
    connection = sql.connect(
        server_hostname=host,
        http_path=f"/sql/1.0/warehouses/{port}", # Path might be complex and mock-specific
        access_token="dummy-token"
    )

A dedicated DatabricksContainer would provide helper methods to abstract this away, offering a much cleaner interface.

Proposed DatabricksContainer approach:

from databricks import sql
# from testcontainers.databricks import DatabricksContainer

with DatabricksContainer as databricks_container:
    # Clean, abstracted methods
    connection = sql.connect(
        server_hostname=databricks_container.get_server_hostname(),
        http_path=databricks_container.get_http_path(),
        access_token=databricks_container.get_token()
    )

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions