Add Beeline JDBC parameters to HiveCliHook by Vamsi-klu · Pull Request #68144 · apache/airflow

Vamsi-klu · 2026-06-06T21:15:28Z

Add support for passing validated Beeline JDBC URL parameters through HiveCliHook and HiveOperator.

Why

Some Hive Beeline deployments require JDBC URL parameters such as transportMode, trust store settings, or other driver-specific options. Previously, Airflow's Hive CLI hook did not provide a safe, Dag-author-controlled way to append these parameters to the generated Beeline JDBC URL.

This change intentionally avoids accepting arbitrary JDBC parameters from connection extras. Connection extras are managed through the Airflow connection UI and can be shared or reused across many Dags, so using them as a free-form JDBC parameter bag would make the blast radius larger and harder to reason about.

What Changed

Added a jdbc_params argument to HiveCliHook.
Added a matching jdbc_params argument to HiveOperator, passed through to the hook.
Appended validated jdbc_params to the generated Beeline JDBC URL.
Added a bounded connection extra, transport_mode, which maps to JDBC transportMode and only accepts binary or http.
Rejected unsafe JDBC parameter names and values:
- names must start with a letter and contain only letters, digits, dots, underscores, or hyphens
- values cannot be None
- values cannot contain ;
Documented the new hook/operator argument and the limited connection-extra behavior.
Updated the Hive provider changelog.

Impact

Dag authors can now configure Beeline JDBC URL parameters directly in code when a deployment needs driver-specific settings.

This matters because it allows affected Hive deployments to connect through Beeline without forcing unsafe, arbitrary JDBC URL parameter injection through connection extras. The change keeps connection-level configuration bounded while still giving Dag authors the flexibility needed for per-Dag connection behavior.

Existing behavior is preserved for:

Kerberos principal handling
proxy user handling
auth parameter handling
high-availability URL construction
login/password command arguments
existing Beeline URLs without additional JDBC parameters

Why This Solves the Problem

The issue requires a way to add JDBC parameters to Beeline URLs. This implementation appends those parameters at the point where HiveCliHook builds the Beeline JDBC URL, so the final command contains the required JDBC suffix.

At the same time, the implementation avoids the previously rejected approach of allowing arbitrary free-form JDBC URL parameters through connection UI extras. Free-form parameters are only accepted from the hook/operator constructor, where they are controlled by the Dag author and validated before use.

Testing

uv run ruff check providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py
uv run ruff format --check providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py
uv run --project providers/apache/hive mypy providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py
AIRFLOW_CONN_HIVE_CLI_DEFAULT='hive-cli://localhost:10000/default?use_beeline=True' uv run --project providers/apache/hive pytest providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHookJdbcParams providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py::TestHiveOperatorJdbcParams::test_hive_operator_passes_jdbc_params_to_hook providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHook::test_run_cli providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_proxy_user_value providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_wrong_principal providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_high_availability -xvs --without-db-init --no-db-cleanup
prek run --from-ref upstream/main --stage pre-commit passed before the final rebase. After the final rebase, the Hive-relevant hooks passed, but the run later failed while creating a devel-common mypy environment because the local external-volume/macOS environment created an AppleDouble ._ruff file inside the wheel install.
prek run --from-ref upstream/main --stage manual was attempted after the final rebase; provider mypy could not run because local Breeze requires Docker and docker is not installed in this environment.

Was generative AI tooling used to co-author this PR?

Yes — Codex (GPT-5)

Generated-by: Codex (GPT-5) following the guidelines

Vamsi-klu · 2026-06-06T21:22:36Z

@potiuk @eladkal @amoghrajesh could you please take a look when you have a chance?

This PR adds a Dag-author-controlled way to pass validated Beeline JDBC parameters for the Hive CLI hook/operator, while keeping arbitrary JDBC params out of connection extras. The fixed connection extra support is limited to bounded values such as transport_mode, and free-form params are only accepted from hook/operator initialization with validation for unsafe names and delimiters.

Thanks.

Drafted-by: Codex (GPT-5); reviewed by @Vamsi-klu before posting

Nataneljpwd

Looks good overall

Remove the transport_mode connection extra and form widget so transportMode is supplied like any other parameter via jdbc_params, the single validated injection point. Consolidate the six JDBC-param helpers into one _append_jdbc_params method that validates name and value in a single pass, and tighten the parameter-name regex to reject names ending in a separator. Update the connection docs accordingly and drop the manual changelog entry (provider changelogs are regenerated from git log by the release manager).

Vamsi-klu · 2026-06-07T19:42:34Z

Thanks for the thorough review, @Nataneljpwd! Pushed a follow-up commit addressing everything:

Removed the transport_mode connection-extra special-casing — the HIVE_CLI_TRANSPORT_MODES constant, the form widget, and _get_connection_jdbc_url_parameters. transportMode is now passed like any other param via jdbc_params, so there's a single sanctioned, validated injection point (resolves the lines 53/152/239/240 threads).
Consolidated the six helpers into one _append_jdbc_params that validates name and value inline in a single pass (lines 229/256/265).
Tightened the name regex to ^[A-Za-z]([A-Za-z0-9._-]*[A-Za-z0-9])?$ so names can't end in ., -, or _, with tests for those cases (line 54).
Docs: removed the Transport Mode entry and the "fixed connection extras" paragraph, and added the regex (lines 84/106/108).
The valid-case behaviour is asserted by test_jdbc_params_append_to_beeline_url and friends (line 713).
Also dropped the manual changelog.rst entry, since provider changelogs are regenerated from git log by the release manager.

On the use-case question (line 273): jdbc_params is the only place a DAG author can inject Cloudera-CDP-style JDBC params, since the Beeline URL is assembled programmatically and there's no author-editable raw-URL field — more detail in that thread.

boring-cyborg Bot added area:providers kind:documentation provider:apache-hive labels Jun 6, 2026

Vamsi-klu force-pushed the codex/hive-jdbc-params-45049 branch 2 times, most recently from 7d67e14 to 578f709 Compare June 6, 2026 21:16

Vamsi-klu marked this pull request as ready for review June 6, 2026 21:17

Vamsi-klu force-pushed the codex/hive-jdbc-params-45049 branch from baef5c9 to 578f709 Compare June 7, 2026 00:44

Add Beeline JDBC parameters to Hive CLI

ca4d369

Vamsi-klu force-pushed the codex/hive-jdbc-params-45049 branch from 578f709 to ca4d369 Compare June 7, 2026 00:52

Nataneljpwd reviewed Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Beeline JDBC parameters to HiveCliHook#68144

Add Beeline JDBC parameters to HiveCliHook#68144
Vamsi-klu wants to merge 2 commits into
apache:mainfrom
Vamsi-klu:codex/hive-jdbc-params-45049

Vamsi-klu commented Jun 6, 2026

Uh oh!

Vamsi-klu commented Jun 6, 2026

Uh oh!

Nataneljpwd left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vamsi-klu commented Jun 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Vamsi-klu commented Jun 6, 2026

Why

What Changed

Impact

Why This Solves the Problem

Testing

Was generative AI tooling used to co-author this PR?

Uh oh!

Vamsi-klu commented Jun 6, 2026

Uh oh!

Nataneljpwd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vamsi-klu commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vamsi-klu commented Jun 7, 2026 •

edited

Loading