Skip to content

Add Beeline JDBC parameters to HiveCliHook#68144

Open
Vamsi-klu wants to merge 2 commits into
apache:mainfrom
Vamsi-klu:codex/hive-jdbc-params-45049
Open

Add Beeline JDBC parameters to HiveCliHook#68144
Vamsi-klu wants to merge 2 commits into
apache:mainfrom
Vamsi-klu:codex/hive-jdbc-params-45049

Conversation

@Vamsi-klu
Copy link
Copy Markdown
Contributor

Add support for passing validated Beeline JDBC URL parameters through HiveCliHook and HiveOperator.

closes: #45049

Why

Some Hive Beeline deployments require JDBC URL parameters such as transportMode, trust store settings, or other driver-specific options. Previously, Airflow's Hive CLI hook did not provide a safe, Dag-author-controlled way to append these parameters to the generated Beeline JDBC URL.

This change intentionally avoids accepting arbitrary JDBC parameters from connection extras. Connection extras are managed through the Airflow connection UI and can be shared or reused across many Dags, so using them as a free-form JDBC parameter bag would make the blast radius larger and harder to reason about.

What Changed

  • Added a jdbc_params argument to HiveCliHook.
  • Added a matching jdbc_params argument to HiveOperator, passed through to the hook.
  • Appended validated jdbc_params to the generated Beeline JDBC URL.
  • Added a bounded connection extra, transport_mode, which maps to JDBC transportMode and only accepts binary or http.
  • Rejected unsafe JDBC parameter names and values:
    • names must start with a letter and contain only letters, digits, dots, underscores, or hyphens
    • values cannot be None
    • values cannot contain ;
  • Documented the new hook/operator argument and the limited connection-extra behavior.
  • Updated the Hive provider changelog.

Impact

Dag authors can now configure Beeline JDBC URL parameters directly in code when a deployment needs driver-specific settings.

This matters because it allows affected Hive deployments to connect through Beeline without forcing unsafe, arbitrary JDBC URL parameter injection through connection extras. The change keeps connection-level configuration bounded while still giving Dag authors the flexibility needed for per-Dag connection behavior.

Existing behavior is preserved for:

  • Kerberos principal handling
  • proxy user handling
  • auth parameter handling
  • high-availability URL construction
  • login/password command arguments
  • existing Beeline URLs without additional JDBC parameters

Why This Solves the Problem

The issue requires a way to add JDBC parameters to Beeline URLs. This implementation appends those parameters at the point where HiveCliHook builds the Beeline JDBC URL, so the final command contains the required JDBC suffix.

At the same time, the implementation avoids the previously rejected approach of allowing arbitrary free-form JDBC URL parameters through connection UI extras. Free-form parameters are only accepted from the hook/operator constructor, where they are controlled by the Dag author and validated before use.

Testing

  • uv run ruff check providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py
  • uv run ruff format --check providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py
  • uv run --project providers/apache/hive mypy providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py
  • AIRFLOW_CONN_HIVE_CLI_DEFAULT='hive-cli://localhost:10000/default?use_beeline=True' uv run --project providers/apache/hive pytest providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHookJdbcParams providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py::TestHiveOperatorJdbcParams::test_hive_operator_passes_jdbc_params_to_hook providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHook::test_run_cli providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_proxy_user_value providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_wrong_principal providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_high_availability -xvs --without-db-init --no-db-cleanup
  • prek run --from-ref upstream/main --stage pre-commit passed before the final rebase. After the final rebase, the Hive-relevant hooks passed, but the run later failed while creating a devel-common mypy environment because the local external-volume/macOS environment created an AppleDouble ._ruff file inside the wheel install.
  • prek run --from-ref upstream/main --stage manual was attempted after the final rebase; provider mypy could not run because local Breeze requires Docker and docker is not installed in this environment.

Was generative AI tooling used to co-author this PR?
  • Yes — Codex (GPT-5)

Generated-by: Codex (GPT-5) following the guidelines

@Vamsi-klu Vamsi-klu force-pushed the codex/hive-jdbc-params-45049 branch 2 times, most recently from 7d67e14 to 578f709 Compare June 6, 2026 21:16
@Vamsi-klu Vamsi-klu marked this pull request as ready for review June 6, 2026 21:17
@Vamsi-klu
Copy link
Copy Markdown
Contributor Author

@potiuk @eladkal @amoghrajesh could you please take a look when you have a chance?

This PR adds a Dag-author-controlled way to pass validated Beeline JDBC parameters for the Hive CLI hook/operator, while keeping arbitrary JDBC params out of connection extras. The fixed connection extra support is limited to bounded values such as transport_mode, and free-form params are only accepted from hook/operator initialization with validation for unsafe names and delimiters.

Thanks.


Drafted-by: Codex (GPT-5); reviewed by @Vamsi-klu before posting

@Vamsi-klu Vamsi-klu force-pushed the codex/hive-jdbc-params-45049 branch from baef5c9 to 578f709 Compare June 7, 2026 00:44
@Vamsi-klu Vamsi-klu force-pushed the codex/hive-jdbc-params-45049 branch from 578f709 to ca4d369 Compare June 7, 2026 00:52
Copy link
Copy Markdown
Contributor

@Nataneljpwd Nataneljpwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

Comment thread providers/apache/hive/docs/connections/hive_cli.rst Outdated
Comment thread providers/apache/hive/docs/connections/hive_cli.rst Outdated
Comment thread providers/apache/hive/docs/connections/hive_cli.rst Outdated
Comment thread providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py Outdated
Comment thread providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py Outdated
Comment thread providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py Outdated
Comment thread providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py Outdated
Comment thread providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py Outdated
Comment thread providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py Outdated
Comment thread providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py Outdated
Remove the transport_mode connection extra and form widget so transportMode
is supplied like any other parameter via jdbc_params, the single validated
injection point. Consolidate the six JDBC-param helpers into one
_append_jdbc_params method that validates name and value in a single pass,
and tighten the parameter-name regex to reject names ending in a separator.
Update the connection docs accordingly and drop the manual changelog entry
(provider changelogs are regenerated from git log by the release manager).
@Vamsi-klu
Copy link
Copy Markdown
Contributor Author

Vamsi-klu commented Jun 7, 2026

Thanks for the thorough review, @Nataneljpwd! Pushed a follow-up commit addressing everything:

  • Removed the transport_mode connection-extra special-casing — the HIVE_CLI_TRANSPORT_MODES constant, the form widget, and _get_connection_jdbc_url_parameters. transportMode is now passed like any other param via jdbc_params, so there's a single sanctioned, validated injection point (resolves the lines 53/152/239/240 threads).
  • Consolidated the six helpers into one _append_jdbc_params that validates name and value inline in a single pass (lines 229/256/265).
  • Tightened the name regex to ^[A-Za-z]([A-Za-z0-9._-]*[A-Za-z0-9])?$ so names can't end in ., -, or _, with tests for those cases (line 54).
  • Docs: removed the Transport Mode entry and the "fixed connection extras" paragraph, and added the regex (lines 84/106/108).
  • The valid-case behaviour is asserted by test_jdbc_params_append_to_beeline_url and friends (line 713).
  • Also dropped the manual changelog.rst entry, since provider changelogs are regenerated from git log by the release manager.

On the use-case question (line 273): jdbc_params is the only place a DAG author can inject Cloudera-CDP-style JDBC params, since the Beeline URL is assembled programmatically and there's no author-editable raw-URL field — more detail in that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for extra JDBC parameters for Hive Client Wrapper in apache-hive provider

2 participants