Add Beeline JDBC parameters to HiveCliHook#68144
Conversation
7d67e14 to
578f709
Compare
|
@potiuk @eladkal @amoghrajesh could you please take a look when you have a chance? This PR adds a Dag-author-controlled way to pass validated Beeline JDBC parameters for the Hive CLI hook/operator, while keeping arbitrary JDBC params out of connection extras. The fixed connection extra support is limited to bounded values such as Thanks. Drafted-by: Codex (GPT-5); reviewed by @Vamsi-klu before posting |
baef5c9 to
578f709
Compare
578f709 to
ca4d369
Compare
Remove the transport_mode connection extra and form widget so transportMode is supplied like any other parameter via jdbc_params, the single validated injection point. Consolidate the six JDBC-param helpers into one _append_jdbc_params method that validates name and value in a single pass, and tighten the parameter-name regex to reject names ending in a separator. Update the connection docs accordingly and drop the manual changelog entry (provider changelogs are regenerated from git log by the release manager).
|
Thanks for the thorough review, @Nataneljpwd! Pushed a follow-up commit addressing everything:
On the use-case question (line 273): |
Add support for passing validated Beeline JDBC URL parameters through
HiveCliHookandHiveOperator.closes: #45049
Why
Some Hive Beeline deployments require JDBC URL parameters such as
transportMode, trust store settings, or other driver-specific options. Previously, Airflow's Hive CLI hook did not provide a safe, Dag-author-controlled way to append these parameters to the generated Beeline JDBC URL.This change intentionally avoids accepting arbitrary JDBC parameters from connection extras. Connection extras are managed through the Airflow connection UI and can be shared or reused across many Dags, so using them as a free-form JDBC parameter bag would make the blast radius larger and harder to reason about.
What Changed
jdbc_paramsargument toHiveCliHook.jdbc_paramsargument toHiveOperator, passed through to the hook.jdbc_paramsto the generated Beeline JDBC URL.transport_mode, which maps to JDBCtransportModeand only acceptsbinaryorhttp.None;Impact
Dag authors can now configure Beeline JDBC URL parameters directly in code when a deployment needs driver-specific settings.
This matters because it allows affected Hive deployments to connect through Beeline without forcing unsafe, arbitrary JDBC URL parameter injection through connection extras. The change keeps connection-level configuration bounded while still giving Dag authors the flexibility needed for per-Dag connection behavior.
Existing behavior is preserved for:
Why This Solves the Problem
The issue requires a way to add JDBC parameters to Beeline URLs. This implementation appends those parameters at the point where
HiveCliHookbuilds the Beeline JDBC URL, so the final command contains the required JDBC suffix.At the same time, the implementation avoids the previously rejected approach of allowing arbitrary free-form JDBC URL parameters through connection UI extras. Free-form parameters are only accepted from the hook/operator constructor, where they are controlled by the Dag author and validated before use.
Testing
uv run ruff check providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py providers/apache/hive/tests/unit/apache/hive/operators/test_hive.pyuv run ruff format --check providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.py providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py providers/apache/hive/tests/unit/apache/hive/operators/test_hive.pyuv run --project providers/apache/hive mypy providers/apache/hive/src/airflow/providers/apache/hive/hooks/hive.py providers/apache/hive/src/airflow/providers/apache/hive/operators/hive.pyAIRFLOW_CONN_HIVE_CLI_DEFAULT='hive-cli://localhost:10000/default?use_beeline=True' uv run --project providers/apache/hive pytest providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHookJdbcParams providers/apache/hive/tests/unit/apache/hive/operators/test_hive.py::TestHiveOperatorJdbcParams::test_hive_operator_passes_jdbc_params_to_hook providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCliHook::test_run_cli providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_proxy_user_value providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_get_wrong_principal providers/apache/hive/tests/unit/apache/hive/hooks/test_hive.py::TestHiveCli::test_high_availability -xvs --without-db-init --no-db-cleanupprek run --from-ref upstream/main --stage pre-commitpassed before the final rebase. After the final rebase, the Hive-relevant hooks passed, but the run later failed while creating a devel-common mypy environment because the local external-volume/macOS environment created an AppleDouble._rufffile inside the wheel install.prek run --from-ref upstream/main --stage manualwas attempted after the final rebase; provider mypy could not run because local Breeze requires Docker anddockeris not installed in this environment.Was generative AI tooling used to co-author this PR?
Generated-by: Codex (GPT-5) following the guidelines