Skip to content

Commit cfe3b3f

Browse files
committed
[SEA-NodeJS] Kernel backend: mTLS, custom HTTP headers & User-Agent
Wire the SEA/kernel path's remaining TLS-adjacent connection options through to the napi binding, matching the Python connector's use_kernel path (session.py + backend/kernel/client.py): - mTLS client identity: `clientCertPem` / `clientKeyPem` (PEM string or Buffer), normalised to Buffers and routed to the kernel `TlsConfig::client_cert_pem` / `client_key_pem`. Both-or-neither enforced up front with an actionable error. - Independent hostname-verify toggle: `checkServerCertificateHostname` (kernel `skip_hostname_verification`) for full parity with Python's `tls_verify_hostname` — skip only the hostname check while still validating the chain. The master `checkServerCertificate=false` still subsumes it. - Custom HTTP headers + User-Agent: headers cross the FFI as an ordered list (`Array<{name,value}>`, the napi `HeaderEntry` shape matching the kernel core `Vec<(String,String)>` and Python's `List[Tuple]`): caller `customHeaders` first, then the connector's composed `User-Agent` appended last (always emitted; the kernel folds the last User-Agent into its base `DatabricksJDBCDriverOSS/...` UA). Kernel-managed reserved names `Authorization` / `x-databricks-org-id` are dropped before the FFI hop, matching Python's `_KERNEL_MANAGED_HEADERS` double-wall. Adds `buildSeaHttpOptions`, extends `buildSeaTlsOptions`/`SeaTlsOptions`, and factors PEM normalisation into a shared helper. Bumps KERNEL_REV and regenerates `native/sea/index.d.ts`. Unit tests cover mTLS pairing/validation, the hostname toggle, ordered header pass-through, reserved-name dropping, and User-Agent composition/ordering; verified the real native binding marshals every new field across the FFI and rejects a wrong header shape. Depends on the kernel napi change exposing clientCertPem / clientKeyPem / customHeaders / checkServerCertificateHostname; KERNEL_REV must be repointed to that commit once merged. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent 03936ec commit cfe3b3f

10 files changed

Lines changed: 497 additions & 44 deletions

File tree

KERNEL_REV

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
8bedaabf69f5bce5a957a8775f29dbb8dbdd2e71
1+
7f8353f39665e7ac0fcc31a052fd2271caba1f67

lib/contracts/InternalConnectionOptions.ts

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,49 @@ export interface InternalConnectionOptions {
2929
/**
3030
* SEA-only: verify the server's TLS certificate. Secure-by-default — omit
3131
* to keep full chain + hostname verification; set `false` only to opt into
32-
* the insecure accept-anything mode.
32+
* the insecure accept-anything mode. This is the master verify toggle:
33+
* `false` also subsumes the hostname check (see
34+
* `checkServerCertificateHostname`). Mirrors the Python connector's
35+
* `_tls_no_verify` (inverted).
3336
* @internal SEA path only.
3437
*/
3538
checkServerCertificate?: boolean;
3639

40+
/**
41+
* SEA-only: verify that the server certificate matches the host
42+
* (hostname-vs-SNI check), independently of full chain validation. Omit
43+
* to keep the secure default (on); set `false` to skip only the hostname
44+
* check while still validating the chain — e.g. connecting via an IP
45+
* literal or a host the cert wasn't issued for. No-op when
46+
* `checkServerCertificate` is `false` (that disables everything). Mirrors
47+
* the Python connector's `_tls_verify_hostname`.
48+
* @internal SEA path only.
49+
*/
50+
checkServerCertificateHostname?: boolean;
51+
3752
/**
3853
* SEA-only: PEM-encoded CA certificate (string or `Buffer`) added to the
3954
* trust store on top of the system roots — for TLS-inspecting proxies or
4055
* on-prem internal CAs. Honoured regardless of `checkServerCertificate`.
4156
* @internal SEA path only.
4257
*/
4358
customCaCert?: Buffer | string;
59+
60+
/**
61+
* SEA-only: PEM-encoded client certificate (string or `Buffer`) for
62+
* mutual TLS (mTLS). Must be supplied together with `clientKeyPem`; a
63+
* leaf cert optionally followed by its intermediate chain is accepted.
64+
* Mirrors the Python connector's `_tls_client_cert_file`.
65+
* @internal SEA path only.
66+
*/
67+
clientCertPem?: Buffer | string;
68+
69+
/**
70+
* SEA-only: PEM-encoded private key (string or `Buffer`) for the mTLS
71+
* client certificate. Must be supplied together with `clientCertPem`.
72+
* For portability supply a PKCS#8 key (`BEGIN PRIVATE KEY`). Mirrors the
73+
* Python connector's `_tls_client_cert_key_file`.
74+
* @internal SEA path only.
75+
*/
76+
clientKeyPem?: Buffer | string;
4477
}

lib/sea/SeaAuth.ts

Lines changed: 183 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ import { ConnectionOptions } from '../contracts/IDBSQLClient';
1616
import { InternalConnectionOptions } from '../contracts/InternalConnectionOptions';
1717
import AuthenticationError from '../errors/AuthenticationError';
1818
import HiveDriverError from '../errors/HiveDriverError';
19+
import { buildUserAgentString } from '../utils';
1920

2021
/**
2122
* Default local listener port for the U2M authorization-code callback.
@@ -113,12 +114,54 @@ export interface SeaTlsOptions {
113114
* `customCaCert` over disabling verification entirely.
114115
*/
115116
checkServerCertificate?: boolean;
117+
/**
118+
* Verify the server certificate's hostname (hostname-vs-SNI), independently
119+
* of chain validation. Omit ⇒ kernel default (on). `false` skips only the
120+
* hostname check. No-op when `checkServerCertificate` is `false`. Mirrors
121+
* the kernel napi `checkServerCertificateHostname` / Python
122+
* `tls_verify_hostname`.
123+
*/
124+
checkServerCertificateHostname?: boolean;
116125
/** PEM-encoded CA bytes to add to the trust store. */
117126
customCaCert?: Buffer;
127+
/**
128+
* PEM-encoded client certificate for mutual TLS (kernel
129+
* `TlsConfig::client_cert_pem`). Paired with {@link clientKeyPem} —
130+
* `buildSeaTlsOptions` rejects supplying only one before the FFI hop.
131+
* The napi shape takes a `Buffer`; the public surface also accepts a
132+
* PEM string, normalised here.
133+
*/
134+
clientCertPem?: Buffer;
135+
/**
136+
* PEM-encoded private key for the mTLS client certificate (kernel
137+
* `TlsConfig::client_key_pem`). Paired with {@link clientCertPem}.
138+
*/
139+
clientKeyPem?: Buffer;
140+
}
141+
142+
/**
143+
* HTTP options shared across all auth-mode variants. Mirrors the napi
144+
* binding's `ConnectionOptions.customHeaders` (kernel
145+
* `HttpConfig::custom_headers`).
146+
*
147+
* Carries the extra request headers the SEA path sends on every request:
148+
* the caller's `customHeaders` plus the composed `User-Agent` (the kernel
149+
* appends a `User-Agent` entry to its base UA rather than replacing it).
150+
*
151+
* An **ordered list** of `{ name, value }` pairs — the napi shape
152+
* (`Array<HeaderEntry>`), which mirrors the kernel core's
153+
* `Vec<(String, String)>` and the Python connector's `http_headers`
154+
* `List[Tuple[str, str]]`. Order is preserved and duplicate names are
155+
* allowed (e.g. a caller `User-Agent` followed by the connector's, which
156+
* the kernel folds last-wins).
157+
*/
158+
export interface SeaHttpOptions {
159+
customHeaders?: Array<{ name: string; value: string }>;
118160
}
119161

120162
export type SeaNativeConnectionOptions = SeaSessionDefaults &
121163
SeaTlsOptions &
164+
SeaHttpOptions &
122165
(
123166
| {
124167
hostName: string;
@@ -168,57 +211,160 @@ export function isBlankOrReserved(s: string): boolean {
168211
const MAX_U32 = 0xffffffff;
169212

170213
/**
171-
* Normalise the public TLS options (`checkServerCertificate` /
172-
* `customCaCert`) into the napi shape.
214+
* Normalise a PEM input (`string` or `Buffer`) accepted on the public
215+
* surface into the `Buffer` the napi shape requires. Does a light,
216+
* ordered BEGIN…END sanity check so a truncated/headerless blob (or a
217+
* stray page that merely contains the literals out of order, e.g. a
218+
* proxy-intercept page) is rejected here rather than surfacing as an
219+
* opaque kernel TLS error. The bytes are NOT fully parsed in JS — that
220+
* is deferred to the kernel, which returns a meaningful error on a
221+
* malformed PEM/key.
222+
*
223+
* `kind` selects the expected block: `'certificate'` matches a
224+
* `CERTIFICATE` block; `'private key'` matches any `… PRIVATE KEY` block
225+
* (PKCS#8 `PRIVATE KEY`, PKCS#1 `RSA PRIVATE KEY`, SEC1 `EC PRIVATE KEY`).
226+
*
227+
* Throws `HiveDriverError` when the value is empty or (for strings)
228+
* lacks the expected PEM header.
229+
*/
230+
function normalizePemBytes(value: Buffer | string, optionName: string, kind: 'certificate' | 'private key'): Buffer {
231+
if (typeof value === 'string') {
232+
const re =
233+
kind === 'certificate'
234+
? /-----BEGIN CERTIFICATE-----[\s\S]+?-----END CERTIFICATE-----/
235+
: /-----BEGIN [A-Z0-9 ]*PRIVATE KEY-----[\s\S]+?-----END [A-Z0-9 ]*PRIVATE KEY-----/;
236+
if (!re.test(value)) {
237+
const expected =
238+
kind === 'certificate'
239+
? "a '-----BEGIN CERTIFICATE-----' … '-----END CERTIFICATE-----' block"
240+
: "a 'BEGIN … PRIVATE KEY' / 'END … PRIVATE KEY' PEM block (PKCS#8, PKCS#1, or SEC1)";
241+
throw new HiveDriverError(
242+
`SEA backend: \`${optionName}\` string does not look like a PEM ${kind} (expected ${expected}). ` +
243+
'Pass PEM text or a Buffer of PEM bytes.',
244+
);
245+
}
246+
return Buffer.from(value, 'utf8');
247+
}
248+
if (Buffer.isBuffer(value)) {
249+
if (value.length === 0) {
250+
throw new HiveDriverError(`SEA backend: \`${optionName}\` Buffer is empty.`);
251+
}
252+
return value;
253+
}
254+
throw new HiveDriverError(`SEA backend: \`${optionName}\` must be a PEM string or a Buffer.`);
255+
}
256+
257+
/**
258+
* Normalise the public TLS options into the napi shape.
173259
*
174260
* - `checkServerCertificate` passes through verbatim (only when set; an
175261
* absent value leaves the kernel default, which is secure — verify on).
176-
* - `customCaCert` accepts a PEM string or `Buffer` on the public
177-
* surface; we convert a string to a `Buffer` here and do a light PEM
178-
* sanity check. The bytes are NOT parsed in JS — the kernel returns a
179-
* meaningful error if the PEM is malformed.
262+
* - `checkServerCertificateHostname` passes through verbatim — the
263+
* independent hostname-vs-SNI toggle (kernel applies it only when the
264+
* master verify toggle is on). Mirrors Python's `tls_verify_hostname`.
265+
* - `customCaCert` accepts a PEM string or `Buffer`; normalised to a
266+
* `Buffer` via {@link normalizePemBytes}.
267+
* - `clientCertPem` / `clientKeyPem` carry the mutual-TLS client identity.
268+
* They must be supplied **together** — supplying only one is rejected
269+
* here with an actionable error (rather than waiting for the kernel's
270+
* `InvalidArgument` at `openSession`). Each accepts a PEM string or
271+
* `Buffer`, normalised the same way.
180272
*
181-
* Throws `HiveDriverError` when `customCaCert` is supplied but empty or
182-
* (for strings) lacks a PEM certificate header.
273+
* Throws `HiveDriverError` when a cert/key is empty, mis-typed, lacks the
274+
* expected PEM header, or when only one half of the mTLS pair is set.
183275
*/
184276
export function buildSeaTlsOptions(options: ConnectionOptions): SeaTlsOptions {
185277
// Read the SEA-only fields through the purpose-built internal options type
186278
// rather than an ad-hoc inline cast, so the shape can't silently drift from
187279
// its declaration and a typo'd key fails to compile.
188-
const { checkServerCertificate, customCaCert } = options as ConnectionOptions & InternalConnectionOptions;
280+
const { checkServerCertificate, checkServerCertificateHostname, customCaCert, clientCertPem, clientKeyPem } =
281+
options as ConnectionOptions & InternalConnectionOptions;
189282

190283
const tls: SeaTlsOptions = {};
191284

192285
if (checkServerCertificate !== undefined) {
193286
tls.checkServerCertificate = checkServerCertificate;
194287
}
195288

289+
if (checkServerCertificateHostname !== undefined) {
290+
tls.checkServerCertificateHostname = checkServerCertificateHostname;
291+
}
292+
196293
if (customCaCert !== undefined) {
197-
if (typeof customCaCert === 'string') {
198-
// Light PEM sanity check — require a well-ordered BEGIN…END block so a
199-
// truncated/headerless cert (or a stray page that merely contains both
200-
// literals out of order, e.g. a proxy-intercept page) is rejected here
201-
// rather than surfacing as an opaque kernel TLS error. Ordered match, not
202-
// two independent substring checks. Full parsing is deferred to the kernel.
203-
if (!/-----BEGIN CERTIFICATE-----[\s\S]+?-----END CERTIFICATE-----/.test(customCaCert)) {
204-
throw new HiveDriverError(
205-
'SEA backend: `customCaCert` string does not look like a PEM certificate ' +
206-
"(expected a '-----BEGIN CERTIFICATE-----' … '-----END CERTIFICATE-----' block). " +
207-
'Pass PEM text or a Buffer of PEM bytes.',
208-
);
209-
}
210-
tls.customCaCert = Buffer.from(customCaCert, 'utf8');
211-
} else if (Buffer.isBuffer(customCaCert)) {
212-
if (customCaCert.length === 0) {
213-
throw new HiveDriverError('SEA backend: `customCaCert` Buffer is empty.');
294+
tls.customCaCert = normalizePemBytes(customCaCert, 'customCaCert', 'certificate');
295+
}
296+
297+
// mTLS client identity. Enforce both-or-neither up front so a caller who
298+
// sets only one gets a clear message naming the missing half, instead of
299+
// the kernel's generic `InvalidArgument` after the FFI hop.
300+
const hasCert = clientCertPem !== undefined;
301+
const hasKey = clientKeyPem !== undefined;
302+
if (hasCert !== hasKey) {
303+
throw new HiveDriverError(
304+
'SEA backend: mutual TLS requires both `clientCertPem` and `clientKeyPem`; only ' +
305+
`\`${hasCert ? 'clientCertPem' : 'clientKeyPem'}\` was supplied. ` +
306+
`Provide the matching ${hasCert ? 'private key (`clientKeyPem`)' : 'certificate (`clientCertPem`)'}, ` +
307+
'or omit both.',
308+
);
309+
}
310+
if (hasCert && hasKey) {
311+
tls.clientCertPem = normalizePemBytes(clientCertPem as Buffer | string, 'clientCertPem', 'certificate');
312+
tls.clientKeyPem = normalizePemBytes(clientKeyPem as Buffer | string, 'clientKeyPem', 'private key');
313+
}
314+
315+
return tls;
316+
}
317+
318+
/**
319+
* Build the napi HTTP options (`customHeaders`) from the public
320+
* `customHeaders` map and `userAgentEntry`.
321+
*
322+
* Mirrors the Python connector's `use_kernel` path (`session.py` +
323+
* `backend/kernel/client.py`), which:
324+
* 1. composes a single connector `User-Agent` and **unconditionally**
325+
* appends it last —
326+
* `all_headers = (http_headers or []) + [("User-Agent", useragent_header)]`;
327+
* 2. before forwarding to the kernel, **drops** the kernel-managed
328+
* reserved names `Authorization` / `x-databricks-org-id`
329+
* (case-insensitive) — the kernel applies the auth token itself and
330+
* re-derives the org id from the `?o=` in the http path, and would
331+
* otherwise skip-and-warn on every request.
332+
*
333+
* The result is an ordered list (the napi `Array<HeaderEntry>` shape,
334+
* matching the kernel core `Vec<(String, String)>`): the caller's
335+
* `customHeaders` first (minus reserved names), then the connector's
336+
* `User-Agent` last. The connector UA is always present and, being last,
337+
* is authoritative (the kernel folds the last `User-Agent` into its base
338+
* UA — `DatabricksJDBCDriverOSS/...` — preserving the result-disposition
339+
* gating token). The value is composed via the same `buildUserAgentString`
340+
* the Thrift path uses, so the SEA UA carries the identical
341+
* `NodejsDatabricksSqlConnector/...` identity (with `userAgentEntry`
342+
* folded in). A caller `User-Agent` in `customHeaders` is forwarded too
343+
* (mirroring Python, which doesn't dedupe it); the kernel's last-wins fold
344+
* means the connector UA still wins.
345+
*/
346+
const KERNEL_MANAGED_HEADERS = new Set(['authorization', 'x-databricks-org-id']);
347+
348+
export function buildSeaHttpOptions(options: ConnectionOptions): SeaHttpOptions {
349+
const { customHeaders, userAgentEntry } = options;
350+
351+
const headers: Array<{ name: string; value: string }> = [];
352+
if (customHeaders) {
353+
for (const [name, value] of Object.entries(customHeaders)) {
354+
// Drop kernel-managed reserved names before the FFI hop — same
355+
// double-wall as the Python connector's `_KERNEL_MANAGED_HEADERS`.
356+
if (KERNEL_MANAGED_HEADERS.has(name.toLowerCase())) {
357+
continue;
214358
}
215-
tls.customCaCert = customCaCert;
216-
} else {
217-
throw new HiveDriverError('SEA backend: `customCaCert` must be a PEM string or a Buffer.');
359+
headers.push({ name, value });
218360
}
219361
}
220362

221-
return tls;
363+
// Always append the connector's composed User-Agent last — exactly the
364+
// Python connector's unconditional `base_headers` append.
365+
headers.push({ name: 'User-Agent', value: buildUserAgentString(userAgentEntry) });
366+
367+
return { customHeaders: headers };
222368
}
223369

224370
/**
@@ -282,7 +428,8 @@ export function buildSeaConnectionOptions(options: ConnectionOptions): SeaNative
282428
httpPath: string;
283429
intervalsAsString: boolean;
284430
maxConnections?: number;
285-
} & SeaTlsOptions = {
431+
} & SeaTlsOptions &
432+
SeaHttpOptions = {
286433
hostName: options.host,
287434
httpPath: prependSlash(options.path),
288435
// Match the NodeJS Thrift driver, which surfaces INTERVAL columns as
@@ -292,9 +439,12 @@ export function buildSeaConnectionOptions(options: ConnectionOptions): SeaNative
292439
// (native Arrow) — they already decode identically to Thrift via the
293440
// shared Arrow converter, so `complexTypesAsJson` is not forced on.
294441
intervalsAsString: true,
295-
// TLS knobs (server-cert verification toggle + custom CA). Validated and
296-
// normalised (string PEM → Buffer) here so the napi shape only sees a Buffer.
442+
// TLS knobs (server-cert verification toggle + custom CA + mTLS client
443+
// identity). Validated and normalised (string PEM → Buffer) here so the
444+
// napi shape only sees a Buffer.
297445
...buildSeaTlsOptions(options),
446+
// HTTP headers (caller `customHeaders` + composed `User-Agent`).
447+
...buildSeaHttpOptions(options),
298448
};
299449

300450
// SEA-only pool sizing; read via cast to match how this function reads the

0 commit comments

Comments
 (0)