Skip to content

Fix materialized push against CDN-fronted HTTP targets (Cloudflare)#64

Open
Soph wants to merge 8 commits into
mainfrom
soph/http-stale-pool-fix
Open

Fix materialized push against CDN-fronted HTTP targets (Cloudflare)#64
Soph wants to merge 8 commits into
mainfrom
soph/http-stale-pool-fix

Conversation

@Soph
Copy link
Copy Markdown
Contributor

@Soph Soph commented May 22, 2026

Summary

  • Diagnoses and fixes two independent bugs that broke materialized push against Cloudflare Artifacts (`use of closed network connection` mid-upload):
    1. Stale keep-alive pool entry: HTTPS info-refs and receive-pack ran ~tens of seconds apart in materialized push; the pooled TLS connection expired on Cloudflare's edge during that gap. Disabled keep-alives on git-sync's default transport (one extra TLS handshake per request, negligible against multi-MB transfers).
    2. Mid-stream stall during delta selection: go-git's encoder ran delta selection synchronously inside `Encode()`, producing no bytes for tens of seconds while the receive-pack body was open. CDN idle-write timeout fired. Fixed by using `packfile.WithObjectSelector` (landed in go-git#2142) to run selection up front and stream the write phase through `io.Pipe`.
  • Adds two diagnostic env-var-gated tools that proved load-bearing during the investigation: `GITSYNC_HTTP_TRACE=1` (httptrace per-request lifecycle + outgoing-request dump with redacted Authorization).
  • Adds in-place verbose-mode progress output for the materialized encode/write phases (`selecting deltas, elapsed X` → `encoding pack: Y MB, elapsed Z`).

Commits

Each commit is scoped to a single change so the diagnostic story is rollback-friendly:

  1. `d8a071d9` httptrace env var — debugging tool, kept
  2. `a3a96da` disable HTTP keep-alives — fixes the stale-pool race
  3. `6f1f5944` outgoing-request dump under trace — debugging tool, kept
  4. `f76e2308` spool materialized push body — the original fix (temp-file workaround for the stall)
  5. `b22b2840` pack-encode progress — UX for the long materialized phase
  6. `05be962e` SpooledBody encapsulation + doc trims — post-review cleanup
  7. `8aa0dab7` switch to go-git `WithObjectSelector` — replaces the temp-file spool with proper streaming, now that go-git#2142 is merged

Commits 4 and 7 tell the iteration: spool first (real workaround), then streaming (proper fix) once the upstream API was available.

Verification

  • Reproduced original failure against Cloudflare Artifacts target (`use of closed network connection` at ~7.9 KB sent).
  • Confirmed stale-pool diagnosis via `GITSYNC_HTTP_TRACE=1` — saw `reused=true wasIdle=true idle=13.001639584s` immediately before the failure.
  • Confirmed mid-stream stall via `GITSYNC_FORCE_CHUNKED=1` experiment (temporary toggle since reverted): spooled body + chunked still succeeded, ruling out chunked encoding as the cause and isolating the stall.
  • Final commit (streaming via `WithObjectSelector`) confirmed working end-to-end against Cloudflare Artifacts.
  • All existing tests pass; `go vet` and `gofmt -l` clean.

Test plan

  • `go test ./...` passes
  • Manual: `sync --all-refs --progress` against Cloudflare Artifacts target succeeds
  • Manual: `GITSYNC_HTTP_TRACE=1` output confirms chunked encoding (no Content-Length on receive-pack POST) and no stale-pool reuse
  • Manual: verbose-mode progress shows `selecting deltas` → `encoded pack` phase transition

Open items

  • `go.mod` pins go-git to the merge commit of go-git#2142 (pseudo-version `v6.0.0-alpha.4.0.20260521151600-590487407c38`). Switch to a tagged release once go-git cuts `v6.0.0-alpha.5`.

Note

Medium Risk
Changes smart-HTTP transport behavior (disables keep-alives) and rewrites materialized push pack encoding/streaming, which can affect connectivity and performance against various Git servers.

Overview
Fixes materialized receive-pack uploads against CDN-fronted HTTP targets by upgrading go-git and changing PushObjects to precompute delta selection up front and then stream the encoder write phase through an io.Pipe using packfile.WithObjectSelector, avoiding mid-request idle stalls.

Hardens smart-HTTP behavior by cloning the default transport and disabling keep-alives to avoid reusing stale pooled TLS connections; adds optional env-gated (GITSYNC_HTTP_TRACE) httptrace logging plus outgoing request header dumps with Authorization redacted.

Adds verbose progress output for delta selection and pack encoding, and extends tests to assert chunked streaming for PushObjects and the updated NewHTTPTransport semantics.

Reviewed by Cursor Bugbot for commit 8aa0dab. Configure here.

Soph added 7 commits May 19, 2026 16:22
Wires net/http/httptrace into RequestInfoRefs and PostRPCStreamBody, gated
on GITSYNC_HTTP_TRACE so production runs pay zero overhead. When enabled,
emits per-request connection lifecycle events (GetConn, GotConn with
Reused/IdleTime, ConnectStart/Done, TLSHandshake*, WroteRequest,
PutIdleConn) to stderr.

Motivated by diagnosing stale keep-alive pool connections against
third-party Git HTTPS hosts (CDN edges, hosted git providers) that close
idle TLS sockets faster than Go's transport assumes. Those failures
surface as "use of closed network connection" on the next POST with no
other signal — httptrace makes the pool reuse and idle duration explicit.

Entire-Checkpoint: d0bad48c086b
Use a cloned http.Transport with DisableKeepAlives=true as the git-sync
default, instead of returning the package-level http.DefaultTransport.

Two changes in one:

1. Always clone — previously, NewHTTPTransport(false) returned the shared
   http.DefaultTransport, so any TLS or pool settings we added would leak
   into other code in the same process. The library angle makes that a
   real footgun.

2. Keep-alives off — git-sync's HTTP workflow against a given host is
   coarse-grained (one info/refs GET, then one upload-pack or
   receive-pack POST) with real work in between (planning, source fetch,
   local object materialization). On the push side the gap is long
   enough for CDN edges and some hosted git providers to close their
   end of an idle TLS socket; the next POST then fails with
   "use of closed network connection" because the pooled connection is
   half-dead. Observed against Cloudflare Artifacts after a ~13s gap.

Pool reuse would save at most one TLS handshake per sync, negligible
against multi-MB to multi-GB transfers, so a fresh connection per
request is the right trade. Library callers needing pool reuse can pass
their own RoundTripper to NewHTTPConn.

Entire-Checkpoint: 2b8c12630fc0
Adds httputil.DumpRequestOut on POST requests when GITSYNC_HTTP_TRACE
is enabled, so the actual wire-format request (Transfer-Encoding,
Content-Length, headers added by Go's transport) is visible alongside
the connection-level trace. Authorization values are redacted so
credentials don't leak into shared logs.

Body is not consumed (body=false). The dump runs once per POST and
goes to stderr like the rest of the trace.

Use case: when a server behaves unexpectedly on a POST, the
connection-level trace tells you which TCP/TLS connection was used
but not what the request looked like on the wire — Transfer-Encoding,
Content-Length, headers Go's transport added. This fills that gap.
Buffer the receive-pack body (update-request header + pack) to a temp
file in the materialized push path so the POST goes out in one
continuous burst, instead of streaming the body incrementally as the
pack encoder produces it.

The cause we're working around: go-git's encoder runs delta selection
synchronously before writing any pack bytes, which on big repos can
take tens of seconds while the HTTP request body sits idle waiting for
the next chunk. CDN edges like Cloudflare's enforce an idle-write
timeout on the request body and close the connection on a stall that
long — surfacing as "use of closed network connection" mid-upload,
with no server response. Spooling collapses "encode" and "write" into
a single sequential phase, so once bytes start flowing they don't stop
until the body is done. Bootstrap-relay didn't have this problem
because source pack bytes flow steadily from the upstream upload-pack
response — no internal stall, nothing to engineer around.

Side effect: the spooled body has a known length, so the POST goes out
with Content-Length instead of Transfer-Encoding: chunked (matching
upstream git's smart-HTTP transport behaviour), and req.GetBody lets
Go's transport replay the body on transient connection failures.
These are nice-to-haves; the stall-avoidance is the actual fix.

Scoped to materialized only. The materialized strategy already
requires the full source object closure to be local before encoding
begins, so spooling on upload doesn't change its fundamental shape.
Relay paths (PushPack) keep streaming source bytes through to target
with chunked encoding, and preserve the "streaming proxy" property
git-sync is built around.

Implementation:

- gitproto.SpooledBody is a temp-file-backed io.ReadCloser with a
  known size, constructed via NewSpooledBody(write func(io.Writer)
  error). PostRPCStreamBody type-asserts on it and wires up
  req.ContentLength / req.GetBody.

- PushObjects (materialized) writes the encoded update-request
  followed by the pack into one SpooledBody and POSTs that. The
  previous io.Pipe + goroutine encoder is gone; materialized was
  never streaming end-to-end so the encode-while-upload overlap
  it provided wasn't load-bearing.

- postReceivePack split out of sendReceivePack so the response
  decoding (sideband demux + report-status) is shared between
  streaming and spooled send paths.

The new TestPushObjectsBuffersBody asserts Content-Length is set and
Transfer-Encoding is absent for materialized push.
TestPushPackStartsHTTPBeforePackFullyRead is unchanged — relay keeps
its streaming property.
Spooling the receive-pack body to a temp file (see previous commit)
introduced a silent gap between "starting push" and "uploading" that
can run into minutes for large repos. Add a transient in-place
progress line that updates every 500ms while encoding, finalized with
a permanent "encoded pack" line on completion.

The encoder has two phases visible to the caller — delta selection
(no writes) and pack write (steady stream). Distinguish them in the
output using the 12-byte pack header as the phase boundary:

  target: selecting deltas, elapsed 50s
  target: encoding pack: 46.7 MB, elapsed 1m10s
  target: encoded pack: 47.3 MB in 1m12s

Without the phase distinction the byte counter would sit near zero
through the long selection phase ("encoding pack: 6 KB, elapsed 50s")
and look like a hang or measurement bug. Splitting it makes both
phases legible.

Writes go through conn.ProgressWriter() with a "target: " prefix, so
they route through the existing sessionStderr → setTransient path that
already handles sideband progress from upload-pack and receive-pack
("Compressing objects: X%\r" etc.). Visually consistent with what
users already see during fetch and push.

Off in non-verbose mode (progressSink returns nil and the progress
goroutine never starts), so quiet runs stay quiet.

Entire-Checkpoint: fc719cc95e77
Three small refinements:

- Encapsulate SpooledBody's request wiring as (*SpooledBody).applyTo,
  so PostRPCStreamBody doesn't reach into unexported size/path fields
  to build req.ContentLength + req.GetBody.

- Trim the multi-paragraph Cloudflare/idle-write narrative on
  PushObjects and PostRPCStreamBody. The story now lives once on
  SpooledBody (the type that actually captures the workaround);
  the other call sites point there.

- Fix the countingWriter comment to say *why* the counter is atomic
  (concurrent read from the progress ticker) rather than restating
  what the type does.

Entire-Checkpoint: 0b0206eed178
Replaces the temp-file spool in PushObjects with the streaming
approach unlocked by go-git PR #2142 (merged 2026-05-21):

  - Run packfile.DeltaSelector.ObjectsToPack synchronously up front,
    showing "selecting deltas, elapsed X" progress (no byte counter
    here — the selector is opaque, elapsed time is the only signal).
  - Construct a packfile.Encoder with WithObjectSelector pointing at
    a local precomputedSelector that returns the pre-selected
    []*ObjectToPack. Encoder.Encode then skips its own selection step
    and runs the write phase only.
  - Pipe encoder output through io.Pipe into sendReceivePack, which
    streams it to the receive-pack POST as chunked transfer encoding.

The mid-stream stall that originally caused "use of closed network
connection" against Cloudflare's git frontend is gone because
selection no longer happens between the request body opening and
pack bytes flowing — it has already completed.

Removed:
  - SpooledBody, NewSpooledBody, (*SpooledBody).applyTo
  - postReceivePack helper (used to bypass sendReceivePack's body
    construction; no longer needed now that we hand sendReceivePack
    a streaming pipe again)
  - The SpooledBody type-assertion in PostRPCStreamBody

What we lose vs. the spool:
  - Content-Length on the request (chunked again)
  - req.GetBody-based retry
  - No more $TMPDIR requirement proportional to pack size

What we gain:
  - Pack bytes flow without ever landing on local disk
  - Materialized push is streaming end-to-end (within the constraint
    that the source object closure already has to be local)
  - Smaller surface in gitproto

Bumped go-git to a pseudo-version pinning the merge commit
(v6.0.0-alpha.4.0.20260521151600-590487407c38). Will switch to a
tagged release once one is cut.

TestPushObjectsBuffersBody → TestPushObjectsStreamsBody now asserts
chunked encoding and Content-Length=-1 (unknown), the inverse of
what we previously asserted.

Entire-Checkpoint: 680da8552908
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 8aa0dab. Configure here.

Comment thread internal/gitproto/push.go
Comment thread internal/gitproto/push.go
Three small fixes after Bugbot review:

- gitproto/push.go countingWriter.Write: wrap the underlying write
  error so wrapcheck stops failing the lint job (CI run 26288151165).

- gitproto/push.go PushObjects: pass the selection error into the
  stopSelect callback. Previously "selected N objects in X" was
  printed even when ObjectsToPack returned an error, which was
  misleading (count was 0 or partial, and the next line was the
  actual error). startSelectionProgress now suppresses the success
  summary when err != nil; the ticker still stops cleanly either way.

- syncer/progress.go sessionStderr: serialize Write under a mutex.
  The materialized-push refactor in this PR runs an encode-progress
  ticker concurrently with sendReceivePack, and both write to the
  same conn.ProgressWriter() (which is *sessionStderr). The
  strings.Builder buffer and the progress notify/setTransient calls
  weren't synchronized, so overlapping writes could scramble the
  live progress UI or, in theory, corrupt the buffer.

Entire-Checkpoint: 7422ae335ce1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant