Skip to content

feat: replace hand-rolled LLM clients with pgedge-go-llm-lib (PR 1 of 4)#172

Open
dpage wants to merge 13 commits into
mainfrom
feat/llm-lib-pr1-chat-clients
Open

feat: replace hand-rolled LLM clients with pgedge-go-llm-lib (PR 1 of 4)#172
dpage wants to merge 13 commits into
mainfrom
feat/llm-lib-pr1-chat-clients

Conversation

@dpage

@dpage dpage commented May 27, 2026

Copy link
Copy Markdown
Member

Summary

  • Replaces hand-rolled Anthropic, OpenAI, and Ollama chat clients in internal/chat/llm.go with thin wrappers around pgedge-go-llm-lib.
  • chat.LLMClient interface preserved; call sites in internal/chat/client.go and internal/llmproxy/proxy.go are essentially unchanged (only NewOllamaClient gained an error return).
  • ~1500 lines of provider-specific HTTP code removed.

This is PR 1 of 4 in the LLM library migration. Spec lives in .claude/specs/2026-05-27-llm-library-migration-design.md (gitignored / local-only).

Follow-up PRs:

  • PR 2: Replace embedding providers (internal/embedding/).
  • PR 3: Adopt library's HTTP proxy package; update web wire format.
  • PR 4: Add streaming chat, add Gemini provider, delete shim layers.

Known issues (filed upstream)

  • Anthropic system prompt is no longer cached (pgedge-go-llm-lib#13).
  • OpenAI Responses-API-only models not supported (pgedge-go-llm-lib#12).
  • Pinned to a commit SHA pending v0.1.0 release (pgedge-go-llm-lib#14).

Test plan

  • make test passes locally with no skips.
  • Manual live smoke test against real Anthropic API (claude-sonnet-4-20250514) returns expected response with correct token usage.
  • Manual live smoke test against real OpenAI API (gpt-4o-mini) returns expected response.
  • Manual CLI smoke test against local Ollama — not run (no Ollama instance available in dev environment).
  • Web client end-to-end smoke — deferred to PR 3 where the proxy wire format changes.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores

    • Updated Go toolchain to 1.26 across CI, Docker images, and module directive; CI triggers expanded to include feature branches matching feat/llm-lib-*.
  • Refactor

    • Consolidated LLM providers to a shared library, simplifying provider behavior.
  • Bug Fixes

    • Improved error handling for one provider so failures surface cleanly in API responses.
  • Documentation

    • Updated changelog: Anthropic prompt caching expanded; OpenAI Responses-model families now documented as supported.
  • Tests

    • Added HTTP tracing, translation, and end-to-end LLM client tests.

Review Change Stack

dpage and others added 9 commits May 27, 2026 15:41
First step of LLM library migration (PR 1 of 4). Pinned to a commit
on main pending v0.1.0 release upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds toLibMessages, toLibTools, and fromLibContent for translating
between our interface{}-content chat.Message form and the library's
typed llm.Message form. Pure functions with full unit coverage;
no consumers yet — wrapper wires them in next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the inline embedding.LogLLMRequestTrace / LogLLMResponseTrace
calls that the old hand-rolled clients made; the new wrapper plugs
this into llm.Options.HTTPClient when debug is on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Don't discard a successful response when reading the response body
  for tracing fails. Log the body-read error and return the response
  with an empty body, preserving the caller's success path.
- Add a test verifying nothing is logged when the log level is below
  Debug (the hot-path fast-bail behaviour).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces libClient, a library-backed implementation of LLMClient,
and rewires NewAnthropicClient/NewOpenAIClient/NewOllamaClient to
return it. The old hand-rolled clients are renamed *OLD and remain
in place; they will be deleted along with their tests in subsequent
tasks of this PR.

NewOllamaClient signature changes from (LLMClient) to (LLMClient,
error) because the library validates baseURL at construction. The
four call sites (2 in client.go, 2 in llmproxy/proxy.go) are updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The named import of the anthropic provider package (for WithToolCaching)
already triggers init(). The adjacent blank import is redundant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three providers now go through pgedge-go-llm-lib via the
libClient wrapper added in the previous commit. ~1500 lines removed.

internal/chat/llm.go shrinks from ~1810 lines to ~278 lines; the
LLMClient interface, types, system prompt, and constructors remain.

Test cleanup follows in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deletes tests that referenced the deleted hand-rolled clients
(anthropicClient, openaiClient, ollamaClient) and the private
extract*ErrorMessage helpers. Adds wrapper-level OpenAI and Ollama
integration tests against httptest servers via Options.BaseURL.

The remaining wrapper tests (TestLibClient_*, TestToLib*,
TestFromLibContent_*, TestTracingRoundTripper_*) exercise the
translation, system-prompt injection, read-only mode, debug token
logging, and round-tripper tracing — the parts of the wrapper we
own. Wire-level behaviour is the library's responsibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 85716e9a-e02e-43ab-9e8c-9b9c8169e0e7

📥 Commits

Reviewing files that changed from the base of the PR and between 4124940 and 8bef9f8.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • .github/workflows/ci-cli-client.yml
  • .github/workflows/ci-docker.yml
  • .github/workflows/ci-docs.yml
  • .github/workflows/ci-server.yml
  • .github/workflows/ci-web-client.yml
  • docs/changelog.md
  • go.mod
  • internal/chat/llm.go
  • internal/chat/llm_test.go
  • internal/chat/llm_tracing.go
✅ Files skipped from review due to trivial changes (2)
  • .github/workflows/ci-docker.yml
  • docs/changelog.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • internal/chat/llm.go
  • internal/chat/llm_test.go

Walkthrough

This PR migrates the LLM provider implementations from hand-rolled HTTP clients to a shared library (pgedge-go-llm-lib), removes ~1500 lines of provider-specific wire code, and updates the Go toolchain to 1.26. The LLMClient interface remains unchanged. Error handling is added for Ollama client construction via a new return type.

Changes

LLM Library Migration

Layer / File(s) Summary
Dependency and toolchain updates
go.mod, .github/workflows/ci-cli-client.yml, .github/workflows/ci-server.yml, .github/workflows/release.yml, Dockerfile.cli, Dockerfile.server, docs/changelog.md, .github/workflows/ci-docker.yml, .github/workflows/ci-docs.yml, .github/workflows/ci-web-client.yml
Go version bumped from 1.25 to 1.26 across CI workflows and Dockerfiles. go directive updated to 1.26.1. New indirect dependency on github.com/pgEdge/pgedge-go-llm-lib added. CI branch filters expanded to include feat/llm-lib-*. Changelog updated to document the migration.
Message and tool translation layer
internal/chat/llm_translate.go, internal/chat/llm_translate_test.go
New translation helpers convert internal Message/content/tool types into library llm types (toLibMessages, contentToBlocks, toLibTools) and reverse (fromLibContent). Tests cover text, typed text, tool-use, tool-result, schema unmarshalling, and error cases.
HTTP request tracing for debug logging
internal/chat/llm_tracing.go, internal/chat/llm_tracing_test.go
Adds tracingRoundTripper which logs request/response bodies at debug level while preserving readers; newTracingHTTPClient builds an HTTP client that uses it. Tests validate logging behavior, pass-through, and log-level gating.
Core LLM client refactor to library-backed implementation
internal/chat/llm.go, internal/chat/llm_test.go
Introduces libClient delegating Chat/ListModels to llm.Client, converts messages/tools via the translation helpers, appends shared system/read-only prompts, conditionally enables Anthropic caching, wires tracing when debug is set, and removes provider-specific HTTP parsing and ValidateBaseURL. Tests rewritten to perform wrapper-style round-trip validations for Anthropic/OpenAI/Ollama.
Error handling for Ollama client construction
internal/chat/client.go, internal/llmproxy/proxy.go
NewOllamaClient now returns (LLMClient, error). Callers in initializeLLM and proxy handlers capture and propagate construction errors, returning HTTP 400 responses when appropriate.
Client wrapper tests and round-trip validation
internal/chat/llm_test.go
Test suite updated to use httptest servers to validate chat round-trips, system prompt injection and caching, read-only prompt toggling, tool-use translation, token-usage population only when debug=true, ListModels, and OpenAI/Ollama path/response behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • claude
  • mmols
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: replacing hand-rolled LLM clients with a shared library while noting it is part 1 of a 4-part migration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/llm-lib-pr1-chat-clients

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production

codacy-production Bot commented May 27, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 4 medium

Results:
4 new issues

Category Results
Complexity 4 medium

View in Codacy

🟢 Metrics 109 complexity · -28 duplication

Metric Results
Complexity 109
Duplication -28

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

The pgedge-go-llm-lib dependency requires Go 1.26; go mod tidy
bumped our go directive accordingly, so CI workflows and Dockerfiles
need to match.

Also replace strings.Title (deprecated since Go 1.18, caught by the
CLI Client lint check) with a small provider-display map. The new
output is also more accurate ("OpenAI" vs "Openai").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dpage

dpage commented May 27, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/ci-cli-client.yml (1)

30-32: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update cache key to match Go 1.26.

The Go toolchain version was updated to 1.26 (Line 21), but the cache key still references go-1.25. This mismatch will prevent cache hits and slow down CI builds.

🔧 Proposed fix
-                  key: ${{ runner.os }}-go-1.25-${{ hashFiles('**/go.sum') }}
+                  key: ${{ runner.os }}-go-1.26-${{ hashFiles('**/go.sum') }}
                   restore-keys: |
-                      ${{ runner.os }}-go-1.25-
+                      ${{ runner.os }}-go-1.26-
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-cli-client.yml around lines 30 - 32, The cache key
strings in the GitHub Actions workflow (the "key:" and "restore-keys:" values
currently containing "go-1.25") must be updated to match the Go toolchain
version bump to 1.26; change occurrences of "go-1.25" to "go-1.26" so the cache
key (key: ${{ runner.os }}-go-1.26-${{ hashFiles('**/go.sum') }}) and
restore-keys ( ${{ runner.os }}-go-1.26-) align with the updated Go version.
.github/workflows/ci-server.yml (1)

48-50: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update cache key to match Go 1.26.

The Go toolchain version was updated to 1.26 (Line 39), but the cache key still references go-1.25. This mismatch will prevent cache hits and slow down CI builds.

🔧 Proposed fix
-                  key: ${{ runner.os }}-go-1.25-${{ hashFiles('**/go.sum') }}
+                  key: ${{ runner.os }}-go-1.26-${{ hashFiles('**/go.sum') }}
                   restore-keys: |
-                      ${{ runner.os }}-go-1.25-
+                      ${{ runner.os }}-go-1.26-
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-server.yml around lines 48 - 50, The cache key strings
still reference go-1.25 causing cache misses after the toolchain bump; update
the 'key:' and matching 'restore-keys:' entries that contain "${{ runner.os
}}-go-1.25-${{ hashFiles('**/go.sum') }}" and "${{ runner.os }}-go-1.25-" to use
go-1.26 instead so the cache key matches the updated Go version.
🧹 Nitpick comments (2)
internal/chat/llm_test.go (2)

51-53: 💤 Low value

Consider using len(gotReqBody) == 0 for byte slice empty check.

Converting to string works but is less idiomatic than checking length directly.

♻️ More idiomatic empty check
-	if string(gotReqBody) == "" {
-		t.Fatal("server received no request body")
-	}
+	if len(gotReqBody) == 0 {
+		t.Fatal("server received no request body")
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/chat/llm_test.go` around lines 51 - 53, Replace the string
conversion empty check with a length check: instead of comparing
string(gotReqBody) == "" in the test (in internal/chat/llm_test.go where
gotReqBody is a []byte), use len(gotReqBody) == 0 to more idiomatically detect
an empty byte slice; update the t.Fatal condition accordingly.

274-336: ⚖️ Poor tradeoff

Consider comprehensive test coverage for OpenAI and Ollama.

The Anthropic tests validate system prompt injection, read-only mode, tool use translation, token usage behavior, and error handling. OpenAI and Ollama currently have only basic round-trip tests. If the shared libClient implementation is sufficiently validated via Anthropic tests and translation is covered in llm_translate_test.go, this may be acceptable. Otherwise, consider adding parallel tests for OpenAI and Ollama to ensure provider-specific wire formats work correctly across all scenarios.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/chat/llm_test.go` around lines 274 - 336, Tests
TestLibClient_OpenAI_Chat_RoundTrip and TestLibClient_Ollama_Chat_RoundTrip only
exercise basic round-trip behavior; add parallel provider-specific tests
(mirroring the Anthropic cases) to validate system prompt injection, read-only
mode, tool-use translation, token usage reporting and error handling for OpenAI
and Ollama. Locate the test helpers and core logic in libClient (and translation
behavior in llm_translate_test.go) and extend or duplicate the Anthropic test
scenarios using the NewOpenAIClient and NewOllamaClient constructors and their
Chat(...) calls (e.g., simulate provider wire formats, streaming vs
non-streaming responses, errors and token/usage fields) so each provider’s
request/response shaping is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/chat/llm_tracing.go`:
- Around line 85-94: The tracing HTTP client created by newTracingHTTPClient
lacks a Timeout, so debug LLM requests can hang if the request context has no
deadline; update newTracingHTTPClient to set a sensible Timeout (e.g.,
30*time.Second) on the returned *http.Client, and ensure the time package is
imported; keep the existing Transport (tracingRoundTripper with provider, model,
inner, out) and only add the Timeout field to the client.

---

Outside diff comments:
In @.github/workflows/ci-cli-client.yml:
- Around line 30-32: The cache key strings in the GitHub Actions workflow (the
"key:" and "restore-keys:" values currently containing "go-1.25") must be
updated to match the Go toolchain version bump to 1.26; change occurrences of
"go-1.25" to "go-1.26" so the cache key (key: ${{ runner.os }}-go-1.26-${{
hashFiles('**/go.sum') }}) and restore-keys ( ${{ runner.os }}-go-1.26-) align
with the updated Go version.

In @.github/workflows/ci-server.yml:
- Around line 48-50: The cache key strings still reference go-1.25 causing cache
misses after the toolchain bump; update the 'key:' and matching 'restore-keys:'
entries that contain "${{ runner.os }}-go-1.25-${{ hashFiles('**/go.sum') }}"
and "${{ runner.os }}-go-1.25-" to use go-1.26 instead so the cache key matches
the updated Go version.

---

Nitpick comments:
In `@internal/chat/llm_test.go`:
- Around line 51-53: Replace the string conversion empty check with a length
check: instead of comparing string(gotReqBody) == "" in the test (in
internal/chat/llm_test.go where gotReqBody is a []byte), use len(gotReqBody) ==
0 to more idiomatically detect an empty byte slice; update the t.Fatal condition
accordingly.
- Around line 274-336: Tests TestLibClient_OpenAI_Chat_RoundTrip and
TestLibClient_Ollama_Chat_RoundTrip only exercise basic round-trip behavior; add
parallel provider-specific tests (mirroring the Anthropic cases) to validate
system prompt injection, read-only mode, tool-use translation, token usage
reporting and error handling for OpenAI and Ollama. Locate the test helpers and
core logic in libClient (and translation behavior in llm_translate_test.go) and
extend or duplicate the Anthropic test scenarios using the NewOpenAIClient and
NewOllamaClient constructors and their Chat(...) calls (e.g., simulate provider
wire formats, streaming vs non-streaming responses, errors and token/usage
fields) so each provider’s request/response shaping is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5dc67c60-8d5d-4955-a2e8-3df4ec8223f4

📥 Commits

Reviewing files that changed from the base of the PR and between 26cbb2e and 4124940.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (15)
  • .github/workflows/ci-cli-client.yml
  • .github/workflows/ci-server.yml
  • .github/workflows/release.yml
  • Dockerfile.cli
  • Dockerfile.server
  • docs/changelog.md
  • go.mod
  • internal/chat/client.go
  • internal/chat/llm.go
  • internal/chat/llm_test.go
  • internal/chat/llm_tracing.go
  • internal/chat/llm_tracing_test.go
  • internal/chat/llm_translate.go
  • internal/chat/llm_translate_test.go
  • internal/llmproxy/proxy.go

Comment thread internal/chat/llm_tracing.go
- Update GitHub Actions cache keys from go-1.25 to go-1.26 to match
  the toolchain bump, restoring cache hits on CI.
- Add a 120-second Timeout to the tracing http.Client so debug-mode
  LLM requests can't hang indefinitely if the context has no
  deadline.
- Switch a test's empty-body check from string(b)=="" to len(b)==0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dpage

dpage commented May 27, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

dpage and others added 2 commits May 27, 2026 17:42
Stacked PRs in the LLM library migration (PR 2/3/4 of 4) target the
previous PR's branch rather than main. Without this, GitHub Actions
workflows don't fire on those PRs, leaving them effectively un-CI'd
until they're rebased onto main. Widening the trigger list lets the
stacked PRs see real CI from the start.

This trigger pattern only matches branches that start with
'feat/llm-lib-' — narrow enough that it doesn't broaden CI for
unrelated work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream resolved the two known issues filed during PR 1's setup:

- pgedge-go-llm-lib#13: anthropic.WithSystemCaching is now available.
  The libClient wrapper applies it on every Anthropic request, so
  long system prompts no longer pay full input-token cost each turn.
  (Tool caching is still applied separately when tools are present.)

- pgedge-go-llm-lib#12: OpenAI's Responses API is now auto-selected
  for gpt-5-*, o1-*, and o3-* models. No wrapper changes are needed —
  the library routes them to /v1/responses transparently.

Library SHA bumped from 23f6ef29ba95 to 17b5f11c6044. Changelog
"Known Issues" block removed (both issues are no longer issues) and
the two new behaviours called out under Changed.

Adds a wrapper-level test confirming the cache_control marker reaches
the last system block on the Anthropic wire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dpage

dpage commented May 28, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@dpage dpage marked this pull request as ready for review May 28, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant