docs: `evals-minimal` and `kitchen sink` examples #169

c-ehrlich · 2025-11-27T06:24:07Z

Replaces the evals example with evals-minimal, a much smaller example. This is more accessible, and then the kitchen-sink example shows EVERYTHING including how the pieces connect.
Adds a kitchen sink example that shows a demo for a support agent with instrumentation and evals.

This example shows:

wrapAISDKModel, wrapTools
withSpan
createAppScope, flag, pickFlags
Eval, Scorer

It includes four evals:

retrieve-from-knowledge-base.eval.ts: Tests the RAG retrieval logic (veryBadRAG) to ensure it returns the expected document IDs for various inputs, including ambiguous queries and adversarial prompts.
extract-ticket-info.eval.ts: Tests the structured data extraction capability (extractTicketInfo) to confirm it correctly identifies ticket fields like "intent" and "product" and reports any missing information. This is based on Hamel's "fill in the blanks".
categorize-messages.eval.ts: Tests the message classification function (categorizeMessage) to verify it accurately labels inputs as support, complaint, spam, or wrong company.
support-agent-e2e-tool-use.eval.ts: Tests the main support agent loop (runSupportAgent) to verify it correctly chooses between using the searchKnowledgeBase tool or no tools at all depending on the user's query.

Example eval run: https://app.dev.axiomtestlabs.co/axiomers-ft83/ai-engineering/evaluations?runId=RMOX7TRDQM

Also made some package/lockfile changes to stop example builds from failing

Copilot

Pull request overview

This PR adds a comprehensive "kitchen sink" example demonstrating AI instrumentation and evaluation patterns. It introduces a support agent demo with OpenTelemetry tracing, feature flags, and multiple evaluation suites.

Key Changes:

New example project with Next.js 16, React 19, and AI SDK integration
OpenTelemetry instrumentation utilities for tracing and span management
Health check API endpoint using Hono framework
Updated workspace dependencies with specific Zod version pinning

Reviewed changes

Copilot reviewed 41 out of 48 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
pnpm-workspace.yaml	Reorganizes catalog entries and pins Zod to exact version 4.1.5
pnpm-lock.yaml	Adds kitchen-sink example dependencies including Next.js 16.0.4, React 19, OpenTelemetry, and Tailwind v4
examples/kitchen-sink/tsconfig.json	Standard Next.js TypeScript configuration with strict mode and path aliases
examples/kitchen-sink/src/lib/utilities/tracer.ts	Creates OpenTelemetry tracer singleton for the application
examples/kitchen-sink/src/lib/utilities/start-active-span.ts	Utility wrapper for creating traced spans with error handling and callbacks
examples/kitchen-sink/src/lib/utilities/get-current-trace-id.ts	Helper to retrieve active trace ID from OpenTelemetry context
examples/kitchen-sink/src/lib/openai.ts	Initializes OpenAI client with API key from environment
examples/kitchen-sink/src/lib/api/health.ts	Basic health check endpoint returning API status and timestamp

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg-pr-new · 2025-11-27T07:32:58Z

Open in StackBlitz

npm i https://pkg.pr.new/axiomhq/ai/axiom@169

commit: 3611974

pkg-pr-new · 2025-11-27T07:32:58Z

Open in StackBlitz

npm i https://pkg.pr.new/axiomhq/ai/axiom@169

commit: 1076695

examples/kitchen-sink/src/lib/api/ai/index.tsx

gabrielelpidio · 2025-11-28T03:55:06Z

examples/kitchen-sink/src/lib/capabilities/support-agent/support-agent-e2e-tool-use.eval.ts

+    {
+      input: 'Hello, are you a bot?',
+      expected: [],
+      purpose: 'chat_no_tool',


First time seeing this purpose, I like it, just might be misleading having them everywhere in this example?

what is misleading about it? collections can have arbitrary excess properties. could be replaced by a comment of course but this seems more readable to me. what do you think

Yeah, what I mean is it gives away the perception that purpose it's a first party offering of the SDK, it seems good to have this, just maybe scoped to a single .eval.ts either way, it's a nit pick

c-ehrlich added 15 commits November 26, 2025 12:40

make AxiomEvalInstrumentationHook infer if its function is async or not

5bc3763

make provider optional on AxiomEvalInstrumentationHook

07d1efd

update runner to vitest v4

8ebd738

ignore vitest.config.ts in user project

503c41a

start kitchen sink example

b64a746

tracing works

cd5029e

we have evals

34a8952

improve evals

469f3b1

Merge branch 'main' into kitchen-sink

5a6701a

wrap model here

7f00c71

create extract-ticket-info and eval it

a397209

agent loop works

bd2d7d6

track and eval tool use

f1740a0

format

b0c7982

didnt mean to change this

34bcaf8

Copilot AI review requested due to automatic review settings November 27, 2025 06:24

c-ehrlich changed the title ~~Kitchen sink~~ docs: kitchen sink example Nov 27, 2025

Copilot started reviewing on behalf of c-ehrlich November 27, 2025 06:24 View session

Copilot finished reviewing on behalf of c-ehrlich November 27, 2025 06:24

Copilot AI reviewed Nov 27, 2025

View reviewed changes

c-ehrlich added 2 commits November 27, 2025 14:30

add readme

9ba4624

fix build

1076695

c-ehrlich added 4 commits November 27, 2025 14:34

remove any here

c5582eb

remove health route and default redirect to customer-support

c17cfee

Delete evals example, replace with evals-minimal

254ea08

add configFlags

06ac899

c-ehrlich changed the title ~~docs: kitchen sink example~~ docs: evals-minimal and kitchen sink examples Nov 27, 2025

prettier

7c54eac

c-ehrlich added 3 commits November 27, 2025 15:06

add note

510de95

move evals

d64b1d3

Merge branch 'main' into kitchen-sink

c1a5504

gabrielelpidio reviewed Nov 28, 2025

View reviewed changes

c-ehrlich mentioned this pull request Nov 28, 2025

feat: improve data/scorer types, again #173

Open

c-ehrlich added 4 commits November 28, 2025 12:51

Merge branch 'main' into kitchen-sink

dfd5faa

make scorers return booleans

579e559

buggg

f207432

remove old stuff

3611974

gabrielelpidio approved these changes Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: `evals-minimal` and `kitchen sink` examples #169

docs: `evals-minimal` and `kitchen sink` examples #169

Uh oh!

c-ehrlich commented Nov 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

pkg-pr-new bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Nov 27, 2025

Uh oh!

Uh oh!

gabrielelpidio Nov 28, 2025 •

edited

Loading

Uh oh!

c-ehrlich Nov 28, 2025

Uh oh!

gabrielelpidio Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: evals-minimal and kitchen sink examples #169

Are you sure you want to change the base?

docs: evals-minimal and kitchen sink examples #169

Uh oh!

Conversation

c-ehrlich commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

pkg-pr-new bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Nov 27, 2025

Uh oh!

Uh oh!

gabrielelpidio Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c-ehrlich Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

gabrielelpidio Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

docs: `evals-minimal` and `kitchen sink` examples #169

docs: `evals-minimal` and `kitchen sink` examples #169

c-ehrlich commented Nov 27, 2025 •

edited

Loading

pkg-pr-new bot commented Nov 27, 2025 •

edited

Loading

gabrielelpidio Nov 28, 2025 •

edited

Loading