Skip to content

feat(evals): add expo-sdk, expo-router, expo-ui#383

Open
grabbou wants to merge 32 commits into
mainfrom
feat/expo-eval-coverage
Open

feat(evals): add expo-sdk, expo-router, expo-ui#383
grabbou wants to merge 32 commits into
mainfrom
feat/expo-eval-coverage

Conversation

@grabbou

@grabbou grabbou commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Expo SDK 56 eval coverage across three new categories, each with a README documenting library baselines, official sources, and a best-practice inventory. All evals use static file judging with requirements.yaml criteria.

  • expo-sdk (25 evals) — camera permissions & preview lifecycle, image picker, location accuracy/authorization, notifications, file system (object API + legacy migration), media library, calendar, contacts, audio/video playback, updates (OTA + request-header override), status/navigation bar, and build-properties config plugins
  • expo-router (11 evals) — SDK 56 import migration, Stack layouts & header composition, native tabs, protected routes, serializable search params, static/server loaders, suspense fallbacks, and not-found route priority
  • expo-ui (7 evals) — universal Host layout controls, drop-in community replacements (BottomSheet, PagerView, DateTimePicker), Material colors/symbols, native-state TextField, and per-platform modifier escape hatches

Also includes testbench alignment for Expo SDK 56 packages and whitepaper clarifications on the eval metadata contract.

Quality bar

Every eval was reviewed against the pinned package implementations in node_modules (runtime build/*.js plus native .swift/.kt), not type stubs or memory. Concretely, the pass enforced:

  • References are runnable and exemplary — each reference solution compiles against the pinned SDK 56 packages and is exactly "the prompt solved from the baseline." Fixes included cross-platform correctness (e.g. pausePreview/resumePreview instead of the iOS-only active prop), lifecycle safety (no calls on released shared objects), and native-validation correctness (declaring updates.requestHeaders so header overrides don't throw).
  • Requirements are judgeable and discriminating — the LLM judge sees file contents and paths; placement/naming requirements (e.g. +not-found.tsx, docs/ catch-all scoping, the route loader boundary) are now gradeable. Baseline-passable "free point" requirements are capped at one per eval and only for documented deprecations or preserve-constraints; the rest were rebound to the migrated surface or removed.
  • Prompts test knowledge, not transcription — prompts describe the scenario and may name the source package being migrated from, but never the target API identifier a requirement grades.

Tooling

  • A scoped typecheck script (bun run typecheck:expo-evals via evals/tsconfig.expo.json) typechecks all new eval app/ and reference/ files against the pinned packages, so a broken reference can't regress silently.
  • The judge prompt now includes escaped <file path="..."> attributes, and runner/evaluators/llm/tests/prompt.test.ts locks in both the path rendering and the escaping.

⚠️ One follow-up needs a maintainer with workflow scope: add - run: bun run typecheck:expo-evals to .github/workflows/ci.yml (after the existing bun run typecheck step) to wire the scoped gate into CI. It was omitted from this branch because the pushing token lacks workflow scope; the script itself is already in package.json.

Test plan

  • bun run typecheck:expo-evals — all new eval files typecheck against pinned SDK 56 packages
  • bun test runner — runner unit tests pass (incl. judge-prompt tests)
  • bun run typecheck and bun run lint — clean
  • Run the eval harness on a sample of new evals to confirm judging behaves as expected

🤖 Generated with Claude Code

grabbou and others added 10 commits June 30, 2026 23:54
…xpo-ui

Address blocking, should-fix, and nice-to-have issues found during
fresh-context eval audit: replace unshipped Stack.Toolbar API with
headerRight options, fix platform key consistency, remove dead links
and orphan files, add missing input stubs, tighten requirement
language for judgeability, and clean up unused code in references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@grabbou grabbou force-pushed the feat/expo-eval-coverage branch from b50ea25 to 99d3d85 Compare June 30, 2026 20:02
grabbou and others added 11 commits July 1, 2026 00:19
Rewrite all Expo eval prompts so they describe WHAT to build or fix,
not HOW to do it. Requirements test whether the model knows the right
APIs and patterns — prompts should not give away the answer.

Also merges expo-modules eval 02 (watchedDirectories rules) into
eval 01 (inline-config) since the standalone eval had no natural
prompt that didn't prescribe the solution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop six evals that test trivial knowledge or duplicate existing coverage:
- expo-modules/09 (create-module-noninteractive): tests CLI-flag
  memorization, not engineering
- expo-modules/04 (kotlin-name-match): duplicates the Swift name-match
  eval; same concept in a second language
- expo-ui/06, 07, 09 (picker, slider/segmented, menu/masked-view
  drop-ins): pure import swaps with 1:1 API parity, redundant with the
  bottom-sheet drop-in
- expo-sdk/11 (notifications project-id guard): substantively a subset
  of the android-channel-token eval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the uniform "Replace this scaffold" placeholder used across all
Expo starters with task-specific baselines that follow the repo
methodology (app/ = realistic code the solver extends), matching the
convention in animation/async-state/navigation/lists evals.

- Migrate evals seed working legacy code (e.g. @gorhom/bottom-sheet,
  react-native-pager-view, @react-navigation, deprecated FileSystem and
  video-thumbnails APIs) so no-legacy-import requirements are meaningful
- "Broken after upgrade / not picked up" evals seed the actual defect
  (e.g. Swift module name mismatch, crashing root FileSystem calls)
- "Add X" evals seed the surrounding screen with stubbed handlers
- Greenfield evals seed a real default app screen, never instruction text
- expo-modules/07 gains package.json so the missing-types task is fixable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions

Make every Expo reference the real result of applying prompt.md to the
reseeded starter: each reference now starts from the new starter, applies
exactly the prompt's change, satisfies every requirement, and carries no
placeholder text or dead imports.

- Remove leftover "Replace this scaffold" placeholder reference App.tsx
  files across all four categories
- Rebuild references on top of the realistic starters (migrated-off
  legacy code, fixed defects, filled stub handlers)
- expo-modules/08: reference now writes camera permission to match the
  prompt (was microphone)
- expo-modules/03: reference shows the fixed file/class/Name alignment
- expo-modules/07: reference adds correct package.json types/files config
- expo-router: add missing index.tsx files; relocate not-found catch-all
  under docs/ so +not-found takes priority; resolve dead /details links
- Restore full reference file sets so each reference mirrors its app/
  starter (per repo convention)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r in expo-sdk/25

- expo-ui/02: add a Reset button (RN in starter, @expo/ui Button in
  reference) so the universal-layout-primitives requirement that lists
  Button is genuinely satisfied
- expo-sdk/25: drive navigation-bar visibility state from
  addVisibilityListener and render it, instead of a no-op callback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
From the fresh-context review:
- expo-ui/03, 08: use the documented default imports for BottomSheet and
  PagerView (were named imports)
- expo-modules/06: add style prop to the AuditLabel wrapper Props so the
  consumer typechecks
- expo-modules/05: drop the now-inaccurate "no types" comment in the
  solved reference
- expo-router/05, 06: keep the scaffold's route-screen bodies in the
  reference (only the tabs _layout is under test)
- expo-sdk/16, 19: drop requirements that referenced config files absent
  from inputs and reference (permission-plugin-note, config-plugin-fields)
- expo-sdk/12: mention background remote notifications in the prompt so
  the background-remote plugin-option requirement maps to the scenario

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e screen

useNativeState is exported from @expo/ui/jetpack-compose (Android) and
cannot be combined with the universal TextInput. Reframe the eval to a
Jetpack Compose screen: import Host/Text/TextField/useNativeState from
@expo/ui/jetpack-compose, drive TextField via value/onValueChange on the
native state's .value. Drop the listener-cleanup requirement (no such JS
listener API exists on ObservableState) and add a no-React-state-mirror
requirement. Verified against the SDK 56 useNativeState docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all expo-modules evals and update the README table and
whitepaper category table/total accordingly. Also corrects the
expo-sdk (26 to 25) and expo-ui (12 to 9) counts that were stale
after earlier eval removals. Suite total is now 137.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Evals 01, 02, 11, 12 seed a real React Native screen that the reference
converts to Expo UI, so the prompt should describe migrating/converting
the existing UI rather than building from scratch. Reword accordingly to
match the seeded starting point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… imports

- Remove expo-ui universal-host-root eval (01): overlapped the
  universal-layout-controls eval on the Host-root concept, which the
  latter already covers with richer layout/control coverage. Renumber
  layout-controls to 01 so the category starts cleanly.
- Drop the unnecessary `Text as UIText/MaterialText/ComposeText` aliases
  where no react-native Text is imported; use plain `Text`. Eval 12
  keeps its aliases (it imports Text from both platform entrypoints).
- Update README and whitepaper counts (expo-ui 9 to 8, suite total 136).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Close the numbering gaps left by removed evals so prefixes run
consecutively: expo-sdk is now 01-25 (was missing 11) and expo-ui is
now 01-08 (was missing 02/06/07/09). Pure directory renames; eval
content, counts, and other categories are unchanged. Pre-existing gaps
in animation/async-state/navigation are left as-is since they are
referenced by the frozen historical result tables under paper/export.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread evals/expo-router/04-rn-expo-router-large-title-scroll-child/prompt.md Outdated
The eval previously called useNavigation() and discarded the result
(dead code that only existed to satisfy a requirement) and dropped the
ThemeProvider entirely in the reference, which is a lossy migration that
never demonstrated the actual SDK 56 change.

Per the SDK 55->56 migration docs, app code must move @react-navigation
imports to expo-router entry points: theming to expo-router/react-navigation
and createNativeStackNavigator to the file-based Stack. Rework the eval to
show exactly that: scaffold is a themed native-stack layout; reference
migrates ThemeProvider/DefaultTheme to expo-router/react-navigation and the
navigator to expo-router Stack, keeping the theme wrapper. Replace the
contrived useNavigation requirement with a theming-import migration one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread evals/expo-router/03-rn-expo-router-stack-composition-header/prompt.md Outdated
Comment thread evals/expo-router/03-rn-expo-router-stack-composition-header/requirements.yaml Outdated
Comment thread evals/expo-router/05-rn-expo-router-native-tabs-trigger-api/prompt.md Outdated
Comment thread evals/expo-router/09-rn-expo-router-data-loaders-config/prompt.md Outdated
Comment thread evals/expo-router/10-rn-expo-router-static-vs-server-loader/prompt.md Outdated
Comment thread evals/expo-router/11-rn-expo-router-suspense-fallback-layout/requirements.yaml Outdated
Comment thread evals/expo-router/11-rn-expo-router-suspense-fallback-layout/requirements.yaml Outdated
Comment thread evals/expo-router/12-rn-expo-router-not-found-route-priority/prompt.md Outdated
grabbou and others added 5 commits July 1, 2026 14:47
Apply reviewer guidance across the expo-router category:
- 02: simplify scaffold to a single Products screen (no prescribed
  routing); prompt now requires adding a navigating button that passes
  the product id; requirements verify the param is passed, read via the
  hook, and rendered on the details screen
- 03: prompt specifies the header (title, back-button label, right button
  that alerts); requirement checks the alert handler instead of a vague
  "deterministic handler"
- 04: drop the iOS-only framing; reword as adding a collapsing large title
- 05: reframe as converting the existing stack layout to native tabs
- 06: prompt specifies exact SF Symbols (house/house.fill) and badge (2);
  requirements check those props
- 07: require Stack.Protected (base is a Stack); prompt says add gating to
  existing routes; drop the dead handleSignIn stub
- 09 (data-loaders-config): dropped per review
- 10 (static/server loader): prompt is explicit about what to move/fetch
- 11 (suspense): drop the unexplained internal must-not; require a visible
  Loading indicator
- 12 (not-found): prompt/requirements explicit about the message, home
  link, and scoping the catch-all
- Renumber expo-router to 01-11 after dropping 09; update README/whitepaper
  counts (expo-router 12->11, suite total 135)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Carry the expo-router review principles across the remaining Expo evals:
- Prompt explicitness: replace vague "handle appropriately/gracefully/
  correctly" with the concrete behavior the requirements check (which
  permission flow, which states, what to fetch, which channel/album, etc.)
- Framing: reword build/implement/create to "wire up" since each scaffold
  already seeds the screen and a stubbed handler
- Requirement fixes:
  - expo-sdk/13: drop the unjudgeable "make the code path clear enough"
    requirement (already covered by the no-root-legacy-import check)
  - expo-sdk/23: drop the un-prompted no-hardcoded-secrets requirement
  - expo-ui/01: drop the un-prompted no-glass-effect requirement
  - expo-ui/04: tighten the two vague requirements (supported props only;
    a visible text fallback for an unavailable mode) to checkable form
- expo-sdk/01 (author edit), 02, 03 and expo-ui/02 already carried explicit
  prompts + tightened requirements; references verified against every
  updated requirement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The no-android-open and unsupported-props evals tested different traps
on the same @expo/ui/community/datetime-picker component with the same
scaffold pattern. Merge them into a single cross-platform date+time eval
that covers every trap: community import, no DateTimePickerAndroid.open,
no @react-native-community import, controlled value via onValueChange,
undefined-cancel guard, and a platform fallback for an unavailable mode.
Adopt the date+time appointment scaffold/reference; renumber expo-ui to
01-07; update README/whitepaper counts (expo-ui 7, suite total 134).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gories

Review pass over the new expo-sdk/expo-router/expo-ui evals:

- Fix references that misused the pinned SDK 56 packages (File.text()
  await, MediaLibrary.Asset.create, ExpoCalendar type, contacts
  address/image fields, Compose background modifier, ObservableState
  TextField wiring, swift-ui VStack, expo-router theming imports)
- Re-base expo-ui/03 on the real DateTimePicker cancellation API
  (onDismiss; dialog pickers mount on press, unmount on close)
- Make expo-router/11 winnable under the solver contract (catch-all
  placement is graded; no baseline deletion required) and move
  expo-router/09's server-only module outside the route directory
- Derive iOS location accuracy from ios.accuracy, keep CameraView
  mounted while toggling active, wire real download progress, persist
  limited-access notice, drop dead/duplicated/baseline-satisfied
  requirements, de-leak prompts that named the graded APIs
- Rewrite bug-report-framed prompts as forward-looking migrations
- Pin migration-source packages in testbench and add a scoped
  typecheck gate (evals/tsconfig.expo.json) to CI

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…polish

- Include file paths in the judge prompt (mirrors the solver prompt),
  so placement/naming requirements (+not-found.tsx, docs/ catch-all
  scoping, tab trigger names) are gradeable from evidence
- Keep expo-router/09 reference in the solver's flat path namespace
  and drop the out-of-app placement claim the harness cannot express
- De-leak remaining prompts that named graded APIs (barcode scanner
  props, EAS projectId)
- Drop expo-ui/04 requirements pre-satisfied by the baseline; scope
  the expo-ui/03 fallback requirement to platforms lacking the mode

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
grabbou and others added 2 commits July 3, 2026 12:02
…d and leak policies

Implementation-verified fixes across all three expo categories:

- SuspenseFallback moved to the layout route (leaf-route exports are
  ignored by expo-router in sync import mode)
- Remove player.pause() from unmount cleanups (useReleasingSharedObject
  releases the player before the component cleanup runs, so the call
  throws); pause stays AppState-driven
- Wrap multi-child Host content in Column/VStack (Host lays out direct
  children as an overlapping stack on both platforms)
- Idempotent re-download into Paths.cache; drop dead videoMaxDuration;
  drop deprecated addVisibilityListener and unprompted reloadScreenOptions
  grading; make headerBackTitle observable via a second screen
- Guard policy: at most one baseline-passable requirement per eval, kept
  only for documented deprecations or preserve-constraints; the rest
  rebound to the migrated surface or deleted
- Leak policy: prompts may name the source package being migrated from
  but never the graded target identifiers; de-leaked remaining prompts
- Escape solver-controlled paths in the judge prompt attribute and add
  buildJudgePrompt tests

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…xpo evals

Final convergence pass, every change grounded in the pinned package
implementations (native sources included):

- expo-sdk/02: grade pausePreview/resumePreview (cross-platform) instead
  of the iOS-only active prop; renamed to preview-pause-focus
- expo-sdk/03: refocus on barcode scanning (Android binds barcode
  analysis only in picture mode; recording removed); renamed to
  barcode-scan-config
- expo-sdk/05: teach the real permission nuance - the library picker is
  never gated on media-library permission; access level notice only
- expo-sdk/21: consolidation-scenario prompt (package not deprecated);
  multiple-offset thumbnails graded
- expo-sdk/23: declare updates.requestHeaders in app config so the
  header override passes native validation
- expo-router/09: true loader boundary (env read inside the stripped
  loader export; helper module under app/ would ship in client bundles
  and register a phantom route)
- expo-router/03/04: no unjustified Stack.Toolbar ban; observable custom
  back label; headerLargeTitleEnabled over the deprecated spelling
- expo-ui/01: Button label prop (raw string child trips the RN text
  invariant); expo-ui/06: restore placeholder/capitalization/hint via
  the TextField Placeholder slot
- Requirement hardening per guard policy: fallbacks and progress must be
  driven by real API results, not static branches; deleted or rebound
  remaining baseline-passable requirements (sdk/12/15/17/22/25,
  router/06/07, ui/03/07)
- Whitepaper: document file paths in the judge prompt as evidence

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@grabbou grabbou changed the title feat(evals): Expo eval coverage — expo-sdk, expo-router, expo-modules, expo-ui feat(evals): Expo SDK 56 eval coverage — expo-sdk, expo-router, expo-ui Jul 3, 2026
Comment thread evals/expo-router/03-rn-expo-router-stack-composition-header/prompt.md Outdated
Comment thread evals/expo-router/04-rn-expo-router-large-title-scroll-child/requirements.yaml Outdated
Comment thread evals/expo-router/09-rn-expo-router-static-vs-server-loader/prompt.md Outdated
Comment thread evals/expo-router/09-rn-expo-router-static-vs-server-loader/requirements.yaml Outdated
Prompts state scenario, outcomes, and concrete values - not which API
mechanism to use:

- expo-router/03: drop "through that screen's options object"; grade
  header composition mechanism-agnostically (screen options or
  Stack.Toolbar are both first-class in expo-router 56)
- expo-router/10: plain wording for the layout-provided fallback
- expo-sdk/23: keep the app-config context, drop the instruction clause
- expo-sdk/25: drop the redundant plugin-name sentence (baseline
  already contains the plugin entry)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread evals/expo-router/09-rn-expo-router-static-vs-server-loader/requirements.yaml Outdated
grabbou and others added 2 commits July 3, 2026 12:25
- expo-router/04: require the non-deprecated headerLargeTitleEnabled
  option; drop acceptance of the deprecated headerLargeTitle spelling
- expo-router/09: stop prescribing a single solution. The client
  bundle strips the loader export and DCE-removes imports used only by
  it (verified in babel-preset-expo server-data-loaders-plugin), so
  reading the secret inline in the loader OR in a server-only module the
  loader imports are both valid. Prompt describes the outcome; the
  requirement accepts either mechanism.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Clarify that the secret should not reach the client bundle and emphasize rendering from server-loaded data.
@grabbou grabbou requested a review from artus9033 July 3, 2026 08:37
@grabbou

grabbou commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Hey @artus9033 this is now ready for your review!

@grabbou grabbou changed the title feat(evals): Expo SDK 56 eval coverage — expo-sdk, expo-router, expo-ui feat(evals): add expo-sdk, expo-router, expo-ui Jul 3, 2026
@@ -0,0 +1 @@
Migrate an Expo Router layout to SDK 56 import conventions. No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK 56+? there is 57 already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants