feat(evals): add expo-sdk, expo-router, expo-ui#383
Open
grabbou wants to merge 32 commits into
Open
Conversation
…xpo-ui Address blocking, should-fix, and nice-to-have issues found during fresh-context eval audit: replace unshipped Stack.Toolbar API with headerRight options, fix platform key consistency, remove dead links and orphan files, add missing input stubs, tighten requirement language for judgeability, and clean up unused code in references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b50ea25 to
99d3d85
Compare
Rewrite all Expo eval prompts so they describe WHAT to build or fix, not HOW to do it. Requirements test whether the model knows the right APIs and patterns — prompts should not give away the answer. Also merges expo-modules eval 02 (watchedDirectories rules) into eval 01 (inline-config) since the standalone eval had no natural prompt that didn't prescribe the solution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop six evals that test trivial knowledge or duplicate existing coverage: - expo-modules/09 (create-module-noninteractive): tests CLI-flag memorization, not engineering - expo-modules/04 (kotlin-name-match): duplicates the Swift name-match eval; same concept in a second language - expo-ui/06, 07, 09 (picker, slider/segmented, menu/masked-view drop-ins): pure import swaps with 1:1 API parity, redundant with the bottom-sheet drop-in - expo-sdk/11 (notifications project-id guard): substantively a subset of the android-channel-token eval Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the uniform "Replace this scaffold" placeholder used across all Expo starters with task-specific baselines that follow the repo methodology (app/ = realistic code the solver extends), matching the convention in animation/async-state/navigation/lists evals. - Migrate evals seed working legacy code (e.g. @gorhom/bottom-sheet, react-native-pager-view, @react-navigation, deprecated FileSystem and video-thumbnails APIs) so no-legacy-import requirements are meaningful - "Broken after upgrade / not picked up" evals seed the actual defect (e.g. Swift module name mismatch, crashing root FileSystem calls) - "Add X" evals seed the surrounding screen with stubbed handlers - Greenfield evals seed a real default app screen, never instruction text - expo-modules/07 gains package.json so the missing-types task is fixable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions Make every Expo reference the real result of applying prompt.md to the reseeded starter: each reference now starts from the new starter, applies exactly the prompt's change, satisfies every requirement, and carries no placeholder text or dead imports. - Remove leftover "Replace this scaffold" placeholder reference App.tsx files across all four categories - Rebuild references on top of the realistic starters (migrated-off legacy code, fixed defects, filled stub handlers) - expo-modules/08: reference now writes camera permission to match the prompt (was microphone) - expo-modules/03: reference shows the fixed file/class/Name alignment - expo-modules/07: reference adds correct package.json types/files config - expo-router: add missing index.tsx files; relocate not-found catch-all under docs/ so +not-found takes priority; resolve dead /details links - Restore full reference file sets so each reference mirrors its app/ starter (per repo convention) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r in expo-sdk/25 - expo-ui/02: add a Reset button (RN in starter, @expo/ui Button in reference) so the universal-layout-primitives requirement that lists Button is genuinely satisfied - expo-sdk/25: drive navigation-bar visibility state from addVisibilityListener and render it, instead of a no-op callback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
From the fresh-context review: - expo-ui/03, 08: use the documented default imports for BottomSheet and PagerView (were named imports) - expo-modules/06: add style prop to the AuditLabel wrapper Props so the consumer typechecks - expo-modules/05: drop the now-inaccurate "no types" comment in the solved reference - expo-router/05, 06: keep the scaffold's route-screen bodies in the reference (only the tabs _layout is under test) - expo-sdk/16, 19: drop requirements that referenced config files absent from inputs and reference (permission-plugin-note, config-plugin-fields) - expo-sdk/12: mention background remote notifications in the prompt so the background-remote plugin-option requirement maps to the scenario Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e screen useNativeState is exported from @expo/ui/jetpack-compose (Android) and cannot be combined with the universal TextInput. Reframe the eval to a Jetpack Compose screen: import Host/Text/TextField/useNativeState from @expo/ui/jetpack-compose, drive TextField via value/onValueChange on the native state's .value. Drop the listener-cleanup requirement (no such JS listener API exists on ObservableState) and add a no-React-state-mirror requirement. Verified against the SDK 56 useNativeState docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all expo-modules evals and update the README table and whitepaper category table/total accordingly. Also corrects the expo-sdk (26 to 25) and expo-ui (12 to 9) counts that were stale after earlier eval removals. Suite total is now 137. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Evals 01, 02, 11, 12 seed a real React Native screen that the reference converts to Expo UI, so the prompt should describe migrating/converting the existing UI rather than building from scratch. Reword accordingly to match the seeded starting point. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… imports - Remove expo-ui universal-host-root eval (01): overlapped the universal-layout-controls eval on the Host-root concept, which the latter already covers with richer layout/control coverage. Renumber layout-controls to 01 so the category starts cleanly. - Drop the unnecessary `Text as UIText/MaterialText/ComposeText` aliases where no react-native Text is imported; use plain `Text`. Eval 12 keeps its aliases (it imports Text from both platform entrypoints). - Update README and whitepaper counts (expo-ui 9 to 8, suite total 136). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Close the numbering gaps left by removed evals so prefixes run consecutively: expo-sdk is now 01-25 (was missing 11) and expo-ui is now 01-08 (was missing 02/06/07/09). Pure directory renames; eval content, counts, and other categories are unchanged. Pre-existing gaps in animation/async-state/navigation are left as-is since they are referenced by the frozen historical result tables under paper/export. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
grabbou
commented
Jul 1, 2026
The eval previously called useNavigation() and discarded the result (dead code that only existed to satisfy a requirement) and dropped the ThemeProvider entirely in the reference, which is a lossy migration that never demonstrated the actual SDK 56 change. Per the SDK 55->56 migration docs, app code must move @react-navigation imports to expo-router entry points: theming to expo-router/react-navigation and createNativeStackNavigator to the file-based Stack. Rework the eval to show exactly that: scaffold is a themed native-stack layout; reference migrates ThemeProvider/DefaultTheme to expo-router/react-navigation and the navigator to expo-router Stack, keeping the theme wrapper. Replace the contrived useNavigation requirement with a theming-import migration one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
grabbou
commented
Jul 1, 2026
Apply reviewer guidance across the expo-router category: - 02: simplify scaffold to a single Products screen (no prescribed routing); prompt now requires adding a navigating button that passes the product id; requirements verify the param is passed, read via the hook, and rendered on the details screen - 03: prompt specifies the header (title, back-button label, right button that alerts); requirement checks the alert handler instead of a vague "deterministic handler" - 04: drop the iOS-only framing; reword as adding a collapsing large title - 05: reframe as converting the existing stack layout to native tabs - 06: prompt specifies exact SF Symbols (house/house.fill) and badge (2); requirements check those props - 07: require Stack.Protected (base is a Stack); prompt says add gating to existing routes; drop the dead handleSignIn stub - 09 (data-loaders-config): dropped per review - 10 (static/server loader): prompt is explicit about what to move/fetch - 11 (suspense): drop the unexplained internal must-not; require a visible Loading indicator - 12 (not-found): prompt/requirements explicit about the message, home link, and scoping the catch-all - Renumber expo-router to 01-11 after dropping 09; update README/whitepaper counts (expo-router 12->11, suite total 135) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Carry the expo-router review principles across the remaining Expo evals:
- Prompt explicitness: replace vague "handle appropriately/gracefully/
correctly" with the concrete behavior the requirements check (which
permission flow, which states, what to fetch, which channel/album, etc.)
- Framing: reword build/implement/create to "wire up" since each scaffold
already seeds the screen and a stubbed handler
- Requirement fixes:
- expo-sdk/13: drop the unjudgeable "make the code path clear enough"
requirement (already covered by the no-root-legacy-import check)
- expo-sdk/23: drop the un-prompted no-hardcoded-secrets requirement
- expo-ui/01: drop the un-prompted no-glass-effect requirement
- expo-ui/04: tighten the two vague requirements (supported props only;
a visible text fallback for an unavailable mode) to checkable form
- expo-sdk/01 (author edit), 02, 03 and expo-ui/02 already carried explicit
prompts + tightened requirements; references verified against every
updated requirement
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The no-android-open and unsupported-props evals tested different traps on the same @expo/ui/community/datetime-picker component with the same scaffold pattern. Merge them into a single cross-platform date+time eval that covers every trap: community import, no DateTimePickerAndroid.open, no @react-native-community import, controlled value via onValueChange, undefined-cancel guard, and a platform fallback for an unavailable mode. Adopt the date+time appointment scaffold/reference; renumber expo-ui to 01-07; update README/whitepaper counts (expo-ui 7, suite total 134). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gories Review pass over the new expo-sdk/expo-router/expo-ui evals: - Fix references that misused the pinned SDK 56 packages (File.text() await, MediaLibrary.Asset.create, ExpoCalendar type, contacts address/image fields, Compose background modifier, ObservableState TextField wiring, swift-ui VStack, expo-router theming imports) - Re-base expo-ui/03 on the real DateTimePicker cancellation API (onDismiss; dialog pickers mount on press, unmount on close) - Make expo-router/11 winnable under the solver contract (catch-all placement is graded; no baseline deletion required) and move expo-router/09's server-only module outside the route directory - Derive iOS location accuracy from ios.accuracy, keep CameraView mounted while toggling active, wire real download progress, persist limited-access notice, drop dead/duplicated/baseline-satisfied requirements, de-leak prompts that named the graded APIs - Rewrite bug-report-framed prompts as forward-looking migrations - Pin migration-source packages in testbench and add a scoped typecheck gate (evals/tsconfig.expo.json) to CI Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…polish - Include file paths in the judge prompt (mirrors the solver prompt), so placement/naming requirements (+not-found.tsx, docs/ catch-all scoping, tab trigger names) are gradeable from evidence - Keep expo-router/09 reference in the solver's flat path namespace and drop the out-of-app placement claim the harness cannot express - De-leak remaining prompts that named graded APIs (barcode scanner props, EAS projectId) - Drop expo-ui/04 requirements pre-satisfied by the baseline; scope the expo-ui/03 fallback requirement to platforms lacking the mode Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d and leak policies Implementation-verified fixes across all three expo categories: - SuspenseFallback moved to the layout route (leaf-route exports are ignored by expo-router in sync import mode) - Remove player.pause() from unmount cleanups (useReleasingSharedObject releases the player before the component cleanup runs, so the call throws); pause stays AppState-driven - Wrap multi-child Host content in Column/VStack (Host lays out direct children as an overlapping stack on both platforms) - Idempotent re-download into Paths.cache; drop dead videoMaxDuration; drop deprecated addVisibilityListener and unprompted reloadScreenOptions grading; make headerBackTitle observable via a second screen - Guard policy: at most one baseline-passable requirement per eval, kept only for documented deprecations or preserve-constraints; the rest rebound to the migrated surface or deleted - Leak policy: prompts may name the source package being migrated from but never the graded target identifiers; de-leaked remaining prompts - Escape solver-controlled paths in the judge prompt attribute and add buildJudgePrompt tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…xpo evals Final convergence pass, every change grounded in the pinned package implementations (native sources included): - expo-sdk/02: grade pausePreview/resumePreview (cross-platform) instead of the iOS-only active prop; renamed to preview-pause-focus - expo-sdk/03: refocus on barcode scanning (Android binds barcode analysis only in picture mode; recording removed); renamed to barcode-scan-config - expo-sdk/05: teach the real permission nuance - the library picker is never gated on media-library permission; access level notice only - expo-sdk/21: consolidation-scenario prompt (package not deprecated); multiple-offset thumbnails graded - expo-sdk/23: declare updates.requestHeaders in app config so the header override passes native validation - expo-router/09: true loader boundary (env read inside the stripped loader export; helper module under app/ would ship in client bundles and register a phantom route) - expo-router/03/04: no unjustified Stack.Toolbar ban; observable custom back label; headerLargeTitleEnabled over the deprecated spelling - expo-ui/01: Button label prop (raw string child trips the RN text invariant); expo-ui/06: restore placeholder/capitalization/hint via the TextField Placeholder slot - Requirement hardening per guard policy: fallbacks and progress must be driven by real API results, not static branches; deleted or rebound remaining baseline-passable requirements (sdk/12/15/17/22/25, router/06/07, ui/03/07) - Whitepaper: document file paths in the judge prompt as evidence Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
grabbou
commented
Jul 3, 2026
grabbou
commented
Jul 3, 2026
grabbou
commented
Jul 3, 2026
grabbou
commented
Jul 3, 2026
grabbou
commented
Jul 3, 2026
Prompts state scenario, outcomes, and concrete values - not which API mechanism to use: - expo-router/03: drop "through that screen's options object"; grade header composition mechanism-agnostically (screen options or Stack.Toolbar are both first-class in expo-router 56) - expo-router/10: plain wording for the layout-provided fallback - expo-sdk/23: keep the app-config context, drop the instruction clause - expo-sdk/25: drop the redundant plugin-name sentence (baseline already contains the plugin entry) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
grabbou
commented
Jul 3, 2026
- expo-router/04: require the non-deprecated headerLargeTitleEnabled option; drop acceptance of the deprecated headerLargeTitle spelling - expo-router/09: stop prescribing a single solution. The client bundle strips the loader export and DCE-removes imports used only by it (verified in babel-preset-expo server-data-loaders-plugin), so reading the secret inline in the loader OR in a server-only module the loader imports are both valid. Prompt describes the outcome; the requirement accepts either mechanism. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Clarify that the secret should not reach the client bundle and emphasize rendering from server-loaded data.
Contributor
Author
|
Hey @artus9033 this is now ready for your review! |
thymikee
reviewed
Jul 3, 2026
| @@ -0,0 +1 @@ | |||
| Migrate an Expo Router layout to SDK 56 import conventions. No newline at end of file | |||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Expo SDK 56 eval coverage across three new categories, each with a README documenting library baselines, official sources, and a best-practice inventory. All evals use static file judging with
requirements.yamlcriteria.Also includes testbench alignment for Expo SDK 56 packages and whitepaper clarifications on the eval metadata contract.
Quality bar
Every eval was reviewed against the pinned package implementations in
node_modules(runtimebuild/*.jsplus native.swift/.kt), not type stubs or memory. Concretely, the pass enforced:pausePreview/resumePreviewinstead of the iOS-onlyactiveprop), lifecycle safety (no calls on released shared objects), and native-validation correctness (declaringupdates.requestHeadersso header overrides don't throw).+not-found.tsx,docs/catch-all scoping, the routeloaderboundary) are now gradeable. Baseline-passable "free point" requirements are capped at one per eval and only for documented deprecations or preserve-constraints; the rest were rebound to the migrated surface or removed.Tooling
bun run typecheck:expo-evalsviaevals/tsconfig.expo.json) typechecks all new evalapp/andreference/files against the pinned packages, so a broken reference can't regress silently.<file path="...">attributes, andrunner/evaluators/llm/tests/prompt.test.tslocks in both the path rendering and the escaping.Test plan
bun run typecheck:expo-evals— all new eval files typecheck against pinned SDK 56 packagesbun test runner— runner unit tests pass (incl. judge-prompt tests)bun run typecheckandbun run lint— clean🤖 Generated with Claude Code