Skip to content

docs: add TOON format adoption spec#836

Open
MQ37 wants to merge 1 commit into
masterfrom
feat/toon-format-benchmark
Open

docs: add TOON format adoption spec#836
MQ37 wants to merge 1 commit into
masterfrom
feat/toon-format-benchmark

Conversation

@MQ37
Copy link
Copy Markdown
Contributor

@MQ37 MQ37 commented May 13, 2026

Context

Apify MCP tool responses wire JSON text content for most tools. Every Actor run, dataset listing, KV-store key, and dataset item lands in the caller LLM's context and stays there for every subsequent turn — the repeated keys in JSON arrays compound per request.

Solution

Research spec for an adaptive picker at each in-scope tool-call site: encodes both the current JSON and TOON (with a small dot-flatten transform) per response, ships whichever is smaller. json is always a candidate, so the picker is never worse than today.

Companion to Code Mode (#794) — TOON shrinks results that flow through the LLM context, Code Mode skips most of them entirely. The spec proposes exposing the picker's encoder as apify.stringifyCompact() for Code Mode programs.

No production code in this PR. Spec-only, pending team review.

Worth your attention

  • Measured savings: −6.2% bytes combined on 18 real Apify-API fixtures; the four list endpoints save 19–44% individually. dataset-items averages near zero — savings concentrate on list endpoints.
  • Stacks with fields= (separate proposal): 100-place Google-Maps result projected to 3 fields → 1.01 MB → 9.5 KB, ~110× smaller into LLM context.
  • structuredContent is unchanged. Only TextContent.text shifts. outputSchema validation, MCP widgets, and programmatic consumers continue to see the same JSON.
  • Bounded by construction: json always in candidate set; defensive try/catch around the TOON candidate; MAX_DEPTH = 20 guard with margin of 11 over deepest real fixture; dotted-key normalisation with collision guard.
  • LLM behaviour change unmeasured. The spec calls for an evals/workflows/ regression gate before any production merge.

Follow-up

  • Implementation lands in a separate PR after team approval of the spec.
  • Workflow eval regression gate required before merge.

@MQ37 MQ37 requested a review from jirispilka May 13, 2026 14:36
@github-actions github-actions Bot added the t-ai Issues owned by the AI team. label May 13, 2026
Copy link
Copy Markdown
Collaborator

@jirispilka jirispilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, I would transform this to an issue.

It will be easier to update it later, including new experiments and number.
After the implementation is done, we can publish it as an md too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-ai Issues owned by the AI team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants