HeyPuter · reynaldichernando · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
diff --git a/src/docs/src/AI/chat.md b/src/docs/src/AI/chat.md
@@ -33,7 +33,7 @@ An object containing the following properties:
 - `temperature` (Number) - A number between 0 and 2 indicating the randomness of the completion. Lower values make the output more focused and deterministic, while higher values make it more random. By default, the specific model's temperature is used.
 - `tools` (Array) (Optional) - Function definitions the AI can call. See [Function Calling](#function-calling) for details.
 - `reasoning_effort` / `reasoning.effort` (String) (Optional) - Controls how much effort reasoning models spend thinking. Supported values: `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. Lower values give faster responses with less reasoning. OpenAI models only.
-- `text` / `text_verbosity` (String) (Optional) - Controls how long or short responses are. Supported values: `low`, `medium`, and `high`. Lower values give shorter responses. OpenAI models only.
+- `verbosity` / `text.verbosity` (String) (Optional) - Controls how long or short responses are. Supported values: `low`, `medium`, and `high`. Lower values give shorter responses. OpenAI models only.
 
 #### `testMode` (Boolean) (Optional)
 

diff --git a/src/docs/src/AI/speech2txt.md b/src/docs/src/AI/speech2txt.md
@@ -44,7 +44,7 @@ Fine-tune how transcription runs.
 - `chunking_strategy` (String): Required for `gpt-4o-transcribe-diarize` inputs longer than 30 seconds (recommend `"auto"`).
 - `known_speaker_names` / `known_speaker_references` (Array): Optional diarization references encoded as data URLs.
 - `extra_body` (Object): Forwarded verbatim to the OpenAI API for experimental flags.
-- `stream` (Boolean): Reserved for future streaming support. Currently rejected when `true`.
+- `stream` (Boolean): Reserved for future streaming support. Streaming is not currently supported.
 - `test_mode` (Boolean): When `true`, returns a sample response without using credits. Defaults to `false`.
 
 **xAI-specific options** (when `provider: 'xai'`):
@@ -65,8 +65,8 @@ When `true`, skips the live API call and returns a static sample transcript so y
 
 Returns a `Promise` that resolves to either:
 
-- A string (when `response_format: "text"` or you pass a shorthand `source` with no options), or
-- An object of [`Speech2TxtResult`](/Objects/speech2txtresult) containing the transcription payload (including diarization segments, timestamps, etc., depending on the selected model and format).
+- A string (when `response_format: "text"`), or
+- An object of [`Speech2TxtResult`](/Objects/speech2txtresult) containing the transcription payload (including diarization segments, timestamps, etc., depending on the selected model and format). This is the default, including when you pass a bare `source` with no options.
 
 ## Examples
 
@@ -79,7 +79,7 @@ Returns a `Promise` that resolves to either:
     <script>
         (async () => {
             const transcript = await puter.ai.speech2txt('https://assets.puter.site/example.mp3');
-            puter.print('Transcript:', transcript.text ?? transcript);
+            puter.print('Transcript:', transcript.text);
         })();
     </script>
 </body>

diff --git a/src/docs/src/AI/txt2speech.listEngines.md b/src/docs/src/AI/txt2speech.listEngines.md
@@ -32,14 +32,7 @@ Common aliases are also accepted (e.g. `'eleven'`, `'google'`, `'grok'`).
 
 ## Return value
 
-A `Promise` that resolves to an array of engine objects. Each object contains:
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `id` | `String` | Engine/model identifier |
-| `name` | `String` | Human-readable engine name |
-| `provider` | `String` | Provider this engine belongs to |
-| `pricing_per_million_chars` | `Number` | Cost per million characters (may be absent) |
+A `Promise` that resolves to an array of [`TTSEngine`](/Objects/ttsengine) objects.
 
 Example response:
 

diff --git a/src/docs/src/AI/txt2speech.listVoices.md b/src/docs/src/AI/txt2speech.listVoices.md
@@ -26,19 +26,7 @@ When `options` is a plain string it is treated as an `engine` filter for the def
 
 ## Return value
 
-A `Promise` that resolves to an array of voice objects. Each object contains:
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `id` | `String` | Voice identifier to pass to `txt2speech()` |
-| `name` | `String` | Human-readable voice name |
-| `provider` | `String` | Provider this voice belongs to |
-| `language` | `Object` | `{ name, code }` language info (may be absent) |
-| `description` | `String` | Short description of the voice (may be absent) |
-| `category` | `String` | Voice category, e.g. `'premade'` (may be absent) |
-| `labels` | `Object` | Provider-specific labels (may be absent) |
-| `supported_models` | `Array` | Model IDs this voice works with (may be absent) |
-| `supported_engines` | `Array` | Engine types this voice supports (may be absent) |
+A `Promise` that resolves to an array of [`TTSVoice`](/Objects/ttsvoice) objects.
 
 Example response:
 

diff --git a/src/docs/src/AI/txt2speech.md b/src/docs/src/AI/txt2speech.md
@@ -96,7 +96,7 @@ Available when `provider: 'xai'`:
 | `language` | `String` | BCP-47 language code. Defaults to `'en'`. Supports `'auto'` for auto-detection and 20+ languages |
 | `output_format` | `String` | Output codec. Available: `'mp3'` (default), `'wav'`, `'pcm'`, `'mulaw'`, `'alaw'` |
 
-Text supports inline speech tags like `[pause]`, `[laugh]` and wrapping tags like `<whisper>text</whisper>` for expressive delivery. Maximum 15,000 characters per request.
+Text supports inline speech tags like `[pause]`, `[laugh]` and wrapping tags like `<whisper>text</whisper>` for expressive delivery.
 
 For more details, see the [xAI TTS documentation](https://x.ai/news/grok-stt-and-tts-apis).
 

diff --git a/src/docs/src/Objects.md b/src/docs/src/Objects.md
@@ -17,6 +17,8 @@ Various object types and classes that represent different entities in the Puter
 - **[MonthlyUsage](/Objects/monthlyusage/)** - Represents user's monthly resource usage information
 - **[Speech2TxtResult](/Objects/speech2txtresult/)** - Represents speech-to-text transcription results
 - **[Subdomain](/Objects/subdomain/)** - Represents a subdomain
+- **[TTSEngine](/Objects/ttsengine/)** - Represents an available text-to-speech engine/model
+- **[TTSVoice](/Objects/ttsvoice/)** - Represents an available text-to-speech voice
 - **[ToolCall](/Objects/toolcall/)** - Represents a tool invocation request
 - **[User](/Objects/user/)** - Represents a Puter user
 - **[WorkerDeployment](/Objects/workerdeployment/)** - Represents a worker deployment result

diff --git a/src/docs/src/Objects/chatresponse.md b/src/docs/src/Objects/chatresponse.md
@@ -16,3 +16,9 @@ An object containing the chat message data.
 - `content` (String) - The content of the message.
 
 - `tool_calls` (Array) - An optional array of [`ToolCall`](/Objects/toolcall) objects if the model wants to call tools.
+
+- `tool_call_id` (String) - An optional identifier linking this message to the tool call it responds to.
+
+- `cache_control` (Object) - An optional object controlling prompt caching for this message. Contains a `type` (String) property.
+
+- `images` (Array) - An array of image content objects associated with the message. Each object contains a `type` (String) and an `image_url` object with a `url` (String) property.
diff --git a/src/docs/src/Objects/chatresponsechunk.md b/src/docs/src/Objects/chatresponsechunk.md
@@ -5,8 +5,44 @@ description: The ChatResponseChunk object containing a chunk of streaming chat r
 
 The `ChatResponseChunk` object containing a chunk of streaming chat response data.
 
+Each chunk has a `type` indicating its kind. The other attributes that are present depend on that `type`.
+
 ## Attributes
 
+#### `type` (String)
+
+The kind of chunk. One of:
+
+- `"text"` - A portion of the response text.
+- `"reasoning"` - A portion of the model's reasoning/thinking output.
+- `"tool_use"` - A tool/function the model wants to call.
+- `"extra_content"` - Provider-specific metadata.
+- `"usage"` - Token usage totals, emitted as the final chunk.
+
 #### `text` (String)
 
-A string containing a portion of the chat response text in streaming mode.
+A portion of the chat response text. Present on `text` chunks.
+
+#### `reasoning` (String)
+
+A portion of the model's reasoning output. Present on `reasoning` chunks.
+
+#### `id` (String)
+
+The unique identifier for the tool call. Present on `tool_use` chunks.
+
+#### `name` (String)
+
+The name of the function/tool to call. Present on `tool_use` chunks.
+
+#### `input` (Object)
+
+The parsed arguments for the tool call. Present on `tool_use` chunks.
+
+#### `extra_content`
+
+Provider-specific metadata attached to the stream.
+
+#### `usage` (Object)
+
+An object containing token usage totals. Present on the final `usage` chunk.
diff --git a/src/docs/src/Objects/speech2txtresult.md b/src/docs/src/Objects/speech2txtresult.md
@@ -18,3 +18,16 @@ A string containing the detected or specified language of the audio.
 #### `segments` (Array)
 
 An optional array of segment objects containing detailed transcription information.
+
+#### `duration` (Number)
+
+An optional duration of the audio in seconds. Provider-dependent (e.g. returned by xAI).
+
+#### `words` (Array)
+
+An optional array of per-word timestamp objects. Provider-dependent (e.g. returned by xAI). Each word has:
+
+- `text` (String): The transcribed word.
+- `start` (Number): Start time of the word in seconds.
+- `end` (Number): End time of the word in seconds.
+- `speaker` (String): Detected speaker, present when `diarize: true`.
diff --git a/src/docs/src/Objects/ttsengine.md b/src/docs/src/Objects/ttsengine.md
@@ -0,0 +1,24 @@
+---
+title: TTSEngine
+description: The TTSEngine object describing an available text-to-speech engine/model.
+---
+
+The `TTSEngine` object describes a text-to-speech engine/model available from a provider, including pricing metadata where available. Arrays of these objects are returned by [`puter.ai.txt2speech.listEngines()`](/AI/txt2speech.listEngines).
+
+## Attributes
+
+#### `id` (String)
+
+The engine/model identifier.
+
+#### `name` (String)
+
+A human-readable engine name.
+
+#### `provider` (String)
+
+The provider this engine belongs to, e.g. `'aws-polly'`, `'openai'`, `'elevenlabs'`, `'gemini'`, `'xai'`.
+
+#### `pricing_per_million_chars` (Number)
+
+An optional cost per million characters. May be absent when the provider does not expose pricing.
diff --git a/src/docs/src/Objects/ttsvoice.md b/src/docs/src/Objects/ttsvoice.md
@@ -0,0 +1,44 @@
+---
+title: TTSVoice
+description: The TTSVoice object describing an available text-to-speech voice.
+---
+
+The `TTSVoice` object describes a text-to-speech voice available from a provider, including metadata such as language, category, and supported models/engines. Arrays of these objects are returned by [`puter.ai.txt2speech.listVoices()`](/AI/txt2speech.listVoices).
+
+## Attributes
+
+#### `id` (String)
+
+The voice identifier to pass to [`puter.ai.txt2speech()`](/AI/txt2speech).
+
+#### `name` (String)
+
+A human-readable voice name.
+
+#### `provider` (String)
+
+The provider this voice belongs to, e.g. `'aws-polly'`, `'openai'`, `'elevenlabs'`, `'gemini'`, `'xai'`.
+
+#### `language` (Object)
+
+An optional object describing the voice's language. Contains a `name` (String) and a `code` (String) property. May be absent.
+
+#### `description` (String)
+
+An optional short description of the voice. May be absent.
+
+#### `category` (String)
+
+An optional voice category, e.g. `'premade'`. May be absent.
+
+#### `labels` (Object)
+
+An optional object of provider-specific labels. May be absent.
+
+#### `supported_models` (Array)
+
+An optional array of model IDs (Strings) this voice works with. May be absent.
+
+#### `supported_engines` (Array)
+
+An optional array of engine types (Strings) this voice supports. May be absent.
diff --git a/src/docs/src/playground/examples/ai-speech2txt.html b/src/docs/src/playground/examples/ai-speech2txt.html
@@ -4,7 +4,7 @@
     <script>
     (async () => {
         const transcript = await puter.ai.speech2txt('https://assets.puter.site/example.mp3');
-        puter.print('Transcript:', transcript.text ?? transcript);
+        puter.print('Transcript:', transcript.text);
     })();
     </script>
 </body>

diff --git a/src/docs/src/sidebar.js b/src/docs/src/sidebar.js
@@ -1243,6 +1243,20 @@ let sidebar = [
                 source: '/Objects/subdomain.md',
                 path: '/Objects/subdomain',
             },
+            {
+                title: '<code>TTSEngine</code>',
+                title_tag: 'TTSEngine',
+                icon: '/assets/img/object.svg',
+                source: '/Objects/ttsengine.md',
+                path: '/Objects/ttsengine',
+            },
+            {
+                title: '<code>TTSVoice</code>',
+                title_tag: 'TTSVoice',
+                icon: '/assets/img/object.svg',
+                source: '/Objects/ttsvoice.md',
+                path: '/Objects/ttsvoice',
+            },
             {
                 title: '<code>ToolCall</code>',
                 title_tag: 'ToolCall',