Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/docs/src/AI/chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ An object containing the following properties:
- `temperature` (Number) - A number between 0 and 2 indicating the randomness of the completion. Lower values make the output more focused and deterministic, while higher values make it more random. By default, the specific model's temperature is used.
- `tools` (Array) (Optional) - Function definitions the AI can call. See [Function Calling](#function-calling) for details.
- `reasoning_effort` / `reasoning.effort` (String) (Optional) - Controls how much effort reasoning models spend thinking. Supported values: `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. Lower values give faster responses with less reasoning. OpenAI models only.
- `text` / `text_verbosity` (String) (Optional) - Controls how long or short responses are. Supported values: `low`, `medium`, and `high`. Lower values give shorter responses. OpenAI models only.
- `verbosity` / `text.verbosity` (String) (Optional) - Controls how long or short responses are. Supported values: `low`, `medium`, and `high`. Lower values give shorter responses. OpenAI models only.

#### `testMode` (Boolean) (Optional)

Expand Down
8 changes: 4 additions & 4 deletions src/docs/src/AI/speech2txt.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Fine-tune how transcription runs.
- `chunking_strategy` (String): Required for `gpt-4o-transcribe-diarize` inputs longer than 30 seconds (recommend `"auto"`).
- `known_speaker_names` / `known_speaker_references` (Array): Optional diarization references encoded as data URLs.
- `extra_body` (Object): Forwarded verbatim to the OpenAI API for experimental flags.
- `stream` (Boolean): Reserved for future streaming support. Currently rejected when `true`.
- `stream` (Boolean): Reserved for future streaming support. Streaming is not currently supported.
- `test_mode` (Boolean): When `true`, returns a sample response without using credits. Defaults to `false`.

**xAI-specific options** (when `provider: 'xai'`):
Expand All @@ -65,8 +65,8 @@ When `true`, skips the live API call and returns a static sample transcript so y

Returns a `Promise` that resolves to either:

- A string (when `response_format: "text"` or you pass a shorthand `source` with no options), or
- An object of [`Speech2TxtResult`](/Objects/speech2txtresult) containing the transcription payload (including diarization segments, timestamps, etc., depending on the selected model and format).
- A string (when `response_format: "text"`), or
- An object of [`Speech2TxtResult`](/Objects/speech2txtresult) containing the transcription payload (including diarization segments, timestamps, etc., depending on the selected model and format). This is the default, including when you pass a bare `source` with no options.

## Examples

Expand All @@ -79,7 +79,7 @@ Returns a `Promise` that resolves to either:
<script>
(async () => {
const transcript = await puter.ai.speech2txt('https://assets.puter.site/example.mp3');
puter.print('Transcript:', transcript.text ?? transcript);
puter.print('Transcript:', transcript.text);
})();
</script>
</body>
Expand Down
9 changes: 1 addition & 8 deletions src/docs/src/AI/txt2speech.listEngines.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,7 @@ Common aliases are also accepted (e.g. `'eleven'`, `'google'`, `'grok'`).

## Return value

A `Promise` that resolves to an array of engine objects. Each object contains:

| Field | Type | Description |
|-------|------|-------------|
| `id` | `String` | Engine/model identifier |
| `name` | `String` | Human-readable engine name |
| `provider` | `String` | Provider this engine belongs to |
| `pricing_per_million_chars` | `Number` | Cost per million characters (may be absent) |
A `Promise` that resolves to an array of [`TTSEngine`](/Objects/ttsengine) objects.

Example response:

Expand Down
14 changes: 1 addition & 13 deletions src/docs/src/AI/txt2speech.listVoices.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,7 @@ When `options` is a plain string it is treated as an `engine` filter for the def

## Return value

A `Promise` that resolves to an array of voice objects. Each object contains:

| Field | Type | Description |
|-------|------|-------------|
| `id` | `String` | Voice identifier to pass to `txt2speech()` |
| `name` | `String` | Human-readable voice name |
| `provider` | `String` | Provider this voice belongs to |
| `language` | `Object` | `{ name, code }` language info (may be absent) |
| `description` | `String` | Short description of the voice (may be absent) |
| `category` | `String` | Voice category, e.g. `'premade'` (may be absent) |
| `labels` | `Object` | Provider-specific labels (may be absent) |
| `supported_models` | `Array` | Model IDs this voice works with (may be absent) |
| `supported_engines` | `Array` | Engine types this voice supports (may be absent) |
A `Promise` that resolves to an array of [`TTSVoice`](/Objects/ttsvoice) objects.

Example response:

Expand Down
2 changes: 1 addition & 1 deletion src/docs/src/AI/txt2speech.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ Available when `provider: 'xai'`:
| `language` | `String` | BCP-47 language code. Defaults to `'en'`. Supports `'auto'` for auto-detection and 20+ languages |
| `output_format` | `String` | Output codec. Available: `'mp3'` (default), `'wav'`, `'pcm'`, `'mulaw'`, `'alaw'` |

Text supports inline speech tags like `[pause]`, `[laugh]` and wrapping tags like `<whisper>text</whisper>` for expressive delivery. Maximum 15,000 characters per request.
Text supports inline speech tags like `[pause]`, `[laugh]` and wrapping tags like `<whisper>text</whisper>` for expressive delivery.

For more details, see the [xAI TTS documentation](https://x.ai/news/grok-stt-and-tts-apis).

Expand Down
2 changes: 2 additions & 0 deletions src/docs/src/Objects.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Various object types and classes that represent different entities in the Puter
- **[MonthlyUsage](/Objects/monthlyusage/)** - Represents user's monthly resource usage information
- **[Speech2TxtResult](/Objects/speech2txtresult/)** - Represents speech-to-text transcription results
- **[Subdomain](/Objects/subdomain/)** - Represents a subdomain
- **[TTSEngine](/Objects/ttsengine/)** - Represents an available text-to-speech engine/model
- **[TTSVoice](/Objects/ttsvoice/)** - Represents an available text-to-speech voice
- **[ToolCall](/Objects/toolcall/)** - Represents a tool invocation request
- **[User](/Objects/user/)** - Represents a Puter user
- **[WorkerDeployment](/Objects/workerdeployment/)** - Represents a worker deployment result
Expand Down
6 changes: 6 additions & 0 deletions src/docs/src/Objects/chatresponse.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,9 @@ An object containing the chat message data.
- `content` (String) - The content of the message.

- `tool_calls` (Array) - An optional array of [`ToolCall`](/Objects/toolcall) objects if the model wants to call tools.

- `tool_call_id` (String) - An optional identifier linking this message to the tool call it responds to.

- `cache_control` (Object) - An optional object controlling prompt caching for this message. Contains a `type` (String) property.

- `images` (Array) - An array of image content objects associated with the message. Each object contains a `type` (String) and an `image_url` object with a `url` (String) property.
38 changes: 37 additions & 1 deletion src/docs/src/Objects/chatresponsechunk.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,44 @@ description: The ChatResponseChunk object containing a chunk of streaming chat r

The `ChatResponseChunk` object containing a chunk of streaming chat response data.

Each chunk has a `type` indicating its kind. The other attributes that are present depend on that `type`.

## Attributes

#### `type` (String)

The kind of chunk. One of:

- `"text"` - A portion of the response text.
- `"reasoning"` - A portion of the model's reasoning/thinking output.
- `"tool_use"` - A tool/function the model wants to call.
- `"extra_content"` - Provider-specific metadata.
- `"usage"` - Token usage totals, emitted as the final chunk.

#### `text` (String)

A string containing a portion of the chat response text in streaming mode.
A portion of the chat response text. Present on `text` chunks.

#### `reasoning` (String)

A portion of the model's reasoning output. Present on `reasoning` chunks.

#### `id` (String)

The unique identifier for the tool call. Present on `tool_use` chunks.

#### `name` (String)

The name of the function/tool to call. Present on `tool_use` chunks.

#### `input` (Object)

The parsed arguments for the tool call. Present on `tool_use` chunks.

#### `extra_content`

Provider-specific metadata attached to the stream.

#### `usage` (Object)

An object containing token usage totals. Present on the final `usage` chunk.
13 changes: 13 additions & 0 deletions src/docs/src/Objects/speech2txtresult.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,16 @@ A string containing the detected or specified language of the audio.
#### `segments` (Array)

An optional array of segment objects containing detailed transcription information.

#### `duration` (Number)

An optional duration of the audio in seconds. Provider-dependent (e.g. returned by xAI).

#### `words` (Array)

An optional array of per-word timestamp objects. Provider-dependent (e.g. returned by xAI). Each word has:

- `text` (String): The transcribed word.
- `start` (Number): Start time of the word in seconds.
- `end` (Number): End time of the word in seconds.
- `speaker` (String): Detected speaker, present when `diarize: true`.
24 changes: 24 additions & 0 deletions src/docs/src/Objects/ttsengine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: TTSEngine
description: The TTSEngine object describing an available text-to-speech engine/model.
---

The `TTSEngine` object describes a text-to-speech engine/model available from a provider, including pricing metadata where available. Arrays of these objects are returned by [`puter.ai.txt2speech.listEngines()`](/AI/txt2speech.listEngines).

## Attributes

#### `id` (String)

The engine/model identifier.

#### `name` (String)

A human-readable engine name.

#### `provider` (String)

The provider this engine belongs to, e.g. `'aws-polly'`, `'openai'`, `'elevenlabs'`, `'gemini'`, `'xai'`.

#### `pricing_per_million_chars` (Number)

An optional cost per million characters. May be absent when the provider does not expose pricing.
44 changes: 44 additions & 0 deletions src/docs/src/Objects/ttsvoice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: TTSVoice
description: The TTSVoice object describing an available text-to-speech voice.
---

The `TTSVoice` object describes a text-to-speech voice available from a provider, including metadata such as language, category, and supported models/engines. Arrays of these objects are returned by [`puter.ai.txt2speech.listVoices()`](/AI/txt2speech.listVoices).

## Attributes

#### `id` (String)

The voice identifier to pass to [`puter.ai.txt2speech()`](/AI/txt2speech).

#### `name` (String)

A human-readable voice name.

#### `provider` (String)

The provider this voice belongs to, e.g. `'aws-polly'`, `'openai'`, `'elevenlabs'`, `'gemini'`, `'xai'`.

#### `language` (Object)

An optional object describing the voice's language. Contains a `name` (String) and a `code` (String) property. May be absent.

#### `description` (String)

An optional short description of the voice. May be absent.

#### `category` (String)

An optional voice category, e.g. `'premade'`. May be absent.

#### `labels` (Object)

An optional object of provider-specific labels. May be absent.

#### `supported_models` (Array)

An optional array of model IDs (Strings) this voice works with. May be absent.

#### `supported_engines` (Array)

An optional array of engine types (Strings) this voice supports. May be absent.
2 changes: 1 addition & 1 deletion src/docs/src/playground/examples/ai-speech2txt.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<script>
(async () => {
const transcript = await puter.ai.speech2txt('https://assets.puter.site/example.mp3');
puter.print('Transcript:', transcript.text ?? transcript);
puter.print('Transcript:', transcript.text);
})();
</script>
</body>
Expand Down
14 changes: 14 additions & 0 deletions src/docs/src/sidebar.js
Original file line number Diff line number Diff line change
Expand Up @@ -1243,6 +1243,20 @@ let sidebar = [
source: '/Objects/subdomain.md',
path: '/Objects/subdomain',
},
{
title: '<code>TTSEngine</code>',
title_tag: 'TTSEngine',
icon: '/assets/img/object.svg',
source: '/Objects/ttsengine.md',
path: '/Objects/ttsengine',
},
{
title: '<code>TTSVoice</code>',
title_tag: 'TTSVoice',
icon: '/assets/img/object.svg',
source: '/Objects/ttsvoice.md',
path: '/Objects/ttsvoice',
},
{
title: '<code>ToolCall</code>',
title_tag: 'ToolCall',
Expand Down
Loading
Loading