Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.1.12 Release #272

Merged
merged 162 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
7ccbaff
enabled toast instead of alert, and fixed previous streaming toast no…
CharlieJCJ Nov 21, 2024
9cf6963
Merge branch 'dev' into CURATOR-48-curator-viewer-ux-change-the-copy-…
CharlieJCJ Nov 23, 2024
7af82be
Update README.md
madiator Nov 26, 2024
c3cdbc2
small fix
RyanMarten Dec 6, 2024
e02a31e
Merge pull request #222 from bespokelabsai/ryanm/patch-3
vutrung96 Dec 6, 2024
47a4293
fix tests module colliding with litellm
RyanMarten Dec 6, 2024
2b0c8e6
rename
RyanMarten Dec 6, 2024
e1699f7
update doc strings
RyanMarten Dec 6, 2024
5fc189c
Merge pull request #224 from bespokelabsai/ryanm/fix-tests
CharlieJCJ Dec 6, 2024
6f527e2
Merge pull request #165 from bespokelabsai/CURATOR-48-curator-viewer-…
vutrung96 Dec 7, 2024
20df3ae
use hf datasets pickler to fix path-dependent caching
vutrung96 Dec 9, 2024
bacc2be
Merge pull request #230 from bespokelabsai/trung/path-caching
RyanMarten Dec 9, 2024
00bc52e
print out unsubmitted request files
vutrung96 Dec 9, 2024
deb3853
don't redownload
vutrung96 Dec 9, 2024
a367a72
black
vutrung96 Dec 9, 2024
12a0cba
use max retries
vutrung96 Dec 9, 2024
64bb286
don't delete by default
vutrung96 Dec 9, 2024
f52a15b
change rpm and tpm to have lower default and manually set
RyanMarten Dec 9, 2024
26e49ac
testing and fix logging
RyanMarten Dec 9, 2024
6a3bbb5
use a test call function for both rate limits and costs
RyanMarten Dec 9, 2024
ac5115e
create abstract method get_header_based_rate_limits
RyanMarten Dec 10, 2024
c1af47b
Merge pull request #234 from bespokelabsai/ryanm/manual-rate-limits
vutrung96 Dec 10, 2024
cb525d6
track cross-key batches
vutrung96 Dec 10, 2024
56e6862
revert to auto delete
vutrung96 Dec 10, 2024
4cf843c
black
vutrung96 Dec 10, 2024
2f93299
don't log error
vutrung96 Dec 10, 2024
a2728a6
more logging
vutrung96 Dec 10, 2024
4bb4077
more details
vutrung96 Dec 10, 2024
c1e1142
more debug
vutrung96 Dec 10, 2024
5acee14
relax assert
vutrung96 Dec 10, 2024
e0dd93b
check file exists
vutrung96 Dec 10, 2024
54e1a00
deletion should be best effort
vutrung96 Dec 10, 2024
2802717
black
vutrung96 Dec 10, 2024
cebb126
only look at output_file_id if it's not none
vutrung96 Dec 10, 2024
a22126a
if apikey missing, openaionline error out, instead of halt
CharlieJCJ Dec 10, 2024
c2fb640
black
vutrung96 Dec 10, 2024
67ecbfe
Merge pull request #231 from bespokelabsai/trung/unsubmitted-logging
vutrung96 Dec 10, 2024
492d5fc
Merge pull request #176 from bespokelabsai/madiator-patch-1
RyanMarten Dec 10, 2024
98edbbe
refactor: rename Prompter class to LLM in prompter.py
devin-ai-integration[bot] Dec 10, 2024
4859607
refactor: update import of LLM class in __init__.py
devin-ai-integration[bot] Dec 10, 2024
1ea0618
refactor: update Prompter to LLM in tests and examples
devin-ai-integration[bot] Dec 10, 2024
ea11768
test: update test mocking to use request processors
devin-ai-integration[bot] Dec 10, 2024
86a056a
style: apply black formatting
devin-ai-integration[bot] Dec 10, 2024
5742b21
clearer value error
vutrung96 Dec 11, 2024
f8d8017
more resilient
vutrung96 Dec 11, 2024
7a8c1b6
change logging message
vutrung96 Dec 11, 2024
7f153b2
Merge pull request #244 from bespokelabsai/trung/request-idx-debugging
RyanMarten Dec 11, 2024
0be52cf
black
CharlieJCJ Dec 11, 2024
9977c32
Merge pull request #237 from bespokelabsai/openai-apionline-missing
madiator Dec 11, 2024
020b473
Increase default values for tpm/rpm, otherwise there is no progress.
madiator Dec 12, 2024
2ab3f21
Merge pull request #245 from bespokelabsai/mahesh/fixes1211
vutrung96 Dec 12, 2024
e62082d
refactor: rename Prompter class to LLM in prompter.py
devin-ai-integration[bot] Dec 10, 2024
7b48978
refactor: update import of LLM class in __init__.py
devin-ai-integration[bot] Dec 10, 2024
8fce106
refactor: update Prompter to LLM in tests and examples
devin-ai-integration[bot] Dec 10, 2024
6b1dff1
test: update test mocking to use request processors
devin-ai-integration[bot] Dec 10, 2024
9599c4e
style: apply black formatting
devin-ai-integration[bot] Dec 10, 2024
c8f8143
Remove changes added by Devin that cause failures.
madiator Dec 12, 2024
a0040f6
Merge branch 'devin/1733864386-rename-prompter-to-llm' of https://git…
madiator Dec 12, 2024
8ef295d
Add back the tpm/rpm defaults, that were lost during rebase.
madiator Dec 12, 2024
23c74cb
Merge pull request #242 from bespokelabsai/devin/1733864386-rename-pr…
madiator Dec 12, 2024
5a4a4d7
Rename prompt.py to llm.py. Simplify prompt_formatter and add test.
madiator Dec 12, 2024
9ae7192
Rename the prompter folder to llm as well.
madiator Dec 12, 2024
8221f8a
Run black.
madiator Dec 12, 2024
1030e87
Add more docstrings and remove checks which are already in PromptFoma…
madiator Dec 12, 2024
f564d57
Merge pull request #246 from bespokelabsai/mahesh/rename_prompter
RyanMarten Dec 12, 2024
8f08d32
Add more docstrings and remove checks which are already in PromptFoma…
madiator Dec 12, 2024
3779605
Merge branch 'mahesh/refactor_llm' of https://github.com/bespokelabsa…
madiator Dec 12, 2024
014bdf6
Merge pull request #247 from bespokelabsai/mahesh/refactor_llm
madiator Dec 12, 2024
c714f22
Remove code accidentally added, and add a type shortcut.
madiator Dec 12, 2024
110aa14
Refactor LLM class to use context manager for batch parameters
devin-ai-integration[bot] Dec 13, 2024
843a2e6
Apply black formatting to llm.py
devin-ai-integration[bot] Dec 13, 2024
6d801d8
Move batch processing to top-level context manager
devin-ai-integration[bot] Dec 13, 2024
3d255b1
Merge pull request #250 from bespokelabsai/devin/1734071055-batch-con…
madiator Dec 13, 2024
1042e23
set default max attempts to 10
RyanMarten Dec 13, 2024
d642227
update logging
RyanMarten Dec 13, 2024
fe5c123
add a strict check on requiring all responses
RyanMarten Dec 13, 2024
1a60a89
update logging
RyanMarten Dec 13, 2024
44bb6c0
fast line counting in python
RyanMarten Dec 13, 2024
12da38e
move line count to another file
RyanMarten Dec 13, 2024
9a73e38
Merge branch 'dev' into mahesh/refactor_llm
madiator Dec 13, 2024
9813512
Merge pull request #249 from bespokelabsai/mahesh/refactor_llm
madiator Dec 13, 2024
e7ab351
add max retries arg
RyanMarten Dec 13, 2024
3a2bc17
Revert "Refactor LLM class to use context manager for batch parameters"
madiator Dec 13, 2024
f2b99c6
tests
RyanMarten Dec 13, 2024
414a16a
Merge remote-tracking branch 'origin/dev' into ryanm/raise-on-failed-…
RyanMarten Dec 13, 2024
ec7f965
Merge pull request #254 from bespokelabsai/revert-250-devin/173407105…
RyanMarten Dec 13, 2024
92dd6d5
Merge branch 'dev' into ryanm/raise-on-failed-requests
RyanMarten Dec 13, 2024
a2adafb
addressing comments
RyanMarten Dec 13, 2024
8c9ae10
Add a simple llm interface.
madiator Dec 13, 2024
7674a6d
Update README.
madiator Dec 13, 2024
286be1e
rate limit properties
RyanMarten Dec 13, 2024
585a495
small fix
RyanMarten Dec 13, 2024
f41aa91
update simple online
RyanMarten Dec 13, 2024
82817e5
update simple online
RyanMarten Dec 13, 2024
19089cb
fix typing
RyanMarten Dec 13, 2024
092d2d2
Merge pull request #251 from bespokelabsai/ryanm/raise-on-failed-requ…
RyanMarten Dec 13, 2024
3ef189a
Add some screenshots from the curator-viewer.
madiator Dec 13, 2024
dbaf676
Rename images and update readme.
madiator Dec 13, 2024
245ccdf
Add a simple llm interface.
madiator Dec 13, 2024
53c398d
Update README.
madiator Dec 13, 2024
34bcdbc
Add some screenshots from the curator-viewer.
madiator Dec 13, 2024
ffd8b11
Rename images and update readme.
madiator Dec 13, 2024
5ac33e3
Merge branch 'mahesh/refactor_llm' of https://github.com/bespokelabsa…
madiator Dec 13, 2024
80121c2
Update readme.
madiator Dec 14, 2024
574aee5
Update example.
madiator Dec 14, 2024
419a275
Run black formatter.
madiator Dec 14, 2024
0b3e014
Remove an accidental extra __init__ that was introduced in https://gi…
madiator Dec 14, 2024
8b9086f
Address Ryan's comments.
madiator Dec 14, 2024
6c62052
Replace the dataset image.
madiator Dec 14, 2024
c14233d
Run black.
madiator Dec 14, 2024
3096a29
Merge pull request #255 from bespokelabsai/mahesh/refactor_llm
madiator Dec 14, 2024
45f1fd6
update some logging so it doesn't look like curator is hanging when d…
RyanMarten Dec 14, 2024
081ddff
add rate limit cool down
RyanMarten Dec 14, 2024
d40f4ab
initalize
RyanMarten Dec 14, 2024
9a3da98
linting
RyanMarten Dec 14, 2024
e92f597
Add metadata dict
GeorgiosSmyrnis Dec 14, 2024
879016e
Merge pull request #256 from bespokelabsai/ryanm/rate-limit-cool-down
RyanMarten Dec 14, 2024
0b1254b
num_written -> num_requests
GeorgiosSmyrnis Dec 15, 2024
293ec0a
Change to only regenerate missing files.
GeorgiosSmyrnis Dec 15, 2024
4e6a9bf
Fix inverted logic bug
GeorgiosSmyrnis Dec 15, 2024
034a792
add gemini specific safety settings
CharlieJCJ Dec 15, 2024
cc4ee05
black
CharlieJCJ Dec 15, 2024
3e5a919
Merge pull request #259 from bespokelabsai/gemini-safety
RyanMarten Dec 15, 2024
239badc
raise on None response message
RyanMarten Dec 15, 2024
a76ea42
Merge pull request #262 from bespokelabsai/ryanm/null-response
RyanMarten Dec 15, 2024
ab59f3c
small fixes
RyanMarten Dec 15, 2024
a25cb3c
small fix
RyanMarten Dec 15, 2024
6a2cfd0
small changes'
RyanMarten Dec 15, 2024
a8c894e
increase timeout to 10 minutes
RyanMarten Dec 16, 2024
6e77a81
linting
RyanMarten Dec 16, 2024
59566c4
Rename function
GeorgiosSmyrnis Dec 16, 2024
0470c6e
Merge pull request #257 from GeorgiosSmyrnis/add_metadata
RyanMarten Dec 16, 2024
739c153
Merge pull request #263 from bespokelabsai/ryanm/null-response
madiator Dec 16, 2024
06cf8ce
small fixes
RyanMarten Dec 16, 2024
999e579
clean up retry logging a bit
RyanMarten Dec 16, 2024
9fd48cf
change logs
RyanMarten Dec 16, 2024
75b6439
Merge pull request #264 from bespokelabsai/ryanm/retry-and-content-fi…
RyanMarten Dec 16, 2024
df1e308
default timeout at 10 minutes
RyanMarten Dec 16, 2024
25bc880
convert response to response format and throw
RyanMarten Dec 16, 2024
0fd6f67
Merge pull request #265 from bespokelabsai/ryanm/timeout
madiator Dec 16, 2024
2e12ad0
structured output gets tool finish reason
RyanMarten Dec 17, 2024
3956d77
logging update
RyanMarten Dec 17, 2024
503f111
Merge pull request #267 from bespokelabsai/ryanm/finish-reason-patch
madiator Dec 17, 2024
9b2ca6d
change logging for cache verification
RyanMarten Dec 17, 2024
b4a5d4f
linting
RyanMarten Dec 17, 2024
84d5811
change to warning
RyanMarten Dec 17, 2024
349093e
fix exception logging
RyanMarten Dec 17, 2024
4b25c3f
bump version
RyanMarten Dec 17, 2024
1fe93f9
move convert function to prompt formatter
RyanMarten Dec 17, 2024
c99978f
Merge pull request #266 from bespokelabsai/ryanm/retry-on-response-fo…
RyanMarten Dec 17, 2024
3fb5dad
Merge pull request #269 from bespokelabsai/ryanm/0.1.12-release
RyanMarten Dec 17, 2024
7246ec7
Merge pull request #268 from bespokelabsai/ryanm/patch-2
RyanMarten Dec 17, 2024
bb5eea2
add prism types
RyanMarten Dec 17, 2024
973d710
fix test different files
RyanMarten Dec 17, 2024
a208c62
just skip the tests for now, we also need to get mocking working
RyanMarten Dec 17, 2024
125a83f
Merge pull request #270 from bespokelabsai/ryanm/patch-4
RyanMarten Dec 17, 2024
0a9ca56
all tests passing now
RyanMarten Dec 17, 2024
9730c56
Merge branch 'dev' into ryanm/update-tests
RyanMarten Dec 17, 2024
40d2c4a
add back cwd
RyanMarten Dec 17, 2024
b39b19b
register cache dir and break up litellm into n tests"
RyanMarten Dec 17, 2024
38649ca
all tests working
RyanMarten Dec 17, 2024
5013908
Merge pull request #271 from bespokelabsai/ryanm/update-tests
RyanMarten Dec 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.venv
.DS_Store
__pycache__
.vscode

Expand Down
85 changes: 67 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,12 @@
<a href="https://discord.gg/KqpXvpzVBS">
<img alt="Discord" src="https://img.shields.io/discord/1230990265867698186">
</a>
<a href="https://github.com/psf/black">
<img alt="Code style: black" src="https://img.shields.io/badge/Code%20style-black-000000.svg">
</a>
</p>

### Overview
## Overview

Bespoke Curator makes it very easy to create high-quality synthetic data at scale, which you can use to finetune models or use for structured data extraction at scale.

Expand All @@ -35,56 +38,99 @@ Bespoke Curator is an open-source project:
* A Curator Viewer which makes it easy to view the datasets, thus aiding in the dataset creation.
* We will also be releasing high-quality datasets that should move the needle on post-training.

### Key Features
## Key Features

1. **Programmability and Structured Outputs**: Synthetic data generation is lot more than just using a single prompt -- it involves calling LLMs multiple times and orchestrating control-flow. Curator treats structured outputs as first class citizens and helps you design complex pipelines.
2. **Built-in Performance Optimization**: We often see calling LLMs in loops, or inefficient implementation of multi-threading. We have baked in performance optimizations so that you don't need to worry about those!
3. **Intelligent Caching and Fault Recovery**: Given LLM calls can add up in cost and time, failures are undesirable but sometimes unavoidable. We cache the LLM requests and responses so that it is easy to recover from a failure. Moreover, when working on a multi-stage pipeline, caching of stages makes it easy to iterate.
4. **Native HuggingFace Dataset Integration**: Work directly on HuggingFace Dataset objects throughout your pipeline. Your synthetic data is immediately ready for fine-tuning!
5. **Interactive Curator Viewer**: Improve and iterate on your prompts using our built-in viewer. Inspect LLM requests and responses in real-time, allowing you to iterate and refine your data generation strategy with immediate feedback.

### Installation
## Installation

```bash
pip install bespokelabs-curator
```

### Usage
## Usage
To run the examples below, make sure to set your OpenAI API key in
the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.

### Hello World with `SimpleLLM`: A simple interface for calling LLMs

```python
from bespokelabs import curator
llm = curator.SimpleLLM(model_name="gpt-4o-mini")
poem = llm("Write a poem about the importance of data in AI.")
print(poem)
# Or you can pass a list of prompts to generate multiple responses.
poems = llm(["Write a poem about the importance of data in AI.",
"Write a haiku about the importance of data in AI."])
print(poems)
```
Note that retries and caching are enabled by default.
So now if you run the same prompt again, you will get the same response, pretty much instantly.
You can delete the cache at `~/.cache/curator`.

#### Use LiteLLM backend for calling other models
You can use the [LiteLLM](https://docs.litellm.ai/docs/providers) backend for calling other models.

```python
from bespokelabs import curator
llm = curator.SimpleLLM(model_name="claude-3-5-sonnet-20240620", backend="litellm")
poem = llm("Write a poem about the importance of data in AI.")
print(poem)
```

### Visualize in Curator Viewer
Run `curator-viewer` on the command line to see the dataset in the viewer.

You can click on a run and then click on a specific row to see the LLM request and response.
![Curator Responses](docs/curator-responses.png)
More examples below.

### `LLM`: A more powerful interface for synthetic data generation

Let's use structured outputs to generate poems.
```python
from bespokelabs import curator
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List

# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict({"topic": [
"Urban loneliness in a bustling city",
"Beauty of Bespoke Labs's Curator library"
]})
```

# Define a class to encapsulate a list of poems.
Define a class to encapsulate a list of poems.
```python
class Poem(BaseModel):
poem: str = Field(description="A poem.")

class Poems(BaseModel):
poems_list: List[Poem] = Field(description="A list of poems.")
```


# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
# `prompt_func` takes a row of the dataset as input.
# `row` is a dictionary with a single key 'topic' in this case.
We define an `LLM` object that generates poems which gets applied to the topics dataset.
```python
poet = curator.LLM(
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
model_name="gpt-4o-mini",
response_format=Poems,
# `row` is the input row, and `poems` is the `Poems` class which
# is parsed from the structured output from the LLM.
parse_func=lambda row, poems: [
{"topic": row["topic"], "poem": p.poem} for p in poems.poems_list
],
)
```
Here:
* `prompt_func` takes a row of the dataset as input and returns the prompt for the LLM.
* `response_format` is the structured output class we defined above.
* `parse_func` takes the input (`row`) and the structured output (`poems`) and converts it to a list of dictionaries. This is so that we can easily convert the output to a HuggingFace Dataset object.

Now we can apply the `LLM` object to the dataset, which reads very pythonic.
```python
poem = poet(topics)
print(poem.to_pandas())
# Example output:
Expand All @@ -94,14 +140,11 @@ print(poem.to_pandas())
# 2 Beauty of Bespoke Labs's Curator library In whispers of design and crafted grace,\nBesp...
# 3 Beauty of Bespoke Labs's Curator library In the hushed breath of parchment and ink,\nBe...
```
Note that `topics` can be created with `curator.Prompter` as well,
Note that `topics` can be created with `curator.LLM` as well,
and we can scale this up to create tens of thousands of diverse poems.
You can see a more detailed example in the [examples/poem.py](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples/poem.py) file,
and other examples in the [examples](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples) directory.

To run the examples, make sure to set your OpenAI API key in
the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.

See the [docs](https://docs.bespokelabs.ai/) for more details as well as
for troubleshooting information.

Expand All @@ -115,6 +158,12 @@ curator-viewer

This will pop up a browser window with the viewer running on `127.0.0.1:3000` by default if you haven't specified a different host and port.

The dataset viewer shows all the different runs you have made.
![Curator Runs](docs/curator-runs.png)

You can also see the dataset and the responses from the LLM.
![Curator Dataset](docs/curator-dataset.png)


Optional parameters to run the viewer on a different host and port:
```bash
Expand Down Expand Up @@ -152,4 +201,4 @@ npm -v # should print `10.9.0`
```

## Contributing
Contributions are welcome!
Contributions are welcome!
9 changes: 2 additions & 7 deletions bespoke-dataset-viewer/app/dataset/[runHash]/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,6 @@ export default async function DatasetPage({
const { runHash } = await params
const { batchMode } = await searchParams
const isBatchMode = batchMode === '1'
return (
<html lang="en" suppressHydrationWarning>
<body>
<DatasetViewer runHash={runHash} batchMode={isBatchMode} />
</body>
</html>
)

return <DatasetViewer runHash={runHash} batchMode={isBatchMode} />
}
9 changes: 5 additions & 4 deletions bespoke-dataset-viewer/app/layout.tsx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import type { Metadata } from "next";
import "./globals.css";

import { Toaster } from "@/components/ui/toaster"

export const metadata: Metadata = {
title: "Curator Viewer",
Expand All @@ -13,10 +13,11 @@ export default function RootLayout({
children: React.ReactNode
}) {
return (
<html lang="en" suppressHydrationWarning>
<body suppressHydrationWarning>
<html lang="en" suppressHydrationWarning>
<body suppressHydrationWarning>
{children}
<Toaster />
</body>
</html>
)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,34 @@ import { Copy } from "lucide-react"
import { DataItem } from "@/types/dataset"
import { useCallback } from "react"
import { Sheet, SheetContent } from "@/components/ui/sheet"
import { useToast } from "@/components/ui/use-toast"

interface DetailsSidebarProps {
item: DataItem | null
onClose: () => void
}

export function DetailsSidebar({ item, onClose }: DetailsSidebarProps) {
const { toast } = useToast()

const copyToClipboard = useCallback(async (text: string) => {
try {
await navigator.clipboard.writeText(text)
alert("Copied to clipboard!")
toast({
title: "Success",
description: "Copied to clipboard!",
duration: 2000,
})
} catch (err) {
console.error("Failed to copy:", err)
alert("Failed to copy to clipboard")
toast({
variant: "destructive",
title: "Error",
description: "Failed to copy to clipboard",
duration: 2000,
})
}
}, [])
}, [toast])

if (!item) return null

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ class Poems(BaseModel):
poems_list: List[Poem] = Field(description="A list of poems.")
# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
# We define an LLM object that generates poems which gets applied to the topics dataset.
poet = curator.LLM(
# prompt_func takes a row of the dataset as input.
# row is a dictionary with a single key 'topic' in this case.
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
Expand Down
2 changes: 1 addition & 1 deletion bespoke-dataset-viewer/components/ui/use-toast.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import type {
} from "@/components/ui/toast"

const TOAST_LIMIT = 1
const TOAST_REMOVE_DELAY = 1000000
const TOAST_REMOVE_DELAY = 3000

type ToasterToast = ToastProps & {
id: string
Expand Down
8 changes: 8 additions & 0 deletions bespoke-dataset-viewer/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions bespoke-dataset-viewer/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
},
"devDependencies": {
"@types/node": "^20",
"@types/prismjs": "^1.26.5",
"@types/react": "^18",
"@types/react-dom": "^18",
"eslint": "^8",
Expand Down
2 changes: 1 addition & 1 deletion build_pkg.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ def nextjs_build():
def run_pytest():
print("Running pytest")
try:
run_command("pytest", cwd="tests")
run_command("pytest")
except subprocess.CalledProcessError:
print("Pytest failed. Aborting build.")
sys.exit(1)
Expand Down
Binary file added docs/curator-dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/curator-responses.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/curator-runs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions examples/camel.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ class QAs(BaseModel):
qas: List[QA] = Field(description="A list of QAs")


subject_prompter = curator.Prompter(
subject_prompter = curator.LLM(
prompt_func=lambda: f"Generate a diverse list of 3 subjects. Keep it high-level (e.g. Math, Science).",
parse_func=lambda _, subjects: [subject for subject in subjects.subjects],
model_name="gpt-4o-mini",
response_format=Subjects,
)
subject_dataset = subject_prompter()
subsubject_prompter = curator.Prompter(
subsubject_prompter = curator.LLM(
prompt_func=lambda subject: f"For the given subject {subject}. Generate 3 diverse subsubjects. No explanation.",
parse_func=lambda subject, subsubjects: [
{"subject": subject["subject"], "subsubject": subsubject.subject}
Expand All @@ -40,7 +40,7 @@ class QAs(BaseModel):
)
subsubject_dataset = subsubject_prompter(subject_dataset)

qa_prompter = curator.Prompter(
qa_prompter = curator.LLM(
prompt_func=lambda subsubject: f"For the given subsubject {subsubject}. Generate 3 diverse questions and answers. No explanation.",
model_name="gpt-4o-mini",
response_format=QAs,
Expand Down
2 changes: 1 addition & 1 deletion examples/distill.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def parse_func(row, response):
return {"instruction": instruction, "new_response": response}


distill_prompter = curator.Prompter(
distill_prompter = curator.LLM(
prompt_func=prompt_func,
parse_func=parse_func,
model_name="gpt-4o-mini",
Expand Down
2 changes: 1 addition & 1 deletion examples/litellm_recipe_prompting.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def main():
# 3. Set environment variable: GEMINI_API_KEY
#############################################

recipe_prompter = curator.Prompter(
recipe_prompter = curator.LLM(
model_name="gemini/gemini-1.5-flash",
prompt_func=lambda row: f"Generate a random {row['cuisine']} recipe. Be creative but keep it realistic.",
parse_func=lambda row, response: {
Expand Down
4 changes: 2 additions & 2 deletions examples/litellm_recipe_structured_output.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def main():
# 2. Generate an API key or use an existing API key
# 3. Set environment variable: ANTHROPIC_API_KEY
#############################################
cuisines_generator = curator.Prompter(
cuisines_generator = curator.LLM(
prompt_func=lambda: f"Generate 10 diverse cuisines.",
model_name="claude-3-5-haiku-20241022",
response_format=Cuisines,
Expand All @@ -44,7 +44,7 @@ def main():
# 2. Generate an API key or use an existing API key
# 3. Set environment variable: GEMINI_API_KEY
#############################################
recipe_prompter = curator.Prompter(
recipe_prompter = curator.LLM(
model_name="gemini/gemini-1.5-flash",
prompt_func=lambda row: f"Generate a random {row['cuisine']} recipe. Be creative but keep it realistic.",
parse_func=lambda row, response: {
Expand Down
2 changes: 1 addition & 1 deletion examples/persona-hub/synthesize.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def get_generator(template):
def prompt_func(row):
return template.format(persona=row["persona"])

generator = curator.Prompter(
generator = curator.LLM(
prompt_func=prompt_func,
model_name="gpt-4o",
temperature=0.7,
Expand Down
6 changes: 3 additions & 3 deletions examples/poem.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ class Topics(BaseModel):


# We define a prompter that generates topics.
topic_generator = curator.Prompter(
topic_generator = curator.LLM(
prompt_func=lambda: "Generate 10 diverse topics that are suitable for writing poems about.",
model_name="gpt-4o-mini",
response_format=Topics,
Expand All @@ -35,8 +35,8 @@ class Poems(BaseModel):
poems_list: List[str] = Field(description="A list of poems.")


# We define a prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
# We define an `LLM` object that generates poems which gets applied to the topics dataset.
poet = curator.LLM(
# The prompt_func takes a row of the dataset as input.
# The row is a dictionary with a single key 'topic' in this case.
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
Expand Down
Loading
Loading