Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the README #15

Merged
merged 1 commit into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/asv_benchmark_main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
- name: Create ASV machine config file
run: asv machine --machine gh-runner --yes

- name: Run Benchmarks - `PR HEAD` vs `main`
- name: Run Benchmarks - `main`
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
Expand Down
32 changes: 31 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,36 @@
# Benchmarks

Benchmark structured generation libraries:
Benchmark suite for the following structured generation libraries:

- [Outlines](https://github.com/outlines-dev/outlines)
- [lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer)


## Motivation

Discussions around the performance of different structured generation methods tend to revolve around misconceptions. This repository aims at grounding the debate by offering a benchmarking suite for the different implementations. The benchmarking suite is public, and we accept pull requests.

Different methods make different trade-offs, and it is important to know when a method is faster than another. We will highlight differences, ideally using minimum pathological examples.


## Explanations

We do not use models to run the benchmarks, as it would lead to increased runtime, more complex code, and unpredictable generation lengths. We instead take a string in the language of the regular expressions / JSON Schemas, tokenize it and iterate over it pretending these were generated tokens.

### Outlines

If you look at the [benchmarking suite for Outlines](https://github.com/outlines-dev/benchmarks/blob/main/src/outlines.py) you will notice that we execute:

``` python
Regexguide("a", tokenizer)
```

in the initialization phase of the benchmark. This serves two purposes:

1. JIT-compile the functions decorated with `@numba.njit`;
2. Convert vocabulary strings to Numba types.

This only ever needs to be done once, possibly while loading the model, and could be made to disappear using Ahead Of Time compilation. In this benchmarking suite we thus measure:

1. The time it takes to compile the index corresponding to a regular expression;
2. The time it takes to look for valid tokens when generating text.
4 changes: 0 additions & 4 deletions src/lfe.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,6 @@

case = [
(r"\d{3}-\d{2}-\d{4}", "203-22-1234"),
(
r"(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?",
"https://www.dottxt.co",
),
(
r"(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?",
"https://github.com/outlines-dev/outlines",
Expand Down
4 changes: 0 additions & 4 deletions src/outlines.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,6 @@

case = [
(r"\d{3}-\d{2}-\d{4}", "203-22-1234"),
(
r"(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?",
"https://www.dottxt.co",
),
(
r"(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?",
"https://github.com/outlines-dev/outlines",
Expand Down
Loading