Skip to content

prrao87/structured-outputs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Structured outputs

Structured output benchmark that compares the results from modern tools like BAML and DSPy. Both DSPy and BAML aim to solve the same problem: help developers build modular, reliable AI systems with composable building blocks. However, there are nuanced differences in what building blocks they use and how they are implemented. This repo aims to explore some of those and study the performance on a benchmark dataset of clinical notes for a structured output extraction task.

Dataset

Source of truth

The sample data for structured extraction is a dataset of 2,726 FHIR records of patients and their notes. It's obtained from this Hugging Face dataset, and the parquet files are transformed into JSON. This raw JSON file of FHIR healthcare records (data/raw_fhir.json) serves as our source of truth for evaluating the structured output performance from either approach.

Patient notes

The unstructured data for patient notes is present in this same dataset from Hugging Face. The goal of structured extraction is to extract relevant information (via a schema) from these unstructured notes and store them in a JSON file. This result can then be compared against the source of truth data (which was human-annotated).

See the ./data directory for the raw and processed data files that are used in the experiments.

Setup

It's recommended to install uv to manage the dependencies.

uv sync

Install any additional Python packages via uv add <package_name>.

See the evaluation results in the ./src/baml and ./src/dspy directories for more information.

Takeaways

The experiments clearly show that BAML's schema representation in the prompt sent to the LLM is more concise and token-efficient compared to DSPy's default JSON schema (far more verbose and messy for LLMs to reason about). However, DSPy allows users to define custom adapters, which is very helpful -- we can then compare the effect of the schema representation by writing a custom BAMLAdapter for DSPy that achieves a similar level of performance.

See below for a comparison of the two schema representations.

The results for experiments that use the BAML adapter are shown in the src/dspy directory.

About

Micro benchmark comparing DSPy and BAML for structured outputs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages