MALCO

Evaluate LLMs' capability at performing differential diagnosis for rare genetic diseases through medical-vignette-like prompts created with phenopacket2prompt.

Description

To systematically assess and evaluate an LLM's ability to perform differential diagnostics tasks, we employed prompts programatically created with phenopacket2prompt, thereby avoiding any patient privacy issues. The original data are phenopackets located at phenopacket-store. A programmatic approach for scoring and grounding results is also developed, made possible thanks to the ontological structure of the Mondo Disease Ontology.

Two main analyses are carried out:

A benchmark of some large language models against the state of the art tool for differential diagnostics, Exomiser. The bottom line, Exomiser clearly outperforms the LLMs as per our preprint, data at this zenodo.
A comparison of gpt-4o's and Meditron3-70B ability to carry out differential diagnosis when prompted in 10 different languages, results at this medRxiv or zenodo (see also this related link for some more data).

Setup

Dependencies

    poetry install

Note: If you are unfamiliar with poetry, just pip install . and then activate the environment, omitting poetry run from the following instructions.

Configure CurateGPT

    export OPENAI_API_KEY=<your key>
    poetry run curategpt ontology index --index-fields label,definition,relationships -p stagedb -c ont_mondo -m openai: sqlite:obo:mondo

Where the embedding model is selected via the -m argument. For non openAI models, we refer to the CurateGPT documentation.

Usage

Grounding & Scoring Single Response

    cp data/config/default.yaml data/config/<your_model>.yaml
    poetry run malco evaluate --config data/config/meditron3-70b.yaml

Plotting Single Model Results

    poetry run malco plot --config data/config/meditron3-70b.yaml

Plotting Single Model By All Languages Results

    poetry run malco combine --dir data/results --lang ALL

Plotting Multiple Model Results

    poetry run malco combine --dir data/results

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github/workflows		.github/workflows
analysis		analysis
caches		caches
data		data
docs		docs
notebooks		notebooks
src/malco		src/malco
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MALCO

Evaluate LLMs' capability at performing differential diagnosis for rare genetic diseases through medical-vignette-like prompts created with phenopacket2prompt.

Description

Setup

Dependencies

Configure CurateGPT

Usage

Grounding & Scoring Single Response

Plotting Single Model Results

Plotting Single Model By All Languages Results

Plotting Multiple Model Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Languages

License

monarch-initiative/pheval.llm

Folders and files

Latest commit

History

Repository files navigation

MALCO

Evaluate LLMs' capability at performing differential diagnosis for rare genetic diseases through medical-vignette-like prompts created with phenopacket2prompt.

Description

Setup

Dependencies

Configure CurateGPT

Usage

Grounding & Scoring Single Response

Plotting Single Model Results

Plotting Single Model By All Languages Results

Plotting Multiple Model Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Languages

Packages