Skip to content

Commit

Permalink
Implemented remaining ACS columns and prediction tasks (#1)
Browse files Browse the repository at this point in the history
* readme update

* added remaining ACS columns

* added ACS target columns

* created Threshold class

* fixing type annotations

* added healthinsurance task

* minor bug fixes

* hash is now deterministic :)

* fixed random hash changes

* remaining ACS tasks seem to be working

* fixed ACSDataset assignment of new task

* minor fix to setting new task on ACSDataset

* minor change

* minor updates

* minor updates
  • Loading branch information
AndreFCruz committed Jun 24, 2024
1 parent cc65e1b commit 78556f0
Show file tree
Hide file tree
Showing 22 changed files with 1,124 additions and 391 deletions.
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,12 @@
![Documentation status](https://github.com/socialfoundations/folktexts/actions/workflows/python-docs.yml/badge.svg)
![PyPI version](https://badgen.net/pypi/v/folktexts)
![PyPI - License](https://img.shields.io/pypi/l/folktexts)
<!-- ![OSI license](https://badgen.net/pypi/license/folktexts) -->
<!-- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) -->
![Python compatibility](https://badgen.net/pypi/python/folktexts)

Folktexts is a python package to evaluate and benchmark calibration of large
language models.
It enables using any transformers model as a classifier for tabular data tasks,
and extracting risk score estimates from the model's output log-odds.

Folktexts is a python package to compute and evaluate classification risk scores
using large language models.
It enables using any transformers model as a classifier for tabular data tasks.

Several benchmark tasks are provided based on data from the American Community Survey.
Namely, each prediction task from the popular
Expand Down
Binary file added docs/_static/PUMS_Data_Dictionary_2018.pdf
Binary file not shown.
3 changes: 2 additions & 1 deletion folktexts/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from ._version import __version__, __version_info__
from .acs import ACSDataset, ACSTaskMetadata
from .task import TaskMetadata
from .benchmark import BenchmarkConfig, CalibrationBenchmark
from .classifier import LLMClassifier
from .acs import ACSDataset, ACSTaskMetadata
15 changes: 11 additions & 4 deletions folktexts/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from datetime import datetime
from functools import partial, reduce
from pathlib import Path
from contextlib import contextmanager

import numpy as np

Expand Down Expand Up @@ -91,7 +92,13 @@ def standardize_path(path: str | Path) -> str:
return Path(path).expanduser().resolve().as_posix()


def get_thresholded_column_name(column_name: str, threshold: float | int) -> str:
"""Standardizes naming of thresholded columns."""
threshold_str = f"{threshold:.2f}".replace(".", "_") if isinstance(threshold, float) else str(threshold)
return f"{column_name}_binary_{threshold_str}"
@contextmanager
def suppress_logging(new_level):
"""Suppresses all logs of a given level within a context block."""
logger = logging.getLogger()
previous_level = logger.level
logger.setLevel(new_level)
try:
yield
finally:
logger.setLevel(previous_level)
Loading

0 comments on commit 78556f0

Please sign in to comment.