Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
AndreFCruz committed Jul 4, 2024
1 parent e44ed71 commit 7d0551d
Show file tree
Hide file tree
Showing 4 changed files with 326 additions and 55 deletions.
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,13 +116,11 @@ By evaluating LLMs on tabular classification tasks, we can use standard feature

You can do so yourself by calling `folktexts.cli.eval_feature_importance` (add `--help` for a full list of options).

Here's an example for the Llama3-70B-Instruct model on the ACSIncome task:
Here's an example for the Llama3-70B-Instruct model on the ACSIncome task (*warning: takes 24h on an Nvidia H100*):
```
python -m folktexts.cli.eval_feature_importance --model 'meta-llama/Meta-Llama-3-70B-Instruct' --task ACSIncome --subsampling 0.1
```

Here are the plotted results:
![feat-imp_llama3-70b.png](feat-imp_llama3-70b.png)
![feat-imp_llama3-70b.png](docs/_static/feat-imp_meta-llama--Meta-Llama-3-70B-Instruct.png)

This script uses sklearn's [`permutation_importance`](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance) to assess which features contribute the most for the ROC AUC metric (other metrics can be assessed using the `--scorer [scorer]` parameter).

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 1 addition & 3 deletions folktexts/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,6 @@ def __hash__(self) -> int:
class CalibrationBenchmark:
"""A benchmark class for measuring and evaluating LLM calibration."""

DEFAULT_BENCHMARK_METRIC = "ece"

"""
Standardized configurations for the ACS data to use for benchmarking.
"""
Expand Down Expand Up @@ -260,7 +258,7 @@ def run(self, results_root_dir: str | Path, fit_threshold: int | bool = 0) -> fl
# Save results to disk
self.save_results()

return self._results[self.DEFAULT_BENCHMARK_METRIC]
return self._results

def plot_results(self, *, show_plots: bool = True):
"""Render evaluation plots and save to disk.
Expand Down
371 changes: 323 additions & 48 deletions notebooks/parse-feature-importance-results.ipynb

Large diffs are not rendered by default.

0 comments on commit 7d0551d

Please sign in to comment.