How to log model response? #84

TundeAtSN · 2025-02-21T21:52:33Z

Hello,

For tasks outside of LM Eval Harness, how do I get the model's generated responses?

neginraoof · 2025-02-22T06:29:12Z

This is currently a work in progress to log generated responses. cc @esfrankel
Would you please let us know if you are asking about any specific benchmarks?

TundeAtSN · 2025-02-22T06:33:23Z

I'm primarily interested in coding and math tasks.

I took a closer look and it seems like the convention is for the model outputs to be written to a temporary file which is then cleaned up once metrics are computed. For now, I copy over the output file before the cleanup is performed.

neginraoof · 2025-02-24T06:36:08Z

Sounds good. For AIME, AMC, MATH500, LCB, and GPQADiamond, the output file is not temporary. We will send you an update once our unified logging is ready.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to log model response? #84

How to log model response? #84

TundeAtSN commented Feb 21, 2025

neginraoof commented Feb 22, 2025

TundeAtSN commented Feb 22, 2025

neginraoof commented Feb 24, 2025

How to log model response? #84

How to log model response? #84

Comments

TundeAtSN commented Feb 21, 2025

neginraoof commented Feb 22, 2025

TundeAtSN commented Feb 22, 2025

neginraoof commented Feb 24, 2025