You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is currently a work in progress to log generated responses. cc @esfrankel
Would you please let us know if you are asking about any specific benchmarks?
I'm primarily interested in coding and math tasks.
I took a closer look and it seems like the convention is for the model outputs to be written to a temporary file which is then cleaned up once metrics are computed. For now, I copy over the output file before the cleanup is performed.
Sounds good. For AIME, AMC, MATH500, LCB, and GPQADiamond, the output file is not temporary. We will send you an update once our unified logging is ready.
Hello,
For tasks outside of LM Eval Harness, how do I get the model's generated responses?
The text was updated successfully, but these errors were encountered: