Result evaluation and comparison #16

DonggeLiu · 2020-02-04T21:13:15Z

Can we access the results of other tools in the competition?
It's best to have them in CSV files (e.g. like what we generated in pre-competition experiments) so that we can cherry-pick benchmarks according to Legion's compatibility and only compare those scores.
I failed to reproduce the final score of each tool in the competition from the score of each category with the formula from the Google Sheets of our pre-competition experiments:
- This is important as we want to compute the score of our new experiments
- How are the final scores computed from the scores of each category
- Did they remove the results of some benchmarks? For example, SQLite-MemSafety has only 1 task where everyone got 0; some benchmarks from other sets have the same problem. How did they deal with them?
- By normalisation, do they mean simply taking averages? (i.e. like we did in our pre-competition experiments)

The text was updated successfully, but these errors were encountered:

DonggeLiu · 2020-02-19T00:22:41Z

Issue 2 is explained by rounding errors.

DonggeLiu added the Benchmarking Issues related to benchmarking label Feb 4, 2020

DonggeLiu assigned gernst Feb 4, 2020

Provide feedback