Skip to content
This repository was archived by the owner on Mar 16, 2024. It is now read-only.

Commit b55f688

Browse files
Merge pull request #28 from emrgnt-cmplxty/feature/checkin-new-results
check in work in progress
2 parents 55b1874 + fc794d1 commit b55f688

19 files changed

+5885
-7166
lines changed

.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
OPENAI_API_KEY=your_openai_key
22
ANTHROPIC_API_KEY=your_anthropic_key
33
LEETCODE_SESSIONS=your_leetcode_sessions,separated,by,comma
4+
HF_TOKEN=your_huggingface_token

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -73,12 +73,12 @@ To see explicit commands ran to generate the reported results, check out the [co
7373

7474
| Category | gpt-3.5-turbo-0301 | gpt-3.5-turbo-0613 | claude-2 | gpt-4-0314 | gpt-4-0613 | gpt-4 Baseline | Sources |
7575
|----------------------|--------------------|--------------------|----------|------------|------------|----------------|----------|
76-
| HumanEval | 81.7 | 61.5 | 65.2 | 87.2 | 84.1 | 67 | [1] |
77-
| EvalPlus | 71.3 | 54.2 | 54.9 | 79.2 | 74.4 | N/A | |
78-
| LeetCode_100 Easy | 87.0 | 89.0 | 73.0 | 91.0 | 88.0 | 72.2-75.6 | [1,2] |
79-
| LeetCode_100 Medium | 19.0 | 19.0 | 16.0 | 26.0 | 17.0 | 26.2-38.7 | [1,2] |
80-
| LeetCode_100 Hard | 4.0 | 4.0 | 2.0 | 6.0 | 4.0 | 6.7-7 | [1,2] |
81-
| GSM8K | 69.5 | 66.0 | XX | X | X | 87.1 | |
76+
| HumanEval | 67.0 | 61.5 | 65.2 | 86.0 | 84.1 | 67 | [1] |
77+
| EvalPlus | 59.1 | 54.2 | 54.9 | 80.5 | 74.4 | N/A | |
78+
| LeetCode_100 Easy | 87.0 | 80.0 | 73.0 | 91.0 | 88.0 | 72.2-75.6 | [1,2] |
79+
| LeetCode_100 Medium | 19.0 | 16.0 | 16.0 | 26.0 | 21.0 | 26.2-38.7 | [1,2] |
80+
| LeetCode_100 Hard | 4.0 | 3.0 | 2.0 | 6.0 | 6.0 | 6.7-7 | [1,2] |
81+
| GSM8K | 71.1 | 67.6 | XX | X | X | 87.1 | |
8282
| MATH | XX | XX | XX | 49.0 | 46.4 | 42.2 | [3] |
8383

8484
## License

results/anthropic/gsm8k/claude_2/anthropic_gsm8k__model_eq_claude_2__temperature_eq_0p7.jsonl

Lines changed: 1319 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

results/anthropic/gsm8k/claude_2/anthropic_gsm8k__model_eq_claude_2__temperature_eq_0p7_eval_results.jsonl

Lines changed: 1319 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

results/openai/GSM8K/gpt_3p5_turbo_0301/openai_gsm8k__model_eq_gpt_3p5_turbo_0301__temperature_eq_0p7_eval_results.jsonl

Lines changed: 59 additions & 6654 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

results/openai/GSM8K/gpt_3p5_turbo_0613/openai_gsm8k__model_eq_gpt_3p5_turbo_0613__temperature_eq_0p7_eval_results.jsonl

Lines changed: 1319 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

results/openai/GSM8K/gpt_4_0314/openai_GSM8K__model_eq_gpt_4_0314__temperature_eq_0p7.jsonl

Lines changed: 404 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

results/openai/GSM8K/gpt_4_0613/openai_gsm8k__model_eq_gpt_4_0613__temperature_eq_0p7.jsonl

Lines changed: 858 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

results/openai/human_eval/gpt_3p5_turbo_0301/openai_human_eval__model_eq_gpt_3p5_turbo_0301__temperature_eq_0p7.jsonl

Lines changed: 160 additions & 160 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

results/openai/human_eval/gpt_3p5_turbo_0301/openai_human_eval__model_eq_gpt_3p5_turbo_0301__temperature_eq_0p7_eval_results.json

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)