Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques
code4bench is now available for download at http://doi.org/10.5281/zenodo.2582968
- Download and unzip file from the given url
- Install mysql version 5.7
- Create database name it “code4bench”
- In MySQL Workbench
a. Server->Data Import
b. Select the extracted folder
c. Push Start Import (it’s may take a time) - Finish
The schema of Code4Bench is drawn below

| Field Name | Description |
|---|---|
| source | |
| id | A unique number |
| submission | ID number given by Codeforces to this submission |
| sourceCode | The submitted source code |
| author | ID number of submitter |
| memory | The memory used by this submission |
| time | The execution time of this submission |
| sent | The submission time by user |
| countLine | The number of lines of code |
| problems_id | Problem ID number |
| verdicts_id | The Codeforces' judgment on this submission |
| languages_id | The language in which this submission is written |
| isduplicated | The submission is unique or duplicated |
| verdicts | |
| id | A unique number |
| name | The name of a judgment |
| languages | |
| id | A unique number |
| name | The name of a programming language |
| problems | |
| id | A unique number |
| fullname | The ID number of competition and name of problem |
| contest | ID number of competition |
| name | ID number of problem section |
| context | The description of problem |
| testcases | |
| id | A unique number |
| inputData | Input data for problem |
| expectedResult | Expected output for problem |
| problems_id | ID number of corresponding problem |
| isValid | Whether test case is complete or deficient |
| user | |
| id | A unique number |
| author_id | The ID of user |
| gender | The user gender |
| age | The user age |
| country | The country in which the user lives |
| state | The state in which the user lives |
| city | The city in which the user lives |
| mainJob | Is programming the user's main job |
| t0_4 | Does you work in time interval 00:00 to 04:00 |
| t4_8 | Does you work in time interval 04:00 to 08:00 |
| t8_12 | Does you work in time interval 08:00 to 12:00 |
| t12_16 | Does you work in time interval 12:00 to 16:00 |
| t16_20 | Does you work in time interval 16:00 to 20:00 |
| t20_24 | Does you work in time interval 20:00 to 24:00 |
| single | Are you single? |
| married | Are you married? |
| divorced | Are you divorced? |
| oneChild | Do you have one child? |
| twoChild | Do you have two children? |
| moreChild | Do you have more than two children |
| educationLevel | Education level from diploma to PhD |
| isFieldCS | Have you graduated in computer science? |
| yearsWork | How many years have you been programming? |
| hours_per_month | How many hours do you work in a month? |
| teamOrAlone | Do you wok alone or as a team member? |
| countries | |
| id | A unique number |
| sortName | Abbr. of each country on which names are sorted |
| name | The full name of a country |
| phoneCode | The area code of a country |
| states | |
| id | A unique number |
| name | The names of a state |
| country_id | The country ID of each state |
| cities | |
| id | A unique number |
| name | The names of a city |
| state_id | The state ID of each city |
| Realfaultslocations AND realfaultslocations_c_cpp | |
| id | A unique number |
| subAccepted | ID number assigned by Codeforces' website to this accepted submission |
| subWrong | ID number assigned by Codeforces' website to this faulty submission |
| change | The number of lines which have been changed |
| changeRate | The percentage of line which have been changed |
| insert | The number of lines which have been added |
| insertRate | The percentage of line which have been added |
| delete | The number of lines which have been deleted |
| deleteRate | The percentage of line which have been deleted |
| faultLocations | The locations of faults in faulty version relative to the correct version |
| countFaults | The number of faults in faulty version relative to the correct version |
| countInsertFaults | The number of addition-type faults in faulty version relative to the correct version |
| countDeleteFaults | The number of deletion-type faults in faulty version relative to the correct version |
| countChangeFaults | The number of change-type faults in faulty version relative to the correct version |
| insertFaultsLocations | The locations of addition-type faults in faulty version relative to the correct version |
| changeFaultsLocations | The locations of change-type faults in faulty version relative to the correct version |
| deleteFaultsLocations | The locations of delete-type faults in faulty version relative to the correct version |
| wSimA | The percentage at which the faulty version is similar to the correct version |
| aSimW | The percentage at which the correct version is similar to the faulty version |
| matchLines | The number of identical lines between the faulty and correct versions |
The number of submissions for each programming language are listed below
| ID | Language | Submission Count |
|---|---|---|
| 1 | GNU C++ 14 | 604,155 |
| 2 | GNU C | 93,492 |
| 3 | MS C++ | 164,912 |
| 4 | GNU C++ 11 | 906,811 |
| 5 | FPC | 47,522 |
| 6 | GNU C++ | 1,167,214 |
| 7 | Java 8 | 154,087 |
| 8 | Python 3 | 52,433 |
| 9 | Go | 3,011 |
| 10 | D | 742 |
| 11 | MS C# | 14,896 |
| 12 | GNU C 11 | 18,574 |
| 13 | Python 2 | 36,469 |
| 14 | PyPy 2 | 4,507 |
| 15 | Ruby | 3,806 |
| 16 | PHP | 2,570 |
| 17 | PyPy 3 | 3,222 |
| 18 | Delphi | 9,698 |
| 19 | Kotlin | 4,739 |
| 20 | JavaScript | 3,020 |
| 21 | Haskell | 3,585 |
| 22 | OCaml | 543 |
| 23 | Scala | 2,131 |
| 24 | Mono C# | 5,199 |
| 25 | Java 7 | 27,931 |
| 26 | Rust | 599 |
| 27 | Perl | 784 |
| 28 | GNU C++ 11 | 1,083 |
| 29 | Java 8 ZIP | 107 |
| 30 | J | 2,673 |
| 31 | GNU C++ 0X | 34,746 |
| 32 | Java 6 | 22,988 |
| 33 | Pike | 4,076 |
| 34 | Befunge | 4,343 |
| 35 | Cobol | 2,114 |
| 36 | Factor | 2,606 |
| 37 | Secret-171 | 158 |
| 38 | Roco | 3,136 |
| 39 | Tcl | 3,752 |
| 40 | F# | 15 |
| 41 | Io | 2,908 |