Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques
code4bench is now available for download at http://doi.org/10.5281/zenodo.2582968
- Download and unzip file from the given url
- Install mysql version 5.7
- Create database name it “code4bench”
- In MySQL Workbench
a. Server->Data Import
b. Select the extracted folder
c. Push Start Import (it’s may take a time) - Finish
The schema of Code4Bench is drawn below
Field Name | Description |
---|---|
source | |
id | A unique number |
submission | ID number given by Codeforces to this submission |
sourceCode | The submitted source code |
author | ID number of submitter |
memory | The memory used by this submission |
time | The execution time of this submission |
sent | The submission time by user |
countLine | The number of lines of code |
problems_id | Problem ID number |
verdicts_id | The Codeforces' judgment on this submission |
languages_id | The language in which this submission is written |
isduplicated | The submission is unique or duplicated |
verdicts | |
id | A unique number |
name | The name of a judgment |
languages | |
id | A unique number |
name | The name of a programming language |
problems | |
id | A unique number |
fullname | The ID number of competition and name of problem |
contest | ID number of competition |
name | ID number of problem section |
context | The description of problem |
testcases | |
id | A unique number |
inputData | Input data for problem |
expectedResult | Expected output for problem |
problems_id | ID number of corresponding problem |
isValid | Whether test case is complete or deficient |
user | |
id | A unique number |
author_id | The ID of user |
gender | The user gender |
age | The user age |
country | The country in which the user lives |
state | The state in which the user lives |
city | The city in which the user lives |
mainJob | Is programming the user's main job |
t0_4 | Does you work in time interval 00:00 to 04:00 |
t4_8 | Does you work in time interval 04:00 to 08:00 |
t8_12 | Does you work in time interval 08:00 to 12:00 |
t12_16 | Does you work in time interval 12:00 to 16:00 |
t16_20 | Does you work in time interval 16:00 to 20:00 |
t20_24 | Does you work in time interval 20:00 to 24:00 |
single | Are you single? |
married | Are you married? |
divorced | Are you divorced? |
oneChild | Do you have one child? |
twoChild | Do you have two children? |
moreChild | Do you have more than two children |
educationLevel | Education level from diploma to PhD |
isFieldCS | Have you graduated in computer science? |
yearsWork | How many years have you been programming? |
hours_per_month | How many hours do you work in a month? |
teamOrAlone | Do you wok alone or as a team member? |
countries | |
id | A unique number |
sortName | Abbr. of each country on which names are sorted |
name | The full name of a country |
phoneCode | The area code of a country |
states | |
id | A unique number |
name | The names of a state |
country_id | The country ID of each state |
cities | |
id | A unique number |
name | The names of a city |
state_id | The state ID of each city |
Realfaultslocations AND realfaultslocations_c_cpp | |
id | A unique number |
subAccepted | ID number assigned by Codeforces' website to this accepted submission |
subWrong | ID number assigned by Codeforces' website to this faulty submission |
change | The number of lines which have been changed |
changeRate | The percentage of line which have been changed |
insert | The number of lines which have been added |
insertRate | The percentage of line which have been added |
delete | The number of lines which have been deleted |
deleteRate | The percentage of line which have been deleted |
faultLocations | The locations of faults in faulty version relative to the correct version |
countFaults | The number of faults in faulty version relative to the correct version |
countInsertFaults | The number of addition-type faults in faulty version relative to the correct version |
countDeleteFaults | The number of deletion-type faults in faulty version relative to the correct version |
countChangeFaults | The number of change-type faults in faulty version relative to the correct version |
insertFaultsLocations | The locations of addition-type faults in faulty version relative to the correct version |
changeFaultsLocations | The locations of change-type faults in faulty version relative to the correct version |
deleteFaultsLocations | The locations of delete-type faults in faulty version relative to the correct version |
wSimA | The percentage at which the faulty version is similar to the correct version |
aSimW | The percentage at which the correct version is similar to the faulty version |
matchLines | The number of identical lines between the faulty and correct versions |
The number of submissions for each programming language are listed below
ID | Language | Submission Count |
---|---|---|
1 | GNU C++ 14 | 604,155 |
2 | GNU C | 93,492 |
3 | MS C++ | 164,912 |
4 | GNU C++ 11 | 906,811 |
5 | FPC | 47,522 |
6 | GNU C++ | 1,167,214 |
7 | Java 8 | 154,087 |
8 | Python 3 | 52,433 |
9 | Go | 3,011 |
10 | D | 742 |
11 | MS C# | 14,896 |
12 | GNU C 11 | 18,574 |
13 | Python 2 | 36,469 |
14 | PyPy 2 | 4,507 |
15 | Ruby | 3,806 |
16 | PHP | 2,570 |
17 | PyPy 3 | 3,222 |
18 | Delphi | 9,698 |
19 | Kotlin | 4,739 |
20 | JavaScript | 3,020 |
21 | Haskell | 3,585 |
22 | OCaml | 543 |
23 | Scala | 2,131 |
24 | Mono C# | 5,199 |
25 | Java 7 | 27,931 |
26 | Rust | 599 |
27 | Perl | 784 |
28 | GNU C++ 11 | 1,083 |
29 | Java 8 ZIP | 107 |
30 | J | 2,673 |
31 | GNU C++ 0X | 34,746 |
32 | Java 6 | 22,988 |
33 | Pike | 4,076 |
34 | Befunge | 4,343 |
35 | Cobol | 2,114 |
36 | Factor | 2,606 |
37 | Secret-171 | 158 |
38 | Roco | 3,136 |
39 | Tcl | 3,752 |
40 | F# | 15 |
41 | Io | 2,908 |