Tested on:
- OSX Catalina 10.15.7 with zsh 5.7.1 (x86_64-apple-darwin19.0)
- NixOS 21.05pre27280.f6b5bfdb470 with bash 4.4.23(1)-release
- make
- python3
- perl5
- curl
- jq (JSON query tool)
apt-get install curl jq
brew install curl
brew install jq
- Create GitHub API token: https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token
- be sure it has
public_repo
access - create the file
github.token
and place the GitHub API token in it
All Boa queries were run on the 2019 October/GitHub
dataset.
Suspiciously 'old' commits
Source: old-source.boa
Output: old-output.txt
Suspiciously 'future' commits
Source: future-source.boa
Output: future-output.txt
Out-of-order commits
Source: order-source.boa
Output: order-output.txt
Count revisions of all projects
Source: all-projects-revisions.boa
Output: all-projects-revisions-output.txt
Note that all data generated by these scripts is already included, so you should not need to re-run any scripts. But they are here if you wish to inspect, modify, and/or re-run them yourself.
If you want to just re-run everything, you can run:
make all
Note that this could take a substantial amount of time!
Used in section IV onward.
make project-lists
Used in section IV onward.
Most of the remaining scripts rely on having JSON metadata for commits. These JSON files are already cached in the dataset, but this is how you download tem:
make cache-json
Used in section IV onward.
make gen-dates
This file is used to inspect the actual commit date of the commit, as reported by GitHub/SHA.
This also generates the file bad-commits-by-year.txt
. This file is used to
generate the filtering by year table for the bad commits.
Used in section IV on.
make gen-commit-loc
Used in Section IV.A.
make gen-gitsvn
Used in sections IV.A, IV.B, figures 2, 3.
make gen-logs
This grabs the other commit logs that are not git-svn
related and puts into
logs-old.txt
.
This also generates the commit logs for 'bad' (out of order) commits in
logs-order.txt
.
Used in section IV.B.
make gen-verified
Used in IV.B for item 2, Common Users.
make gen-commit-users
Used in IV.B for Item 3, Common Projects and Table V.
make gen-commit-proj
Used in section V.
make gen-goodbad
This generates good-[sha-]projects.txt
and bad-[sha-]projects.txt
The 'bad'
list is used to avoid asking GitHub API for info on deleted
projects. The '-sha-'
versions look for the missing GitHub projects on
Software Heritage's archive.
Used in section V.
make calc-dupes
Used in section V.
make calc-percent-bad
Used in section VI, table VI.
make gen-filteryear-table