You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal is to improve our existing OSCI code which ranks companies on the basis of the number of commits, because the current situation is that there appear to be large number of of commits done by automated processes associated with GitHub accounts that have a company (commercial organization) email domain. These skew the ranking of companies based on commits, which is precisely why our OSCI ranking is based on number of contributors rather than number of commits.
For example, when we look at the OSCI commit-based company counts to end June 2020, we see
OrgName
Commits
Microsoft
640009
GitHub
519108
Renovateapp
472705
Google
379847
Red Hat
331087
Travis CI
195377
Intel
150613
IBM
131510
Exoplatform
125844
Odoo
113452
Pyup
82118
However, Renovateapp, Travis CI, Exoplatform and Pyup do not feature highly in our OSCI countributor-based company ranking. In fact, Renovateapp has only 4 active contributors, Travis CI has 67, Exoplatform has 41, Pyup has 4.
When we dig deeper into this, we see:
This is top of commits authors for Pyup:
Company
AuthorName
Commits
Pyup
pyup-bot
349717
Pyup
pyup.io bot
10146
Pyup
pyup.io vuln bot
22
Pyup
pyup.io bot (via Travis CI)
1
As you can see all of them are bots.
The same picture for Renovateapp:
Company
AuthorName
Commits
Renovateapp
Renovate Bot
2348935
Renovateapp
WhiteSource Renovate
65148
Renovateapp
Renovate Bot (via Travis CI)
358
Renovateapp
renovate-bot
63
Renovateapp
Rhys Arkins
3
TravisCI (Top 10 by commits):
Company
AuthorName
Commits
Travis CI
Deployment Bot (from Travis CI)
426727
Travis CI
Travis CI
92799
Travis CI
travis-ci
11824
Travis CI
TravisCI
9511
Travis CI
Travis
8128
Travis CI
Deployment Bot (Travis)
7723
Travis CI
Deployment Bot
1917
Travis CI
raveit65
1322
Travis CI
Piotr Milcarz
1317
Travis CI
Travis Build Bot (from Travis CI)
1015
The biggest part of commits comming from bots
We would like a way to filter out these automated processes/bot commits, so that we could more accurately generate a ranking of companies based on commits.
One obvious way is to simply have a 'blacklist' of GitHub accounts / email addresses, but perhaps something more sophisticated could be devised, based on 'unhuman' levels of activity.
At the moment, we are using the domain <-> company match list, which filters companies from the top that we form. Perhaps the problem of bots can be solved by creating a similar list that will filter out bots.
The text was updated successfully, but these errors were encountered:
The goal is to improve our existing OSCI code which ranks companies on the basis of the number of commits, because the current situation is that there appear to be large number of of commits done by automated processes associated with GitHub accounts that have a company (commercial organization) email domain. These skew the ranking of companies based on commits, which is precisely why our OSCI ranking is based on number of contributors rather than number of commits.
For example, when we look at the OSCI commit-based company counts to end June 2020, we see
However, Renovateapp, Travis CI, Exoplatform and Pyup do not feature highly in our OSCI countributor-based company ranking. In fact, Renovateapp has only 4 active contributors, Travis CI has 67, Exoplatform has 41, Pyup has 4.
When we dig deeper into this, we see:
This is top of commits authors for Pyup:
As you can see all of them are bots.
The same picture for Renovateapp:
TravisCI (Top 10 by commits):
The biggest part of commits comming from bots
We would like a way to filter out these automated processes/bot commits, so that we could more accurately generate a ranking of companies based on commits.
One obvious way is to simply have a 'blacklist' of GitHub accounts / email addresses, but perhaps something more sophisticated could be devised, based on 'unhuman' levels of activity.
At the moment, we are using the domain <-> company match list, which filters companies from the top that we form. Perhaps the problem of bots can be solved by creating a similar list that will filter out bots.
The text was updated successfully, but these errors were encountered: