Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering bots from OSCI ranking #26

Open
vlad-isayko opened this issue Sep 14, 2020 · 1 comment
Open

Filtering bots from OSCI ranking #26

vlad-isayko opened this issue Sep 14, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@vlad-isayko
Copy link
Collaborator

vlad-isayko commented Sep 14, 2020

The goal is to improve our existing OSCI code which ranks companies on the basis of the number of commits, because the current situation is that there appear to be large number of of commits done by automated processes associated with GitHub accounts that have a company (commercial organization) email domain. These skew the ranking of companies based on commits, which is precisely why our OSCI ranking is based on number of contributors rather than number of commits.

For example, when we look at the OSCI commit-based company counts to end June 2020, we see

OrgName Commits
Microsoft 640009
GitHub 519108
Renovateapp 472705
Google 379847
Red Hat 331087
Travis CI 195377
Intel 150613
IBM 131510
Exoplatform 125844
Odoo 113452
Pyup 82118

However, Renovateapp, Travis CI, Exoplatform and Pyup do not feature highly in our OSCI countributor-based company ranking. In fact, Renovateapp has only 4 active contributors, Travis CI has 67, Exoplatform has 41, Pyup has 4.

When we dig deeper into this, we see:

This is top of commits authors for Pyup:

Company AuthorName Commits
Pyup pyup-bot 349717
Pyup pyup.io bot 10146
Pyup pyup.io vuln bot 22
Pyup pyup.io bot (via Travis CI) 1

As you can see all of them are bots.
The same picture for Renovateapp:

Company AuthorName Commits
Renovateapp Renovate Bot 2348935
Renovateapp WhiteSource Renovate 65148
Renovateapp Renovate Bot (via Travis CI) 358
Renovateapp renovate-bot 63
Renovateapp Rhys Arkins 3

TravisCI (Top 10 by commits):

Company AuthorName Commits
Travis CI Deployment Bot (from Travis CI) 426727
Travis CI Travis CI 92799
Travis CI travis-ci 11824
Travis CI TravisCI 9511
Travis CI Travis 8128
Travis CI Deployment Bot (Travis) 7723
Travis CI Deployment Bot 1917
Travis CI raveit65 1322
Travis CI Piotr Milcarz 1317
Travis CI Travis Build Bot (from Travis CI) 1015

The biggest part of commits comming from bots

We would like a way to filter out these automated processes/bot commits, so that we could more accurately generate a ranking of companies based on commits.

One obvious way is to simply have a 'blacklist' of GitHub accounts / email addresses, but perhaps something more sophisticated could be devised, based on 'unhuman' levels of activity.

At the moment, we are using the domain <-> company match list, which filters companies from the top that we form. Perhaps the problem of bots can be solved by creating a similar list that will filter out bots.

@vlad-isayko vlad-isayko added the enhancement New feature or request label Sep 14, 2020
@patrickstephens2
Copy link

It would be interesting to analyse what those bots actually do, are they contributing anything useful or it's just deployment logs or whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants