Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting usage statistics and community metrics #219

Open
matt-graham opened this issue Sep 20, 2024 · 0 comments · May be fixed by #253
Open

Collecting usage statistics and community metrics #219

matt-graham opened this issue Sep 20, 2024 · 0 comments · May be fixed by #253
Labels
infrastructure Issues related to infrastructure for repository and project question Further information is requested

Comments

@matt-graham
Copy link
Collaborator

@jasonmcewen suggested we may wish to look at options for collecting statistics about usage of s2fft (and related packages) to support potential future funding applications.

Some notes from a bit of initial research on options / tools and resources in this area:

  • usagestats Python package - package allowing getting opt-in usage statistics from users of a program
    • This is mainly targeted at Python CLI tools where there is a entry point to attach the prompt asking users to opt-in too, so may well not be relevant to our case.
    • As it requires explicit user opt-in it is likely to give more useful data about actual users as opposed to automated installations on CI runners etc.
    • Users however might find the idea of collecting usage statistics like this off-putting!
  • github-repo-stats GitHub Action - can be set up as a scheduled workflow to automate collecting and generating reports from GitHub's built in traffic statistics, overcoming the limitation in the built in interface to 14 days of data.
    • This looks easy to set up and doesn't require any intervention on user side.
    • Only captures statistics of interactions with the GitHub repository, so only gives a partial picture as many (most?) users will install from PyPI, but still useful to get statistics around development activity.
  • pypistats Python package - 'Python interface to PyPI Stats API to get aggregate download statistics on Python packages on the Python Package Index'
    • Allows accessing last 180 days of PyPI download statistics.
    • We could potentially set up a scheduled GitHub Actions job to download and record this data for example monthly.
  • pypinfo Python package - 'pypinfo is a simple CLI to access PyPI download statistics via Google's BigQuery'
    • Similar to pypistats but as it is directly accessing the underlying Google BigQuery data, not limited to 180 days window.
    • This may be useful for extracting historical PyPI download statistics on demand as an alternative to runnning a regular job with pypistats as suggested above.
  • Augur - 'a data engineering tool that makes it possible for data scientists to gather open source software community data'
    • Part of the CHAOSS (Community Health Analytics in Open Source Software) project.
    • Pulls in data from a range of sources and as focussed at a community rather than single repository level, can collect data across multiple linked repositories / projects.
    • Has extensive data visualization, reporting and querying support.
    • Getting set up looks non-trivial and it feels like this may be overkill for our purposes.
@matt-graham matt-graham added question Further information is requested infrastructure Issues related to infrastructure for repository and project labels Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Issues related to infrastructure for repository and project question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant