Estimate how frequently Python packages are imported across public GitHub repositories.
We determine package popularity by:
- Randomly sampling GitHub repositories with Python as the main language
- Analyzing Python import statements in these repositories
- Extrapolating findings based on the total Python repository count (~18M repositories
The system continually improves its accuracy by sampling additional repositories every 6 hours via GitHub Actions.
Note: We have stopped considering standard Python libraries but have not yet removed all the data.
Script | Purpose |
---|---|
find_repos.py | Queries GitHub API for random Python repositories |
analyze_imports.py | Extracts import statements from repository files |
count_libs.py | Aggregates and calculates package usage statistics |
update_readme.py | Refreshes this README with latest data |
total_python_repos.ipynb | Estimates total Python repository count on GitHub |
File | Description | Format |
---|---|---|
repos.jsonl | Details of processed repositories | JSONL |
imports.jsonl | Raw import statements extracted from repos | JSONL |
library_counts.csv | Aggregated package usage statistics | CSV |
Our GitHub Actions workflow orchestrates the entire process:
Find Random Repos → Analyze Imports → Count Package Usage → Update Statistics → Refresh README
Rank | Library | Count |
---|---|---|
1 | numpy | 1678 |
2 | matplotlib | 552 |
3 | pandas | 548 |
4 | torch | 534 |
5 | requests | 475 |
6 | django | 341 |
7 | cv2 | 310 |
8 | sklearn | 273 |
9 | utils | 266 |
10 | scipy | 257 |
Last updated: 2025-04-01 18:34:44 UTC