Skip to content

Commit

Permalink
Merge pull request #4825 from GSA/ga-metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
btylerburton authored Jul 26, 2024
2 parents bc07841 + 5c2a685 commit bb76031
Show file tree
Hide file tree
Showing 14 changed files with 1,000 additions and 6 deletions.
21 changes: 16 additions & 5 deletions .github/workflows/build-metrics-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,7 @@ jobs:
fetch-report:
defaults:
run:
working-directory: metrics
env:
AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID_METRICS}}
AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY_METRICS}}
working-directory: metrics
runs-on: ubuntu-latest
name: Fetch Reports
steps:
Expand All @@ -34,6 +31,16 @@ jobs:
with:
poetry-version: ${{ env.POETRY_VERSION }}

- name: Setup a local virtual environment (if no poetry.toml file)
run: |
poetry config virtualenvs.create true --local
poetry config virtualenvs.in-project true --local
- uses: actions/cache@v4
name: Define a cache for the virtual environment based on the dependencies lock file
with:
path: ./.venv
key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}

- name: Install Dependencies
run: |
poetry env use ${{ env.PY_VERSION }}
Expand All @@ -43,7 +50,11 @@ jobs:
env:
GA_CREDENTIALS_JSON: ${{ secrets.GA_CREDENTIALS_JSON }}
run: |
echo $GA_CREDENTIALS_JSON | base64 --decode > datagov_metrics/credentials.json
echo $GA_CREDENTIALS_JSON | base64 --decode > datagov_metrics/credentials.json
- name: Run Python script
env:
AWS_ACCESS_KEY_ID_METRICS: ${{ secrets.AWS_ACCESS_KEY_ID_METRICS }}
AWS_SECRET_ACCESS_KEY_METRICS: ${{ secrets.AWS_SECRET_ACCESS_KEY_METRICS }}
AWS_DEFAULT_REGION: us-gov-west-1
run: |
poetry run python datagov_metrics
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,7 @@ ansible/roles/vendor
.vscode/settings.json

output

## dont ignore
# metrics
!metrics/.env
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
| egress actions | [![disable egress proxy](https://github.com/GSA/data.gov/actions/workflows/disable-egress.yml/badge.svg)](https://github.com/GSA/data.gov/actions/workflows/disable-egress.yml) [![enable egress proxy](https://github.com/GSA/data.gov/actions/workflows/enable-egress.yml/badge.svg)](https://github.com/GSA/data.gov/actions/workflows/enable-egress.yml) [![restart egress proxy](https://github.com/GSA/data.gov/actions/workflows/restart-egress.yml/badge.svg)](https://github.com/GSA/data.gov/actions/workflows/restart-egress.yml) |
| [ssb egress actions](https://github.com/GSA/datagov-ssb) | [![disable egress proxy](https://github.com/GSA/datagov-ssb/actions/workflows/disable-egress.yml/badge.svg)](https://github.com/GSA/datagov-ssb/actions/workflows/disable-egress.yml) [![enable egress proxy](https://github.com/GSA/datagov-ssb/actions/workflows/enable-egress.yml/badge.svg)](https://github.com/GSA/datagov-ssb/actions/workflows/enable-egress.yml) [![restart egress proxy](https://github.com/GSA/datagov-ssb/actions/workflows/restart-egress.yml/badge.svg)](https://github.com/GSA/datagov-ssb/actions/workflows/restart-egress.yml) |
| | |
| [data.gov](https://github.com/GSA/datagov-11ty) | [![Build & Test](https://github.com/GSA/datagov-11ty/actions/workflows/build.yml/badge.svg)](https://github.com/GSA/datagov-11ty/actions/workflows/build.yml) |
| [data.gov](https://github.com/GSA/datagov-11ty) | [![Build & Test](https://github.com/GSA/datagov-11ty/actions/workflows/build.yml/badge.svg)](https://github.com/GSA/datagov-11ty/actions/workflows/build.yml)[![Catalog Metrics](https://github.com/GSA/data.gov/actions/workflows/build-metrics-report.yml/badge.svg)](https://github.com/GSA/data.gov/actions/workflows/build-metrics-report.yml) |
| [www-redirects](https://github.com/GSA/datagov-website) | [![deploy](https://github.com/GSA/datagov-website/actions/workflows/deploy.yml/badge.svg)](https://github.com/GSA/datagov-website/actions/workflows/deploy.yml) |
| [datagov-ssb](https://github.com/GSA/datagov-ssb) | [![commit](https://github.com/GSA/datagov-ssb/actions/workflows/commit.yml/badge.svg)](https://github.com/GSA/datagov-ssb/actions/workflows/commit.yml) [![plan](https://github.com/GSA/datagov-ssb/actions/workflows/plan.yml/badge.svg)](https://github.com/GSA/datagov-ssb/actions/workflows/plan.yml) [![apply](https://github.com/GSA/datagov-ssb/actions/workflows/apply.yml/badge.svg)](https://github.com/GSA/datagov-ssb/actions/workflows/apply.yml) |
| [resources.data.gov](https://github.com/GSA/resources.data.gov/) | [![Build & Test](https://github.com/GSA/datagov-11ty/actions/workflows/build.yml/badge.svg)](https://github.com/GSA/datagov-11ty/actions/workflows/build.yml) [![QA](https://github.com/GSA/resources.data.gov/actions/workflows/qa.yml/badge.svg)](https://github.com/GSA/resources.data.gov/actions/workflows/qa.yml) |
Expand Down
1 change: 1 addition & 0 deletions metrics/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
AWS_S3_BUCKET_METRICS=cg-baa85e06-1bdd-4672-9e3a-36333c05c6ce
22 changes: 22 additions & 0 deletions metrics/.vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"version": "0.2.0",
"configurations": [
{
"name": "Poetry debug",
"type": "debugpy",
"request": "launch",
"pythonPath": "${workspaceFolder}/.venv/bin/python",
"cwd": "${workspaceFolder}",
"module": "datagov_metrics.ga",
"justMyCode": false,
"args": [
"src.main:app",
"--host",
"0.0.0.0",
"--port",
"8000",
"--reload"
]
}
]
}
12 changes: 12 additions & 0 deletions metrics/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.DEFAULT_GOAL := help
SHELL := /bin/bash

.PHONY: py-lint
py-lint: ## Run python linting scanners and black
poetry run ruff check . --fix

# Output documentation for top-level targets
# Thanks to https://marmelab.com/blog/2016/02/29/auto-documented-makefile.html
.PHONY: help
help:
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf "\033[36m%-10s\033[0m %s\n", $$1, $$2}' $(MAKEFILE_LIST)
11 changes: 11 additions & 0 deletions metrics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# datagov-metrics

A python module to fetch metrics from various sources (GA & CKAN) and push them to S3

Reports are then available at the path:

https://s3-us-gov-west-1.amazonaws.com/cg-baa85e06-1bdd-4672-9e3a-36333c05c6ce/{file_name}

Ex.
https://s3-us-gov-west-1.amazonaws.com/cg-baa85e06-1bdd-4672-9e3a-36333c05c6ce/global__datasets_per_org.csv

Empty file.
4 changes: 4 additions & 0 deletions metrics/datagov_metrics/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from datagov_metrics import ckan, ga

ga.main()
ckan.main()
43 changes: 43 additions & 0 deletions metrics/datagov_metrics/ckan.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import requests
import csv
import io
from datagov_metrics.s3_util import put_data_to_s3

CKAN_BASE_URL = "https://catalog.data.gov/api/action/package_search"
QUERIES = {
"harvest_sources": '?fq=dataset_type:harvest&facet.field=["organization"]&facet.limit=200&rows=0',
"datasets_per_org": '?q=*:*&facet.field=["organization"]&facet.limit=200&rows=0',
}


def get_data():
query_dict = {}
for k, v in QUERIES.items():
url = f"{CKAN_BASE_URL}{v}"
repo = requests.get(url)
data = repo.json()

raw_data = data["result"]["facets"]["organization"]

query_dict[k] = [[k, v] for (k, v) in raw_data.items()]
return query_dict


def write_data_to_csv(response):
"""Reshape the response CSV."""
with io.StringIO() as csv_buffer:
writer = csv.writer(csv_buffer, delimiter=",")
writer.writerow(["organization", "count"]) # write header
writer.writerows(response)
return csv_buffer.getvalue()


def main():
data = get_data()
for k, v in data.items():
csv_data = write_data_to_csv(v)
put_data_to_s3(f"global__{k}.csv", csv_data)


if __name__ == "__main__":
main()
Loading

0 comments on commit bb76031

Please sign in to comment.