CalData Infrastructure Template

This repository is a copier template which can be used to quickly seed a modern data stack project. Instructions may vary depending on if the repo is hosted via GitHub or Azure DevOps, so we make some distinctions below.

This repo consists of:

A dbt project.
pre-commit checks for enforcing code-quality.
A documentation skeleton using mkdocs-material
GitHub actions for running quality checks in continuous integration (CI)

Usage

Start with an environment that has access to uv. This can be installed via a number of packages (ODI staff usually use Homebrew). Install copier as a uv tool. This creates a separate virtual environment for the copier command line interface, and avoids any possible conflicts with the actual project dependencies:

uv tool install copier

Create a repo online and choose None for the license. Be sure to add all the necessary team members with the right level of permissions. Next, create the repo locally into which the project will be rendered:

mkdir <your-project-name>
cd <your-project-name>
git init

Create a new project using the copier command-line tool, with the following prompts below depending on if this is in GitHub or Azure DevOps. This will ask you a series of questions, the answers to which will be used to populate the project (uv tools can be explicitly invoked by prefixing the command with `uvx):

GitHub

HTTPS:

uvx copier copy https://github.com/cagov/caldata-infrastructure-template .

OR with SSH:

uvx copier copy [email protected]:cagov/caldata-infrastructure-template.git .

Azure DevOps

Install git credential manager (with Homebrew if on a Mac, if on a windows you should have it by default with this git instalation.) Then run the following three commands:

brew install git-credential-manager

git clone <Azure DevOps repo url e.g. https://[email protected]/caldata-sandbox/mdsa-test/_git/mdsa-test>

uvx copier copy https://github.com/cagov/caldata-infrastructure-template .

Once the project is rendered, you should add and commit the changes:

git add .
git commit -m "Initial commit"

Finally, install the Python dependencies and commit the uv.lock:

uv sync
git add uv.lock
git commit -m "Add uv.lock"
git remote add <new-repo-name> https://github.com/cagov/<new-repo-name>
git push --set-upstream <new-repo-name> main

dbt Cloud setup

For Azure DevOps repos you'll follow the instructions here. To integrate dbtCloud with Azure DevOps, the service user (legacy) option must be used. Complete the steps found in the documentation.

For GitHub repos you'll follow the instructions here.

GitHub-based Snowflake setup

The projects generated from our infrastructure template need read access to the Snowflake account in order to do two things from GitHub actions:

Verify that dbt models in branches compile and pass linter checks
Generate dbt docs upon merge to main.

The terraform configurations deployed above create two service accounts for GitHub actions, a production one for docs and a dev one for CI checks.

Add key pairs to the GitHub service accounts

This repository assumes two service accounts in Snowflake for usage with GitHub Actions. Set up key pairs for the two GitHub actions service accounts (GITHUB_ACTIONS_SVC_USER_DEV and GITHUB_ACTIONS_SVC_USER_PRD) following the instructions given here.

Set up GitHub actions secrets

In order for the service accounts to be able to connect to your Snowflake account you need to configure secrets in GitHub actions From the repository page, go to "Settings", then to "Secrets and variables", then to "Actions".

Add the following repository secrets:

Variable	Value
`SNOWFLAKE_ACCOUNT`	<org_name>-<account_name> # format is organization-account
`SNOWFLAKE_USER_DEV`	`GITHUB_ACTIONS_SVC_USER_DEV`
`SNOWFLAKE_USER_PRD`	`GITHUB_ACTIONS_SVC_USER_PRD`
`SNOWFLAKE_PRIVATE_KEY_DEV`	dev service account private key
`SNOWFLAKE_PRIVATE_KEY_PRD`	prd service account private key

Enable GitHub pages for the repository

The repository must have GitHub pages enabled in order for it to deploy and be viewable.

From the repository page, go to "Settings", then to "Pages".
Under "GitHub Pages visibility" select "Private" (unless the project is public!).
Under "Build and deployment" select "Deploy from a branch" and choose "gh-pages" as your branch.

Testing

Continuous integration for this template creates a new project from the template, then verifies that the pre-commit checks pass. Future versions might do additional checks (e.g., running sample dbt models or orchestration DAGs).

To run the tests locally, change directories to the parent of the template, then run

./caldata-infrastructure-template/ci/test.sh

Updating a project with new changes to the template

If the template has new features, maintenance, or bugfixes, it can be useful to apply those changes from the template automatically, rather than manually copying them around.

The main docs for updating a project are here, but we briefly summarize some steps here:

Check out a new branch for applying updates.
Make sure that there are no uncommitted changes or files (even ones you don't intend to keep in version control!) in the repository. Running git status --porcelain should show nothing.
Run copier update --defaults, which tries to apply new changes from the template, reusing your answers to the template questions.
Review the applied changes and make any corrections before staging and committing them:
- There may be merge conflicts in files that have changed with git-style merge conflict markers.
- There may be new files that need to be added to the repository
Create a pull request with the template changes, review, and merge it.

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.github/workflows		.github/workflows
ci		ci
{{project_name}}		{{project_name}}
LICENSE		LICENSE
README.md		README.md
copier.yml		copier.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CalData Infrastructure Template

Usage

GitHub

Azure DevOps

dbt Cloud setup

GitHub-based Snowflake setup

Add key pairs to the GitHub service accounts

Set up GitHub actions secrets

Enable GitHub pages for the repository

Testing

Updating a project with new changes to the template

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

cagov/caldata-infrastructure-template

Folders and files

Latest commit

History

Repository files navigation

CalData Infrastructure Template

Usage

GitHub

Azure DevOps

dbt Cloud setup

GitHub-based Snowflake setup

Add key pairs to the GitHub service accounts

Set up GitHub actions secrets

Enable GitHub pages for the repository

Testing

Updating a project with new changes to the template

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages