Analytics Engineer Interview Project

This repository contains a technical evaluation project for Analytics Engineering candidates at Jarvus. It provides a structured environment for working with data pipelines using dbt and DuckDB.

Project Structure

.
├── data/               # Data directory (gitignored except for .gitkeep)
│   └── raw/           # Raw data storage
├── models/            # dbt models
│   ├── staging/       # Staging models
│   ├── intermediate/  # Intermediate models
│   └── marts/         # Mart models
├── scripts/           # Data ingestion scripts
├── tests/            # dbt tests
└── macros/           # dbt macros

Prerequisites

Docker and Docker Compose
Git

Local Development Setup

Clone the repository:

git clone https://github.com/your-org/analytics-engineer-interview.git
cd analytics-engineer-interview

Start the development environment:

docker compose build
docker compose run dev

Run the full pipeline:
```
docker compose run analytics
```

Development Workflow

Running Individual Components

Data Ingestion:
```
docker compose run ingest
```

dbt Commands:

docker compose run dbt deps    # Install dbt dependencies
docker compose run dbt debug   # Test connection
docker compose run dbt build   # Run models, tests, and snapshots

Making Changes

Create a new branch:

git checkout -b feature/your-feature-name

Make your changes to the models, scripts, or configurations
Test your changes:
```
docker compose run analytics
```
Open a pull request on GitHub

Deployment Process

The project includes a GitHub Actions workflow that:

Runs on push to main branch and pull requests
Executes the full pipeline
Uploads the DuckDB database and dbt artifacts as workflow artifacts

Project Components

Data Pipeline

Downloads example data from public sources
Loads data into a local DuckDB database
Processes data through dbt models:
- Staging models for initial data cleaning
- Intermediate models for data transformation
- Mart models for final analysis

dbt Models

stg_bus_shelters: Initial cleaning and column renaming
int_shelter_locations: Geographic clustering analysis
mart_shelter_distribution: High-level shelter distribution metrics

Configuration Files

pyproject.toml: Python dependencies managed by Poetry
dbt_project.yml: dbt project configuration
profiles.yml: dbt connection profiles
docker-compose.yml: Container configuration
Dockerfile: Development environment definition

Contributing

See our Contributing Guidelines for development practices and standards.

Support

If you encounter any issues or have questions, please:

Check the existing GitHub issues
Create a new issue if needed
Comment on your assigned evaluation task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analytics Engineer Interview Project

Project Structure

Prerequisites

Local Development Setup

Development Workflow

Running Individual Components

Making Changes

Deployment Process

Project Components

Data Pipeline

dbt Models

Configuration Files

Contributing

Support

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
macros		macros
models		models
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
dbt_project.yml		dbt_project.yml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
profiles.yml		profiles.yml
pyproject.toml		pyproject.toml

JarvusInnovations/analytics-engineer-interview

Folders and files

Latest commit

History

Repository files navigation

Analytics Engineer Interview Project

Project Structure

Prerequisites

Local Development Setup

Development Workflow

Running Individual Components

Making Changes

Deployment Process

Project Components

Data Pipeline

dbt Models

Configuration Files

Contributing

Support

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages