templater

Quick Install

go install github.com/mr-joshcrane/templater/cmd/templater@latest

Usage

$ templater [FIELDS_TO_UNPACK ...]

FIELDS_TO_UNPACK is an optional indications of which fields are JSON objects, capable of further unpacking.

Why would you use templater?

Data Engineering will often require taking some raw, untyped and unsanitised data and running it through a series of preliminary transformations before it can be presented in its final format.

Templater was created with DBT as the transformation tool in mind, and is designed to generate a complete DBT project from a set of CSV files. These CSVs are typically exported Snowflake tables, but any CSVs can be used.

Theoretically templater can be used with any data warehouse solution, but it has only been tested with Snowflake.

Data doesn't always play nice

Sometimes you'll have some untyped data that needs to be typed. This usually isn't difficult, but is very tedious and labour intensive.

Sometimes you will have column names that contain special characters that are not allowed in Snowflake (without special handling).

Sometimes you will have nested JSON data that needs to be flattened before it can be used for any real business application.

Templater can be used to alleviate some of the burden here!

Imagine we had a table in Snowflake called ENERGY and we dumped it out to a CSV file as ENERGY.csv:

sourceAsMetric	target(as %)	value(per million tonnes)	statistics
Bio-conversion(natural)	Liquid	124.729	"{ ""attributes"": { ""on_hand"": false, ""available_in"": ""days"", ""componentDemands"": [99,2.5,"{5,2,5}"]}}"
BiofuelImports	Solid	35	"{ ""attributes"": { ""on_hand"": true, ""available_in"": ""minutes"", ""componentDemands"": [1,0,"{0.1,9,0}"]}}"
coal imports(55%)	Coal	null	"{ ""attributes"": { ""on_hand"": true, ""available_in"": ""hours"", ""componentDemands"": [1,0,"{0.1,00.1,NA}"]}}"

Gross! As we can see, the format of this data leaves much to be desired. We could munge it with DBT, but that would be a non-trivial amount effort!

We can use templater to generate a DBT project that will transform this data into something more useful.

Lets call templater, and indicate we want to unpack the CSV field called statistics which seems to be a JSON field.

$ templater statistics

Templater will generate the core of a new DBT project, including our transformation below.

output/transform/TRANS01_ENERGY.sql

{{ config(materialized='table') }}
SELECT
  "statistics":"attributes"."available_in"::STRING AS ATTRIBUTES__AVAILABLE_IN
  ,"statistics":"attributes"."componentDemands"::ARRAY AS ATTRIBUTES__COMPONENT_DEMANDS
  ,"statistics":"attributes"."on_hand"::BOOLEAN AS ATTRIBUTES__ON_HAND
  ,"sourceAsMetric"::STRING AS SOURCE_AS_METRIC
  ,"target(idealised)"::STRING AS TARGET_IDEALISED
  ,"value(per million tonnes)"::STRING AS VALUE_PER_MILLION_TONNES
FROM
  {{ source('TEMPLATER', 'ENERGY') }}

We can see that templater has:

Generated our transformation SQL
Tried to infer Snowflake Types for each column
Normalised the column names to be Snowflake friendly

It has also generated some suggested DBT models that you can tweak to your liking in output/transform/_models_schema.yml, which will go a long way when you're trying to generate some dbt docs.

version: 2
models:
  - name: TRANS01_ENERGY
    columns:
      - name: ATTRIBUTES__AVAILABLE_IN
      - name: ATTRIBUTES__COMPONENT_DEMANDS
      - name: ATTRIBUTES__ON_HAND
      - name: SOURCE_AS_METRIC
      - name: TARGET_IDEALISED
      - name: VALUE_PER_MILLION_TONNES

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
cmd/templater		cmd/templater
templates		templates
testdata/script		testdata/script
LICENCE.md		LICENCE.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum
parse.go		parse.go
sql_models.go		sql_models.go
tables.go		tables.go
templater.go		templater.go
templater_test.go		templater_test.go
yaml_properties.go		yaml_properties.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

templater

Why would you use templater?

Data doesn't always play nice

About

Releases 13

Packages

Contributors 2

Languages

License

mr-joshcrane/templater

Folders and files

Latest commit

History

Repository files navigation

templater

Why would you use templater?

Data doesn't always play nice

About

Resources

License

Stars

Watchers

Forks

Releases 13

Packages 0

Contributors 2

Languages

Packages