Skip to content

ashish10alex/formatdataform

Repository files navigation

Dataform formatter

Format .sqlx files in your Dataform project using sqlfluff

Version Linux macOS Windows

To format a file or directory run

formatdataform format <path_to_file_or_directory>

Note

When ran for the first time on a dataform workspace, the formatdataform cli will setup necessary files & default .sqluff config file in .formatdataform directory to support formatting. You can alternatively manually setup the defaults by running formatdataform setup.

You can override the default sqlfluff config file by using the --config flag or -c (shorthand) as follows

formatdataform -c <path_to_sqlfluff_config_file> format <path_to_file_or_directory>

Or alternatively, you can directly edit the sqlfluff config file generated in .formatdataform/.sqlfluff

Important

Ensure that the config block has the last block which handles the regex for [sqlfluff:templater:placeholder] to handle the parsing of ${ref("TABLE_NAME")} blocks in .sqlx files

Installation

  1. Prerequisite: Install sqlfluff
pip install sqlfluff
  1. Install the Latest release of formatdataform binary ( WSL / Linux / MacOS users only!)
curl -sSfL https://raw.githubusercontent.com/ashish10alex/formatdataform/main/install_latest.sh | bash

Note

If you are a windows user please download the binary directly from the releases

OR

go install github.com/ashish10alex/formatdataform@latest

This installs the binary formatdataform to $GOBIN, which defaults to $GOPATH/bin

OR

Manually clone the repository and build the cli and add the cli to your system path

git clone https://github.com/ashish10alex/formatdataform.git
go build -o formatdataform
mv formatdataform /usr/local/bin/formatdataform

To run tests

  1. Install gotestsum for prettier test visualization
  2. Run gotestsum --format testname

Known issues

  1. sqlfluff expect the table name to have backticks i.e `gcp_project_id.dataset.table` instead of gcp_project_id.dataset.table
  2. Does not format sql in pre_operations block. To handle this we would need to identify if there is a javascript block inside the pre_operations block and then get the query out of that block and format it
  3. Does not format config block

TODO

  • Refactor config, pre/post operations block to support multiple pre/post operations