topological concatenation of files
topcat is a simple tool to concatenate files in a topological order. It is useful when you have a set of files that
depend on each other and you want to concatenate them in the right order.
For my use case this is SQL files.
I like to treat my SQL files as a set of functions and views that depend on each other. I like to keep them in separate files and concatenate them in the right order to create a single file that I can run in my database.
pip:
pip install topcatpoetry:
poetry add topcattopcat -i /path/to/input -o /path/to/output.sqlWhere /path/to/input is the directory containing the files to concatenate and /path/to/output.sql will be where the
concatenated file will be written.
USAGE:
topcat [FLAGS] [OPTIONS] --output <FILE>
FLAGS:
--dry Only print the output, do not write to file.
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Print debug information
OPTIONS:
--comment-str <comment-str>
The string used to denote a comment. eg '--' [default: --]
--ensure-each-file-ends-with <ensure-each-file-ends-with-str>
Add this string to the end of files if it does not exist. eg ';' [default: ;]
-x, --exclude <PATTERN>... Exclude files matching given glob pattern
--file-separator-str <file-separator-str>
Add this between each concatenated file in the output. eg '---' [default:
------------------------------------------------------------------------------------------------------------------------]
-n, --include <PATTERN>... Only include files matching glob pattern
-i, --input_dir <DIR>...
Path to directory containing files to be concatenated
-o, --output <FILE> Path to generate combined output fileSome quirks here:
-iis the input directory. You can have multiple input directories. This is useful if you have a set of files in different directories that depend on each other.-ois the output file. This is where the concatenated file will be written.-xand-nare used to exclude and include files respectively. These are glob patterns. For example-x **/tests/*will exclude all files in anytestsdirectory.-n **/functions/*will only include files in thefunctionsdirectory. You can use these together to include and exclude files as you need. You can use these multiple times.--comment-stris the string used to denote a comment. This is used to find thename,requires,dropped_byandexistscomments in the files. The default is--. In SQL this is--but in other languages it might be//or#.--ensure-each-file-ends-withis the string to add to the end of each file if it doesn't exist. This is useful for SQL files where you might want to ensure each file ends with a;. The default is;.--file-separator-stris the string to add between each concatenated file in the output. The default is a long line of dashes. This is just visually useful to see where one file ends and the next begins.--drywill only print the output, it will not write to the output file.-vwill print debug information and a.dotformat of the dependency graph.
The only requirement for a file to be included in the concatenation is that it needs to have a name comment at the top
of the file.
This can be anything you want, but it needs to be unique. This is used to define a node in the dependency graph.
For example:
-- name: my_schema
If a file requires another file to be concatenated before it, you can add a requires comment to the file.
An alias for requires is dropped_by. I use dropped_by in SQL files for clarity to show that the DDL in the file
gets dropped so I don't need to use CREATE OR REPLACE FUNCTION or the like.
For example:
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
exists is for soft dependencies. For example in plpgsql functions, the body isn't parsed until the function is called.
So any dependent objects you can't use requires for, you can use exists to ensure the file is included in the
concatenated file but order of creation doesn't matter.
For example:
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
-- exists: my_schema.c
You can organize files into layers that enforce ordering constraints. Files in lower-index layers cannot depend on files in higher-index layers.
For example:
-- name: my_schema.setup
-- layer: first
-- name: my_schema.functions
-- layer: second
-- requires: my_schema.setup
-- name: my_schema.views
-- layer: third
-- requires: my_schema.functions
- Use
--layers first,second,thirdto define custom layers in order - Use
--fallback-layer secondto specify the default layer for files without explicit layer declarations - Default layers are
prepend,normal,appendwithnormalas the fallback
The legacy -- is_initial and -- is_final headers are still supported:
-- is_initialmaps to the "prepend" layer-- is_finalmaps to the "append" layer- Files without layer declarations use the fallback layer
Lets say you have a directory with the following files:
sql
├── my_other_schema
│ ├── functions
│ │ ├── a.sql
│ │ ├── b.sql
│ │ └── c.sql
│ └── schema.sql
└── my_schema
├── functions
│ └── a.sql
└── schema.sql
And the content of the files is:
sql/my_schema/schema.sql:
-- name: my_schema
DROP SCHEMA IF EXISTS my_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_schema;
sql/my_schema/functions/a.sql:
-- name: my_schema.a
-- dropped_by: my_schema
CREATE FUNCTION my_schema.a() RETURNS INT AS
$$
SELECT 1;
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
sql/my_schema/functions/b.sql:
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
CREATE FUNCTION my_schema.b() RETURNS INT AS
$$
SELECT my_schema.a() + 1
$$ LANGUAGE SQL;
sql/my_schema/functions/c.sql:
-- name: my_schema.c
-- dropped_by: my_schema
-- requires: my_schema.b
CREATE FUNCTION my_schema.c() RETURNS INT AS
$$
SELECT my_schema.b() + 1
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
sql/my_other_schema/schema.sql:
-- name: my_other_schema
DROP SCHEMA IF EXISTS my_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_schema;
sql/my_other_schema/functions/a.sql:
-- name: my_other_schema.a
-- dropped_by: my_other_schema
-- requires: my_schema.b
CREATE FUNCTION my_other_schema.a() RETURNS INT AS
$$
SELECT my_schema.b() + 1
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
So the dependency graph looks like:

Now you can run topcat to concatenate the files in the right order:
topcat -i tests/input/sql -o tests/output/sql/output.sqlThe content of output.sql will be:
-- This file was generated by topcat. To regenerate run:
--
-- topcat -i tests/input/sql -o tests/output/sql/output.sql -v
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/schema.sql
-- name: my_schema
DROP SCHEMA IF EXISTS my_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_schema;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/functions/a.sql
-- name: my_schema.a
-- dropped_by: my_schema
CREATE FUNCTION my_schema.a() RETURNS INT AS
$$
SELECT 1;
$$ LANGUAGE SQL;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/functions/b.sql
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
CREATE FUNCTION my_schema.b() RETURNS INT AS
$$
SELECT my_schema.a() + 1
$$ LANGUAGE SQL;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_schema/schema.sql
-- name: my_other_schema
DROP SCHEMA IF EXISTS my_other_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_other_schema;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_schema/functions/a.sql
-- name: my_other_schema.a
-- dropped_by: my_other_schema
-- requires: my_schema.b
CREATE FUNCTION my_other_schema.a() RETURNS INT AS
$$
SELECT my_schema.b() + 1
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/functions/c.sql
-- name: my_schema.c
-- dropped_by: my_schema
-- requires: my_schema.b
-- requires: my_other_schema.a
CREATE FUNCTION my_schema.c() RETURNS INT AS
$$
SELECT my_schema.b() + my_other_schema.a() + 1
$$ LANGUAGE SQL;