An ETL'ish tool that ransfers data between any JDBC compliant databases.
Lauda has its own simple repository in given database, that can be configured and loaded either from command line
or directly in database. Keeping extract-load definitions in database, makes it easier for database people to
manage transfers using a database client. And for the same reason lauda has a simple database model and no orm
in source.
Lauda has three concepts for defining an ETL process;
-
connection
: Connection definition to source and target databases. These definitions are kept inconnections
table in repository. A Connection has the following attributes.name
: Unique name of the connection. Case sensitive.url
: JDBC url for the connection. Example:jdbc:oracle:thin:@localhost:1521:orcl
username
: user name for connection.password
: passowrd for connection.class_name
: Class name of the JDBC driver. Example:oracle.jdbc.driver.OracleDriver
-
mapping
: Mapping definition of the extract and load targets. These data is kept inmappings
table in repository. A mapping has the following attributes.name
: Unique name of the mapping. Case sensitive.source_connection
: Name of the source connection. Lookup toconnections
target_connection
: Name of the target connection. Lookup toconnections
source_table
: Name of the source table. Give database/schema prefix if table is another database/schema than the connection. For example the value can bemy_schema.table_name
or justtable_name
target_table
: Name of the target table. Give database/schema prefix if table is another database/schema than the connection. For example the value can bemy_schema.table_name
or justtable_name
source_hint
: Optional source query hint. Example/*+ parallel(8) */
target_hint
: Optional target statement hint. Example/*+ nologging */
filter
: Filter expression for the source. Examplecolumn_a in ('X', 'Y', 'Z') and column_b=2
batch_size
: Batch select and insert size. Defaults to 1000drop_create
: Drop the target table before insert. Defaults to0
,false
Since some databases do not support boolean, in repository tables, boolean options like
drop_create
are managed by integers. Forfalse
give the value0
fortrue
give any positve integer. In repository data files(see command line options and data file loading) you can usetrue
orfalse
. -
column mapping
: Column mapping pairs between source and target tables. This mappings are stored incolumns
table in repository. Has the following attributes;mapping
: Name of the mapping, this pair belongs. Lookup tomappings
.source
: Name of the source column or an expression. Example:COL_A
orto_char(COL_DATE, 'yyymmdd')
target
: Name of the target column that source will be inserted.
- Make sure you have java 1.8 or greater
- Download archive file from dist
- Extract
lauda-[version].tar.gz
to desired location. - Put the required
jdbc
drivers tolib
folder
- Make sure you have sbt
- Clone the repository
- run
dist.sh
- Use
config.yml
for configuratin
Usage: lauda [-hV] [--columns=<columns>]
[--source-connection=<sourceConnection>]
[--source-table=<sourceTable>]
[--target-connection=<targetConnection>]
[--target-table=<targetTable>] <command>
Loads data between databases https://github.com/bluecolor/lauda
<command> Command to exeucte
See https://github.
com/bluecolor/lauda#command-line-arguments
--columns=<columns> Comma seperated list of columns. Optional
-h, --help Show this help message and exit.
--source-connection=<sourceConnection>
Source connection name
--source-table=<sourceTable>
Source table name. can be schema.table_name or just
table_name
--target-connection=<targetConnection>
Target connection name
--target-table=<targetTable>
Target table name. can be schema.table_name or just
table_name
-V, --version Print version information and exit.
command
command can be one of the following and can take parameters;
repository.up
: Initialize repository. Usesconfig.yml
repository.down
: Drops the repsitory. Usesconfig.yml
repository.import
: Import repository data with parameterrepository.print.connections
: Print connectionsrepository.print.mappings
: Print mappingsrepository.print.columns
: Print source and target columns of given mapping withmapping.delete
: Delete a mapping by name with parametermapping.exists
: Check if mapping exists with parametermapping.run
: Run the mapping given withmapping.create
: Create target table in given mapping withconnection.delete
: Delete connection by nameconnection.test
: Test jdbc connection by name
Examples
Create repository, using the parameters in config.yml
./lauda.sh repository.up
Reset/drop repository, using the parameters in config.yml
./lauda.sh repository.down
Import mapping and connection definitions. See example data file.
./lauda.sh repository.import examples/repository-seed.yml
Print available connections
./lauda.sh repository.print.connections
Print defined mappings
./lauda.sh repository.print.mappings
Print the source and target columns of the given mapping
./lauda.sh repository.print.columns mapping_name
Delete a mapping definition(column mappings also)
./lauda.sh mapping.delete mapping_name
./lauda.sh mapping.delete "Mapping name"
Check if mapping with given name already exists
./lauda.sh mapping.exists mapping_name
Run mapping
./lauda.sh mapping.run mapping_name
Create target table in mapping. Does not run mapping, only created target.
./lauda.sh mapping.create mapping_name
Delete given connection definition
./lauda.sh connection.delete connection_name
./lauda.sh connection.delete "Connection name"
Test the connection
./lauda.sh connection.test connection_name
Generate mapping
./lauda.sh mapping.generate table_4 \
--source-connection oracle_1 \
--target-connection postgre_2 \
--source-table source_table_3 \
--target-table target_table_3
An ETL definiton can be given directly using a database client in repository or using
command line option. With a database client you can execute a script like this one
For command line, use the repsitory.import
command and give data file like
this one. See examples for different options.
Example command line usage;
./lauda.sh repository.import /path/to/data-file.yml
./lauda.sh repository.import ./data-file.yml
For data transfers between two oracle databases you can use elo which uses oracle dblink's and is faster.