Euphrates

Periodic table copier from mysql to redshift.

Hows it work?

Take a list of tables
Use mysqldump --xml to turn each table into xml.
Transform the structure into a table named "_$name_new" where $name is the target table.
Transform the table data into JSON and store in s3 as segments.
Load segments into Redshift using COPY command.
Once all tables are loaded, perform swap of tables.

Fragile by design

If any error is encountered, it immediately quits! If a record is unable to be copied, it immediately quits! Either your tables copied 100%, or it didn't happen.

Running

Create a valid config.json and run using mvn exec:exec.

Config

{
  "mysql": {
    "user": "root",
    "password": "xxx",
    "host": "some.rds.host",
    "port": 3306,
    "database": "db_name",
    "maxConnections": 6
  },
  "redshift": {
    "user": "root",
    "password": "xxx",
    "host": "something.us-east-1.redshift.amazonaws.com",
    "port": 5439,
    "database": "hotcar",
    "maxConnections": 4,
    "schema": "public"
  },
  "tables": [
    {
      "name": "users",
      "extra": "DISTKEY (user_id) INTERLEAVED SORTKEY (created_at, updated_at)",
      "columns": {
        "user_id": "integer not null ENCODE DELTA",
        "user_id": "integer not null ENCODE DELTA"
      }
    }
  ],
  "s3": {
    "bucket": "some-transfer-bucket",
    "region": "us-east-1",
    "accessKey": "xxx",
    "secretKey": "xxx",
    "minimumSegmentSize": 20000000
  }
}

Known Issues

Inconsistent snapshotting

Because each table is snapshotted independently, the start time for each table copy is slightly different. Can be fixed by acquiring current binlog coordinates first, and using that as the basis for each dump.

Not using binlog

Hard to do bulk updates/inserts/deletes reliably and in a timely manner into redshift. Performance concerns. (If you know how tell me!)

Operations is tough!

I agree! Make a web interface, do all the scheduling in one continous process, allow for manual snapshots.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
src/main		src/main
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Euphrates

Hows it work?

Fragile by design

Running

Config

Known Issues

Inconsistent snapshotting

Not using binlog

Operations is tough!

About

Releases

Packages

Languages

License

beyond12/euphrates

Folders and files

Latest commit

History

Repository files navigation

Euphrates

Hows it work?

Fragile by design

Running

Config

Known Issues

Inconsistent snapshotting

Not using binlog

Operations is tough!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages