pgsync

Sync Postgres data to your local machine. Designed for:

speed - up to 4x faster than traditional tools on a 4-core machine
security - built-in methods to prevent sensitive data from ever leaving the server
convenience - sync partial tables, groups of tables, and related records

🍊 Battle-tested at Instacart

Installation

pgsync is a command line tool. To install, run:

gem install pgsync

This will give you the pgsync command.

In your project directory, run:

pgsync --setup

This creates .pgsync.yml for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). pgsync commands can be run from this directory or any subdirectory.

How to Use

Sync all tables

pgsync

Note: pgsync assumes your schema is already set up on your local machine. See the schema section if that’s not the case.

Sync specific tables

pgsync table1,table2

Sync specific rows (existing rows are overwritten)

pgsync products "where store_id = 1"

You can also preserve existing rows

pgsync products "where store_id = 1" --preserve

Or truncate them

pgsync products "where store_id = 1" --truncate

Exclude Tables

pgsync --exclude users

To always exclude, add to .pgsync.yml.

exclude:
  - table1
  - table2

For Rails, you probably want to exclude schema migrations and ActiveRecord metadata.

exclude:
  - schema_migrations
  - ar_internal_metadata

Groups

Define groups in .pgsync.yml:

groups:
  group1:
    - table1
    - table2

And run:

pgsync group1

You can also use groups to sync a specific record and associated records in other tables.

To get product 123 with its reviews, last 10 coupons, and store, use:

groups:
  product:
    products: "where id = {1}"
    reviews: "where product_id = {1}"
    coupons: "where product_id = {1} order by created_at desc limit 10"
    stores: "where id in (select store_id from products where id = {1})"

And run:

pgsync product:123

Schema

Sync schema

pgsync --schema-only

Specify tables

pgsync table1,table2 --schema-only

Note: --schema-only will not sync non-table objects like functions, extensions etc unless used in conjunction with --no-constraints and --add-constraints (see next section)

Managing integrity checks

If your schema has referential integrity checks, you should do a complete DB sync by dropping the exising database first. Then you can import the schema without any constraints/triggers with --no-constraints, and then pass --add-constraints when syncing the data:

pgsync --schema-only --no-constraints
pgsync --add-constraints

If you're syncing partial data/tables, you must make sure that the data does not violate any constraints, otherwise pgsync --add-constraints will fail

Sensitive Information

Prevent sensitive information - like passwords and email addresses - from leaving the remote server.

Define rules in .pgsync.yml:

data_rules:
  email: unique_email
  last_name: random_letter
  birthday: random_date
  users.auth_token:
    value: secret
  visits_count:
    statement: "(RANDOM() * 10)::int"
  encrypted_*: null

last_name matches all columns named last_name and users.last_name matches only the users table. Wildcards are supported, and the first matching rule is applied.

Options for replacement are:

null
value
statement
unique_email
unique_phone
random_letter
random_int
random_date
random_time
random_ip
untouched

Multiple Databases

To use with multiple databases, run:

pgsync --setup db2

This creates .pgsync-db2.yml for you to edit. Specify a database in commands with:

pgsync --db db2

Safety

To keep you from accidentally overwriting production, the destination is limited to localhost or 127.0.0.1 by default.

To use another host, add to_safe: true to your .pgsync.yml.

Large Tables

For extremely large tables, sync in batches.

pgsync large_table --in-batches

The script will resume where it left off when run again, making it great for backfills.

Reference

Help

pgsync --help

Version

pgsync --version

Setup Scripts

Use groups when possible to take advantage of parallelism.

For Ruby scripts, you may need to do:

Bundler.with_clean_env do
  system "pgsync ..."
end

Upgrading

Run:

gem install pgsync

To use master, run:

gem install specific_install
gem specific_install ankane/pgsync

Thanks

Inspired by heroku-pg-transfer.

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
exe		exe
lib		lib
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
config.yml		config.yml
pgsync.gemspec		pgsync.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pgsync

Installation

How to Use

Exclude Tables

Groups

Schema

Managing integrity checks

Sensitive Information

Multiple Databases

Safety

Large Tables

Reference

Setup Scripts

Upgrading

Thanks

Contributing

About

Uh oh!

Releases

Packages

Languages

License

arshsingh/pgsync

Folders and files

Latest commit

History

Repository files navigation

pgsync

Installation

How to Use

Exclude Tables

Groups

Schema

Managing integrity checks

Sensitive Information

Multiple Databases

Safety

Large Tables

Reference

Setup Scripts

Upgrading

Thanks

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages