Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ETL scenarios: dump a full ez installation, transform it, reimport it somewhere else #57

Open
gggeek opened this issue Aug 23, 2016 · 11 comments
Milestone

Comments

@gggeek
Copy link
Member

gggeek commented Aug 23, 2016

In short: we could do a full ezp db dump, doing a breadth-first descent and making sure that user accounts are dumped 1st and contents 2nd.

What would be missing is the handling of object relations: since in ezp relations can be circular, the import should allow creating content even with broken object-relations, and do multiple passes to fix those after all contents have been created.

Note: this most likely depends on issues #34 and #46.

@gggeek gggeek added this to the 4.0 milestone Aug 23, 2016
@gggeek
Copy link
Member Author

gggeek commented Oct 22, 2016

Prerequisite: #55

@gggeek
Copy link
Member Author

gggeek commented Oct 22, 2016

Prerequisite: #56

@gggeek
Copy link
Member Author

gggeek commented Oct 22, 2016

Prerequisite: #54

@gggeek
Copy link
Member Author

gggeek commented Oct 22, 2016

Prerequisite: #34

@gggeek
Copy link
Member Author

gggeek commented Oct 22, 2016

More prerequisites:

  • a migration loader that scans directories recursively, as 1M files can not be in a single directory
  • a 'migrate' command that uses parallel processing to import contents for speed
  • a command that drops the existing migration table, to ease tests / multiple executions (or allows removing migrations based on regexp matching on name or path)
  • a command that drops all contents except the top-level folders and admin+anon users, their 3 sections, anon and admin roles, for cleanup of target db (this could be probably achieved with existing migration steps, but for speed it is probably better to use custom SQL queries...)
  • a 'generate' migration which actually saves files to disk, to be used by the high-level 'export' command
  • a way to add settings to tune the way migration definitions are created for content export
  • an 'export' command that splits work in parallel threads
  • an 'upsert' migration for the cases where the target installation already has contents (Support upsert migrations #245)
  • a flexible way of matching contents between source and export databases by id/remote-id mappers

@gggeek
Copy link
Member Author

gggeek commented Mar 14, 2017

Prerequisite: #102

@gggeek
Copy link
Member Author

gggeek commented Nov 12, 2017

Steps forward in release 4.4

@gggeek
Copy link
Member Author

gggeek commented Nov 25, 2018

Steps forward in release 5.4.1 and 5.5

@gggeek
Copy link
Member Author

gggeek commented Nov 30, 2018

Steps forward in release 5.6

@gggeek
Copy link
Member Author

gggeek commented Dec 15, 2018

Bugfixes in 5.7.3

@gggeek
Copy link
Member Author

gggeek commented Nov 4, 2020

Step forward in release 5.13: Upserts are now possible, albeit in an impractical way:
create a migration with 3 steps:

  1. load target item by identifier, with allow_null_results and set reference to count
  2. create item with if condition: reference equals 0
  3. update item with if condition: reference equals 0

@blankse I know this is way late, but it probably is a stepping stone for simplified upsert steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant