A script for migrating data from one media asset management (MAM) system to another.
The script takes user input and calls a set of modules to perform a data migration from one MAM system to another. In this script data is moved from Gorilla EFCS and DIVAArchive to the Dalet Galaxy MAM. An SQLite DB is used to store the metadata and the migration status for each asset.
The script follows the a series of steps:
- Query the two separate dbs (Gorilla and DIVA), export the data to two seperate CSV files.
- Use Pandas to merge the query results based on a common field, and export to a new csv.
- Parse the merged data for rows that contain certain string patterns.
The string patterns are specific to the data that needs to migrate.
Export a new csv of the parsed data. - Clean the parsed data. A metaxml field containing mediainfo is split
out into 7 new columns,
and the data from the XML elements is used to populate the newly created columns.
The bad data from the XML is dropped, some incorrect data is fixed, and empty values are marked as NULL.
If metadata is missing, a best-guess is attempted based on the filename information.
If that is not possible or unsucessful, the field is marked Null. After the info is split out into seperate columns,
the original metaxml column is dropped, and the data is moved into a SQL DB. - There is an optional step to crosscheck all exported data against the information in the DB,
to ensure assets are not migrated more than once. - Based on the user input, a number of XMLs are exported from the DB.
- Optional step to also export the corresponding proxy file along with the XML.
Proxies are exported only for assets with the titletype = 'video'.
Assets with the titletype = 'archive' assets do not have a proxy to export.
main.py
user_input.py
config.py
gorilla_oracle_query.py
diva_oracle_query.py
merge_dbs.py
csv_parse.py
csv_clean.py
crosscheck_assets.py
update_db.py
create_xml.py
get_proxy.py
logging.yaml
- Install prerequisities
- Create a
config.yaml
document with the format:
paths:
root_path:
oracle-db-gor:
user:
pass:
url:
oracle-db-diva:
user:
pass:
url: