In response to the client's need to reduce costs associated with Snowflake, this project aimed to migrate marketing data from Snowflake to Databricks. The project focused on data modeling, transformation, and transportation to ensure data accuracy, accessibility, and reliability.
The primary objectives of this project were:
- Data Transformation: Restructuring and transforming marketing data to align with the Databricks environment. This was achieved using dbt (Data Build Tool) to automate the creation of data models and transformations, ensuring accuracy, consistency, and optimal performance.
- Migration Strategy: Developing and executing a strategy to migrate data seamlessly from Snowflake to Databricks, minimizing downtime and preventing data loss.
- Optimization: Enhancing data processing capabilities and optimizing query performance in the Databricks environment.
As part of this project, the responsibilities included:
- Ensuring all properties of the data were preserved during migration.
- Gathering marketing data from various sources, both structured and unstructured, to ensure comprehensive and reliable data availability across the company.
- Conducting stakeholder meetings to align the data modeling and migration process with organizational requirements and expectations.
Throughout the project, data privacy was a top priority. Fictitious names and data obfuscation techniques were employed to protect sensitive information without compromising the project's integrity or functionality. Additionally, several properties of the original code have been changed due to confidentiality requirements. These changes do not affect the overall functionality and performance of the code.
The project is organized as follows:
- Data Modeling: Scripts and processes for restructuring marketing data to fit Databricks schemas.
- SQL Files: SQL scripts for creating tables, views, and performing data transformations.
- Documentation: Detailed project documentation, data dictionaries, and README files.
- Tests: Procedures to ensure data quality and integrity post-transformation.
Contributions are welcome! If you have suggestions, improvements, or feedback, please submit an issue or a pull request.