Tools Used: Data Profiling: Altreyx Data Cleaning: Talend ETL tool: Talend SQL Servers: MySQL, MSSQL Data Visualization: PowerBi and Tableau
IMDB Movies Analysis Using Talend
Tools & Technologies Used: Talend, ER Studio, Altreyx, Microsoft SQL Server, MySQL, Tableau, Azure Data Studio, PowerBi
• Executed data integration from diverse sources including MySQL (IMDb tables), TSV (revenue data), and JSON files (movie titles and actor name changes), ensuring comprehensive data consolidation
• Conducted in-depth data profiling and analysis using Alteryx, producing detailed reports and insights, complemented by a meticulous mapping document in Excel
• Developed a robust data model focusing on an SCD Type 2 Movie Titles Dimension table, enhancing data accuracy and historical tracking
• Designed and implemented ETL mappings in Talend, utilizing metadata-based connections, contexts, and environments, to streamline data processing workflows
• Created dynamic and interactive dashboards in Power BI and Tableau, ensuring SQL script outputs were consistent with visualized data, effectively communicating key metrics and trends
1.Alteryx:
Alteryx Workflow: Understanding data
![image](https://private-user-images.githubusercontent.com/114325852/291817479-4ed20b48-d853-4f0d-beaf-2e43c8807d98.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjI2NjgzMzcsIm5iZiI6MTcyMjY2ODAzNywicGF0aCI6Ii8xMTQzMjU4NTIvMjkxODE3NDc5LTRlZDIwYjQ4LWQ4NTMtNGYwZC1iZWFmLTJlNDNjODgwN2Q5OC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODAzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgwM1QwNjUzNTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wYWEwNGZlYzE4YzExZGNiMzhlMTBjZGFiZjIxYzExMDA2ZTE2ZTA3YjUzMTBiN2Q0N2NhMGZkODZmYzAyYWY1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.iabnh7LUGdTwQe67YXwoSm9meqf46YmsKq1taN1N0jU)
Finding:
- Rank: The movie's rank varied from 1 to 55 during its box office run, and it contains “-” values as well
- Gross: Daily gross earnings ranged from a minimum of $357 to a maximum of about $28.27 million.
- Per Theater: Earnings per theater varied between $60 and $8,181.
- Total Gross: The cumulative gross earnings increased, reaching approximately $760.51 million.
- Days: The dataset covers 336 days from the movie's release.
- %LW and %YD contain null values
Insights and Observations
- Strong Initial Performance: "Avatar" had a powerful opening, indicated by the high initial daily and per-theater gross.
- Longevity in Theaters: The movie remained in theaters for a significant duration (336 days), highlighting its lasting appeal.
- Consistent Top Rankings: The movie consistently ranked well during its theatrical run despite fluctuations.
- Revenue Stability: After the initial spike, the total gross showed stability, indicating a steady influx of viewers over an extended period.
2. Navicat: For designing Data Model Dimensional Model:
3.Talend Workflow Screenshots
Bridge Tables:
Movie-Genre Bridge table:
Movie-Region Bridge Table:
Fact Tables:
BoxOfcFact:
FactTitle Principal:
Genre Fact:
Visualization Using Power BI (https://app.powerbi.com/groups/4245cd51-53a4-4aac-984f-18f6bde6a73e/reports/07948f86-f53d-4286-b8c0-efee8aaf52e1/ReportSection185e58af7ba5a1c2e3ef?experience=power-bi):