Skip to content

Commit

Permalink
Setup README and added first chart
Browse files Browse the repository at this point in the history
  • Loading branch information
Master Shifu committed Apr 29, 2024
1 parent 62a7c85 commit 0c54c5e
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 2 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.idea
.idea/*
.venv
57 changes: 55 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,55 @@
# goodgrief
Analysis of IMDb datasets
[🔙 Back to my profile](https://shefaliisharma.github.io/)

<!-- TOC -->

<!-- TOC -->

This project contains the analysis of [IMDb Dataset](https://developer.imdb.com/non-commercial-datasets/)

## Objectives:
1. How have genres and viewer preferences evolved over the years within the content industry for English Language, and how do these trends differ across regions and languages?


## Dataset & Methodology:
The analysis was performed using PostgreSQL queries. The dataset was queried to extract relevant information and answer the research questions. The queries used in the analysis are provided in the results section below.

The output from SQL queries were loaded into Tableau. My local setup for achieving the above consisted of:

- PostgreSQL server running on localhost on my Mac OS Sonoma
- Datagrip for querying the database and exploratory data analysis
- Tableau for Visualizations

## Analysis:

### Trends within Genres of English Language Content:

I've crafted a query that dissects the ever-changing trends in genres using IMDb's extensive database. By transforming the genre information from a single string into individual elements, I’ve prepared the data to showcase the number of films and average ratings for each genre per year.

**The Process:**
Expand Genres: With a CTE, I convert the list of genres for each title into separate rows using PostgreSQL's UNNEST and STRING_TO_ARRAY functions.
Aggregate Insights: Joining the expanded genres with the IMDb dataset, I focus on English language titles and known regions, filtering out any unknowns.
Calculate Metrics: I compute the total number of titles (title_count) and their average IMDb rating (average_rating) for each genre annually.

```sql
WITH GenreExpansions AS (
SELECT
imdb_basic.tconst,
UNNEST(STRING_TO_ARRAY(genres, ',')) AS genre_split,
startyear,
averagerating
FROM imdb_basic JOIN imdb_ratings ON imdb_basic.tconst = imdb_ratings.tconst
)
SELECT
genre_split,
startyear,
COUNT(tconst) AS title_count,
AVG(averagerating) AS average_rating
FROM GenreExpansions
JOIN imdb_akas ON imdb_akas.titleid = GenreExpansions.tconst
JOIN imdb_country_codes ON imdb_country_codes.region_code = imdb_akas.region
WHERE language = 'en' AND region_name != 'Unknown'
GROUP BY genre_split, startyear
ORDER BY genre_split, startyear;
```

[![Visual01](assets/viz1.png)](https://public.tableau.com/views/IMDbdatasetGenreTimeSeries/Ratingsanalysisovertheyears?:language=en-US&:sid=&:display_count=n&:origin=viz_share_link)
4 changes: 4 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
remote_theme: pages-themes/[email protected]
title: [IMDb Dataset Analysis]
description: [Project to showcase my Data Analysis skills]
show_downloads: "false"
Binary file added assets/viz1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0c54c5e

Please sign in to comment.