-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Master Shifu
committed
Apr 29, 2024
1 parent
62a7c85
commit 0c54c5e
Showing
4 changed files
with
62 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
.idea | ||
.idea/* | ||
.venv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,55 @@ | ||
# goodgrief | ||
Analysis of IMDb datasets | ||
[🔙 Back to my profile](https://shefaliisharma.github.io/) | ||
|
||
<!-- TOC --> | ||
|
||
<!-- TOC --> | ||
|
||
This project contains the analysis of [IMDb Dataset](https://developer.imdb.com/non-commercial-datasets/) | ||
|
||
## Objectives: | ||
1. How have genres and viewer preferences evolved over the years within the content industry for English Language, and how do these trends differ across regions and languages? | ||
|
||
|
||
## Dataset & Methodology: | ||
The analysis was performed using PostgreSQL queries. The dataset was queried to extract relevant information and answer the research questions. The queries used in the analysis are provided in the results section below. | ||
|
||
The output from SQL queries were loaded into Tableau. My local setup for achieving the above consisted of: | ||
|
||
- PostgreSQL server running on localhost on my Mac OS Sonoma | ||
- Datagrip for querying the database and exploratory data analysis | ||
- Tableau for Visualizations | ||
|
||
## Analysis: | ||
|
||
### Trends within Genres of English Language Content: | ||
|
||
I've crafted a query that dissects the ever-changing trends in genres using IMDb's extensive database. By transforming the genre information from a single string into individual elements, I’ve prepared the data to showcase the number of films and average ratings for each genre per year. | ||
|
||
**The Process:** | ||
Expand Genres: With a CTE, I convert the list of genres for each title into separate rows using PostgreSQL's UNNEST and STRING_TO_ARRAY functions. | ||
Aggregate Insights: Joining the expanded genres with the IMDb dataset, I focus on English language titles and known regions, filtering out any unknowns. | ||
Calculate Metrics: I compute the total number of titles (title_count) and their average IMDb rating (average_rating) for each genre annually. | ||
|
||
```sql | ||
WITH GenreExpansions AS ( | ||
SELECT | ||
imdb_basic.tconst, | ||
UNNEST(STRING_TO_ARRAY(genres, ',')) AS genre_split, | ||
startyear, | ||
averagerating | ||
FROM imdb_basic JOIN imdb_ratings ON imdb_basic.tconst = imdb_ratings.tconst | ||
) | ||
SELECT | ||
genre_split, | ||
startyear, | ||
COUNT(tconst) AS title_count, | ||
AVG(averagerating) AS average_rating | ||
FROM GenreExpansions | ||
JOIN imdb_akas ON imdb_akas.titleid = GenreExpansions.tconst | ||
JOIN imdb_country_codes ON imdb_country_codes.region_code = imdb_akas.region | ||
WHERE language = 'en' AND region_name != 'Unknown' | ||
GROUP BY genre_split, startyear | ||
ORDER BY genre_split, startyear; | ||
``` | ||
|
||
[![Visual01](assets/viz1.png)](https://public.tableau.com/views/IMDbdatasetGenreTimeSeries/Ratingsanalysisovertheyears?:language=en-US&:sid=&:display_count=n&:origin=viz_share_link) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
remote_theme: pages-themes/[email protected] | ||
title: [IMDb Dataset Analysis] | ||
description: [Project to showcase my Data Analysis skills] | ||
show_downloads: "false" |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.