Skip to content

Latest commit

 

History

History
120 lines (88 loc) · 8.39 KB

File metadata and controls

120 lines (88 loc) · 8.39 KB

Data visualization - netflix Movies and TV Shows

The goal of this notebook is to share data visualization tools and techniques along to communicate practiques and methods to an efficient and interesting data storytelling. Join me on this funny journey ☕

Tools

Python Matplotlib Plotly Numpy Pandas PyCharm

Dataset

The Netflix Movies and TV Shows Dataset from Bansal, S. (2021), is a tabular dataset consists of listings of all the movies and tv shows available on Netflix, along with details such as - cast, directors, ratings, release year, duration, etc.

Netflix is one of the most popular media and video streaming platforms. They have over 8000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. [1]

Attributes

  • Type (movie or TV show)
  • Title
  • Director
  • Cast
  • Country
  • Date added
  • Release year
  • Rating (TV-MA, TV-14, TV-PG, etc)
  • Duration (in minutes if it's movie or seasons if it's TV show)
  • Listed in (category)
  • Description

Type of content distributtion

As we can observe, the Netflix catalog has approximately 70% movies over 30% TV shows. Also we can see that the catalog contains 6131 movies and 2676 TV shows, that's 8807 elements of content!!

netflix

Now, we have interest in see how is the distribution of content in different countries, so we choose 8 arbitrary countries. Notice how every country keeps the relation between movies and TV shows, except for India and Japan. India has almost 92% of movies, in the other hand Japan has almost 63% for TV shows (maybe cause its affinity to anime anime ) netflix

Director

The attriburte Director is the one with most missing data, but as a exercise with wanted to show the top 15 directors who appear the most.

netflix

Cast

The attriburte Cast is the third with most missing data, but also, the attribute with the mst number of elements, it contains almost 40000 actors, so as a exercise with wanted to show the top 40 actors who appear the most.

netflix

Country

In the following chart, we can see the elements that every country has, as expected, the US has the most ammount of elements, followed by India and UK, the following countries has a simillar proportion with each other.

netflix

Date added

Ih the following plot, we can observe the ammount of elements that were added to the Netflix catalog, notice that in gray color are the movies and in red color are the TV shows, also the black vertical bars indicate a different year. From this plot we can notice several thigs:

  • the ammount of items per movie and per TV shows has increase over the years
  • the months where the elements are added tend to be in the beginning and ending of the year
  • the porportion movies/TV shows remains over the years netflix

Release year

In this plot we can infer several things:

  • in the 2000 decade the amount of movies/TV shows increased considerably
  • after 2018 the amount decrease, maybe because un updated data or maybe because pandemic
  • the plot follows an exponential distributtion netflix

Rating

In this pie chart, we can observe the rating distributtion taht the Betflix database has, as we can notice, almost 40% is for TV-MA audidence, follow by TV-14 and TV-PG, as we expected, because the goal of Netflix is to capture the attention of a global audience. Of course, it has content restricted for adults and in counterpart, content dedicated for kids, bith in less proportions.

netflix

Duration

In ths chart of TV shows seasons we can see the almost all the TV shows has 1 seasons, followed by 2, 3, and so on.

netflix

In the case of Movies, the average tend to be 100 minutes which is approximately 1 hour 40 minutes.

netflix

Listed in

For the category, we can observe the top categories in a wordcloud, the bigger the word the most appear in the catalog.

netflix

In addition, we wanted to know the top 4 categories in 8 arbitrary countries. As we can expect, every country have different tastes for each category. netflix

Description

Finally, we made an interesting wordcloud: We take the synopsis of all the 8807 elements ad extract the 150 most repeated words, those words where placed over the Netflix logo and the result was the following wordcloud!! netflix

References

[1] Bansal, S. (2021). Netflix Movies and TV Shows. Kaggle. https://www.kaggle.com/datasets/shivamb/netflix-shows