Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.36 KB

README.md

File metadata and controls

22 lines (16 loc) · 1.36 KB

Lego Data Analysis Project

Individual Big Data Programming Project -- University Junior Year

Mariam Abdelati

Project Description

Before beginning my main data science project, I originally planned to analyse lego data and determine the relationship between the feature of Lego sets such as the price, number of pieces and the theme and how they correlate with each other. The analysis should be able to answer the following questions:

  1. How does the theme of a Lego set affect its price?
  2. Which Lego sets have the most pieces? Is there a common theme between the top Lego sets?
  3. Which are the most common themes for Lego sets?
  4. What are the number of Lego sets made for each theme?
  5. What are the top age groups for Lego sets? Is there a common theme between the top ages?
  6. Which Lego sets have the most minifigs? Is there a common theme between the top Lego sets?

I was able to create a scraping tool using R that scraped a total of 19,239 unique records with 16 unique attributes. Due to some changes in my aims and scope and wanting to include textual data, I decided to change the scope for my individual project and analyse lyrics instead. This scrapped data will be used later for a personal project to gain more experience in big data programming.

Libraries Used

  • tidyverse
  • dplyr
  • rvest