Skip to content

mariamabdelati/lego_big_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lego Data Analysis Project

Individual Big Data Programming Project -- University Junior Year

Mariam Abdelati

Project Description

Before beginning my main data science project, I originally planned to analyse lego data and determine the relationship between the feature of Lego sets such as the price, number of pieces and the theme and how they correlate with each other. The analysis should be able to answer the following questions:

  1. How does the theme of a Lego set affect its price?
  2. Which Lego sets have the most pieces? Is there a common theme between the top Lego sets?
  3. Which are the most common themes for Lego sets?
  4. What are the number of Lego sets made for each theme?
  5. What are the top age groups for Lego sets? Is there a common theme between the top ages?
  6. Which Lego sets have the most minifigs? Is there a common theme between the top Lego sets?

I was able to create a scraping tool using R that scraped a total of 19,239 unique records with 16 unique attributes. Due to some changes in my aims and scope and wanting to include textual data, I decided to change the scope for my individual project and analyse lyrics instead. This scrapped data will be used later for a personal project to gain more experience in big data programming.

Libraries Used

  • tidyverse
  • dplyr
  • rvest

Releases

No releases published

Packages

No packages published

Languages