Lil-Data

The official repository for Team Lil Data, CMPT 732 final project at Simon Fraser University.

To visit the site please visit https://www.devxia.com/Lil-Data

Repository structure:

2015 sample data: sample Twitch stream dataset, shared by from Mr. Cong Zhang ([email protected])
analysis: Python code where we do analysis on 2015, 2018 Twitch, GiantBomb.com dataset, mostly using Apache Spark
- most_popular_category.py: Evaluate the popularity of a category using four different perspectives
- Best_time_of_stream.py: Evaluate the best time in a day for streaming
data collecting: Crawler code that collect data from Twitch official api every 30 minutes, and fetch all games information from GiantBomb.com
- crawl_twitch.py: Made use of twitch api to access and download live streaming data from twitch database. The work was running with multi-threads every half an hour, from 11/13/2018 to 11/26/2018. About 50 gigabytes of data were collected at the end
- fetch_games.py: Used api provided by Giantbomb.com to collect game data from its database
- get_giantbomb_genre.py: With guid of game, which was collected from Giantbomb api, call another api from Giantbomb to get detailed game information, including genres
ETL: Extract, transform, load code written in Python, using Apache Spark
- twitch_raw_data_clean.py: grab useful features for stream objects and channel objects from dirty json files
- twitch_dataframe_ETL.py: reconstruct the dataframe using customized schema
- giantbomb_game_info_ETL.py: grab useful features for game objects from dirty json files
- join_with_giantbomb.py: join stream with game, creating a new table including both stream and game information
- read_guid.py: get game genres from dirty json files
frontend: web frontend written in JavaScript, mostly in React.js
docs: production build of react web frontend

Local Setup

Yarn is our package manager of choice for the frontend project, not to be confused with Apache Hadoop YARN since this is also a "Big Data" project.

To setup environment for the frontend app (/frontend), do yarn install or just yarn.

To start, do yarn start

To perform static type checking, do yarn flow

Analysis Results

Under 'results' folder

Top 20 most popular categories by evaluating their: https://github.com/harrisonxia/Lil-Data/blob/master/Analysis/popular_categories_time_frames.py
1. Number of reviews on GiantBomb (which are all 0, though)
2. Number of live streams that were broadcasting a game in this category
3. Number of viewers that are watching a game in this category
4. Number of followers that are following a channel that broadcasts this category of game
Best time frame to broadcast stream by evaluating: https://github.com/harrisonxia/Lil-Data/blob/master/Analysis/Best_time_of_stream.py
1. Number of streams:
  1. Number of streams in each time frame in each day
  2. Sum of number of streams in each time frame throughout entire data collecting period
  3. A trend of change of number of streams for each time frame throughout all days
2. Number of veiwers:
  1. Number of viewers in each time frame in each day
  2. Sum of number of viewers in each time frame throughout entire data collecting period
  3. A trend of change of number of viewers for each time frame throughout all days
Viewers distribution among days of week: https://github.com/harrisonxia/Lil-Data/blob/master/Analysis/views_in_dow.py

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
2015 sample data/data sample set		2015 sample data/data sample set
Analysis		Analysis
ETL		ETL
data collecting		data collecting
docs		docs
frontend		frontend
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RUNNING.txt		RUNNING.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lil-Data

To visit the site please visit https://www.devxia.com/Lil-Data

Repository structure:

Local Setup

Analysis Results

About

Releases

Packages

Contributors 4

Languages

License

harrisonxia/Lil-Data

Folders and files

Latest commit

History

Repository files navigation

Lil-Data

To visit the site please visit https://www.devxia.com/Lil-Data

Repository structure:

Local Setup

Analysis Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages