Skip to content

smartlin5228/BitTigerCapstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

BitTiger Capstone Project

Project Track

Track 1 - Data Mining and Analysis System for Cloudacl

Project Objective:

  • Building a data pipeline with the functionalities including but not limited to data processing, data storage and data visualization.

  • Extracting data features and making recommendations based on the results

Project Introduction:

Cloudacl is the leading provider of security and infrastructure service that make the Internet safer through integrated Web content filtering. It provides plugin for Chrome, Firefox, and mobile app for android and iPhone. Every browser request will go through our cloud service and will be classified according to customized policy. Our cloud service capture every browsing request including ip, timestamp, url and category of the url.

Team Members:

Chen Zhiting

Haotian Zhang

Song Yu

Sun Yang

Tianyang Lin

Data Format:

Web Log

Sample Data:

179.39.12.146 - - [05/Mar/2017:00:00:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//openload.co/stream/6fF2Dk85Wqw%7E1488758111%7E179.39.0.0%7E99LJRJZj%3Fmime%3Dtrue&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"

Timeline:

  1. Week 1 (Mar 19) - Brainstorming and making plans to the project.

    • Interpret data format
    • Research industrial performance for the expected data system
    • Find suitable techniques to extract valuable data
    • Explore possible services and technical stacks for the project
    • Distribute tasks to team members
  2. Week 2 (Mar 26) - Building Data Management System

    • Build data pipelines, including message queue, data storage and data processing models
    • Ensure the system meet industrial performance requirements and industry practices
  3. Week 3 (Apr 2) - Improvement and Customization

    • Develop data set processes for data modeling, mining and production
    • Create custom software components (e.g. specialized UDFs) and analytics applications
    • Employ data visualization to data reports
  4. Week 4 (Apr 7) - Finishing project

    • Collect performance report
    • Create project doc
    • Make slides for presentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published