Skip to content

Group project for course 05-839: Interactive Data Science. We created a beginner's guide to classification algorithms.

License

Notifications You must be signed in to change notification settings

jujuwong21/fp-classification_clarification

 
 

Repository files navigation

CMU Interactive Data Science Final Project

Abstract

The goal of our project is to explain classification algorithms to readers without prior knowledge in machine learning. Our solution is to create an interactive “scrollytelling” narrative. Our narrative is centered around a case study where readers see whether education and age have an impact on income level. The narrative first introduces the dataset, allowing users to understand the basis of our models through exploratory data analysis. Next, we cover some of the inherent problems common to all machine learning problems, such as the concept of training data and overfitting. Finally, we introduce the three classification algorithms, K-Nearest Neighbors, Decision Trees, and Logistic Regression. In these sections, we include interactive visualizations that allow the user to see how changing the hyperparameters can affect the results of the model. We hope that by reading and interacting with the article, readers will understand at a high-level how these different classifications work, and that there are multiple ways to achieve similar outcomes with machine learning.

Work distribution

Juliette

I primarily worked on the visualizations for the report. I created the visualizations for the EDA and mystery man section using vega-lite, and I helped use idyll to make the decision tree visualization interactive. Additionally, I created the content (text and graphs) for the logistic regression section, and ran the algorithms to compute accuracies for the model selection section.

Nathan

I was primarily responsible for the styling of our web application, which meant writing custom CSS and bolding text. Additionally, since I am the only team member with JavaScript experience, I had to build out the custom React components for our application as well as implement a lot of our app's functionality. Also helped build the decision tree in d3 and brainstormed and helped out with the EDA and mystery person. Finally, I was responsible for figuring out how to deploy our Idyll application to GitHub pages.

Christian

I mainly handled the data side of things including creating the dataset and structures for the decision trees. I created the interactive visualisations for decision trees boundaries and KNN in Vega-lite and helped to visualize the trees in D3. I also wrote some of the text that explained the inner-workings of the classification algorithms.

Laura

For this project, I was primarily in a product management role. I managed the team's shared folders, organized meetings, and helped to define the narrative. I also helped with the layout design, sourced images and gifs, wrote narrative text, and inserted them into the Idyll application.

Project Process

Our project progressed relatively slowly in the initial stage as we had to gain familiarity with the Idyll markup language. The major challenge here was integrating Idyll with D3 and vega-lite components, which contained limited documentation. For datasets, we initially used the Iris flower dataset, however we later settled on the 1994 Census dataset to perform income prediction. We created a plan for the 3 algorithms we wanted to cover and the technologies we would use to visualize them. This allowed us to simultaneously work on the text contents and visualizations. In order to promote seamless transitions from text to visualizations, we changed the text contents iteratively as we created updated versions of our visualizations. In the end stage of the project, we focused on making stylistic adjustments in Idyll based on group feedback and adding decoration to our visualizations.

Deliverables

Proposal

  • The URL at the top of this readme needs to point to your application online. It should also list the names of the team members.
  • A completed proposal. The contact should submit it as a PDF on Canvas.

Design review

  • Develop a prototype of your project.
  • Create a 5 minute video to demonstrate your project and lists any question you have for the course staff. The contact should submit the video on Canvas.

Final deliverables

  • All code for the project should be in the repo.
  • A 5 minute video demonstration.
  • Update Readme according to Canvas instructions.
  • A detailed project report. The contact should submit the video and report as a PDF on Canvas.

Running the Project

  • Install npm (follow this link)
  • Install idyll
    • npm install -g idyll
  • Install dependencies for idyll
    • npm i
  • Use es5 version for vega-lite and vega-embed (must be manually done)
    • For vega-lite: Go to node_modules/vega-lite, copy vega-lite.js from build-es5 directory to build directory
    • For vega-embed: Go to node_modules/vega-embed, copy vega-embed.js from build-es5 directory to build directory
  • Run the project
    • idyll

About

Group project for course 05-839: Interactive Data Science. We created a beginner's guide to classification algorithms.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 78.2%
  • JavaScript 16.5%
  • HTML 3.9%
  • CSS 1.4%