Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 716 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 716 Bytes

Principal Component Analysis for Semantic Classification

Final Project for AMATH 582: Computational Methods for Data Analysis

Principal component analysis (PCA) and classification via supervised learning are two popular topics in data science today. In our project, we combine techniques from both areas in order to classify news articles based on their word frequency content. We find that we can accurately classify the data by projecting onto a small subset of principal components, reducing the feature space from nearly 10,000 elements to only 4. We also compare results from the traditional and robust PCA formulations, and discuss what additional semantic information can be inferred from our results.