This repository showcases assignments from the CIS-5450 Big Data Analytics course. Each assignment highlights key skills and concepts in data science, machine learning, and big data technologies.
- Homework 1: Data Wrangling with Pandas
- Homework 2: SQL with Spotify Data
- Homework 3: Spark SQL and Amazon Reviews
- Homework 4: Machine Learning with Apache Spark ML
- Homework 5: Deep Learning with PyTorch
- Skills Demonstrated: Data cleaning, aggregation, and visualization using Pandas.
- Project Summary:
Analyzed the performance of various airline companies by wrangling and cleaning raw data. - File: homework1.ipynb
- Skills Demonstrated: SQL querying with
pandasql
, text analysis. - Project Summary:
Explored a Spotify dataset containing song reviews and statistics to uncover trends and insights. - File: homework2.ipynb
- Skills Demonstrated: Big data processing with Spark SQL, cluster computing with AWS EMR.
- Project Summary:
Manipulated datasets about Amazon products and their reviews using Spark SQL on an EMR cluster. - File: homework3.ipynb
- Skills Demonstrated: Predictive modeling with Apache Spark ML and Scikit-learn.
- Project Summary:
Built predictive models to estimate ratings of new Airbnb properties. - File: homework4.ipynb
- Skills Demonstrated: Neural network modeling, image classification with PyTorch.
- Project Summary:
Designed a deep learning model to classify images from the CIFAR-10 dataset. - File: homework5.ipynb
- Data wrangling and visualization with Pandas.
- SQL querying with
pandasql
. - Big data processing with Apache Spark SQL and EMR.
- Predictive modeling using Spark ML and Scikit-learn.
- Deep learning with PyTorch for image classification.