Before beginning, make sure your environment is properly set up. This was documented in Hands-On 0: https://github.com/ucsc-cse-40/HO0
In this assignment, you will start with an unprocessed, real-world dataset and clean the dataset so that it is ready for machine learning algorithms.
The objective of this assignment is for you to learn about:
- Finding errors and inconsistencies in data.
- Extracting key pieces of information from a data column.
- Assigning types to data columns.
- Understanding and finding outliers in your data.
- Replacing missing pieces of data.
- Encoding data.
- Joining data.
Submit your assignment by running python3 -m cse40.autograder submit
from your local repository directory.
This script will check for your cruzid, password, and assignment id in config.json
and submit your work to a server controlled by the TAs where tests will be run, reporting the results back to you.