- Clone this repo
git clone [email protected]:MichiganDataScienceTeam/googleanalytics.git
If you don't have an SSH key set up on Github, the above will not work. As a temporary solution, use the command below.
git clone https://github.com/MichiganDataScienceTeam/googleanalytics.git
-
Download the data from Google Drive and place it in
./data
-
Unzip the data and make sure they have read permissions
cd data
unzip train.csv.zip
unzip test.csv.zip
unzip sample_submission.csv.zip
chmod +r train.csv test.csv sample_submission.csv
cd ..
- Create a virtualenv named env so that you can prevent version conflicts (this will likely solve any package installation issues you have.)
sudo pip install virtualenv
python -m virtualenv env
- Activate/go into the virtualenv
source env/bin/activate
- Install the required packages.
pip install -r requirements.txt
- Process the train/val split.
python split_train_valid.py
- Make sure the dataset is in the correct place and run the exploration code. Note: removing the
--debug
flag will cause the full dataset to be loaded, which may take a long time on your machine.
python dataset.py --debug
python explore.py --debug
- Create an account on Github and add an SSH key to your account
- Ask @stroud on slack to join the MDST Organization
- Assign yourself to an issue
- Create a branch and write your code
- Submit a pull request when you are done!