Skip to content

Heryhelder/book_genres_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Predict book genres

Intro

This project aims to use Machine Learning models (ML) and Natural Language Processing (NLP) to predict a book genre based on it's description.

Data

The original data comes from Goodreads Book Datasets in Kaggle. It was adapted using the script "get_book_genre.py".

The generated data is here. Download it and put into a "new_data" folder.

If you want to go from scratch, you need to download the Goodreads Book Datasets and process it using "get_book_genre.py" file, it took almost 24 hours to run on my PC, so be careful. The new generated data will be in the "new_data" folder.

All the analysis is in the "genre_analysis.ipynb" file.

Dependencies

  • Numpy
  • Pandas
  • NLTK
  • Scikit-learn
  • Regex