Skip to content

miohana/data-preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Example of Data Preprocessing using Python ๐Ÿ

We all produce a lot of data. All the time.

We need to treat all that data in order to make it useful and extract high-quality information from the text, that can be used for predictions and natural language processing.

The main objective here is to give a short information about some tools that data scientist have been using to data mining.

It's important to always focus on the business and see what are the tools that most fit with it.

The language

In this project I used Python, in version 3.6.8.

The content

We are using the content extract from this book, written by Alex Smola, about Machine Learning (great stuff, btw).

About the techniques used

The techniques that we are going to use are:

1-Case alignment

2-Tokenization

3-Stopwords removal

4-Stemming

5-Lemmatization

You can see more information in the notebook, the data-preprocessing.ipynb archive, and the presentation that guides the content, the DataPreProcessing.pdf.

Enjoy! ๐Ÿ’œ

About

๐Ÿ”ฎ Example of data preprocessing using Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published