Example of Data Preprocessing using Python 🐍

We all produce a lot of data. All the time.

We need to treat all that data in order to make it useful and extract high-quality information from the text, that can be used for predictions and natural language processing.

The main objective here is to give a short information about some tools that data scientist have been using to data mining.

It's important to always focus on the business and see what are the tools that most fit with it.

The language

In this project I used Python, in version 3.6.8.

The content

We are using the content extract from this book, written by Alex Smola, about Machine Learning (great stuff, btw).

About the techniques used

The techniques that we are going to use are:

1-Case alignment

2-Tokenization

3-Stopwords removal

4-Stemming

5-Lemmatization

You can see more information in the notebook, the data-preprocessing.ipynb archive, and the presentation that guides the content, the DataPreProcessing.pdf.

Enjoy! 💜

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DataPreProcessing.pdf		DataPreProcessing.pdf
README.md		README.md
data-preprocessing.ipynb		data-preprocessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Example of Data Preprocessing using Python 🐍

The language

The content

About the techniques used

About

Uh oh!

Releases

Packages

Uh oh!

Languages

miohana/data-preprocessing

Folders and files

Latest commit

History

Repository files navigation

Example of Data Preprocessing using Python 🐍

The language

The content

About the techniques used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages