This repository contains common functions that I used in data science projects with Python.
- Utils
- Download files
- Machine Learning - Preprocessing
- Normalize the inputs
- Impute missing data
- Text patterns
- Name extractor
- Phone
- Year
- Webscraping
- Extract the content (text) from websites
To clone and install this package, you'll need PIP installed on your computer. From your command line:
# Update pip
pip install --upgrade pip
# Install the latest master of vichShir
pip install git+https://github.com/vichShir/datascience-utils-python.git
from vichshir.cleaning.text_matching.nlp import NameExtractor
txt_person = '''
Existem muitos sistemas de ERP. Thiago Fulano da Silva é CTO e desenvolvedor de um poderoso sistema de ERP, também coordena uma equipe, João Sicrano da Costa e Pedro Beltrano.
'''
extractor = NameExtractor()
persons = extractor.extract_names(txt_person)
persons
This software uses the following open source packages:
- Pandas
- Numpy
- Scikit-learn
- Transformers
- Beautifulsoup
Apache 2.0