VerticaPy

⭐ 2023-12-01: VerticaPy secures 200 stars.

📢 2020-06-27: Vertica-ML-Python has been renamed to VerticaPy.

⚠️ The old website has been relocated to https://www.vertica.com/python/old/ and will be discontinued soon. We strongly recommend upgrading to the new major release before August 2024.

⚠️ The following README is for VerticaPy 1.0.x and onwards, and so some of the elements may not be present in the previous versions.

📜 Some basic syntax can be found in the cheat sheet.

📰 Check out the latest newsletter here.

VerticaPy

VerticaPy is a Python library with scikit-like functionality used to conduct data science projects on data stored in Vertica, taking advantage of Vertica’s speed and built-in analytics and machine learning features. VerticaPy offers robust support for the entire data science life cycle, uses a 'pipeline' mechanism to sequentialize data transformation operations, and offers beautiful graphical options.

Introduction

Vertica was the first real analytic columnar database and is still the fastest in the market. However, SQL alone isn't flexible enough to meet the needs of data scientists.

Python has quickly become the most popular tool in this domain, owing much of its flexibility to its high-level of abstraction and impressively large and ever-growing set of libraries. Its accessibility has led to the development of popular and perfomant APIs, like pandas and scikit-learn, and a dedicated community of data scientists. Unfortunately, Python only works in-memory as a single-node process. This problem has led to the rise of distributed programming languages, but they too, are limited as in-memory processes and, as such, will never be able to process all of your data in this era, and moving data for processing is prohobitively expensive. On top of all of this, data scientists must also find convenient ways to deploy their data and models. The whole process is time consuming.

VerticaPy aims to solve all of these problems. The idea is simple: instead of moving data around for processing, VerticaPy brings the logic to the data.

3+ years in the making, we're proud to bring you VerticaPy.

Main Advantages:

Easy Data Exploration.
Fast Data Preparation.
In-Database Machine Learning.
Easy Model Evaluation.
Easy Model Deployment.
Flexibility of using either Python or SQL.

Name		Name	Last commit message	Last commit date
Latest commit History 1,265 Commits
.github		.github
assets		assets
docs		docs
examples		examples
verticapy		verticapy
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements-testing.txt		requirements-testing.txt
setup.py		setup.py
tox.ini		tox.ini

License

afard/VerticaPy

Folders and files

Latest commit

History

Repository files navigation

VerticaPy

Table of Contents

Introduction

Installation

Connecting to the Database

Documentation

Use-cases

Highlighted Features

Themes

SQL Magic

Example

SQL Plots

Example

Multiple Database Connection using DBLINK

Example

Python and SQL Combo

Example

Charts

Complete Machine Learning Pipeline

Loading Predefined Datasets

Quickstart

Help and Support

Contributing

Communication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages