QuantPolars

A Python package for quantitative finance analysis using Polars, providing blazingly fast tools for data summarization and option pricing.

Installation

pip3 install git+https://github.com/matthewgson/quantpolars.git

Requirements: Python 3.8+, Polars

Data Summary Function (`sm`)

Generate comprehensive summary statistics for all columns in your DataFrame with a single function call. Returns a Polars DataFrame with summary statistics that can be optionally converted to styled GT tables.

Features

Blazingly Fast: Single-pass computation using Polars expressions
Type-Aware: Different statistics based on data type (numeric, date, categorical)
Missing Data: Includes percentage of missing values for each column
Simple API: Returns DataFrame directly, convert to GT styling when needed
Styled Output: Optional Great Tables formatting for beautiful HTML tables
LazyFrame Support: Works with both eager and lazy evaluation

Basic Usage

import polars as pl
from datetime import date
from quantpolars import sm

# Create sample data
df = pl.DataFrame({
    'revenue': [1000, 2500, 1800, 3200, 2900, None, 2100, 1750],
    'profit_margin': [0.15, 0.22, 0.18, 0.25, 0.20, 0.17, 0.19, 0.16],
    'transaction_date': [
        date(2024, 1, 15), date(2024, 2, 20), date(2024, 3, 10),
        date(2024, 4, 5), date(2024, 5, 12), date(2024, 6, 8),
        date(2024, 7, 22), None
    ],
    'customer_segment': ['Enterprise', 'SMB', 'Enterprise', 'SMB', 'Enterprise', 'SMB', 'Enterprise', 'SMB'],
    'active': [True, True, False, True, False, True, True, False]
})

print("Sample Data:")
df

# Generate summary statistics
summary = sm(df)
print("Summary Statistics with % Missing:")
summary  # This is now a Polars DataFrame directly

Output:

shape: (5, 16)
┌──────────────────┬─────────────┬──────┬─────────────┬───┬────────┬────────┬────────┬──────────┐
│ variable         ┆ type        ┆ nobs ┆ pct_missing ┆ … ┆ p75    ┆ p95    ┆ p99    ┆ n_unique │
│ ---              ┆ ---         ┆ ---  ┆ ---         ┆   ┆ ---    ┆ ---    ┆ ---    ┆ ---      │
│ str              ┆ str         ┆ i64  ┆ f64         ┆   ┆ f64    ┆ f64    ┆ f64    ┆ i64      │
╞══════════════════╪═════════════╪══════╪═════════════╪═══╪════════╪════════╪════════╪══════════╡
│ transaction_date ┆ date        ┆ 7    ┆ 12.5        ┆ … ┆ null   ┆ null   ┆ null   ┆ 7        │
│ customer_segment ┆ categorical ┆ 8    ┆ 0.0         ┆ … ┆ null   ┆ null   ┆ null   ┆ 2        │
│ active           ┆ categorical ┆ 8    ┆ 0.0         ┆ … ┆ null   ┆ null   ┆ null   ┆ 2        │
│ revenue          ┆ numeric     ┆ 7    ┆ 12.5        ┆ … ┆ 2900.0 ┆ 3200.0 ┆ 3200.0 ┆ 7        │
│ profit_margin    ┆ numeric     ┆ 8    ┆ 0.0         ┆ … ┆ 0.2    ┆ 0.25   ┆ 0.25   ┆ 8        │
└──────────────────┴─────────────┴──────┴─────────────┴───┴────────┴────────┴────────┴──────────┘

Column Reference

Column	Description
`variable`	Column name
`type`	Data type category (`numeric`, `date`, `categorical`)
`nobs`	Number of non-null observations
`pct_missing`	Percentage of missing values
`mean`	Mean value (numeric columns only)
`sd`	Standard deviation (numeric columns only)
`min`	Minimum value (numeric and date columns only)
`max`	Maximum value (numeric and date columns only)
`p1-p99`	Percentiles (numeric columns only)
`n_unique`	Number of unique values

Styled Output

For beautiful formatted tables with proper date formatting:

from quantpolars import to_gt

# Requires: pip3 install great-tables
styled_summary = to_gt(summary)  # Convert DataFrame to styled GT table
styled_summary  # In Jupyter, displays as formatted HTML table

Rendered Output Example: The .to_gt() method returns a Great Tables (GT) object that renders as a beautifully formatted HTML table in Jupyter notebooks with:

Table Header: "Data Summary Statistics" with subtitle showing variable count
Formatted Numbers: Statistics rounded to 2 decimal places
Percentage Formatting: Missing values shown as percentages (e.g., "12.5%")
Date Formatting: Min/max dates formatted as MM/DD/YYYY (e.g., "1/1/2023")
Professional Styling: Clean borders, alternating row colors, proper alignment
Column Labels: User-friendly names ("Std Dev" instead of "sd", "N Obs" instead of "nobs")

Example of what the styled table displays:

Variable	Type	N Obs	% Missing	Mean	Std Dev	Min	Max	1%	5%	25%	50%	75%	95%	99%	N Unique
transaction_date	date	7	12.5%	—	—	Jan 15, 2024	Jul 22, 2024	—	—	—	—	—	—	—	7
customer_segment	categorical	8	0.0%	—	—	—	—	—	—	—	—	—	—	—	2
active	categorical	8	0.0%	—	—	—	—	—	—	—	—	—	—	—	2
revenue	numeric	7	12.5%	2,225.00	716.02	1,000.00	3,200.00	1,000.00	1,000.00	1,800.00	2,100.00	2,900.00	3,200.00	3,200.00	7
profit_margin	numeric	8	0.0%	0.19	0.03	0.15	0.25	0.15	0.15	0.17	0.19	0.22	0.25	0.25	8

Data Type Handling

Numeric: Full statistics including percentiles
Date: Min/max dates only (percentiles not supported by Polars)
Categorical: Unique counts only

Out-of-Core Example

import polars as pl
import quantpolars as qp

# Batch price 1M options
df = pl.scan_csv("options_data.csv")  # Out-of-core
df = df.with_columns(
    price = qp.black_scholes(df, 'S', 'K', 'T', 'r', 'sigma', 'call')['price']
)

Features

Data Summary Tools: Out-of-core data summarization for big data
Option Pricing: Black-Scholes, Cox-Ross-Rubinstein (CRR), Barone-Adesi-Whaley (BAW) models
Implied Volatility: Calculation of implied volatility
Greeks: Delta, Gamma, Theta, Vega, Rho calculators

Key Optimizations

Vectorized DataFrame API: Functions operate on Polars DataFrames for batch processing of multiple options
Fast Norm CDF Approximation: Implemented Abramowitz & Stegun approximation using Polars expressions
Lazy Evaluation: All operations are lazy, enabling out-of-core processing for big data

Updated API

The functions now work on Polars DataFrames, allowing for:

Batch Processing: Price thousands of options in a single operation
Big Data Ready: Handles datasets larger than memory with Polars' streaming
Extreme Speed: Vectorized operations on columnar data

Performance Benefits

No Loops: All vectorized in Polars/Rust
Memory Efficient: Columnar storage and lazy evaluation
Scalable: Handles billions of rows with minimal memory
Parallel: Automatic parallelization where possible

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README_files/libs		README_files/libs
src/quantpolars		src/quantpolars
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.html		README.html
README.md		README.md
REFACTORING_SUMMARY.md		REFACTORING_SUMMARY.md
TTEST_DOCUMENTATION.md		TTEST_DOCUMENTATION.md
demo_ttest.py		demo_ttest.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QuantPolars

Installation

Data Summary Function (`sm`)

Features

Basic Usage

Column Reference

Styled Output

Data Type Handling

Out-of-Core Example

Features

Key Optimizations

Updated API

Performance Benefits

About

Uh oh!

Releases

Packages

Languages

License

matthewgson/quantpolars

Folders and files

Latest commit

History

Repository files navigation

QuantPolars

Installation

Data Summary Function (sm)

Features

Basic Usage

Column Reference

Styled Output

Data Type Handling

Out-of-Core Example

Features

Key Optimizations

Updated API

Performance Benefits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Data Summary Function (`sm`)

Packages