A Python package for quantitative finance analysis using Polars, providing blazingly fast tools for data summarization and option pricing.
pip3 install git+https://github.com/matthewgson/quantpolars.gitRequirements: Python 3.8+, Polars
Generate comprehensive summary statistics for all columns in your DataFrame with a single function call. Returns a Polars DataFrame with summary statistics that can be optionally converted to styled GT tables.
- Blazingly Fast: Single-pass computation using Polars expressions
- Type-Aware: Different statistics based on data type (numeric, date, categorical)
- Missing Data: Includes percentage of missing values for each column
- Simple API: Returns DataFrame directly, convert to GT styling when needed
- Styled Output: Optional Great Tables formatting for beautiful HTML tables
- LazyFrame Support: Works with both eager and lazy evaluation
import polars as pl
from datetime import date
from quantpolars import sm
# Create sample data
df = pl.DataFrame({
'revenue': [1000, 2500, 1800, 3200, 2900, None, 2100, 1750],
'profit_margin': [0.15, 0.22, 0.18, 0.25, 0.20, 0.17, 0.19, 0.16],
'transaction_date': [
date(2024, 1, 15), date(2024, 2, 20), date(2024, 3, 10),
date(2024, 4, 5), date(2024, 5, 12), date(2024, 6, 8),
date(2024, 7, 22), None
],
'customer_segment': ['Enterprise', 'SMB', 'Enterprise', 'SMB', 'Enterprise', 'SMB', 'Enterprise', 'SMB'],
'active': [True, True, False, True, False, True, True, False]
})
print("Sample Data:")
df# Generate summary statistics
summary = sm(df)
print("Summary Statistics with % Missing:")
summary # This is now a Polars DataFrame directlyOutput:
shape: (5, 16)
┌──────────────────┬─────────────┬──────┬─────────────┬───┬────────┬────────┬────────┬──────────┐
│ variable ┆ type ┆ nobs ┆ pct_missing ┆ … ┆ p75 ┆ p95 ┆ p99 ┆ n_unique │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ i64 │
╞══════════════════╪═════════════╪══════╪═════════════╪═══╪════════╪════════╪════════╪══════════╡
│ transaction_date ┆ date ┆ 7 ┆ 12.5 ┆ … ┆ null ┆ null ┆ null ┆ 7 │
│ customer_segment ┆ categorical ┆ 8 ┆ 0.0 ┆ … ┆ null ┆ null ┆ null ┆ 2 │
│ active ┆ categorical ┆ 8 ┆ 0.0 ┆ … ┆ null ┆ null ┆ null ┆ 2 │
│ revenue ┆ numeric ┆ 7 ┆ 12.5 ┆ … ┆ 2900.0 ┆ 3200.0 ┆ 3200.0 ┆ 7 │
│ profit_margin ┆ numeric ┆ 8 ┆ 0.0 ┆ … ┆ 0.2 ┆ 0.25 ┆ 0.25 ┆ 8 │
└──────────────────┴─────────────┴──────┴─────────────┴───┴────────┴────────┴────────┴──────────┘
| Column | Description |
|---|---|
variable |
Column name |
type |
Data type category (numeric, date, categorical) |
nobs |
Number of non-null observations |
pct_missing |
Percentage of missing values |
mean |
Mean value (numeric columns only) |
sd |
Standard deviation (numeric columns only) |
min |
Minimum value (numeric and date columns only) |
max |
Maximum value (numeric and date columns only) |
p1-p99 |
Percentiles (numeric columns only) |
n_unique |
Number of unique values |
For beautiful formatted tables with proper date formatting:
from quantpolars import to_gt
# Requires: pip3 install great-tables
styled_summary = to_gt(summary) # Convert DataFrame to styled GT table
styled_summary # In Jupyter, displays as formatted HTML tableRendered Output Example:
The .to_gt() method returns a Great Tables (GT) object that renders as a beautifully formatted HTML table in Jupyter notebooks with:
- Table Header: "Data Summary Statistics" with subtitle showing variable count
- Formatted Numbers: Statistics rounded to 2 decimal places
- Percentage Formatting: Missing values shown as percentages (e.g., "12.5%")
- Date Formatting: Min/max dates formatted as MM/DD/YYYY (e.g., "1/1/2023")
- Professional Styling: Clean borders, alternating row colors, proper alignment
- Column Labels: User-friendly names ("Std Dev" instead of "sd", "N Obs" instead of "nobs")
Example of what the styled table displays:
| Variable | Type | N Obs | % Missing | Mean | Std Dev | Min | Max | 1% | 5% | 25% | 50% | 75% | 95% | 99% | N Unique |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| transaction_date | date | 7 | 12.5% | — | — | Jan 15, 2024 | Jul 22, 2024 | — | — | — | — | — | — | — | 7 |
| customer_segment | categorical | 8 | 0.0% | — | — | — | — | — | — | — | — | — | — | — | 2 |
| active | categorical | 8 | 0.0% | — | — | — | — | — | — | — | — | — | — | — | 2 |
| revenue | numeric | 7 | 12.5% | 2,225.00 | 716.02 | 1,000.00 | 3,200.00 | 1,000.00 | 1,000.00 | 1,800.00 | 2,100.00 | 2,900.00 | 3,200.00 | 3,200.00 | 7 |
| profit_margin | numeric | 8 | 0.0% | 0.19 | 0.03 | 0.15 | 0.25 | 0.15 | 0.15 | 0.17 | 0.19 | 0.22 | 0.25 | 0.25 | 8 |
- Numeric: Full statistics including percentiles
- Date: Min/max dates only (percentiles not supported by Polars)
- Categorical: Unique counts only
import polars as pl
import quantpolars as qp
# Batch price 1M options
df = pl.scan_csv("options_data.csv") # Out-of-core
df = df.with_columns(
price = qp.black_scholes(df, 'S', 'K', 'T', 'r', 'sigma', 'call')['price']
)- Data Summary Tools: Out-of-core data summarization for big data
- Option Pricing: Black-Scholes, Cox-Ross-Rubinstein (CRR), Barone-Adesi-Whaley (BAW) models
- Implied Volatility: Calculation of implied volatility
- Greeks: Delta, Gamma, Theta, Vega, Rho calculators
- Vectorized DataFrame API: Functions operate on Polars DataFrames for batch processing of multiple options
- Fast Norm CDF Approximation: Implemented Abramowitz & Stegun approximation using Polars expressions
- Lazy Evaluation: All operations are lazy, enabling out-of-core processing for big data
The functions now work on Polars DataFrames, allowing for:
- Batch Processing: Price thousands of options in a single operation
- Big Data Ready: Handles datasets larger than memory with Polars' streaming
- Extreme Speed: Vectorized operations on columnar data
- No Loops: All vectorized in Polars/Rust
- Memory Efficient: Columnar storage and lazy evaluation
- Scalable: Handles billions of rows with minimal memory
- Parallel: Automatic parallelization where possible