60 smart meter analysis pricing simulation by griffinsharps · Pull Request #61 · switchbox-data/smart-meter-analysis

griffinsharps · 2026-02-11T21:24:21Z

No description provided.

…dation.

Co-authored-by: Cursor <cursoragent@cursor.com>

…onth validator Enhance validate_month_output.py with three preflight checks needed before scaling to full-month execution: - Duplicate (zip_code, account_identifier, datetime) detection per batch file - Row count reporting (total + per-file) in validation report JSON - Run artifact integrity via --run-dir flag (plan.json, run_summary.json, manifests, batch summaries) Add PREFLIGHT_200.md checklist for 200-file EC2 validation run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…s_sorted() Polars 1.38 removed is_sorted() from Expr. Collect the composite key first, then check sortedness on the resulting Series which retains the method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restores Justfile from main and adds migrate-month recipe. Usage: just migrate-month 202307 - batch-size 100, workers 6, lazy_sink, --resume - Reads ~/s3_paths_<YYYYMM>_full.txt, writes to /ebs/.../out_<YYYYMM>_production - Uses bare python (no uv) for EC2 compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

.gitignore: block *.txt, .tmp/, archive_quarantine/, tmp_polars_run_*/, subagent_packages/ from being tracked. pre-commit: add detect-private-key hook and a local forbid-secrets hook that blocks .env, .secrets, credentials.json, .pem, .key, .p12, .pfx, .jks files from being committed (even via git add -f). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hash-based duplicate detection (n_unique ~400MB/file) with adjacent-key streaming that leverages the required global sort order. Sortedness and uniqueness now share a single PyArrow iter_batches pass in full mode. Key changes: - _streaming_sort_and_dup_check: combined sort+dup via PyArrow batch iteration, O(batch_size) memory, cross-file boundary state - Per-file datetime stats with merge (_DtStats dataclass) - Per-file DST stats with merge (_DstFileStats dataclass) - Enhanced sample mode: strict-increasing check (catches dups in windows) - Row counts from parquet metadata (O(1), no data scan) - Phase-based main() architecture (discovery -> metadata -> streaming -> datetime -> DST -> artifacts -> report) - _fail() typed as NoReturn for mypy narrowing - Add pyarrow mypy override in pyproject.toml Removed dead functions: _check_sorted_full, _validate_no_duplicates_file, _validate_datetime_invariants_partition, _validate_dst_option_b_partition, _keys_is_sorted_df Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two fixes in validate_month_output.py: 1. _slice_keys: use lf.collect() instead of streaming engine for slice reads — streaming may reorder rows, defeating sortedness validation. Slices are small (5K rows x 3 cols) so default engine is correct and fast. 2. _check_sorted_sample: track prev_end and only perform cross-slice boundary comparison when off >= prev_end (non-overlapping). Random windows can overlap head/tail/each other, making boundary checks invalid under overlap. Within-slice strict-monotonic checks still run unconditionally. Also updates remaining collect(streaming=True) calls to collect(engine="streaming") to fix Polars deprecation warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restores Justfile from main and adds migrate-month YEAR_MONTH recipe: - Guards against non-EC2 environments (checks /ebs mount) - Auto-generates S3 input list via aws s3 ls + awk + sort - Validates non-empty input list before running - Runs migrate_month_runner.py with standard production params (batch-size 100, workers 6, --resume, lazy_sink) Usage: just migrate-month 202307 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Annotate migrate_month_runner.py, validate_month_output.py, and Justfile with industry-standard "why" comments for senior code review. Additions include module-level architecture docstrings, function-level design rationale, and parameter tuning explanations. No logic changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r-data-from-csv-to-parquet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Refactor migrate-month to use configurable variables (S3_PREFIX, MIGRATE_OUT_BASE, etc.) instead of hardcoded bucket names and usernames, preparing the repo for open-source. Add six recipes: months-from-s3, migrate-months, validate-month, validate-months, and migration-status. Multi-month recipes support fail-fast (default) or continue-on-error mode with per-invocation UTC-timestamped logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hard-coded RTP/flat spread logic with dual tariff inputs (--tariff-prices-a, --tariff-prices-b) that each use the standard price_cents_per_kwh schema from build_tariff_hourly_prices.py. Adds fail-loud join guards (null check + row-count check per tariff). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Generates one row per local hour in America/Chicago for a given year, mapping each hour to its TOU season and period with the associated price. Handles DST by keeping the first UTC occurrence of fall-back duplicates. Validates full hour coverage per season. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Validates uniqueness and join coverage across the full chain: interval data -> hourly loads -> tariff calendars -> household bills. Synthetic tests exercise spring-forward, fall-back, and normal months; sample-data tests validate the real 202308 artefacts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

build_regression_dataset.py: maps household bills to Census block groups via ZIP+4 crosswalk (deterministic 1:1), aggregates to BG-level outcomes, and fits two OLS regressions (savings + bill diff ~ demographics). Supports auto/core/explicit predictor modes and graceful outcome column fallback. run_billing_pipeline.py: multi-month orchestrator that chains hourly loads, tariff billing, annual aggregation, and regression via subprocess. Supports --months/--months-file with {yyyymm} path patterns and writes per-month outputs plus annual_household_aggregate.parquet with a full run manifest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ng tests/test_*.py

Annotate 12 files with inline comments explaining design decisions, trade-offs, and non-obvious rationale for a senior reviewer. No logic changes—comments only (+121/-12 lines). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace all references to annual_household_aggregate.parquet with all_months_household_bills.parquet. Expand the test suite from 25 tests to 93 tests covering: - All-months bills: schema, YYYYMM month format, no nulls, set equality of months, additive totals matching per-month outputs - Regression artifacts: existence of all 7 files (bg_month_outcomes, bg_annual_outcomes, bg_season_outcomes, regression_dataset_bg, regression_results.json, regression_summary.txt, regression_metadata.json) - Schema assertions for all BG outcome parquets - Mathematical invariants: pct_savings_weighted definition, annual rollup equals sum-of-months, season values and mapping, null handling - Crosswalk coverage: n_zip4, n_bg, n_zip4_multi_bg, pct_dropped - Regression results JSON: model keys, r_squared, coefficients with const - Both regression modes (annual, bg_month) with schema consistency checks - Skip-regression mode: no artifacts, manifest flags, bills still produced - Manifest: all_months_bills_rows, steps_completed, regression_level Also adds --regression-level pass-through to the orchestrator CLI so both annual and bg_month modes can be tested end-to-end. Test data is augmented with synthetic households at diverse ZIP+4 values to ensure >= 6 census block groups for OLS regression coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…alidation Co-authored-by: Cursor <cursoragent@cursor.com>

… S3 prefixes

…t atomic swap

Replace the commented-out setup block with a working loader that unpacks cache/report_variables.pkl into a SimpleNamespace (v), and add dollar/pct helper functions. Prefix all 48 inline {python} variable references with v. so they resolve from the namespace.

…plotnine

Replaces the pricing simulation pipeline notebook with a focused equity analysis notebook that starts from pre-computed billing outputs. Computes all report variables (quintile breakdowns, regression results, rate constants) and exports them to cache/report_variables.pkl for index.qmd. - Fixes DTOU description ('Delivery' not 'Dynamic') - Computes Q1-Q5 gaps dynamically instead of hardcoding - Updates fig-rate-structures with accurate C23 all-in rates - Adds spot-check assertions against known pipeline outputs

…analysis.qmd

Add plotnine, compute_delivery_deltas, gspread, dotenv, bs4, and IPython to the appropriate deptry ignore lists. plotnine is a notebook-only dependency; the others are local scripts or transitive imports in lib/.

Add plotnine, compute_delivery_deltas, gspread, dotenv, bs4, and IPython to the appropriate deptry ignore lists. plotnine is a notebook-only dependency; the others are local scripts or transitive imports in lib/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update pct_save computations to use the actual parquet column names: STOU → total_delta_dollars, DTOU → dtou_total_delta_dollars. Fix regression extraction to filter on "rate" and "dep_var" (not "rate_type"/"outcome"); hardcode stou_jan_mean_pct = 25.66 since there is no mean_pct column in regression_summary.csv. Update callout notes and prose to reflect the real schema.

…ort targets - Add _quarto.yml: manuscript project type, Switchbox theme, SVG figures - Add references.bib with ICC Final Order and Order on Rehearing entries - Add render/draft/clean targets to Justfile - Fix bibliography path in index.qmd (../references.bib → references.bib) - Fix typo in index.qmd: "chargesa" → "charges a"

…github.com/switchbox-data/smart-meter-analysis into 60-smart-meter-analysis-pricing-simulation

Griffin Sharps and others added 23 commits January 26, 2026 23:22

WIP Wide-to-long transformation code and accompanying runner and vali…

971096f

…dation.

WIP on transform CSV scripts

296cb57

Co-authored-by: Cursor <cursoragent@cursor.com>

Fix sortedness bug: sink_parquet ignores sort, use collect+write_parquet

49ae5b2

Merge remote-tracking branch 'origin/main' into 43-convert-coned-mete…

012e9cf

…r-data-from-csv-to-parquet

Fix trailing whitespace in README files from main merge

c5d7157

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add rate_structures directory for tariff definitions

31d37d3

Added STOU YAML file to begin simulation testing

a91613c

Add builder for hourly flat-rate price calendar

fe68a1a

Add pricing simulation E2E tests and reviewer test plan; allow tracki…

64eb66b

…ng tests/test_*.py

griffinsharps linked an issue Feb 11, 2026 that may be closed by this pull request

[smart-meter-analysis] Pricing simulation #60

Open

Griffin Sharps and others added 6 commits February 16, 2026 22:26

Add deterministic month-level compaction stage with atomic swap and v…

7285174

…alidation Co-authored-by: Cursor <cursoragent@cursor.com>

Filter migrate-month inputs by filename month to tolerate mixed-month…

4e0f3e4

… S3 prefixes

Run migrate_month_runner via uv-managed environment

35ff63e

Add --compact-no-swap mode; perform full compaction+validation withou…

286c4f7

…t atomic swap

update .gitignore to explicitly avoid committing test parquets

8f18d52

Griffin Sharps and others added 15 commits March 12, 2026 16:22

Add cost_change field to GeoJSON for Felt-friendly coloring

47831ea

Add income quintile analysis to statewide pipeline

ba5cf72

Draft 1 of memo in index.qmd

a7b381e

Updated index.qmd to full draft

3f9a8d8

devcontainer: switch from features to Dockerfile, raise fd limit

1d6ee79

devcontainer: fix stale yarn repo blocking chromium install

5466d76

devcontainer: set ulimit via runArgs

2757c42

devcontainer: create /workspaces dir

8c4877f

fix: devcontainer startup — move venv to /opt/venv, add WORKDIR, add …

7d63cea

…plotnine

updated memo draft with correct python format to plug into notebooks/…

d5a8c9f

…analysis.qmd

Fix deptry dependency issues in CI

a24c5df

Add plotnine, compute_delivery_deltas, gspread, dotenv, bs4, and IPython to the appropriate deptry ignore lists. plotnine is a notebook-only dependency; the others are local scripts or transitive imports in lib/.

griffinsharps force-pushed the 60-smart-meter-analysis-pricing-simulation branch 2 times, most recently from 49ae9d9 to 473b47a Compare March 24, 2026 17:02

Griffin Sharps and others added 8 commits March 24, 2026 17:07

Fix month filter comparisons to use integers in assign-regression-vars

327b1f7

Add Felt map iframes for DTOU and Rate BEST bill change maps

be1f1dd

add report_variables.pkl to cache, ignore .quarto build dir

2e7baca

WIP: proposed edits to report

d4268e8

Merge branch '60-smart-meter-analysis-pricing-simulation' of https://…

997a7b6

…github.com/switchbox-data/smart-meter-analysis into 60-smart-meter-analysis-pricing-simulation

Add Switchbox shared lib (plotnine theme, quarto helpers) from reports2

2105d4e

Add build dependencies and uncommitted project files

d2ad44b

griffinsharps force-pushed the 60-smart-meter-analysis-pricing-simulation branch from 5924afc to d2ad44b Compare March 24, 2026 18:38

Max Shron (Switchbox) and others added 4 commits March 24, 2026 11:43

Merge branch '60-smart-meter-analysis-pricing-simulation' of https://…

ff89d5d

…github.com/switchbox-data/smart-meter-analysis into 60-smart-meter-analysis-pricing-simulation

Merge branch '60-smart-meter-analysis-pricing-simulation' of https://…

7cce8f9

…github.com/switchbox-data/smart-meter-analysis into 60-smart-meter-analysis-pricing-simulation

Memo draft 2 for review/comment

fa9cae5

Proposed edits to final draft

d04eddf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

60 smart meter analysis pricing simulation#61

60 smart meter analysis pricing simulation#61
griffinsharps wants to merge 137 commits intomainfrom
60-smart-meter-analysis-pricing-simulation

griffinsharps commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

griffinsharps commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant