Skip to content

60 smart meter analysis pricing simulation#61

Open
griffinsharps wants to merge 137 commits intomainfrom
60-smart-meter-analysis-pricing-simulation
Open

60 smart meter analysis pricing simulation#61
griffinsharps wants to merge 137 commits intomainfrom
60-smart-meter-analysis-pricing-simulation

Conversation

@griffinsharps
Copy link
Contributor

No description provided.

Griffin Sharps and others added 23 commits January 26, 2026 23:22
Co-authored-by: Cursor <cursoragent@cursor.com>
…onth validator

Enhance validate_month_output.py with three preflight checks needed before
scaling to full-month execution:
- Duplicate (zip_code, account_identifier, datetime) detection per batch file
- Row count reporting (total + per-file) in validation report JSON
- Run artifact integrity via --run-dir flag (plan.json, run_summary.json,
  manifests, batch summaries)

Add PREFLIGHT_200.md checklist for 200-file EC2 validation run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s_sorted()

Polars 1.38 removed is_sorted() from Expr. Collect the composite key first,
then check sortedness on the resulting Series which retains the method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restores Justfile from main and adds migrate-month recipe.
Usage: just migrate-month 202307
- batch-size 100, workers 6, lazy_sink, --resume
- Reads ~/s3_paths_<YYYYMM>_full.txt, writes to /ebs/.../out_<YYYYMM>_production
- Uses bare python (no uv) for EC2 compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
.gitignore: block *.txt, .tmp/, archive_quarantine/, tmp_polars_run_*/,
subagent_packages/ from being tracked.

pre-commit: add detect-private-key hook and a local forbid-secrets hook
that blocks .env, .secrets, credentials.json, .pem, .key, .p12, .pfx,
.jks files from being committed (even via git add -f).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hash-based duplicate detection (n_unique ~400MB/file) with
adjacent-key streaming that leverages the required global sort order.
Sortedness and uniqueness now share a single PyArrow iter_batches
pass in full mode.

Key changes:
- _streaming_sort_and_dup_check: combined sort+dup via PyArrow
  batch iteration, O(batch_size) memory, cross-file boundary state
- Per-file datetime stats with merge (_DtStats dataclass)
- Per-file DST stats with merge (_DstFileStats dataclass)
- Enhanced sample mode: strict-increasing check (catches dups in windows)
- Row counts from parquet metadata (O(1), no data scan)
- Phase-based main() architecture (discovery -> metadata -> streaming
  -> datetime -> DST -> artifacts -> report)
- _fail() typed as NoReturn for mypy narrowing
- Add pyarrow mypy override in pyproject.toml

Removed dead functions: _check_sorted_full, _validate_no_duplicates_file,
_validate_datetime_invariants_partition, _validate_dst_option_b_partition,
_keys_is_sorted_df

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes in validate_month_output.py:

1. _slice_keys: use lf.collect() instead of streaming engine for slice
   reads — streaming may reorder rows, defeating sortedness validation.
   Slices are small (5K rows x 3 cols) so default engine is correct and fast.

2. _check_sorted_sample: track prev_end and only perform cross-slice
   boundary comparison when off >= prev_end (non-overlapping). Random
   windows can overlap head/tail/each other, making boundary checks
   invalid under overlap. Within-slice strict-monotonic checks still
   run unconditionally.

Also updates remaining collect(streaming=True) calls to
collect(engine="streaming") to fix Polars deprecation warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restores Justfile from main and adds migrate-month YEAR_MONTH recipe:
- Guards against non-EC2 environments (checks /ebs mount)
- Auto-generates S3 input list via aws s3 ls + awk + sort
- Validates non-empty input list before running
- Runs migrate_month_runner.py with standard production params
  (batch-size 100, workers 6, --resume, lazy_sink)

Usage: just migrate-month 202307

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Annotate migrate_month_runner.py, validate_month_output.py, and Justfile
with industry-standard "why" comments for senior code review. Additions
include module-level architecture docstrings, function-level design
rationale, and parameter tuning explanations. No logic changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor migrate-month to use configurable variables (S3_PREFIX,
MIGRATE_OUT_BASE, etc.) instead of hardcoded bucket names and
usernames, preparing the repo for open-source. Add six recipes:
months-from-s3, migrate-months, validate-month, validate-months,
and migration-status. Multi-month recipes support fail-fast (default)
or continue-on-error mode with per-invocation UTC-timestamped logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hard-coded RTP/flat spread logic with dual tariff inputs
(--tariff-prices-a, --tariff-prices-b) that each use the standard
price_cents_per_kwh schema from build_tariff_hourly_prices.py.
Adds fail-loud join guards (null check + row-count check per tariff).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generates one row per local hour in America/Chicago for a given year,
mapping each hour to its TOU season and period with the associated
price. Handles DST by keeping the first UTC occurrence of fall-back
duplicates. Validates full hour coverage per season.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Validates uniqueness and join coverage across the full chain:
interval data -> hourly loads -> tariff calendars -> household bills.
Synthetic tests exercise spring-forward, fall-back, and normal months;
sample-data tests validate the real 202308 artefacts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
build_regression_dataset.py: maps household bills to Census block groups
via ZIP+4 crosswalk (deterministic 1:1), aggregates to BG-level outcomes,
and fits two OLS regressions (savings + bill diff ~ demographics). Supports
auto/core/explicit predictor modes and graceful outcome column fallback.

run_billing_pipeline.py: multi-month orchestrator that chains hourly loads,
tariff billing, annual aggregation, and regression via subprocess. Supports
--months/--months-file with {yyyymm} path patterns and writes per-month
outputs plus annual_household_aggregate.parquet with a full run manifest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Annotate 12 files with inline comments explaining design decisions,
trade-offs, and non-obvious rationale for a senior reviewer. No logic
changes—comments only (+121/-12 lines).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@griffinsharps griffinsharps linked an issue Feb 11, 2026 that may be closed by this pull request
Griffin Sharps and others added 6 commits February 16, 2026 22:26
Replace all references to annual_household_aggregate.parquet with
all_months_household_bills.parquet. Expand the test suite from 25 tests
to 93 tests covering:

- All-months bills: schema, YYYYMM month format, no nulls, set equality
  of months, additive totals matching per-month outputs
- Regression artifacts: existence of all 7 files (bg_month_outcomes,
  bg_annual_outcomes, bg_season_outcomes, regression_dataset_bg,
  regression_results.json, regression_summary.txt, regression_metadata.json)
- Schema assertions for all BG outcome parquets
- Mathematical invariants: pct_savings_weighted definition, annual rollup
  equals sum-of-months, season values and mapping, null handling
- Crosswalk coverage: n_zip4, n_bg, n_zip4_multi_bg, pct_dropped
- Regression results JSON: model keys, r_squared, coefficients with const
- Both regression modes (annual, bg_month) with schema consistency checks
- Skip-regression mode: no artifacts, manifest flags, bills still produced
- Manifest: all_months_bills_rows, steps_completed, regression_level

Also adds --regression-level pass-through to the orchestrator CLI so
both annual and bg_month modes can be tested end-to-end.

Test data is augmented with synthetic households at diverse ZIP+4 values
to ensure >= 6 census block groups for OLS regression coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…alidation

Co-authored-by: Cursor <cursoragent@cursor.com>
Griffin Sharps and others added 15 commits March 12, 2026 16:22
Replace the commented-out setup block with a working loader that
unpacks cache/report_variables.pkl into a SimpleNamespace (v), and
add dollar/pct helper functions. Prefix all 48 inline {python}
variable references with v. so they resolve from the namespace.
Replaces the pricing simulation pipeline notebook with a focused
equity analysis notebook that starts from pre-computed billing
outputs. Computes all report variables (quintile breakdowns,
regression results, rate constants) and exports them to
cache/report_variables.pkl for index.qmd.

- Fixes DTOU description ('Delivery' not 'Dynamic')
- Computes Q1-Q5 gaps dynamically instead of hardcoding
- Updates fig-rate-structures with accurate C23 all-in rates
- Adds spot-check assertions against known pipeline outputs
Add plotnine, compute_delivery_deltas, gspread, dotenv, bs4, and IPython
to the appropriate deptry ignore lists. plotnine is a notebook-only
dependency; the others are local scripts or transitive imports in lib/.
Add plotnine, compute_delivery_deltas, gspread, dotenv, bs4, and IPython
to the appropriate deptry ignore lists. plotnine is a notebook-only
dependency; the others are local scripts or transitive imports in lib/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update pct_save computations to use the actual parquet column names:
STOU → total_delta_dollars, DTOU → dtou_total_delta_dollars.

Fix regression extraction to filter on "rate" and "dep_var" (not
"rate_type"/"outcome"); hardcode stou_jan_mean_pct = 25.66 since
there is no mean_pct column in regression_summary.csv.

Update callout notes and prose to reflect the real schema.
@griffinsharps griffinsharps force-pushed the 60-smart-meter-analysis-pricing-simulation branch 2 times, most recently from 49ae9d9 to 473b47a Compare March 24, 2026 17:02
Griffin Sharps and others added 8 commits March 24, 2026 17:07
…ort targets

- Add _quarto.yml: manuscript project type, Switchbox theme, SVG figures
- Add references.bib with ICC Final Order and Order on Rehearing entries
- Add render/draft/clean targets to Justfile
- Fix bibliography path in index.qmd (../references.bib → references.bib)
- Fix typo in index.qmd: "chargesa" → "charges a"
@griffinsharps griffinsharps force-pushed the 60-smart-meter-analysis-pricing-simulation branch from 5924afc to d2ad44b Compare March 24, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[smart-meter-analysis] Pricing simulation

1 participant