60 smart meter analysis pricing simulation#61
Open
griffinsharps wants to merge 137 commits intomainfrom
Open
Conversation
Co-authored-by: Cursor <cursoragent@cursor.com>
…onth validator Enhance validate_month_output.py with three preflight checks needed before scaling to full-month execution: - Duplicate (zip_code, account_identifier, datetime) detection per batch file - Row count reporting (total + per-file) in validation report JSON - Run artifact integrity via --run-dir flag (plan.json, run_summary.json, manifests, batch summaries) Add PREFLIGHT_200.md checklist for 200-file EC2 validation run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s_sorted() Polars 1.38 removed is_sorted() from Expr. Collect the composite key first, then check sortedness on the resulting Series which retains the method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restores Justfile from main and adds migrate-month recipe. Usage: just migrate-month 202307 - batch-size 100, workers 6, lazy_sink, --resume - Reads ~/s3_paths_<YYYYMM>_full.txt, writes to /ebs/.../out_<YYYYMM>_production - Uses bare python (no uv) for EC2 compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
.gitignore: block *.txt, .tmp/, archive_quarantine/, tmp_polars_run_*/, subagent_packages/ from being tracked. pre-commit: add detect-private-key hook and a local forbid-secrets hook that blocks .env, .secrets, credentials.json, .pem, .key, .p12, .pfx, .jks files from being committed (even via git add -f). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hash-based duplicate detection (n_unique ~400MB/file) with adjacent-key streaming that leverages the required global sort order. Sortedness and uniqueness now share a single PyArrow iter_batches pass in full mode. Key changes: - _streaming_sort_and_dup_check: combined sort+dup via PyArrow batch iteration, O(batch_size) memory, cross-file boundary state - Per-file datetime stats with merge (_DtStats dataclass) - Per-file DST stats with merge (_DstFileStats dataclass) - Enhanced sample mode: strict-increasing check (catches dups in windows) - Row counts from parquet metadata (O(1), no data scan) - Phase-based main() architecture (discovery -> metadata -> streaming -> datetime -> DST -> artifacts -> report) - _fail() typed as NoReturn for mypy narrowing - Add pyarrow mypy override in pyproject.toml Removed dead functions: _check_sorted_full, _validate_no_duplicates_file, _validate_datetime_invariants_partition, _validate_dst_option_b_partition, _keys_is_sorted_df Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes in validate_month_output.py: 1. _slice_keys: use lf.collect() instead of streaming engine for slice reads — streaming may reorder rows, defeating sortedness validation. Slices are small (5K rows x 3 cols) so default engine is correct and fast. 2. _check_sorted_sample: track prev_end and only perform cross-slice boundary comparison when off >= prev_end (non-overlapping). Random windows can overlap head/tail/each other, making boundary checks invalid under overlap. Within-slice strict-monotonic checks still run unconditionally. Also updates remaining collect(streaming=True) calls to collect(engine="streaming") to fix Polars deprecation warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restores Justfile from main and adds migrate-month YEAR_MONTH recipe: - Guards against non-EC2 environments (checks /ebs mount) - Auto-generates S3 input list via aws s3 ls + awk + sort - Validates non-empty input list before running - Runs migrate_month_runner.py with standard production params (batch-size 100, workers 6, --resume, lazy_sink) Usage: just migrate-month 202307 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Annotate migrate_month_runner.py, validate_month_output.py, and Justfile with industry-standard "why" comments for senior code review. Additions include module-level architecture docstrings, function-level design rationale, and parameter tuning explanations. No logic changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r-data-from-csv-to-parquet
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor migrate-month to use configurable variables (S3_PREFIX, MIGRATE_OUT_BASE, etc.) instead of hardcoded bucket names and usernames, preparing the repo for open-source. Add six recipes: months-from-s3, migrate-months, validate-month, validate-months, and migration-status. Multi-month recipes support fail-fast (default) or continue-on-error mode with per-invocation UTC-timestamped logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hard-coded RTP/flat spread logic with dual tariff inputs (--tariff-prices-a, --tariff-prices-b) that each use the standard price_cents_per_kwh schema from build_tariff_hourly_prices.py. Adds fail-loud join guards (null check + row-count check per tariff). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Generates one row per local hour in America/Chicago for a given year, mapping each hour to its TOU season and period with the associated price. Handles DST by keeping the first UTC occurrence of fall-back duplicates. Validates full hour coverage per season. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Validates uniqueness and join coverage across the full chain: interval data -> hourly loads -> tariff calendars -> household bills. Synthetic tests exercise spring-forward, fall-back, and normal months; sample-data tests validate the real 202308 artefacts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
build_regression_dataset.py: maps household bills to Census block groups
via ZIP+4 crosswalk (deterministic 1:1), aggregates to BG-level outcomes,
and fits two OLS regressions (savings + bill diff ~ demographics). Supports
auto/core/explicit predictor modes and graceful outcome column fallback.
run_billing_pipeline.py: multi-month orchestrator that chains hourly loads,
tariff billing, annual aggregation, and regression via subprocess. Supports
--months/--months-file with {yyyymm} path patterns and writes per-month
outputs plus annual_household_aggregate.parquet with a full run manifest.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ng tests/test_*.py
Annotate 12 files with inline comments explaining design decisions, trade-offs, and non-obvious rationale for a senior reviewer. No logic changes—comments only (+121/-12 lines). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all references to annual_household_aggregate.parquet with all_months_household_bills.parquet. Expand the test suite from 25 tests to 93 tests covering: - All-months bills: schema, YYYYMM month format, no nulls, set equality of months, additive totals matching per-month outputs - Regression artifacts: existence of all 7 files (bg_month_outcomes, bg_annual_outcomes, bg_season_outcomes, regression_dataset_bg, regression_results.json, regression_summary.txt, regression_metadata.json) - Schema assertions for all BG outcome parquets - Mathematical invariants: pct_savings_weighted definition, annual rollup equals sum-of-months, season values and mapping, null handling - Crosswalk coverage: n_zip4, n_bg, n_zip4_multi_bg, pct_dropped - Regression results JSON: model keys, r_squared, coefficients with const - Both regression modes (annual, bg_month) with schema consistency checks - Skip-regression mode: no artifacts, manifest flags, bills still produced - Manifest: all_months_bills_rows, steps_completed, regression_level Also adds --regression-level pass-through to the orchestrator CLI so both annual and bg_month modes can be tested end-to-end. Test data is augmented with synthetic households at diverse ZIP+4 values to ensure >= 6 census block groups for OLS regression coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…alidation Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the commented-out setup block with a working loader that
unpacks cache/report_variables.pkl into a SimpleNamespace (v), and
add dollar/pct helper functions. Prefix all 48 inline {python}
variable references with v. so they resolve from the namespace.
Replaces the pricing simulation pipeline notebook with a focused
equity analysis notebook that starts from pre-computed billing
outputs. Computes all report variables (quintile breakdowns,
regression results, rate constants) and exports them to
cache/report_variables.pkl for index.qmd.
- Fixes DTOU description ('Delivery' not 'Dynamic')
- Computes Q1-Q5 gaps dynamically instead of hardcoding
- Updates fig-rate-structures with accurate C23 all-in rates
- Adds spot-check assertions against known pipeline outputs
Add plotnine, compute_delivery_deltas, gspread, dotenv, bs4, and IPython to the appropriate deptry ignore lists. plotnine is a notebook-only dependency; the others are local scripts or transitive imports in lib/.
Add plotnine, compute_delivery_deltas, gspread, dotenv, bs4, and IPython to the appropriate deptry ignore lists. plotnine is a notebook-only dependency; the others are local scripts or transitive imports in lib/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update pct_save computations to use the actual parquet column names: STOU → total_delta_dollars, DTOU → dtou_total_delta_dollars. Fix regression extraction to filter on "rate" and "dep_var" (not "rate_type"/"outcome"); hardcode stou_jan_mean_pct = 25.66 since there is no mean_pct column in regression_summary.csv. Update callout notes and prose to reflect the real schema.
49ae9d9 to
473b47a
Compare
…ort targets - Add _quarto.yml: manuscript project type, Switchbox theme, SVG figures - Add references.bib with ICC Final Order and Order on Rehearing entries - Add render/draft/clean targets to Justfile - Fix bibliography path in index.qmd (../references.bib → references.bib) - Fix typo in index.qmd: "chargesa" → "charges a"
…github.com/switchbox-data/smart-meter-analysis into 60-smart-meter-analysis-pricing-simulation
5924afc to
d2ad44b
Compare
…github.com/switchbox-data/smart-meter-analysis into 60-smart-meter-analysis-pricing-simulation
…github.com/switchbox-data/smart-meter-analysis into 60-smart-meter-analysis-pricing-simulation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.