Utkarsh/continual learning benchmark upgrade#3
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR upgrades the continual-learning benchmark by adding ER-ACE and GDumb strategies, expanding smoke/integration coverage, and enhancing reporting artifacts to include memory-budget context in leaderboard outputs.
Changes:
- Added ER-ACE (asymmetric CE masking + replay) and GDumb (class-balanced memory + retrain-from-scratch) strategies and wired them into strategy creation + CLI method selection.
- Introduced a class-balanced replay buffer utility to support exemplar-only baselines like GDumb.
- Updated reporting/README/docs assets to include memory-budget information and refreshed generated benchmark artifacts.
Reviewed changes
Copilot reviewed 18 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_strategies.py | Adds unit tests for ER-ACE, GDumb, and the balanced replay buffer. |
| tests/test_integration_smoke.py | Extends smoke suite to cover er_ace and gdumb. |
| tests/test_config.py | Asserts gdumb_epochs is loaded from config. |
| src/cl_bench/strategies/replay.py | Adds BalancedReplayBuffer alongside reservoir replay buffer. |
| src/cl_bench/strategies/gdumb.py | Implements GDumb strategy and factory. |
| src/cl_bench/strategies/er_ace.py | Implements ER-ACE strategy and factory. |
| src/cl_bench/strategies/init.py | Exports new strategies and adds er_ace / gdumb to create_strategy. |
| src/cl_bench/reporting.py | Adds memory budget aggregation + reporting column; improves retention-curve alignment. |
| src/cl_bench/experiments.py | Logs gdumb_epochs into run summary metadata. |
| src/cl_bench/config.py | Adds gdumb_epochs to config schema + parsing defaults. |
| src/cl_bench/cli.py | Adds new methods and expands allowed model choices. |
| README.md | Documents ER-ACE/GDumb and adds a GDumb command + updated leaderboard format. |
| docs/BENCHMARK_CARD.md | Updates benchmark card to include ER-ACE and GDumb descriptions. |
| docs/assets/split_cifar10_headline/summary.json | Updates generated report summary with GDumb runs + memory field. |
| docs/assets/split_cifar10_headline/README.md | Updates generated report README with GDumb and memory column. |
| docs/assets/split_cifar10_headline/leaderboard.csv | Updates generated leaderboard CSV schema + GDumb row. |
| configs/split_cifar10_headline.yaml | Adds strategy.gdumb_epochs. |
| configs/smoke.yaml | Adds strategy.gdumb_epochs for smoke runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+58
to
+62
| task_classes=[ | ||
| [int(label) for label in task.classes] | ||
| for task in config.tasks | ||
| if task.classes != "all" | ||
| ], |
Comment on lines
116
to
120
| forgetting = [_metric(record, "average_forgetting") for record in method_records] | ||
| backward_transfer = [_metric(record, "backward_transfer") for record in method_records] | ||
| runtimes = [record.runtime_seconds for record in method_records] | ||
| memory_budgets = [_metric(record, "replay_buffer_size") for record in method_records] | ||
| seeds = ",".join( |
Comment on lines
+48
to
+59
| self._fit_memory(task_id) | ||
| val_metrics = self.evaluate(val_loader) | ||
| return [ | ||
| { | ||
| "task_id": task_id, | ||
| "epoch": 1, | ||
| "train_loss": 0.0, | ||
| "train_accuracy": 0.0, | ||
| "train_examples": float(example_count), | ||
| **{f"val_{key}": value for key, value in val_metrics.items()}, | ||
| } | ||
| ] |
| parser.add_argument("--model", choices=["linear", "mlp", "small_cnn", "cnn"]) | ||
| parser.add_argument( | ||
| "--model", | ||
| choices=["linear", "mlp", "small_cnn", "cnn", "cifar_convnet", "resnet18_cifar"], |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.