You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We will integrate Speculators’ productized model definitions and configs directly into the Eagle3 and HASS research prototypes so those codebases construct, train, and serialize speculator models natively (no post-hoc conversion). Prototype user-facing interfaces will remain largely the same; under the hood, they will instantiate models and attach verifier backbones via Speculators’ standardized pathways. We will start with Llama 3-style speculator configs and the smallest model sizes/toy variants to validate forward/backward correctness, speed/accuracy parity, and checkpoint I/O. Scope is intentionally narrow to reduce churn and fragility, while setting a path to later support Qwen 3, gpt-oss, and Llama 4.
Goals
Primary
Replace prototype-local model/config code with Speculators’ model definitions and Pydantic configs.
Keep prototype CLI & training loops stable (minimal surface change).
Ensure forward/backward passes function correctly under prototype losses/optimizers/schedules.
Integrate verifier attachment/loading through Speculators’ built-ins (replacing prototype glue).
Achieve parity with current research implementations:
Accuracy within noise on fixed seeds/datasets.
Throughput/latency within ±5–10% for like-for-like settings.
Native save/load: write checkpoints/configs in Speculators’ format (no converter step).
Secondary
Introduce a thin adapter layer to preserve prototype APIs while delegating to Speculators.
Land a single reference architecture first: Llama 3 speculator (draft head + verifier attach).
Document a repeatable recipe to extend to Qwen 3, gpt-oss, and Llama 4 later.
Non-Goals (for this phase)
Rewriting training objectives or schedulers in prototypes.
Cross-framework checkpoint translation beyond what Speculators already supports.
Requirements
Functional
Config integration
Prototypes accept current standard args and build Speculators’ Algorithm/Model Config or resolve from HF id/local file.
Build the speculator model fully from the config definition including all parameters, modules, and supporting functionality (verifier attachment/loading, tokenization/processor support, params/layer construction and loading
Utilize builtins within speculators for from_pretrained, save_pretrained, algorithm construction/model construction, etc
Training loop compatibility
Forward returns logits (and aux as needed) in the same tensor shapes the prototypes expect.
Backward works under prototype loss functions; grad flow validated (no detached paths).
Mixed precision/distributed training work as expected and fully supported by Speculators
Checkpointing
Save checkpoints as Speculators formats
Resume paths support loading back from saved Speculator checkpoints
Metrics/Parity
Ensure metrics/losses log out the same and are comparable to the original in addition to adding easy pathways for performance testing.
Non-Functional
Performance: No material regressions on single-GPU baselines; batching parameters honored.
Reliability: Deterministic seeding, reproducible resumes, schema validation at boundaries.
Maintainability: Minimal shim in prototypes; most complexity lives in Speculators.
Compatibility: Python 3.10+, recent PyTorch and Transformers versions aligned with prototypes and Speculators toolkit
Design
High-Level Architecture
N/A
Integration Points
1. Model Architecture Replacement
Eagle3: Replace custom model definitions in training code with Speculators' standardized model configs
HASS: Similar model replacement, likely with different head configurations
2. Config System Integration
Current: Both use shell script parameters and hardcoded configs
Target: Replace with Speculators' Pydantic config system
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
We will integrate Speculators’ productized model definitions and configs directly into the Eagle3 and HASS research prototypes so those codebases construct, train, and serialize speculator models natively (no post-hoc conversion). Prototype user-facing interfaces will remain largely the same; under the hood, they will instantiate models and attach verifier backbones via Speculators’ standardized pathways. We will start with Llama 3-style speculator configs and the smallest model sizes/toy variants to validate forward/backward correctness, speed/accuracy parity, and checkpoint I/O. Scope is intentionally narrow to reduce churn and fragility, while setting a path to later support Qwen 3, gpt-oss, and Llama 4.
Goals
Requirements
Functional
Non-Functional
Design
High-Level Architecture
N/A
Integration Points
1. Model Architecture Replacement
2. Config System Integration
3. Data Generation Pipeline
ge_data.allocation
module for parallel data generationultrachat.py
,sharegpt.py
, etc.)4. Training Loop Integration
5. Vocabulary Mapping
d2t.npy
,t2d.npy
files)6. Checkpoint Management
convert.sh
)7. Verifier Attachment
8. Multi-GPU/Distributed Training
Milestones
M0 — Spike & Contracts (Llama 3 tiny, toy data)
SpeculatorModule.forward/backward
works with verifier attached.M1 — Eagle3 Integration (Llama 3.1 8B)
M2 — HASS Integration (Llama 3.1 8B)
M3 — Hardening
Beta Was this translation helpful? Give feedback.
All reactions