Speculators Model Definition Prototype Integration Design Doc #139

markurtz · 2025-09-18T16:51:46Z

markurtz
Sep 18, 2025
Maintainer

Summary

We will integrate Speculators’ productized model definitions and configs directly into the Eagle3 and HASS research prototypes so those codebases construct, train, and serialize speculator models natively (no post-hoc conversion). Prototype user-facing interfaces will remain largely the same; under the hood, they will instantiate models and attach verifier backbones via Speculators’ standardized pathways. We will start with Llama 3-style speculator configs and the smallest model sizes/toy variants to validate forward/backward correctness, speed/accuracy parity, and checkpoint I/O. Scope is intentionally narrow to reduce churn and fragility, while setting a path to later support Qwen 3, gpt-oss, and Llama 4.

Goals

Primary
- Replace prototype-local model/config code with Speculators’ model definitions and Pydantic configs.
- Keep prototype CLI & training loops stable (minimal surface change).
- Ensure forward/backward passes function correctly under prototype losses/optimizers/schedules.
- Integrate verifier attachment/loading through Speculators’ built-ins (replacing prototype glue).
- Achieve parity with current research implementations:
  - Accuracy within noise on fixed seeds/datasets.
  - Throughput/latency within ±5–10% for like-for-like settings.
- Native save/load: write checkpoints/configs in Speculators’ format (no converter step).
Secondary
- Introduce a thin adapter layer to preserve prototype APIs while delegating to Speculators.
- Land a single reference architecture first: Llama 3 speculator (draft head + verifier attach).
- Document a repeatable recipe to extend to Qwen 3, gpt-oss, and Llama 4 later.
Non-Goals (for this phase)
- Rewriting training objectives or schedulers in prototypes.
- Cross-framework checkpoint translation beyond what Speculators already supports.

Requirements

Functional

Config integration
- Prototypes accept current standard args and build Speculators’ Algorithm/Model Config or resolve from HF id/local file.
- Prototype runtime maps existing flags to Speculators config fields (with defaults).
Model construction
- Build the speculator model fully from the config definition including all parameters, modules, and supporting functionality (verifier attachment/loading, tokenization/processor support, params/layer construction and loading
- Utilize builtins within speculators for from_pretrained, save_pretrained, algorithm construction/model construction, etc
Training loop compatibility
- Forward returns logits (and aux as needed) in the same tensor shapes the prototypes expect.
- Backward works under prototype loss functions; grad flow validated (no detached paths).
- Mixed precision/distributed training work as expected and fully supported by Speculators
Checkpointing
- Save checkpoints as Speculators formats
- Resume paths support loading back from saved Speculator checkpoints
Metrics/Parity
- Ensure metrics/losses log out the same and are comparable to the original in addition to adding easy pathways for performance testing.

Non-Functional

Performance: No material regressions on single-GPU baselines; batching parameters honored.
Reliability: Deterministic seeding, reproducible resumes, schema validation at boundaries.
Maintainability: Minimal shim in prototypes; most complexity lives in Speculators.
Compatibility: Python 3.10+, recent PyTorch and Transformers versions aligned with prototypes and Speculators toolkit

Design

High-Level Architecture

N/A

Integration Points

1. Model Architecture Replacement

Eagle3: Replace custom model definitions in training code with Speculators' standardized model configs
HASS: Similar model replacement, likely with different head configurations

2. Config System Integration

Current: Both use shell script parameters and hardcoded configs
Target: Replace with Speculators' Pydantic config system
Integration point: Map CLI args → Speculators Algorithm/Model configs

3. Data Generation Pipeline

Eagle3: Uses ge_data.allocation module for parallel data generation
HASS: Uses dataset-specific scripts (ultrachat.py, sharegpt.py, etc.)
Integration: Ensure Speculators' tokenization/processing works with existing data pipelines

4. Training Loop Integration

Both: Use distributed training with DeepSpeed
Forward pass: Need to maintain output tensor shapes for existing loss functions
Backward pass: Ensure gradient flow works with Speculators' model structure

5. Vocabulary Mapping

Eagle3: Uses restricted vocabulary (d2t.npy, t2d.npy files)
Integration: Speculators needs to integrate with reduced vocabulary mapping for draft models

6. Checkpoint Management

Current: Custom checkpoint saving/loading
Target: Native Speculators format save/load
Remove: Post-training conversion scripts (convert.sh)

7. Verifier Attachment

Current: Manual model loading and attachment
Target: Speculators' built-in verifier attachment mechanisms

8. Multi-GPU/Distributed Training

Both: Use CUDA_VISIBLE_DEVICES and distributed setups
Integration: Ensure Speculators works with existing DeepSpeed configurations

Milestones

M0 — Spike & Contracts (Llama 3 tiny, toy data)

Stand-alone proof: SpeculatorModule.forward/backward works with verifier attached.
Define forward output dict to satisfy both Eagle3 & HASS losses without if-defs.

M1 — Eagle3 Integration (Llama 3.1 8B)

Adapter landed; training loop runs end-to-end.
Save/resume in Speculators format.
Parity: accuracy ±1–2% absolute on defined eval; throughput within ±10%.
Docs/examples

M2 — HASS Integration (Llama 3.1 8B)

Same as M1; reconcile any objective-specific tensors via output dict.
Parity report generated.
Docs/examples

M3 — Hardening

Larger models, distributed setups, different model architectures
Docs/examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speculators Model Definition Prototype Integration Design Doc #139

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Speculators Model Definition Prototype Integration Design Doc #139

Uh oh!

Uh oh!

markurtz Sep 18, 2025 Maintainer

Summary

Goals

Requirements

Functional

Non-Functional

Design

High-Level Architecture

Integration Points

1. Model Architecture Replacement

2. Config System Integration

3. Data Generation Pipeline

4. Training Loop Integration

5. Vocabulary Mapping

6. Checkpoint Management

7. Verifier Attachment

8. Multi-GPU/Distributed Training

Milestones

Replies: 0 comments

markurtz
Sep 18, 2025
Maintainer