Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specscheduler evaluation support code #1541

Merged
merged 56 commits into from
Nov 15, 2024
Merged

Conversation

goliaro
Copy link
Collaborator

@goliaro goliaro commented Nov 15, 2024

Description of changes:

This PR does the following:

  • LLAMA 3 speculation support:
    • Add support for LLAMA 3.1 and 3.2
    • Benchmark performance of LLAMA-3.1-70B with small models: Zhuominc/Llama-3-330M, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama-3.1-8B-Instruct (tl;dr meta-llama/Llama-3.2-1B-Instruct is the best)
    • Add support for serving SSMs with TP_degre > 1
  • Make evaluation easier/faster to run:
    • Add code to load all the weights in parallel, fixing context issue discussed with Legion team here
    • Record memory usage breakdown when passing --log-instance-creation. Add script to debug issues related to insufficient memory by device and task. See here.
  • Bug fixes
    • Remove all reduce deadlock by adding Legion barriers
    • Detection of EOS tokens when produced in the middle of speculation (instead of at the end) and early stop to prevent infinite generation (until max sequence length) when the EOS token is in middle of verified sequence
  • Benchmarking

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@goliaro goliaro marked this pull request as ready for review November 15, 2024 17:09
@goliaro goliaro changed the base branch from specscheduler to specscheduler_eval November 15, 2024 17:16
@goliaro goliaro merged commit b798385 into specscheduler_eval Nov 15, 2024
29 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants