Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: UL2 merge #23

Open
wants to merge 144 commits into
base: multi-query-attention
Choose a base branch
from
Open

WIP: UL2 merge #23

wants to merge 144 commits into from

Conversation

RaymondLi0
Copy link
Collaborator

@RaymondLi0 RaymondLi0 commented Feb 7, 2023

This PR is based on NVIDIA#268
In addition:

  • changed the variance of the masked-span length to scale with the mean
  • truncate the sequences after the masking in the decoder-only case.
  • some slight refactor

TODO: getting around 30%reduced throughput with UL2.

jaredcasper and others added 30 commits July 21, 2022 15:35
Remove old merge tool.

See merge request ADLR/megatron-lm!433
added a flag to be able to switch between pytorch and ring exchange p2p

See merge request ADLR/megatron-lm!434
support for all mask in fused kernel + avoiding inplace operation in bwd pass

See merge request ADLR/megatron-lm!435
fix a bug for size mismatch

See merge request ADLR/megatron-lm!438
Timing levels

See merge request ADLR/megatron-lm!436
fixed grad scalar warning so it only prints it for fp16

See merge request ADLR/megatron-lm!441
fixed grad scalar warning for bf16

See merge request ADLR/megatron-lm!442
Memory safety checks were incorrect for the tokens_to_generate=0 case

See merge request ADLR/megatron-lm!447
Update state_dict arguments for recent PyTorch versions.

See merge request ADLR/megatron-lm!432
The LICENSE file says everything is 3-clause BSD, which is what we want,
but at some point the Apache license was added to the top of some files
and that proliferated. This commit removes the Apache license from any
files that we own the copyright to.

Also updates the copyright year and removes the unnessary coding=utf-8
line.
Clean up licensing.

See merge request ADLR/megatron-lm!451
Also merged in some changed from apex
janEbert and others added 22 commits January 23, 2023 17:58
Since the normal distribution is unbounded, we cannot have `max_ngrams`
set to a bounded value.
Filtered means not `cls_id` or `sep_id` tokens. This slightly improves
calculated statistics for long sequences and greatly for very short
sequences.
Via an extra "private" argument.
The GPT tokenizer does not handle the difference between UL2 tokens and
other special tokens well. This should be fine as UL2 tokens being
distinct from other special tokens is never assumed at the
moment (although other tokenizers implement it like that). In general,
`additional_special_token_ids` is new for the GPT tokenizer, so there is
no backward compatibility trouble.
Not always strictly necessary; this is only important for the
decoder-only case. However, we don't bother checking for this since it's
also queried in the `UL2Dataset`.
Usually we do not iterate through all indices, so we can save quite some
time if `max_ngrams` is large.
@RaymondLi0 RaymondLi0 changed the base branch from main to multi-query-attention February 7, 2023 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants