Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Megatron version #85

Draft
wants to merge 427 commits into
base: nvidia_main
Choose a base branch
from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Jan 15, 2024

  1. Configuration menu
    Copy the full SHA
    44a8f18 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c7d0fb1 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2024

  1. Configuration menu
    Copy the full SHA
    7bcb2e1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    97d9a50 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'sliding_window_attention/akoumparouli' into 'main'

    Sliding window attention
    
    See merge request ADLR/megatron-lm!1025
    jaredcasper committed Jan 16, 2024
    Configuration menu
    Copy the full SHA
    6e7ded3 View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2024

  1. Configuration menu
    Copy the full SHA
    46ca3db View commit details
    Browse the repository at this point in the history
  2. Merge branch 'distopt_with_moe' into 'main'

    Refactor DistributedOptimizer for MoE model support
    
    See merge request ADLR/megatron-lm!986
    deepakn94 committed Jan 17, 2024
    Configuration menu
    Copy the full SHA
    d657a3e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6083743 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2024

  1. Configuration menu
    Copy the full SHA
    17545b3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6c0e7a9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    bf9c0a1 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    85c4034 View commit details
    Browse the repository at this point in the history
  5. Better code formatting

    PytLab committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    54de98d View commit details
    Browse the repository at this point in the history
  6. Fixed merge conflicts

    Signed-off-by: Selvaraj Anandaraj <[email protected]>
    Selvaraj Anandaraj committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    909bda3 View commit details
    Browse the repository at this point in the history
  7. add is_first_microbatch for TE

    Signed-off-by: jiemingz <[email protected]>
    jiemingz committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    3c44fb9 View commit details
    Browse the repository at this point in the history
  8. add arg name

    Signed-off-by: jiemingz <[email protected]>
    jiemingz committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    27879a7 View commit details
    Browse the repository at this point in the history
  9. add docstring and move set_is_first_microbatch

    Signed-off-by: jiemingz <[email protected]>
    jiemingz committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    7dc2ee8 View commit details
    Browse the repository at this point in the history
  10. Fixed formatting

    Signed-off-by: Selvaraj Anandaraj <[email protected]>
    Selvaraj Anandaraj committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    3e19c76 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. Merge branch 'jiemingz/is_first_microbatch' into 'main'

    add is_first_microbatch for TE
    
    See merge request ADLR/megatron-lm!1033
    jaredcasper committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    bed60a8 View commit details
    Browse the repository at this point in the history
  2. fix a bug in branch and format

    Hongbin Liu committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    cf1a1c6 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'main' into fuse_rope_swiglu_main

    Hongbin Liu committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    036605d View commit details
    Browse the repository at this point in the history
  4. fix tests

    Hongbin Liu committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    568da5a View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    140642c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    de9428a View commit details
    Browse the repository at this point in the history
  7. Merge branch 'atomic_gemm_switch' into 'main'

    Need a switch at NeMo level to enable Atomic GEMM
    
    See merge request ADLR/megatron-lm!1017
    jaredcasper committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    599f558 View commit details
    Browse the repository at this point in the history
  8. Merge branch 'mblaz/dist-ckpt-layernorms' into 'main'

    Add distributed checkpoint support to non-TE based models
    
    See merge request ADLR/megatron-lm!1005
    jaredcasper committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    ca8a00a View commit details
    Browse the repository at this point in the history
  9. Docstring removed for context config

    Signed-off-by: Selvaraj Anandaraj <[email protected]>
    Selvaraj Anandaraj committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    79269fa View commit details
    Browse the repository at this point in the history
  10. Decoupled cpu offloading and SplitAlongDim imports

    Signed-off-by: Selvaraj Anandaraj <[email protected]>
    Selvaraj Anandaraj committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    4b05862 View commit details
    Browse the repository at this point in the history
  11. Merge branch 'cpu_offload' into 'main'

    Support for activation offloading to CPU in M-LM
    
    See merge request ADLR/megatron-lm!1016
    jaredcasper committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    a5165ac View commit details
    Browse the repository at this point in the history
  12. Merge branch 'fuse_rope_swiglu_main' into 'main'

    add rope and swiglu fusion
    
    See merge request ADLR/megatron-lm!946
    jaredcasper committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    640af6b View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    473225f View commit details
    Browse the repository at this point in the history
  14. Merge branch 'jaeminc/mcore-jit' into 'main'

    Add jit_fuser to switch between torch.jit.script and torch.compile
    
    See merge request ADLR/megatron-lm!1036
    jaredcasper committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    de4028a View commit details
    Browse the repository at this point in the history
  15. misc

    jlamypoirier committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    716204e View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2024

  1. Merge branch 'black_on_optimizer' into 'main'

    Run black on megatron/optimizer
    
    See merge request ADLR/megatron-lm!1050
    jaredcasper committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    8c2cd99 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c795038 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2016969 View commit details
    Browse the repository at this point in the history
  4. Code clean.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    9b5cd88 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    dc436f2 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    a98c5ba View commit details
    Browse the repository at this point in the history
  7. Noramlize the token scores.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    0f80408 View commit details
    Browse the repository at this point in the history
  8. Code clean.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    de37485 View commit details
    Browse the repository at this point in the history
  9. Fix moe aux loss.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    8efc8de View commit details
    Browse the repository at this point in the history
  10. Fix UTs; Fix MoE Loss.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    15e75b0 View commit details
    Browse the repository at this point in the history
  11. Add Z loss UT.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    dd0411b View commit details
    Browse the repository at this point in the history
  12. Add documentation.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    bfb7bbd View commit details
    Browse the repository at this point in the history
  13. Add typing check.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    b506152 View commit details
    Browse the repository at this point in the history
  14. Update CI.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    411bc27 View commit details
    Browse the repository at this point in the history
  15. Fix grouped gemm UT.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    1ab146c View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    6d702cb View commit details
    Browse the repository at this point in the history
  17. Fix Z Loss.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    c656553 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    8b41c9f View commit details
    Browse the repository at this point in the history
  19. Update CI golden values.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    196b911 View commit details
    Browse the repository at this point in the history
  20. Swap topk and softmax.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    3ff8c7f View commit details
    Browse the repository at this point in the history
  21. Update CI after rebasing.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    1ce5712 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    09accc8 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    5d0dbd3 View commit details
    Browse the repository at this point in the history
  24. Fix review comments.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    a003610 View commit details
    Browse the repository at this point in the history
  25. Renaming.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    e2d3e4f View commit details
    Browse the repository at this point in the history
  26. Renaming.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    b616497 View commit details
    Browse the repository at this point in the history
  27. Move dispatcher and experts.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    2038324 View commit details
    Browse the repository at this point in the history
  28. Update CI golden value.

    yanring committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    eb47d69 View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    3da7d1d View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2024

  1. Code clean.

    yanring committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    2afee76 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2024

  1. Configuration menu
    Copy the full SHA
    aed469f View commit details
    Browse the repository at this point in the history
  2. Add input jitter.

    yanring committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    f1b6c96 View commit details
    Browse the repository at this point in the history
  3. Moved offloading configs to Model parallel config from TF config

    Signed-off-by: Selvaraj Anandaraj <[email protected]>
    Selvaraj Anandaraj committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    f24abd1 View commit details
    Browse the repository at this point in the history
  4. Fixed formatting and imports

    Signed-off-by: Selvaraj Anandaraj <[email protected]>
    Selvaraj Anandaraj committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    288134e View commit details
    Browse the repository at this point in the history
  5. Update retro doc

    boxin-wbx committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    1872385 View commit details
    Browse the repository at this point in the history
  6. Log progress (iterations, floating-point operations, tokens) to progr…

    …ess.txt file
    
    - Also log job ID and number of GPUs in progress file.
    - Log job throughput and cumulative throughput separately.
    deepakn94 committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    8fb44df View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    781d86a View commit details
    Browse the repository at this point in the history
  8. Merge branch 'progress' into 'main'

    Log progress of a sequence of jobs using the same checkpoint directory to a progress.txt file in checkpoint directory
    
    See merge request ADLR/megatron-lm!1060
    deepakn94 committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    be8011a View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2024

  1. Configuration menu
    Copy the full SHA
    b03eae3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d2e5f78 View commit details
    Browse the repository at this point in the history
  3. Remove one_logger config file

    PytLab committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    62a5a3e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    49727de View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0cb693a View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ae1cd89 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ebb1484 View commit details
    Browse the repository at this point in the history
  8. Add distributed optimizer tests with --overlap-param-gather (and corr…

    …esponding gold values)
    deepakn94 committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    7298d15 View commit details
    Browse the repository at this point in the history
  9. Fix bug causing issues with fp16 and --overlap-param-gather by disabl…

    …ing overlapped param gather for validation
    deepakn94 committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    33111c9 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2024

  1. Configuration menu
    Copy the full SHA
    f634cca View commit details
    Browse the repository at this point in the history
  2. Merge branch 'fp16_overlap_param_gather' into 'main'

    Add distributed optimizer tests with --overlap-param-gather
    
    See merge request ADLR/megatron-lm!1058
    deepakn94 committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    75120db View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9e773fa View commit details
    Browse the repository at this point in the history
  4. Packed Sequence

    cuichenx authored and jaredcasper committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    95b2146 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'chcui/packed_seq_from_fuse_rope_swiglu_main' into 'main'

    Packed Sequence
    
    See merge request ADLR/megatron-lm!984
    jaredcasper committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    773ad0f View commit details
    Browse the repository at this point in the history
  6. Merge branch 'documentation' into 'main'

    Add basic documentation for packages
    
    See merge request ADLR/megatron-lm!999
    deepakn94 committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    2c3468a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    51e936c View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    83c0423 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    00358e5 View commit details
    Browse the repository at this point in the history
  10. Handle MoE with GeLU

    mikolajblaz committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    431ce99 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e2fd6ca View commit details
    Browse the repository at this point in the history
  12. Merge branch 'boxin/retro-doc-fix' into 'main'

    Update retro doc
    
    See merge request ADLR/megatron-lm!1061
    jaredcasper committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    bd6f4ea View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    1e0e58e View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    472d54e View commit details
    Browse the repository at this point in the history
  15. Fix

    jlamypoirier committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    98fbb42 View commit details
    Browse the repository at this point in the history
  16. Merge branch 'fused-warning-fix' into 'main'

    Only print warning about fused rotary position embedding once.
    
    See merge request ADLR/megatron-lm!1067
    jaredcasper committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    37e7dac View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2024

  1. Configuration menu
    Copy the full SHA
    c4678ff View commit details
    Browse the repository at this point in the history
  2. Merge branch 'offload_patch' into 'main'

    Moved offloading configs to Model parallel config from TF config
    
    See merge request ADLR/megatron-lm!1059
    jaredcasper committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    817b431 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    de859b3 View commit details
    Browse the repository at this point in the history
  4. Add app_tag_count tracking

    PytLab committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    7027a1d View commit details
    Browse the repository at this point in the history
  5. Merge branch 'feature/add-e2e-metrics-logging' of ssh://gitlab-master…

    ….nvidia.com:12051/zshao/megatron-lm into feature/add-e2e-metrics-logging
    Zhengjiang Shao committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    a72388d View commit details
    Browse the repository at this point in the history
  6. Resolve merging conflict

    Zhengjiang Shao committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    8344203 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    7af41ab View commit details
    Browse the repository at this point in the history
  8. Remove app_tag global var

    PytLab committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    e713cd7 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    9603e1f View commit details
    Browse the repository at this point in the history
  10. Add doc

    mikolajblaz committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    fdafcc5 View commit details
    Browse the repository at this point in the history
  11. Add no support info

    mikolajblaz committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    c40c047 View commit details
    Browse the repository at this point in the history
  12. Adding bert local spec test

    Shanmugam Ramasamy committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    e25970f View commit details
    Browse the repository at this point in the history
  13. Adding bert local spec test

    Shanmugam Ramasamy committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    2b0decc View commit details
    Browse the repository at this point in the history
  14. Merge branch 'zijiey/moe_api_clean' into 'main'

    Refactoring of the MoE layer and communications; Integration of the Top-K Router with MoE Losses.
    
    See merge request ADLR/megatron-lm!1018
    jaredcasper committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    559e82c View commit details
    Browse the repository at this point in the history
  15. Adding bert local spec test

    Shanmugam Ramasamy committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    e6ef9ea View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2024

  1. Adding bert local spec test

    Shanmugam Ramasamy committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    c2d44ff View commit details
    Browse the repository at this point in the history
  2. Adding bert local spec test

    Shanmugam Ramasamy committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    fc316ff View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8578800 View commit details
    Browse the repository at this point in the history
  4. Adding bert local spec test

    Shanmugam Ramasamy committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    6e599dc View commit details
    Browse the repository at this point in the history
  5. add unit tests

    Signed-off-by: Chen Cui <[email protected]>
    cuichenx committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    1e95136 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    5c10cb4 View commit details
    Browse the repository at this point in the history
  7. Add num_floating_point_operations_so_far arg to save_checkpoint cal…

    …l in checkpoint/util.py
    mathemakitten authored and jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    4a08560 View commit details
    Browse the repository at this point in the history
  8. Merge branch 'hn-save-checkpoint' into 'main'

    Add `num_floating_point_operations_so_far` arg to save_checkpoint call in checkpoint/util.py
    
    See merge request ADLR/megatron-lm!1073
    jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    3709708 View commit details
    Browse the repository at this point in the history
  9. Fixing the nightly ci for #1018.

    yanring authored and jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    88ddc36 View commit details
    Browse the repository at this point in the history
  10. Merge branch 'zijie/fix_1018_nightly_tests' into 'main'

    Fixing the nightly ci for #1018.
    
    See merge request ADLR/megatron-lm!1075
    jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    f5c5388 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    5cce2b5 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    04d7b19 View commit details
    Browse the repository at this point in the history
  13. formatting

    Signed-off-by: Chen Cui <[email protected]>
    cuichenx committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    1fc103f View commit details
    Browse the repository at this point in the history
  14. typo

    Signed-off-by: Chen Cui <[email protected]>
    cuichenx committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    16e6e9b View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    3df96f1 View commit details
    Browse the repository at this point in the history
  16. Merge branch 'akoumparouli/expert_model_parallel_world_size_setter' i…

    …nto 'main'
    
    Add _CPU_EXPERT_MODEL_PARALLEL_WORLD_SIZE flag in parallel-state to allow...
    
    See merge request ADLR/megatron-lm!1057
    ericharper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    5cfe7b8 View commit details
    Browse the repository at this point in the history
  17. Fix formatting

    shanmugamr committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    567fab7 View commit details
    Browse the repository at this point in the history
  18. Merge branch 'layernorm-apex-update' into 'main'

    Use new memory_efficient argument to fused layernorm functions when available in apex.
    
    See merge request ADLR/megatron-lm!1068
    jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    f2a49ba View commit details
    Browse the repository at this point in the history
  19. Merge branch 'chcui/fix_rope_fusion_config' into 'main'

    Update `apply_rope_fusion` in config after checking availability
    
    See merge request ADLR/megatron-lm!1074
    jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    195171f View commit details
    Browse the repository at this point in the history
  20. Support for raw and mock datasets

    John Kamalu authored and jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    8d8241a View commit details
    Browse the repository at this point in the history
  21. Merge branch 'raw-dataset' into 'main'

    Support for raw and mock datasets
    
    See merge request ADLR/megatron-lm!1031
    jaredcasper committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    803a018 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2024

  1. Configuration menu
    Copy the full SHA
    4223649 View commit details
    Browse the repository at this point in the history
  2. Adding bert local spec test

    Shanmugam Ramasamy committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    eaaf92f View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2024

  1. Configuration menu
    Copy the full SHA
    a4b5a9e View commit details
    Browse the repository at this point in the history
  2. Merge branch 'chcui/fix_rope_fusion_config' into 'main'

    Fix `qkv_format` in TEDotProductAttention
    
    See merge request ADLR/megatron-lm!1078
    ericharper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    83bb191 View commit details
    Browse the repository at this point in the history
  3. Add support for masked WordPiece datasets BERT and T5

    John Kamalu authored and ericharper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    25a9946 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'masked-datasets' into 'main'

    Add support for masked WordPiece datasets BERT and T5
    
    See merge request ADLR/megatron-lm!1041
    ericharper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    8312a3e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e2ff3e6 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'mblaz/moe-0.5-dist-ckpt' into 'main'

    Distributed checkpointing implementation for MoE
    
    See merge request ADLR/megatron-lm!1055
    jaredcasper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    05342e7 View commit details
    Browse the repository at this point in the history
  7. Merge branch 'main' into 'local_spec_bert'

    # Conflicts:
    #   pretrain_bert.py
    Shanmugam Ramasamy committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    329baac View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    eef48ef View commit details
    Browse the repository at this point in the history
  9. Merge branch 'moe_gmm_corner_case_fixw' into 'main'

    Fix the case when none token is allocated for local expert(s) with EP>1.
    
    See merge request ADLR/megatron-lm!1063
    ericharper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    9f92da0 View commit details
    Browse the repository at this point in the history
  10. rename output layer

    maxmatical committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    0bfeeae View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    a45805a View commit details
    Browse the repository at this point in the history
  12. Merge branch 'jlasek/generate_causal_mask_in_mcore' into 'main'

    Generate causal mask for local layer spec
    
    See merge request ADLR/megatron-lm!1047
    ericharper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    d972605 View commit details
    Browse the repository at this point in the history
  13. Update minor version

    ericharper authored and jaredcasper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    918d415 View commit details
    Browse the repository at this point in the history
  14. Merge branch 'update_minor_version' into 'main'

    Update minor version
    
    See merge request ADLR/megatron-lm!1086
    jaredcasper committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    34c874e View commit details
    Browse the repository at this point in the history
  15. Merge pull request #3 from ServiceNow/max/rename-output-layer

    rename output layer
    maxmatical committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    bb53cf9 View commit details
    Browse the repository at this point in the history
  16. use TE checkpointing when FP8

    Signed-off-by: Jimmy Zhang <[email protected]>
    jiemingz committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    eeb1b21 View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2024

  1. Configuration menu
    Copy the full SHA
    530239b View commit details
    Browse the repository at this point in the history
  2. Merge branch 'local_spec_bert' into 'main'

    Adding bert local spec test
    
    See merge request ADLR/megatron-lm!1072
    jaredcasper committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    4bd4e74 View commit details
    Browse the repository at this point in the history
  3. Remove unused hashlib

    PytLab committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    f8b277a View commit details
    Browse the repository at this point in the history
  4. Move grad-scale to loss.device

    Signed-off-by: Alexandros Koumparoulis <[email protected]>
    akoumpa committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    0fcbff0 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'feature/add-e2e-metrics-logging' into 'main'

    Feature/Add E2E metrics logging
    
    See merge request ADLR/megatron-lm!1049
    jaredcasper committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    ea52266 View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2024

  1. code clean for moe.

    fanshiqing committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    c3d057f View commit details
    Browse the repository at this point in the history
  2. update readme.

    fanshiqing committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    a1ba50f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2ee86c5 View commit details
    Browse the repository at this point in the history
  4. add license.

    fanshiqing committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    2e1f869 View commit details
    Browse the repository at this point in the history
  5. update readme.

    fanshiqing committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    e5102e7 View commit details
    Browse the repository at this point in the history
  6. JET Migration Updates

    maanug-nv authored and jaredcasper committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    6aad211 View commit details
    Browse the repository at this point in the history
  7. Merge branch 'maanug/jet-recipes' into 'main'

    JET Migration Updates
    
    See merge request ADLR/megatron-lm!1066
    jaredcasper committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    3d201d7 View commit details
    Browse the repository at this point in the history
  8. Fixing bugs in inference and adding mcore support

    shanmugamr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    50f8384 View commit details
    Browse the repository at this point in the history
  9. Fixing bugs in inference and adding mcore support

    shanmugamr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    7329f73 View commit details
    Browse the repository at this point in the history
  10. Fixing bugs in inference and adding mcore support

    shanmugamr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    376337d View commit details
    Browse the repository at this point in the history
  11. Merge branch 'fp8_recompute' into 'main'

    use TE checkpointing when FP8
    
    See merge request ADLR/megatron-lm!1080
    jaredcasper committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    cb995d5 View commit details
    Browse the repository at this point in the history
  12. Fixing bugs in inference and adding mcore support

    shanmugamr committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    d91c5a6 View commit details
    Browse the repository at this point in the history
  13. Merge branch 'akoumparouli/loss_scale_fix' into 'main'

    Move grad-scale to loss.device
    
    See merge request ADLR/megatron-lm!1083
    jaredcasper committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    7628c3a View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2024

  1. Configuration menu
    Copy the full SHA
    075d5b0 View commit details
    Browse the repository at this point in the history
  2. Move Megatron timer to core

    Aishwarya Bhandare authored and ericharper committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    680b67c View commit details
    Browse the repository at this point in the history
  3. Merge branch 'abhandare_timer' into 'main'

    Move Megatron timer to core
    
    See merge request ADLR/megatron-lm!995
    ericharper committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    8b691b9 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'inference_fix' into 'main'

    Fixing bugs in inference and adding mcore support
    
    See merge request ADLR/megatron-lm!1095
    jaredcasper committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    b87f069 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'code_clean' into 'main'

    code clean for moe.
    
    See merge request ADLR/megatron-lm!1094
    jaredcasper committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    259f06e View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2024

  1. Configuration menu
    Copy the full SHA
    aa96ab7 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'maanug/jet-hotfix' into 'main'

    JET fix: Migrate tests and run functional results always not on success
    
    See merge request ADLR/megatron-lm!1098
    maanug-nv committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    3e1a635 View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2024

  1. MoE argument sanity checks

    akoumpa authored and ericharper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    f89f388 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'akoumparouli/arg_sanity_check' into 'main'

    MoE argument sanity checks
    
    See merge request ADLR/megatron-lm!1084
    ericharper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    487ba73 View commit details
    Browse the repository at this point in the history
  3. add add_qkv_bias config

    Xue Huang authored and jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    f6995e5 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'xueh/add_qkv_bias' into 'main'

    add add_qkv_bias config
    
    See merge request ADLR/megatron-lm!926
    jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    02d284d View commit details
    Browse the repository at this point in the history
  5. Minor fixes for JET CI

    maanug-nv authored and jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    c8f50b4 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'maanug/jet-minor-fixes' into 'main'

    Minor fixes for JET CI
    
    See merge request ADLR/megatron-lm!1106
    jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    7c1dd65 View commit details
    Browse the repository at this point in the history
  7. Tokenizer fix

    jlamypoirier committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    9760e11 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    94ce57b View commit details
    Browse the repository at this point in the history
  9. Check if config has num_moe_experts

    akoumpa authored and jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    bb235cc View commit details
    Browse the repository at this point in the history
  10. Merge branch 'akoumparouli/moe_config_check' into 'main'

    Check if config has num_moe_experts
    
    See merge request ADLR/megatron-lm!1107
    jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    b02e62e View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    548e57a View commit details
    Browse the repository at this point in the history
  12. Merge branch 'mblaz/dist-ckpt-docs' into 'main'

    Add dist ckpt package docs for Sphinx documentation
    
    See merge request ADLR/megatron-lm!1010
    jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    240a8ef View commit details
    Browse the repository at this point in the history
  13. Fix oob perf

    wdykas authored and jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    960c06b View commit details
    Browse the repository at this point in the history
  14. Merge branch 'fix-oob-perf' into 'main'

    Fix oob perf
    
    See merge request ADLR/megatron-lm!1039
    jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    1390944 View commit details
    Browse the repository at this point in the history
  15. Add interleaved rotary embedding in MCore

    Xue Huang authored and jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    260c4f2 View commit details
    Browse the repository at this point in the history
  16. Merge branch 'xueh/rotary_interleaved' into 'main'

    Add interleaved rotary embedding in MCore
    
    See merge request ADLR/megatron-lm!895
    jaredcasper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    6d6f9af View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    6fdbfa7 View commit details
    Browse the repository at this point in the history
  18. Merge branch 'geshen/fix_activation_mutation' into 'main'

    fix activation checkpointing mutation
    
    See merge request ADLR/megatron-lm!977
    ericharper committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    169bfa4 View commit details
    Browse the repository at this point in the history
  19. fix

    jlamypoirier committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    b22634d View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2024

  1. Better wandb

    jlamypoirier committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    2165919 View commit details
    Browse the repository at this point in the history
  2. misc

    jlamypoirier committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    c478f48 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b6ce193 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'zijiey/fix_top2_dispatcher' into 'main'

    [MoE] fix the convergence issue when EP>1 and K>1
    
    See merge request ADLR/megatron-lm!1103
    ericharper committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    98da379 View commit details
    Browse the repository at this point in the history
  5. Use view() to set param_buffer from grad_buffer

    Move away from storage(); this helps reduce peak storage
    wangxicoding authored and deepakn94 committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    84c7af2 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2fb398c View commit details
    Browse the repository at this point in the history
  7. Merge branch 'save_checkpoint_fix' into 'main'

    Add missing num_floating_point_operations_so_far argument to save_checkpoint_and_time call
    
    See merge request ADLR/megatron-lm!1115
    jaredcasper committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    0f0279a View commit details
    Browse the repository at this point in the history
  8. Merge branch 'fix_param_buffer_peak_memory' into 'main'

    Fix param buffer peak memory
    
    See merge request ADLR/megatron-lm!1030
    jaredcasper committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    0052bf0 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2024

  1. Configuration menu
    Copy the full SHA
    6e25554 View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2024

  1. Fixed atomic gemm defaults/fixed the offloading check

    Signed-off-by: Selvaraj Anandaraj <[email protected]>
    Selvaraj Anandaraj committed Feb 10, 2024
    Configuration menu
    Copy the full SHA
    a8182ee View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2024

  1. Configuration menu
    Copy the full SHA
    daf0006 View commit details
    Browse the repository at this point in the history

Commits on Feb 12, 2024

  1. Ran black(19.10b0) on megatron/core

    Ankur Joshi committed Feb 12, 2024
    Configuration menu
    Copy the full SHA
    a73b113 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2482a4a View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2024

  1. Configuration menu
    Copy the full SHA
    5566742 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9e17a15 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'lmcafee/te-noinit-fix' into 'main'

    Condition TE init_method on config.perform_initialization.
    
    See merge request ADLR/megatron-lm!1104
    jaredcasper committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    55f3502 View commit details
    Browse the repository at this point in the history
  4. Move optimizers to MCore

    deepakn94 authored and ericharper committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    32f9155 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'dist_optimizer_to_mcore' into 'main'

    Move optimizers to MCore
    
    See merge request ADLR/megatron-lm!1071
    ericharper committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    eedfe53 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'tied_embeddings' into 'main'

    Put embedding layers in separate buckets to make sure embedding tying works
    
    See merge request ADLR/megatron-lm!1079
    deepakn94 committed Feb 13, 2024
    Configuration menu
    Copy the full SHA
    db2040f View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2024

  1. Merge branch 'Add_back_Timer_Code_changes_for_E2E' into 'main'

    Adding back the changes needed in timers.py for E2E work
    
    See merge request ADLR/megatron-lm!1121
    jaredcasper committed Feb 14, 2024
    Configuration menu
    Copy the full SHA
    6f3d5a4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5b4bbd5 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'te_transformer_layer_wrapper_in_mcore' into 'main'

    add support wrapper for TE TransformerLayer in mcore
    
    See merge request ADLR/megatron-lm!1113
    jaredcasper committed Feb 14, 2024
    Configuration menu
    Copy the full SHA
    5f9c870 View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2024

  1. Fixing examples

    Shanmugam Ramasamy committed Feb 15, 2024
    Configuration menu
    Copy the full SHA
    1b6ae27 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'bugfixexample' into 'main'

    Fixing examples
    
    See merge request ADLR/megatron-lm!1135
    jaredcasper committed Feb 15, 2024
    Configuration menu
    Copy the full SHA
    4ec7835 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2024

  1. Configuration menu
    Copy the full SHA
    72a255a View commit details
    Browse the repository at this point in the history
  2. Merge branch 'edp_with_zero1' into 'main'

    [MoE] Expert data parallel w/ ZeRO-1 support
    
    See merge request ADLR/megatron-lm!1040
    ericharper committed Feb 21, 2024
    Configuration menu
    Copy the full SHA
    90568ae View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2024

  1. Merge branch 'config_default' into 'main'

    Fixed atomic gemm defaults/fixed the offloading check
    
    See merge request ADLR/megatron-lm!1125
    jaredcasper committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    528d7cf View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2024

  1. Configuration menu
    Copy the full SHA
    a67ffda View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2024

  1. Mcore CLIP ViT model

    trintamaki authored and ericharper committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    5afa5da View commit details
    Browse the repository at this point in the history
  2. Merge branch 'trintamaki/clip-vit-model' into 'main'

    Mcore CLIP ViT model
    
    See merge request ADLR/megatron-lm!1127
    ericharper committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    6d14c7e View commit details
    Browse the repository at this point in the history
  3. Merge branch 'dist_optimizer_bugfix' into 'main'

    Bugfix: Make sure data_end_index is padded when creating new buckets
    
    See merge request ADLR/megatron-lm!1140
    deepakn94 committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    ad53b1e View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2024

  1. Configuration menu
    Copy the full SHA
    9530e19 View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2024

  1. Configuration menu
    Copy the full SHA
    5f1f813 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'mblaz/unify-resume-and-correctness-func-tests' into 'main'

    Unify resume and correctness functional tests
    
    See merge request ADLR/megatron-lm!1070
    maanug-nv committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    70e469d View commit details
    Browse the repository at this point in the history
  3. Mcore mock multimodal dataset

    trintamaki authored and jaredcasper committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    1fcdc95 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'trintamaki/dummy-multimodal-dataset' into 'main'

    Mcore mock multimodal dataset
    
    See merge request ADLR/megatron-lm!1147
    jaredcasper committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    1dada7e View commit details
    Browse the repository at this point in the history

Commits on Feb 28, 2024

  1. Fix NaN checking in grads: should be performed before data-parallel c…

    …ommunication
    
    Compute norm once per batch (instead of once per microbatch) and once per bucket (instead of once per param)
    deepakn94 committed Feb 28, 2024
    Configuration menu
    Copy the full SHA
    d668077 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'check_nan_in_grad' into 'main'

    Fix NaN checking in grads: should be performed before data-parallel all-reduce
    
    See merge request ADLR/megatron-lm!989
    deepakn94 committed Feb 28, 2024
    Configuration menu
    Copy the full SHA
    53a350e View commit details
    Browse the repository at this point in the history

Commits on Feb 29, 2024

  1. Configuration menu
    Copy the full SHA
    9677b3b View commit details
    Browse the repository at this point in the history
  2. Move to Draco OCI

    maanug-nv committed Feb 29, 2024
    Configuration menu
    Copy the full SHA
    3dafc0e View commit details
    Browse the repository at this point in the history
  3. Merge branch 'maanug/jet-oci' into 'main'

    Move to Draco OCI
    
    See merge request ADLR/megatron-lm!1137
    maanug-nv committed Feb 29, 2024
    Configuration menu
    Copy the full SHA
    17c487a View commit details
    Browse the repository at this point in the history

Commits on Mar 1, 2024

  1. Merge branch 'theoretical_memory_fix' into 'main'

    Print number of transformer and embedding parameters separately
    
    See merge request ADLR/megatron-lm!1159
    jaredcasper committed Mar 1, 2024
    Configuration menu
    Copy the full SHA
    3b0fcd1 View commit details
    Browse the repository at this point in the history
  2. Mcore LLaVA model

    trintamaki authored and jaredcasper committed Mar 1, 2024
    Configuration menu
    Copy the full SHA
    7bc3c74 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'trintamaki/llava-model-mr' into 'main'

    Mcore LLaVA model
    
    See merge request ADLR/megatron-lm!1151
    jaredcasper committed Mar 1, 2024
    Configuration menu
    Copy the full SHA
    d1acce3 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    80e180d View commit details
    Browse the repository at this point in the history
  5. Merge branch 'chenhany/ammo_ptq_example' into 'main'

    [OMNIML-614] AMMO ptq + TensorRT-LLM export examples for megatron-lm
    
    See merge request ADLR/megatron-lm!1013
    jaredcasper committed Mar 1, 2024
    Configuration menu
    Copy the full SHA
    36e9b6b View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2024

  1. Merge branch 'variable_ffn_size' into 'main'

    Make throughput and memory footprint formulae compatible with arbitrary ffn_hidden_size
    
    See merge request ADLR/megatron-lm!1169
    deepakn94 committed Mar 3, 2024
    Configuration menu
    Copy the full SHA
    0c1e53d View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2024

  1. Experimental Yaml configs

    wdykas authored and jaredcasper committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    47cb630 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'yaml' into 'main'

    Experimental Yaml configs
    
    See merge request ADLR/megatron-lm!1134
    jaredcasper committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    8957468 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2024

  1. MOE support

    jlamypoirier committed Mar 8, 2024
    Configuration menu
    Copy the full SHA
    63d9d3e View commit details
    Browse the repository at this point in the history
  2. stuff

    jlamypoirier committed Mar 8, 2024
    Configuration menu
    Copy the full SHA
    40a134a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1a96a99 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2024

  1. Configuration menu
    Copy the full SHA
    fdd668c View commit details
    Browse the repository at this point in the history
  2. Fix arg

    jlamypoirier committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    4238a80 View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2024

  1. fixes

    jlamypoirier committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    fe38434 View commit details
    Browse the repository at this point in the history

Commits on May 29, 2024

  1. fix

    jlamypoirier committed May 29, 2024
    Configuration menu
    Copy the full SHA
    3c6652e View commit details
    Browse the repository at this point in the history