-
Notifications
You must be signed in to change notification settings - Fork 12.4k
model : add PLaMo-2 model #14560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+1,048
−44
Merged
model : add PLaMo-2 model #14560
Changes from all commits
Commits
Show all changes
80 commits
Select commit
Hold shift + click to select a range
271104c
wip: llama : separate recurrent states from the KV cache
compilade 8db1e4d
llama : use std::find for seq_nodes in llama_rs_cache
compilade 0028010
llama : state checkpoints for recurrent models
compilade 0c8b3b2
llama : correctly handle more edge cases for the rs cache
compilade d66849f
Merge branch 'master' into compilade/refactor-kv-cache
compilade a09db95
llama : rename many llama_kv_cache_* functions
compilade c460ff1
Merge branch 'master' into compilade/refactor-kv-cache
compilade b6fafd1
llama : remove useless return value for some llama_cache_* functions
compilade b7ec12e
Merge branch 'master' into compilade/refactor-kv-cache
compilade 3b57b55
Merge branch 'master' into compilade/refactor-kv-cache
compilade 7e13f19
llama : rethink recurrent state cell counts
compilade cbc743e
llama : support Jamba
compilade 0fd13e9
Merge branch 'master' into compilade/refactor-kv-cache
compilade 61a88a1
llama : fix BERT inference without KV cache
compilade ea2e63e
convert-hf : check for unprocessed Jamba experts
compilade fc59407
convert-hf : support Mini-Jamba conversion
compilade 181dadf
llama : fix Jamba quantization sanity checks
compilade 3a414b0
llama : sequence-length-aware batch splitting
compilade 4e4c41e
Merge branch 'master' into compilade/refactor-kv-cache
compilade 3587a94
llama : use equal-sequence-length sub-batches for recurrent models
compilade 5d3c7b9
Merge branch 'master' into compilade/refactor-kv-cache
compilade 72eea49
llama : fix batch split output count for embeddings
compilade 18d1c14
llama : minimize swaps when reordering logits
compilade 61200ef
llama : fix edge case finding batch seq_id of split recurrent cell
compilade eb589d5
llama : avoid copies for simple batch splits
compilade 8fb57ac
llama : use im2col and mul_mat to perform convolution for Mamba
compilade 17f6c1e
llama : fix .base() compilation error on Windows
compilade fee3c1d
llama : allow doing the equivalent of SSM_CONV with SUM_ROWS and MUL
compilade 6840ac0
Merge branch 'master' into compilade/refactor-kv-cache
compilade 372482d
llama : rename llama_cache to llama_past
compilade 43d8d4b
examples : replace llama_kv_cache_seq_* with llama_past_seq_*
compilade ff794f5
Merge branch 'master' into compilade/refactor-kv-cache
compilade 33425a7
mamba : fix non-contiguous usage of ggml_silu
compilade 10c3c41
Merge branch 'master' into compilade/refactor-kv-cache
compilade 9b38f8b
Merge branch 'master' into compilade/refactor-kv-cache
compilade bc320ef
Merge branch 'master' into compilade/refactor-kv-cache
compilade fcb889c
llama : session saving and reloading for hybrid models
compilade a03e32a
Merge branch 'master' into compilade/refactor-kv-cache
compilade 9d3f44d
convert_hf : fix Jamba conversion
compilade 5f62db7
llama : fix mixed signedness comparison
compilade 375de5b
llama : use unused n_embd_k_gqa in k_shift
compilade 4bb4b22
llama : begin renaming llama_past back to llama_kv_cache
compilade 63ac36b
Merge branch 'master' into compilade/refactor-kv-cache
compilade 124c222
Merge branch 'master' into compilade/refactor-kv-cache
compilade 8006f3b
llama : remove implicit recurrent state rollbacks
compilade 691698e
Merge branch 'master' into compilade/refactor-kv-cache
compilade e3fe612
llama : partially apply clang-format style
compilade 2bcaf64
Merge branch 'master' into compilade/refactor-kv-cache
compilade 908e655
convert : fix jamba conv1d shape squeezing
compilade 4682e21
Merge branch 'master' into compilade/refactor-kv-cache
compilade 20f8e43
graph : add back hybrid memory graph input
compilade 07c252f
model : add Jamba to Mamba-specific hparams printing
compilade f716358
Merge branch 'master' into compilade/refactor-kv-cache
compilade f656712
Add PLaMo-2 model using hybrid memory module
mitmul 4728e42
Fix z shape
mitmul 6acaf3c
Add cmath to include from llama-vocab.h
mitmul 7e4c5ec
Explicitly dequantize normalization weights before RoPE apply
mitmul 149b98c
Revert unnecessary cast because the problem can be solved by excludin…
mitmul 7786520
Use ATTN_K/Q_NORM for k,q weights to prevent quantization
mitmul 0424a76
Remove SSM_BCDT that is not used from anywhere
mitmul ea95a1d
Do not duplicate embedding weights for output.weight
mitmul 2d76b21
Fix tokenizer encoding problem for multibyte strings
mitmul fccec6d
Merge remote-tracking branch 'upstream/master' into mitmul/add-plamo2
mitmul 5231e4f
Merge branch 'master' into mitmul/add-plamo2
mitmul 521c1e0
Apply suggestion from @CISC
mitmul df95fce
Update src/llama-model.cpp
mitmul 498b8b3
Use LLM_FFN_SWIGLU instead of splitting ffn_gate and ffn_up
mitmul 6afd3be
Remove unnecessary part for Grouped Query Attention
mitmul 34360eb
Fix how to load special token id to gguf
mitmul 71abd3a
Remove unused tensor mapping
mitmul fb2ae69
Update src/llama-model.cpp
mitmul eea696e
Remove llama_vocab_plamo2 class and replace it with llm_tokenizer_pla…
mitmul 841ffc8
Update src/llama-vocab.cpp
mitmul 35d8188
Update convert_hf_to_gguf.py
mitmul d134e7f
Update src/llama-model.cpp
mitmul 921e864
Update src/llama-model.cpp
mitmul f87ac1c
Merge remote-tracking branch 'upstream/master' into mitmul/add-plamo2
mitmul 7b0b2ea
Update convert_hf_to_gguf.py
mitmul b42f95d
Update convert_hf_to_gguf.py
mitmul 6921534
Fix plamo2 tokenizer session to prevent multiple calls of build()
mitmul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.