You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been playing around with your awesome implementation and found the following bug:
When I call LLaMaGenerate() with prompts longer than 511 tokens (might also be character limit, I just used token counting for simplicity), the subsequent call to Int4llamaDecoder::forward() causes a segmentation fault upon creation/allocation of inputs_embeds_buf on line 71 in llama/TinyChatEngine/llm/src/nn_modules/non_cuda/Int4llamaDecoder.cc.
I believe the issue lies in the stack allocation which might get too large for some prompts. On most systems the stack growth is limited. A stack allocation of a few MBytes can be too large, which is the case here and causes the segmentation fault.
I have the following fix: Instead define a vector with the specified size (I think this will be on the heap) like such std::vector<float> inputs_embeds_buf_vec(sqlen * this->embed_dim); and pass the data pointer to the Matrix3D<float> object in the next line Matrix3D<float> inputs_embeds(inputs_embeds_buf_vec.data(), 1, sqlen, this->embed_dim);.
It has worked in my test cases as of right now. Should I post a pull-request?
Edit: I don't know how reproducible this is, as the stack growth limit is architecture dependent according to this stack overflow comment: https://stackoverflow.com/a/1826072
The text was updated successfully, but these errors were encountered:
I have been playing around with your awesome implementation and found the following bug:
When I call
LLaMaGenerate()
with prompts longer than 511 tokens (might also be character limit, I just used token counting for simplicity), the subsequent call toInt4llamaDecoder::forward()
causes a segmentation fault upon creation/allocation ofinputs_embeds_buf
on line 71 in llama/TinyChatEngine/llm/src/nn_modules/non_cuda/Int4llamaDecoder.cc.I believe the issue lies in the stack allocation which might get too large for some prompts. On most systems the stack growth is limited. A stack allocation of a few MBytes can be too large, which is the case here and causes the segmentation fault.
I have the following fix: Instead define a vector with the specified size (I think this will be on the heap) like such
std::vector<float> inputs_embeds_buf_vec(sqlen * this->embed_dim);
and pass the data pointer to theMatrix3D<float>
object in the next lineMatrix3D<float> inputs_embeds(inputs_embeds_buf_vec.data(), 1, sqlen, this->embed_dim);
.It has worked in my test cases as of right now. Should I post a pull-request?
Edit: I don't know how reproducible this is, as the stack growth limit is architecture dependent according to this stack overflow comment: https://stackoverflow.com/a/1826072
The text was updated successfully, but these errors were encountered: