You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the latest AWS Neuron SDK 2.18.1 release, the transformers-neuronx package has been updated to a new version 0.10.0.360 whose code is not available in this repository at the moment.
One of the change is to 'fix' continuous batching, but it actually breaks the Mixtral model.
The symptom is that the first call to forward after encoding fails with:
defforward(self, input_ids, cache_ids=None, start_ids=None):
# Compute the window starting index for specific mask patterns # For other patterns we pass in a default value of 0, it won't be used >curr_window_start=max(0, self.num_processed_tokens-self.config.window_size) ifself.config.window_sizeelse0ERuntimeError: BooleanvalueofTensorwithmorethanonevalueisambiguous
The root cause is a modification in the base.py file, method _prepare_for_par_ctx_rhs_padding line 265.
The last_token_id returned value used to be a scalar, but can now be a vector. This leads to self.numprocessed_tokens to also become a vector, which causes the error.
The text was updated successfully, but these errors were encountered:
Then Mistral and Mixtral are actually not supported, because static batching with padding (the alternative to continuous batching) has been broken for all models since the introduction of continuous batching: #79. Or has it been fixed ?
We had the 2.19 release going out this week. With this new release we have now added support for Mistral. Support for Mixtral would be added in one of the upcoming releases.
In the latest AWS Neuron SDK 2.18.1 release, the
transformers-neuronx
package has been updated to a new version0.10.0.360
whose code is not available in this repository at the moment.One of the change is to 'fix' continuous batching, but it actually breaks the Mixtral model.
The symptom is that the first call to
forward
after encoding fails with:The root cause is a modification in the
base.py
file, method_prepare_for_par_ctx_rhs_padding
line 265.The
last_token_id
returned value used to be a scalar, but can now be a vector. This leads to self.numprocessed_tokens to also become a vector, which causes the error.The text was updated successfully, but these errors were encountered: