-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2-VL-7B-Instruct-4bit allocation crashes on larger dimension images #79
Comments
🙌🙌 Thanks for the quick response @Blaizzy
Yes, something to this effect. I think the following example client flow could be desirable (if possible of course):
Then either:
OR
What do you think? I'm not exactly sure how easy it is to get step 2 working. I'd also imagine 3a to be an easier implementation than 3b. |
I'm not sure about step 2 either. Let me check. |
Could you try to run the model in a Python file like this: import sys
def main():
try:
import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model_path = "mlx-community/Qwen2-VL-7B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
# Generate output
output = generate(model, processor, image, formatted_prompt, verbose=False)
print(output)
# This is a placeholder to simulate the error:
except MemoryError as e:
print(f"Memory allocation error: {e}", file=sys.stderr)
print("The program attempted to allocate more memory than available or allowed.", file=sys.stderr)
print("Consider reducing the size of your input or using a machine with more memory.", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"An unexpected error occurred: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main() |
Unfortunately the above still results in the From limited research, it seems like the crash comes from a C++ error that isn't caught and propagated to python properly, so Interestingly enough though, I was able to avoid the attempted large allocation at mlx-vlm/mlx_vlm/models/qwen2_vl/qwen2_vl.py Lines 70 to 78 in d4b562f
_merge_input_ids_with_image_features :
to effectively break down whatever large allocation numpy attempts in lines:
But then a similar allocation attempt occurs in the first |
I think my personal takeaways are:
Seems totally reasonable to me to leave this scope out of #83 though |
Interesting! Thanks for the update. I have one question. Are you making requests in batch? if so what is the use case? |
Not currently making requests in batch! Sorry, I could have expressed my thoughts around the "batching" thing more clearly. Its more about the potential for breaking down large allocations into smaller batches, so that a single large request can succeed. My chain of thought:
Does that clarify? Maybe it's not possible for some reason that I do not yet know - just an ideation |
Could you share the method / reproducible example you used to indetifiy |
Certainly! If I add:
at the top of
I see:
With the edition of my rough batched implementation of _merge_input_ids_with_image_features AND changing https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/models/qwen2_vl/language.py#L185 to use mx.tile instead of np.tile (or else it crashes there too)
|
(Please feel free to let me know if you know of better ways to trace) |
Seems like there is some relationship between large allocations and |
I have the same issue. When I use a picture with the resolution as 4032*3024, it gave me the following error:
21743271936 bytes is 247GB, I don't think processing this single image will consume so much memory. There must be something wrong with the calculation. I am using the mlx-vlm version 0.1.0. |
Ohh yeah, I found the issue here #84. Upgrade your MLX version to the latest and let me know if it solves it.
|
And yeah, that large of an image is not a good idea. Try passing It will be faster and less resource intensive |
The mlx version is 0.19.0, and it is the latest one as of this writing. The --resize-shape works perfectly, thank you, @Blaizzy |
My pleasure! |
Qwen2-VL-7B-Instruct-4bit
crashes on memory allocation errors on images with larger dimensions.My machine: Apple M3 Pro, 36 GB RAM
Error production below is with an image of dimensions:
1978 × 2806
. With an image of dimension1278 × 816
, I do not see the crash.Ideally I'm wondering if there is a way to do some combination of:
maximum allowed buffer size
, and if it is larger raise an exception instead of attempting and crashing the processSteps to reproduce
Image used for this test
Dimensions:
1978 × 2806
The text was updated successfully, but these errors were encountered: