Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Llama-3.2-11B-Vision-Instruct #1294

Open
AshviniKSharma opened this issue Nov 28, 2024 · 4 comments
Open

Issues with Llama-3.2-11B-Vision-Instruct #1294

AshviniKSharma opened this issue Nov 28, 2024 · 4 comments
Labels

Comments

@AshviniKSharma
Copy link

AshviniKSharma commented Nov 28, 2024

Describe the issue as clearly as possible:

Unable to work with Llama-3.2-11B-Vision-Instruct, getting CUDA Out of Bounds Issue

Steps/code to reproduce the bug:

# LLM stuff
import outlines
import torch
from transformers import AutoProcessor, MllamaForConditionalGeneration
from pydantic import BaseModel, Field
from typing import Literal, Optional, List

# Image stuff
from PIL import Image
import matplotlib.pyplot as plt
import json

from rich import print

model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model_class = MllamaForConditionalGeneration

model = outlines.models.transformers_vision(
    model_name,
    model_class=model_class,
    model_kwargs={
        "device_map": "cuda:0",
        "torch_dtype": torch.bfloat16,
    },
    processor_kwargs={
        "device": "cuda:0", # set to "cpu" if you don't have a GPU
    },
)


# Set up the content you want to send to the model
messages = [
    {
        "role": "user",
        "content": [
            {
                # The image is provided as a PIL Image object
                "type": "image",
                "image": image,
            },
            {
                "type": "text",
                "text": f"""You are an expert at extracting information from PAN Card.
                Please extract the information from the PAN Card. Be as detailed as possible --
                missing or misreporting information is a crime.

                Return the information in the following JSON schema:
                {Recipt.model_json_schema()}
            """},
        ],
    }
]

# Convert the messages to the final prompt
processor = AutoProcessor.from_pretrained(model_name)
prompt = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# Prepare a function to process receipts
receipt_summary_generator = outlines.generate.json(
    model,
    Recipt,

    # Greedy sampling is a good idea for numeric
    # data extraction -- no randomness.
    sampler=outlines.samplers.greedy()
)

# Generate the receipt summary
result = receipt_summary_generator(prompt, [image])
print(result)

Expected result:

Should print JSON

Error message:

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], [line 12](vscode-notebook-cell:?execution_count=11&line=12)
      [2](vscode-notebook-cell:?execution_count=11&line=2) receipt_summary_generator = outlines.generate.json(
      [3](vscode-notebook-cell:?execution_count=11&line=3)     model,
      [4](vscode-notebook-cell:?execution_count=11&line=4)     Recipt,
   (...)
      [8](vscode-notebook-cell:?execution_count=11&line=8)     # sampler=outlines.samplers.greedy()
      [9](vscode-notebook-cell:?execution_count=11&line=9) )
     [11](vscode-notebook-cell:?execution_count=11&line=11) # Generate the receipt summary
---> [12](vscode-notebook-cell:?execution_count=11&line=12) result = receipt_summary_generator(prompt, [image])
     [13](vscode-notebook-cell:?execution_count=11&line=13) print(result)

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:556, in VisionSequenceGeneratorAdapter.__call__(self, prompts, media, max_tokens, stop_at, seed, **model_specific_params)
    [550](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:550) prompts, media = self._validate_prompt_media_types(prompts, media)
    [552](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:552) generation_params = self.prepare_generation_parameters(
    [553](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:553)     max_tokens, stop_at, seed
    [554](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:554) )
--> [556](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:556) completions = self.model.generate(
    [557](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:557)     prompts,
    [558](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:558)     media,
    [559](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:559)     generation_params,
    [560](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:560)     copy(self.logits_processor),
    [561](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:561)     self.sampling_params,
    [562](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:562)     **model_specific_params,
    [563](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:563) )
    [565](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/generate/api.py:565) return self._format(completions)

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:56, in TransformersVision.generate(self, prompts, media, generation_parameters, logits_processor, sampling_parameters)
     [46](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:46) inputs = self.processor(
     [47](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:47)     text=prompts, images=media, padding=True, return_tensors="pt"
     [48](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:48) ).to(self.model.device)
     [50](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:50) generation_kwargs = self._get_generation_kwargs(
     [51](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:51)     prompts,
     [52](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:52)     generation_parameters,
     [53](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:53)     logits_processor,
     [54](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:54)     sampling_parameters,
     [55](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:55) )
---> [56](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:56) generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)
     [58](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:58) # if single str input and single sample per input, convert to a 1D output
     [59](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:59) if isinstance(prompts, str):
     [60](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers_vision.py:60)     # Should always be true until NotImplementedError above is fixed

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:350, in Transformers._generate_output_seq(self, prompts, inputs, generation_config, **generation_kwargs)
    [346](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:346) def _generate_output_seq(
    [347](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:347)     self, prompts, inputs, generation_config, **generation_kwargs
    [348](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:348) ):
    [349](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:349)     input_ids = inputs["input_ids"]
--> [350](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:350)     output_ids = self.model.generate(
    [351](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:351)         **inputs, generation_config=generation_config, **generation_kwargs
    [352](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:352)     )
    [354](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:354)     # encoder-decoder returns output_ids only, decoder-only returns full seq ids
    [355](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/models/transformers.py:355)     if self.model.config.is_encoder_decoder:

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    [113](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:113) @functools.wraps(func)
    [114](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:114) def decorate_context(*args, **kwargs):
    [115](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:115)     with ctx_factory():
--> [116](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:116)         return func(*args, **kwargs)

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2215, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   [2207](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2207)     input_ids, model_kwargs = self._expand_inputs_for_generation(
   [2208](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2208)         input_ids=input_ids,
   [2209](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2209)         expand_size=generation_config.num_return_sequences,
   [2210](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2210)         is_encoder_decoder=self.config.is_encoder_decoder,
   [2211](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2211)         **model_kwargs,
   [2212](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2212)     )
   [2214](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2214)     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> [2215](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2215)     result = self._sample(
   [2216](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2216)         input_ids,
   [2217](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2217)         logits_processor=prepared_logits_processor,
   [2218](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2218)         stopping_criteria=prepared_stopping_criteria,
   [2219](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2219)         generation_config=generation_config,
   [2220](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2220)         synced_gpus=synced_gpus,
   [2221](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2221)         streamer=streamer,
   [2222](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2222)         **model_kwargs,
   [2223](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2223)     )
   [2225](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2225) elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   [2226](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2226)     # 11. prepare beam search scorer
   [2227](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2227)     beam_scorer = BeamSearchScorer(
   [2228](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2228)         batch_size=batch_size,
   [2229](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2229)         num_beams=generation_config.num_beams,
   (...)
   [2234](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2234)         max_length=generation_config.max_length,
   [2235](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:2235)     )

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:3223, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   [3220](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:3220) next_token_logits = next_token_logits.to(input_ids.device)
   [3222](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:3222) # pre-process distribution
-> [3223](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:3223) next_token_scores = logits_processor(input_ids, next_token_logits)
   [3225](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:3225) # Store scores, attentions and hidden_states when required
   [3226](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/utils.py:3226) if return_dict_in_generate:

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/logits_process.py:104, in LogitsProcessorList.__call__(self, input_ids, scores, **kwargs)
    [102](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/logits_process.py:102)         scores = processor(input_ids, scores, **kwargs)
    [103](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/logits_process.py:103)     else:
--> [104](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/logits_process.py:104)         scores = processor(input_ids, scores)
    [106](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/transformers/generation/logits_process.py:106) return scores

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    [113](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:113) @functools.wraps(func)
    [114](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:114) def decorate_context(*args, **kwargs):
    [115](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:115)     with ctx_factory():
--> [116](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/torch/utils/_contextlib.py:116)         return func(*args, **kwargs)

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:88, in OutlinesLogitsProcessor.__call__(self, input_ids, logits)
     [86](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:86) # Guarantee passed as 2D Tensors, then covert back to original (1D or 2D) shape
     [87](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:87) if len(torch_logits.shape) == 2:
---> [88](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:88)     processed_logits = self.process_logits(input_ids, torch_logits)
     [89](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:89) elif len(torch_logits.shape) == 1:
     [90](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:90)     processed_logits = self.process_logits(
     [91](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:91)         input_ids.unsqueeze(0), torch_logits.unsqueeze(0)
     [92](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/base_logits_processor.py:92)     ).squeeze(0)

File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:121, in GuideLogitsProcessor.process_logits(self, input_ids, logits)
    [118](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:118) allowed_tokens_concat = torch.cat(allowed_tokens_batch)
    [119](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:119) batch_indices_concat = torch.cat(batch_indices)
--> [121](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:121) mask[batch_indices_concat, allowed_tokens_concat] = False
    [122](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:122) logits.masked_fill_(mask, float("-inf"))
    [124](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:124) return logits

RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


### Outlines/Python version information:

Version information
<details>

0.1.6
Python 3.9.20 (main, Oct 3 2024, 07:27:41)
[GCC 11.2.0]
accelerate==1.1.1
aiohappyeyeballs==2.4.3
aiohttp==3.11.8
aiosignal==1.3.1
airportsdata==20241001
annotated-types==0.7.0
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
async-timeout==5.0.1
attrs==24.2.0
backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
certifi==2024.8.30
charset-normalizer==3.4.0
cloudpickle==3.1.0
comm @ file:///croot/comm_1709322850197/work
contourpy==1.3.0
cycler==0.12.1
datasets==3.1.0
debugpy @ file:///croot/debugpy_1690905042057/work
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
dill==0.3.8
diskcache==5.6.3
exceptiongroup @ file:///croot/exceptiongroup_1706031385326/work
executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
filelock==3.16.1
fonttools==4.55.0
frozenlist==1.5.0
fsspec==2024.9.0
huggingface-hub==0.26.2
idna==3.10
importlib_metadata @ file:///croot/importlib_metadata-suite_1732633488278/work
importlib_resources==6.4.5
interegular==0.3.3
ipykernel @ file:///croot/ipykernel_1728665589812/work
ipython @ file:///croot/ipython_1694181358621/work
jedi @ file:///croot/jedi_1721058342488/work
Jinja2==3.1.4
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter_client @ file:///croot/jupyter_client_1699455897726/work
jupyter_core @ file:///croot/jupyter_core_1718818295206/work
kiwisolver==1.4.7
lark==1.2.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.9.2
matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
nest-asyncio @ file:///croot/nest-asyncio_1708532673751/work
networkx==3.2.1
numpy==2.0.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.1.105
outlines @ git+https://github.com/dottxt-ai/outlines.git@e9485cf2126d9c14bd749d55f8aa6729d96808d0
outlines_core==0.1.17
packaging @ file:///croot/packaging_1720101850331/work
pandas==2.2.3
parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
pillow==11.0.0
platformdirs @ file:///croot/platformdirs_1692205439124/work
prompt-toolkit @ file:///croot/prompt-toolkit_1704404351921/work
propcache==0.2.0
psutil @ file:///opt/conda/conda-bld/psutil_1656431268089/work
ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
pyarrow==18.1.0
pycountry==24.6.1
pydantic==2.10.2
pydantic_core==2.27.1
Pygments @ file:///croot/pygments_1684279966437/work
pyparsing==3.2.0
python-dateutil @ file:///croot/python-dateutil_1716495738603/work
pytz==2024.2
PyYAML==6.0.2
pyzmq @ file:///croot/pyzmq_1705605076900/work
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.21.0
safetensors==0.4.5
six @ file:///tmp/build/80754af9/six_1644875935023/work
stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
sympy==1.13.3
tokenizers==0.20.3
torch==2.4.0
tornado @ file:///croot/tornado_1718740109488/work
tqdm==4.67.1
traitlets @ file:///croot/traitlets_1718227057033/work
transformers==4.46.3
triton==3.0.0
typing_extensions @ file:///croot/typing_extensions_1715268824938/work
tzdata==2024.2
urllib3==2.2.3
wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
xxhash==3.5.0
yarl==1.18.0
zipp @ file:///croot/zipp_1732630741423/work

</details>


### Context for the issue:

_No response_
@rlouf
Copy link
Member

rlouf commented Nov 28, 2024

This does not seem to be an issue with outlines. If it is I will need more information.

@AshviniKSharma
Copy link
Author

Please let me know what information you need, as I am able to run LLaMA3.2 Vision without outlines without any issue. It would be great if you could provide a working example with LLAMA3.2 Vision, I will try it my end to check if everything is in check.

@cpfiffer
Copy link
Contributor

cpfiffer commented Dec 2, 2024

This suggests that the issue is likely due to your CUDA installation, which is infamously awful to deal with.


File ~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:121, in GuideLogitsProcessor.process_logits(self, input_ids, logits)
    [118](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:118) allowed_tokens_concat = torch.cat(allowed_tokens_batch)
    [119](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:119) batch_indices_concat = torch.cat(batch_indices)
--> [121](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:121) mask[batch_indices_concat, allowed_tokens_concat] = False
    [122](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:122) logits.masked_fill_(mask, float("-inf"))
    [124](https://vscode-remote+ssh-002dremote-002bphx-002dscavenge.vscode-resource.vscode-cdn.net/home/ashvshar/madrid/friendly/Outlines/~/miniconda3/envs/outlines/lib/python3.9/site-packages/outlines/processors/structured.py:124) return logits

RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I've found Perplexity to be pretty good at working through these issues, but I agree with Remi that this does not seem to be an Outlines issue.

@priyank9320
Copy link

Hi @cpfiffer, while trying to run the Llama-3.2-11B-Vision-Instruct with outlines on CPU i was able to get a better error msg:

IndexError: index 128256 is out of bounds for dimension 1 with size 128256

And it turns out that this Input Id refers to the Image token in Llama and it is not used in generating the output which is indicated by the logits shape, therefore can be filtered out. Not an issue with outlines, but gives error with the mentioned Llama model.

I have created a pull request: #1310
(but this way we will be filtering out the input id every time so maybe there is a better way to do it)

Let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants