Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mathvista PNG images are written with a .jpg suffix, causing API failures #483

Open
evanmiller-anthropic opened this issue Sep 23, 2024 · 10 comments

Comments

@evanmiller-anthropic
Copy link
Contributor

Running Mathvista against Anthropic APIs, I encounter the failure

│ BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message':                                          │
│ 'messages.0.content.1.image.source.base64.data: The image was specified using the image/jpeg media type,                                            │
│ but does not appear to be a valid jpeg image'}}

It appears that Mathvista images are a mix of JPEG and PNG files, but they are all saved locally with extension .jpg. Inspect's MIME inference logic then reports them to the Anthropic API as JPEG files, causing 400 Bad Request failures when PNGs are encountered.

@sudhir-b
Copy link
Contributor

I'd love to have a crack at this as a first-time contributor if that's okay! I had a brief look, and it seems like it might be straightforward to save the images locally with the correct file extensions with a small change to evals/mathvista/mathvista.py:

@@ -1,3 +1,4 @@
+import imghdr
 import re
 from pathlib import Path
 
@@ -114,12 +115,19 @@ def mathvista_solver() -> Solver:
 
 def record_to_sample(record: dict) -> Sample:
     # extract image
-    image = Path(record["image"])
+    image_bytes = record["decoded_image"]["bytes"]
+    image_type = imghdr.what(None, h=image_bytes)
+    original_path = Path(record["image"])
+    file_extension = (
+        f".{image_type}" if image_type is not None else original_path.suffix
+    )
+    image = original_path.with_suffix(file_extension)
+
     if not image.exists():
         print(f"Extracting {image}")
         image.parent.mkdir(exist_ok=True)
         with open(image, "wb") as file:
-            file.write(record["decoded_image"]["bytes"])
+            file.write(image_bytes)
 
     message: list[ChatMessage] = [
         ChatMessageUser(

However, imghdr has been marked for deprecation. Two reasonable alternatives seem to be python-magic and filetype. I'd be more than happy to try to submit a PR for this but would appreciate any and all guidance from core contributors.

@jjallaire-aisi
Copy link
Collaborator

Hi @sudhir-b, yes, it would be great if you took a crack at this! I would suggest the filetype library as it has no external dependencies. One note: when doing this you should add a requirements.txt file to the folder listing filetype. We will soon be turning evals into a package and may pick this up as a package dependency (in either case the requirements.txt will serve as documentation).

@evanmiller-anthropic
Copy link
Contributor Author

For a dependency-free solution, you could check the first 8 bytes for the static PNG header

https://en.wikipedia.org/wiki/PNG#File_header

@sudhir-b
Copy link
Contributor

sudhir-b commented Sep 24, 2024

I had assumed that's what the filetype package did but in fact it only checks the first 4 bytes:
https://github.com/h2non/filetype.py/blob/0c7f219ea20a50b636c4a279af8694b0edf8419c/filetype/types/image.py#L135

I'm happy to do either implementation: using filetype or looking at the raw bytes.

@jjallaire-aisi
Copy link
Collaborator

Let's just look at the raw bytes.

@evanmiller-anthropic
Copy link
Contributor Author

I'm still seeing this error with the merged changes. It appears that line 163 needs to be modified to point to the written PNG files, rather than the original JPEG.

files={f"image:{record['image']}": record["image"]},

@jjallaire-aisi
Copy link
Collaborator

Interestingly that line is exactly unneeded (that's for copying files to a docker container): 3f511de

I am still seeing this w/ Sonnet 3.5:

{'type': 'error', 'error': {'type': 'invalid_request_error',                               
'message': 'messages.0.content.1.image.source.base64: image exceeds 5 MB maximum: 6415740 bytes >                                 
 5242880 bytes'}}                    

So I think we need another image reduction pass here.

@jjallaire-aisi
Copy link
Collaborator

@evanmiller-anthropic I would defer to you on what you think the right heuristics are for reducing images in this dataset (i.e. we probably can target going well below 5MB but I'm not sure what the optimal target is)

@evanmiller-anthropic
Copy link
Contributor Author

Hmm, I wonder why it still thinks bad JPEGs are being provided – I will need to investigate more.

@jjallaire-aisi I think same heuristic I added to MMMU? 1024 pixels per side? Anthropic endpoints have a pixel limit of 1.15 megapixels

https://github.com/UKGovernmentBEIS/inspect_ai/pull/482/files

@jjallaire-aisi
Copy link
Collaborator

Okay I added the reduction here: b253aba

I am still seeing this happen periodically though:

Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error',                                   │
│ 'message': 'messages.0.content.1.image.source.base64.data: The image was specified using the                                      │
│ image/jpeg media type, but does not appear to be a valid jpeg image'}} 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants