Skip to content

Commit

Permalink
Merge pull request #108 from alan-turing-institute/openai-multimodal
Browse files Browse the repository at this point in the history
Add multimodal prompting with OpenAI and Azure OpenAI
  • Loading branch information
rchan26 authored Oct 29, 2024
2 parents 771be5f + b3a970a commit 83d54a6
Show file tree
Hide file tree
Showing 22 changed files with 1,147 additions and 21 deletions.
473 changes: 473 additions & 0 deletions examples/azure-openai/azure-openai-multimodal.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"id": 0, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": ["describe what is happening in this image", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 1, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": [{"type": "image_url", "image_url": "mortadella.jpg"}, "what is this?"]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 2, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": ["what is in this image?", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}, {"role": "assistant", "content": "This is image shows a group of cyclists."}, {"role": "user", "content": "are there any notable cyclists in this image? what are their names?"}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 3, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
Binary file added examples/azure-openai/data/media/mortadella.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"id": 0, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": ["describe what is happening in this image", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-13-10-48", "response": "The image shows a group of cyclists participating in a road cycling race. They are riding closely together in a single file along a paved path, next to a stone wall. The cyclists are wearing colorful team uniforms and helmets, and the bicycles are designed for racing, with drop handlebars and thin tires. The position of their bodies and the intense looks suggest they are pushing themselves, possibly during a climb or a sprint segment of the race. The background indicates they are in an urban or village setting, as evidenced"}
{"id": 1, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": [{"type": "image_url", "image_url": "mortadella.jpg"}, "what is this?"]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-13-10-50", "response": "This is mortadella, an Italian sausage or cold cut made of finely hashed or ground, heat-cured pork. Mortadella is typically cylindrical in shape and wrapped in a casing tied with string. It often contains small cubes of pork fat distributed throughout the meat, and may include other ingredients such as pistachios or black pepper. It is commonly sliced thin and served as a deli meat or in sandwiches."}
{"id": 2, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": ["what is in this image?", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}, {"role": "assistant", "content": "This is image shows a group of cyclists."}, {"role": "user", "content": "are there any notable cyclists in this image? what are their names?"}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-13-10-52", "response": "I do not recognize any notable cyclists in the image."}
{"id": 3, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": [{"type": "text", "text": "What\u2019s in this image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-13-10-54", "response": "The image depicts a serene natural landscape with a wooden boardwalk extending into the distance through a lush green field. The sky above is bright and blue with scattered, wispy clouds. The scene suggests a peaceful, inviting environment, likely part of a nature reserve or park. Trees and bushes are visible in the background, enhancing the image's natural beauty."}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"id": 0, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": ["describe what is happening in this image", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 1, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": [{"type": "image_url", "image_url": "mortadella.jpg"}, "what is this?"]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 2, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": ["what is in this image?", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}, {"role": "assistant", "content": "This is image shows a group of cyclists."}, {"role": "user", "content": "are there any notable cyclists in this image? what are their names?"}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 3, "api": "azure-openai", "model_name": "reginald-gpt-4o", "prompt": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
29-10-2024, 13:11: Completed experiment: azure-openai-multimodal-example.jsonl! Experiment processing time: 13.742 seconds, Average time per query: 3.436 seconds
10 changes: 9 additions & 1 deletion examples/gemini/gemini-multimodal.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,15 @@
"\n",
"An example of a multimodal prompt is the following:\n",
"```python\n",
"[{\"role\": \"user\", \"part\": [\"what is in this image?\", {\"type\": \"image\", \"media\": \"image.jpg\"}]}]\n",
"[\n",
" {\n",
" \"role\": \"user\",\n",
" \"part\": [\n",
" \"what is in this image?\",\n",
" {\"type\": \"image\", \"media\": \"image.jpg\"},\n",
" ]\n",
" },\n",
"]\n",
"```\n",
"\n",
"Here, we have a list of one dictionary where we specify the \"role\" as \"user\" and \"part\" as a list of two elements: the first is a string and the second is a dictionary specifying the type and media content of the part. In this case, the media content is a image file path.\n",
Expand Down
4 changes: 4 additions & 0 deletions examples/openai/data/input/openai-multimodal-example.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"id": 0, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": ["describe what is happening in this image", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 1, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": [{"type": "image_url", "image_url": "mortadella.jpg"}, "what is this?"]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 2, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": ["what is in this image?", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}, {"role": "assistant", "content": "This is image shows a group of cyclists."}, {"role": "user", "content": "are there any notable cyclists in this image? what are their names?"}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 3, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
Binary file added examples/openai/data/media/mortadella.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/openai/data/media/pantani_giro.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"id": 0, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": ["describe what is happening in this image", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-11-57-36", "response": "The image shows a group of cyclists participating in a road race. They are in motion, riding closely together along a roadside with a stone wall. Each cyclist wears a distinct team jersey and helmet, suggesting they are part of a professional cycling event. One cyclist is wearing a pink jersey, typically indicating the leader in certain stage races like the Giro d'Italia. The scene captures a moment of intense competition and teamwork."}
{"id": 1, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": [{"type": "image_url", "image_url": "mortadella.jpg"}, "what is this?"]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-11-57-38", "response": "This is a slice of mortadella, an Italian sausage made from finely ground pork, studded with small cubes of pork fat and sometimes flavored with spices and pistachios. The larger sausages are usually encased and tied in a rope netting."}
{"id": 2, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": ["what is in this image?", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}, {"role": "assistant", "content": "This is image shows a group of cyclists."}, {"role": "user", "content": "are there any notable cyclists in this image? what are their names?"}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-11-57-40", "response": "Sorry, I can't identify or provide names for the cyclists in this image."}
{"id": 3, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": [{"type": "text", "text": "What\u2019s in this image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}, "timestamp_sent": "29-10-2024-11-57-42", "response": "The image shows a wooden boardwalk path through a grassy field or wetland area. The sky is blue with some clouds, and there is lush green vegetation on either side of the path. The scene suggests a natural, serene environment, possibly in a park or nature reserve."}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"id": 0, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": ["describe what is happening in this image", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 1, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": [{"type": "image_url", "image_url": "mortadella.jpg"}, "what is this?"]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 2, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": ["what is in this image?", {"type": "image_url", "image_url": "pantani_giro.jpg"}]}, {"role": "assistant", "content": "This is image shows a group of cyclists."}, {"role": "user", "content": "are there any notable cyclists in this image? what are their names?"}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
{"id": 3, "api": "openai", "model_name": "gpt-4o", "prompt": [{"role": "user", "content": [{"type": "text", "text": "What’s in this image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}}]}], "parameters": {"n": 1, "temperature": 1, "max_tokens": 100}}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
29-10-2024, 11:57: Completed experiment: openai-multimodal-example.jsonl! Experiment processing time: 10.815 seconds, Average time per query: 2.704 seconds
Loading

0 comments on commit 83d54a6

Please sign in to comment.