Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

๐ŸŒ [i18n-KO] Translated model_doc/paligemma.md to Korean #33612

Merged
merged 7 commits into from
Oct 9, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions docs/source/ko/model_doc/paligemma.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,22 @@ rendered properly in your Markdown viewer.

# PaliGemma[[paligemma]]

## Overview[[overview]]
## ์˜ค๋ฒ„๋ทฐ[[overview]]
fabxoe marked this conversation as resolved.
Show resolved Hide resolved
fabxoe marked this conversation as resolved.
Show resolved Hide resolved

The PaliGemma model was proposed in [PaliGemma โ€“ Google's Cutting-Edge Open Vision Language Model](https://huggingface.co/blog/paligemma) by Google. It is a 3B vision-language model composed by a [SigLIP](siglip) vision encoder and a [Gemma](gemma) language decoder linked by a multimodal linear projection. It cuts an image into a fixed number of VIT tokens and prepends it to an optional prompt. One particularity is that the model uses full block attention on all the image tokens plus the input text tokens. It comes in 3 resolutions, 224x224, 448x448 and 896x896 with 3 base models, with 55 fine-tuned versions for different tasks, and 2 mix models.
PaliGemma ๋ชจ๋ธ์€ ๊ตฌ๊ธ€์ด ์ œ์•ˆํ•œ [PaliGemma โ€“ Google์˜ ์ตœ์ฒจ๋‹จ ์˜คํ”ˆ ๋น„์ „ ์–ธ์–ด ๋ชจ๋ธ](https://huggingface.co/blog/paligemma)๋กœ ์†Œ๊ฐœ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. PaliGemma๋Š” [SigLIP](siglip) ๋น„์ „ ์ธ์ฝ”๋”์™€ [Gemma](gemma) ์–ธ์–ด ์ธ์ฝ”๋”๋กœ ๊ตฌ์„ฑ๋œ 3B ๊ทœ๋ชจ์˜ ๋น„์ „-์–ธ์–ด ๋ชจ๋ธ๋กœ, ๋‘ ์ธ์ฝ”๋”๊ฐ€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์„ ํ˜• ๋‘์˜์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋ฅผ ๊ณ ์ •๋œ ์ˆ˜์˜ VITํ† ํฐ์œผ๋กœ ๋ถ„ํ• ํ•˜๊ณ  ์ด๋ฅผ ์„ ํƒ์  ํ”„๋กฌํ”„ํŠธ ์•ž์— ์ถ”๊ฐ€ ํ•˜๋ฉฐ, ๋ชจ๋“  ์ด๋ฏธ์ง€ ํ† ํฐ๊ณผ ์ž…๋ ฅ ํ…์ŠคํŠธ ํ† ํฐ์— ๋Œ€ํ•ด ์ „์ฒด ๋ธ”๋ก ์–ดํ…์…˜์„ ์‚ฌ์šฉํ•˜๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
fabxoe marked this conversation as resolved.
Show resolved Hide resolved

PaliGemma๋Š” 224x224, 448x448, 896x896์˜ 3๊ฐ€์ง€ ํ•ด์ƒ๋„๋กœ ์ œ๊ณต๋˜๋ฉฐ, 3๊ฐœ์˜ ๊ธฐ๋ณธ ๋ชจ๋ธ๊ณผ 55๊ฐœ์˜ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ๋Œ€ํ•ด ๋ฏธ์„ธ ์กฐ์ •๋œ ๋ฒ„์ „, ๊ทธ๋ฆฌ๊ณ  2๊ฐœ์˜ ํ˜ผํ•ฉ ๋ชจ๋ธ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/paligemma/paligemma_arch.png"
alt="drawing" width="600"/>

<small> PaliGemma architecture. Taken from the <a href="https://huggingface.co/blog/paligemma">blog post.</a> </small>
<small> PaliGemma ์•„ํ‚คํ…์ฒ˜ <a href="https://huggingface.co/blog/paligemma">๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ.</a> </small>

This model was contributed by [Molbap](https://huggingface.co/Molbap).
์ด ๋ชจ๋ธ์€ [Molbap](https://huggingface.co/Molbap)์— ์˜ํ•ด ๊ธฐ์—ฌ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

## Usage tips[[usage-tips]]
## ์‚ฌ์šฉํŒ[[usage-tips]]
fabxoe marked this conversation as resolved.
Show resolved Hide resolved

Inference with PaliGemma can be performed as follows:
PaliGemma์˜ ์ถ”๋ก ์€ ๋‹ค์Œ์ฒ˜๋Ÿผ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PaliGemma์˜ ์ถ”๋ก ์€ ๋‹ค์Œ์ฒ˜๋Ÿผ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค:
PaliGemma์˜ ์ถ”๋ก  ์ˆ˜ํ–‰


```python
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
Expand All @@ -47,22 +49,22 @@ output = model.generate(**inputs, max_new_tokens=20)
print(processor.decode(output[0], skip_special_tokens=True)[len(prompt):])
```

- PaliGemma is not meant for conversational use, and it works best when fine-tuning to a specific use case. Some downstream tasks on which PaliGemma can be fine-tuned include image captioning, visual question answering (VQA), object detection, referring expression segmentation and document understanding.
- One can use `PaliGemmaProcessor` to prepare images, text and optional labels for the model. When fine-tuning a PaliGemma model, the `suffix` argument can be passed to the processor which creates the `labels` for the model:
- PaliGemma๋Š” ๋Œ€ํ™”์šฉ์œผ๋กœ ์„ค๊ณ„๋˜์ง€ ์•Š์•˜์œผ๋ฉฐ, ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋Œ€ํ•ด ๋ฏธ์„ธ ์กฐ์ •ํ•  ๋•Œ ๊ฐ€์žฅ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. PaliGemma๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ํ•˜์œ„ ์ž‘์—…์—๋Š” ์ด๋ฏธ์ง€ ์บก์…”๋‹, ์‹œ๊ฐ์  ์งˆ๋ฌธ ๋‹ต๋ณ€(VQA), ์˜ค๋ธŒ์ ํŠธ ๋””ํ…์…˜, ์ฐธ์กฐ ํ‘œํ˜„ ๋ถ„ํ•  ๋ฐ ๋ฌธ์„œ ์ดํ•ด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
- `PaliGemmaProcessor`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์— ํ•„์š”ํ•œ ์ด๋ฏธ์ง€, ํ…์ŠคํŠธ ๋ฐ ์„ ํƒ์  ๋ ˆ์ด๋ธ”์„ ์ค€๋น„ํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. PaliGemma ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•  ๋•Œ๋Š”, ํ”„๋กœ์„ธ์„œ์— `suffix`์ธ์ž๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ๋‹ค์Œ ์ฒ˜๋Ÿผ ๋ชจ๋ธ์˜ `labels`๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
fabxoe marked this conversation as resolved.
Show resolved Hide resolved

```python
prompt = "What is on the flower?"
answer = "a bee"
inputs = processor(images=raw_image, text=prompt, suffix=answer, return_tensors="pt")
```

## Resources[[resources]]
## ๋ฆฌ์†Œ์Šค[[resources]]
fabxoe marked this conversation as resolved.
Show resolved Hide resolved
fabxoe marked this conversation as resolved.
Show resolved Hide resolved

A list of official Hugging Face and community (indicated by ๐ŸŒŽ) resources to help you get started with PaliGemma. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
PaliGemma๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” Hugging Face์™€ community ์ž๋ฃŒ ๋ชฉ๋ก(๐ŸŒŽ๋กœ ํ‘œ์‹œ๋จ) ์ž…๋‹ˆ๋‹ค.์—ฌ๊ธฐ์— ํฌํ•จ๋  ์ž๋ฃŒ๋ฅผ ์ œ์ถœํ•˜๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด PR(Pull Request)๋ฅผ ์—ด์–ด์ฃผ์„ธ์š”. ๋ฆฌ๋ทฐ ํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค! ์ž๋ฃŒ๋Š” ๊ธฐ์กด ์ž๋ฃŒ๋ฅผ ๋ณต์ œํ•˜๋Š” ๋Œ€์‹  ์ƒˆ๋กœ์šด ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

- A blog post introducing all the features of PaliGemma can be found [here](https://huggingface.co/blog/paligemma).
- Demo notebooks on how to fine-tune PaliGemma for VQA with the Trainer API along with inference can be found [here](https://github.com/huggingface/notebooks/tree/main/examples/paligemma).
- Demo notebooks on how to fine-tune PaliGemma on a custom dataset (receipt image -> JSON) along with inference can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/PaliGemma). ๐ŸŒŽ
- PaliGemma์˜ ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ์†Œ๊ฐœํ•˜๋Š” ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ๋Š” [์ด๊ณณ](https://huggingface.co/blog/paligemma)์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐ŸŒŽ
- Trainer API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ VQA(Visual Question Answering)๋ฅผ ์œ„ํ•ด PaliGemma๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ฐ๋ชจ ๋…ธํŠธ๋ถ์€ [์ด๊ณณ](https://github.com/huggingface/notebooks/tree/main/examples/paligemma)์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐ŸŒŽ
- ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ์ดํ„ฐ์…‹(์˜์ˆ˜์ฆ ์ด๋ฏธ์ง€ -> JSON)์— ๋Œ€ํ•ด PaliGemma๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ฐ๋ชจ ๋…ธํŠธ๋ถ์€ [์ด๊ณณ](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/PaliGemma)์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐ŸŒŽ

## PaliGemmaConfig[[transformers.PaliGemmaConfig]]

Expand Down