Skip to content

Commit

Permalink
Merge branch 'main' into Socvest-main
Browse files Browse the repository at this point in the history
# Conflicts:
#	README.md
  • Loading branch information
lfoppiano committed Jan 4, 2025
2 parents 424e90d + 33d60dc commit 62ddf0f
Show file tree
Hide file tree
Showing 4 changed files with 95 additions and 28 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.0.19"
current_version = "0.0.20"
commit = "true"
tag = "true"
tag_name = "v{new_version}"
Expand Down
82 changes: 61 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@ You can see an [application](https://github.com/lfoppiano/structure-vision) in a
## Features
- Show PDF files in a Streamlit application with a simple command
- Based on the pdf.js library
- Support showing the PDF with the native pdf.js browser's viewer: "legacy" (with limitations, no annotations, no scrolling, etc..)
- Visualize annotations on top of the PDF documents
- Render text on top of the PDF document, allowing copy-paste
- Allow to render specific pages of the PDF document
- Scroll to a specific page
- Scroll to a specific annotation
- Allow custom callbacks when an annotation is clicked
- Additional support showing PDF documents using the native pdf.js browser's viewer: "legacy" (with limitations, no annotations, no scrolling, etc..)

## Limitations
- Tested and developed to support Firefox and Chrome.
Expand All @@ -28,6 +29,16 @@ You can see an [application](https://github.com/lfoppiano/structure-vision) in a
- The component is still in development, so expect some bugs and limitations
- The streamlit reload at each action may render slowly for complex PDF documents

## Caveats
Here some caveats to be aware of:
- It ss mandatory to specify a `width` to show PDF document on tabs and expanders, otherwise, the viewer will not be displayed on tabs not immediately visible.
- From version 0.0.16, the behavior for managing width and height has changed:
- If only the height is specified, the PDF document will be shown in proportion with the with proportional based on the PDF dimensions.
- The possibility to show a large view of half the PDF is not available anymore (let's face it, it was not very useful).
- If you need to use all the available space and limit the height, you can encapsulate the `pdf_viewer()` into a `st.component(width:...)` setting the width.
- The `legacy` rendering is not supported on Chrome, due to security reasons.



## Getting started

Expand All @@ -44,31 +55,24 @@ from streamlit_pdf_viewer import pdf_viewer
pdf_viewer("str, path or bytes")
```

### Caveats

Here some caveats to be aware of:
- Is mandatory to specify a `width` to show PDF document on tabs and expanders, otherwise, the viewer will not be displayed on tabs not immediately visible.
- The `legacy` rendering is not supported on Chrome, due to security reasons.


### Params

In the following table the list of parameters that can be provided to the `pdf_viewer` function:

| name | description |
|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| input | The source of the PDF file. Accepts a file path, URL, or binary data. |
| name | description |
|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| input | The source of the PDF file. Accepts a file path, URL, or binary data. |
| width | Width of the PDF viewer in pixels. It defaults to 700 pixels. Can set it to string percentage also. For example 90% which will make the pdf render to 90% of the container/window/screen width. If the pdf width is larger than the screen width, it will horizontally scroll. |
| height | Height of the PDF viewer in pixels. If not provided, the viewer shows the whole content. |
| annotations | A list of annotations to be overlaid on the PDF. Format is described [here](#annotation-format). |
| pages_vertical_spacing | The vertical space (in pixels) between each page of the PDF. Defaults to 2 pixels. |
| annotation_outline_size | Size of the outline around each annotation in pixels. Defaults to 1 pixel. |
| rendering | Type of rendering: `unwrap` (default), `legacy_iframe`, or `legacy_embed`. The default value, `unwrap` shows the PDF document using pdf.js, and supports the visualisation of annotations. Other values are `legacy_iframe` and `legacy_embed` which use the legacy approach of injecting the document into an `<embed>` or `<iframe>`. They allow viewing the PDF using the viewer of the browser that contains additional features we are still working to implement in this component. **IMPORTANT**: :warning: The "legacy" methods **work only with Firefox**, and **do not support annotations**. :warning: |
| pages_to_render | Filter the rendering to a specific set of pages. By default, all pages are rendered. |
| render_text | Enable a layer of text on top of the PDF document. The text may be selected and copied. **NOTE** to avoid breaking existing deployments, we made this optional at first, also considering that having many annotations might interfere with the copy-paste.
| scroll_to_page | Scroll to a specific page when the component is rendered. Default is None. Require ints and ignores the parameters below zero. |
| scroll_to_annotation | Scroll to a specific annotation when the component is rendered. Default is None. Mutually exclusive with `scroll_to_page`. Raise an exception if used with `scroll_to_page` |

| height | Height of the PDF viewer in pixels. If not provided, the viewer shows the whole content. |
| annotations | A list of annotations to be overlaid on the PDF. Format is described [here](#annotation-format). |
| pages_vertical_spacing | The vertical space (in pixels) between each page of the PDF. Defaults to 2 pixels. |
| annotation_outline_size | Size of the outline around each annotation in pixels. Defaults to 1 pixel. |
| rendering | Type of rendering: `unwrap` (default), `legacy_iframe`, or `legacy_embed`. The default value, `unwrap` shows the PDF document using pdf.js, and supports the visualisation of annotations. Other values are `legacy_iframe` and `legacy_embed` which use the legacy approach of injecting the document into an `<embed>` or `<iframe>`. They allow viewing the PDF using the viewer of the browser that contains additional features we are still working to implement in this component. **IMPORTANT**: :warning: The "legacy" methods **work only with Firefox**, and **do not support annotations**. :warning: |
| pages_to_render | Filter the rendering to a specific set of pages. By default, all pages are rendered. |
| render_text | Enable a layer of text on top of the PDF document. The text may be selected and copied. **NOTE** to avoid breaking existing deployments, we made this optional at first, also considering that having many annotations might interfere with the copy-paste. |
| scroll_to_page | Scroll to a specific page when the component is rendered. Default is None. Require ints and ignores the parameters below zero. |
| scroll_to_annotation | Scroll to a specific annotation when the component is rendered. Default is None. Mutually exclusive with `scroll_to_page`. Raise an exception if used with `scroll_to_page` |
| on_annotation_click | Callback function that is called when an annotation is clicked. The function receives the annotation as a parameter. |

### Annotation format
The annotations format has been derived from the [Grobid's coordinate formats](https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/), which are described as a list of "bounding boxes".
Expand All @@ -91,6 +95,42 @@ Here an example:

The example shown in our screenshot can be found [here](resources/annotations.json).

### Custom callback for clicking on annotations

```python
from streamlit_pdf_viewer import pdf_viewer

annotations = [
{
"page": 1,
"x": 220,
"y": 155,
"height": 22,
"width": 65,
"color": "red"
},
{
"page": 1,
"x": 220,
"y": 155,
"height": 22,
"width": 65,
"color": "red"
}
]

def my_custom_annotation_handler(annotation):
print(f"Annotation {annotation} clicked.")

pdf_viewer(
"path/to/pdf",
on_annotation_click=my_custom_annotation_handler,
annotations=annotations
)

```


## Developers notes

### Environment
Expand Down
Loading

0 comments on commit 62ddf0f

Please sign in to comment.