Skip to content

Commit

Permalink
feat: Initial contribution for weasyprint as a service
Browse files Browse the repository at this point in the history
Refs: #DEV-11712
  • Loading branch information
nirikash committed May 23, 2024
1 parent 6fcb60e commit 077c2ed
Show file tree
Hide file tree
Showing 6 changed files with 274 additions and 19 deletions.
21 changes: 21 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM python:3.12.3-slim
LABEL maintainer="Team Polarion (CLEW/WZU/POLARION) <[email protected]>"

RUN apt-get update && \
apt-get --yes --no-install-recommends install python3-cffi python3-brotli libpango-1.0-0 libpangoft2-1.0-0 fonts-liberation chromium && \
apt-get clean autoclean && \
apt-get --yes autoremove && \
rm -rf /var/lib/apt/lists/*

ENV WORKING_DIR=/opt/weasyprint
ENV CHROME_EXECUTABLE_PATH=/usr/bin/chromium

WORKDIR ${WORKING_DIR}

COPY requirements.txt ${WORKING_DIR}/requirements.txt

RUN pip install --no-cache-dir -r ${WORKING_DIR}/requirements.txt

COPY ./app/*.py ${WORKING_DIR}/app/

ENTRYPOINT [ "python", "app/WeasyprintServiceApplication.py" ]
95 changes: 76 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,93 @@
# Polarion ALM extension to <...>
# WeasyPrint Service
Service providing REST API to use WeasyPrint functionality

This Polarion extension provides possibility to <...>
## Build
## Build Docker image

This extension can be produced using maven:
```
mvn clean package
```bash
docker build \
--file Dockerfile \
--tag weasyprint-service:61.2.0 .
```

## Installation to Polarion
## Start Docker container

To install the extension to Polarion `ch.sbb.polarion.extension.<extension_name>-<version>.jar`
should be copied to `<polarion_home>/polarion/extensions/ch.sbb.polarion.extension.<extension_name>/eclipse/plugins`
It can be done manually or automated using maven build:
```bash
docker run --detach \
--publish 9080:9080 \
--name weasyprint-service \
weasyprint-service:61.2.0
```
mvn clean install -P polarion2304,install-to-local-polarion

## Stop Docker container

```bash
docker container stop weasyprint-service
```
For automated installation with maven env variable `POLARION_HOME` should be defined and point to folder where Polarion is installed.

Changes only take effect after restart of Polarion.
## Access service
WeasyPrint Service provides the following endpoints:

------------------------------------------------------------------------------------------
#### Getting version info
<details>
<summary>
<code>GET</code> <code>/version</code>
</summary>

##### Responses

> | HTTP code | Content-Type | Response |
> |-----------|--------------------|-------------------------------------------|
> | `200` | `application/json` | `{"python":"3.12.3","weasyprint":"61.2"}` |
##### Example cURL

> ```bash
> curl -X GET -H "Content-Type: application/json" http://localhost:9080/version
> ```
</details>
------------------------------------------------------------------------------------------
#### Convert HTML to PDF
<details>
<summary>
<code>POST</code> <code>/convert/html</code>
</summary>
## Polarion configuration
##### Parameters
<...>
> | Parameter name | Type | Data type | Description |
> |----------------------|----------|-----------|----------------------------------------------------------------------|
> | encoding | optional | string | Encoding of provided HTML (default: utf-8) |
> | media_type | optional | string | WeasyPrint media type (default: print) |
> | file_name | optional | string | Output filename (default: converted-document.pdf) |
> | presentational_hints | optional | string | WeasyPrint option: Follow HTML presentational hints (default: False) |
> | base_url | optional | string | Base URL to resolve relative resources (default: None) |
##### Responses
## Extension Configuration
> | HTTP code | Content-Type | Response |
> |-----------|-------------------|-------------------------------|
> | `200` | `application/pdf` | PDF document (binary data) |
> | `400` | `plain/text` | Error message with exception |
> | `500` | `plain/text` | Error message with exception |
<...>
##### Example cURL
> ```bash
> curl -X POST -H "Content-Type: application/html" --data @input_html http://localhost:9080/convert/html --output output.pdf
> ```
## Usage
</details>
<...>
------------------------------------------------------------------------------------------
## Changelog
| Version | Changes |
|---------|----------------------------------------------|
| v1.2.0 | Replacing svg images with png using chromium |
| v1.1.0 | Repository refactored + readme updated |
| v1.0.0 | Initial contribution |
99 changes: 99 additions & 0 deletions app/SvgUtils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import base64
import logging
import os
import re
import subprocess
import tempfile
from uuid import uuid4

NON_SVG_CONTENT_TYPES = ('image/jpeg', 'image/png', 'image/gif')


# Process img tags, replacing base64 SVG images with PNGs
def process_svg(html):
pattern = re.compile(r'<img(?P<intermediate>[^>]+?src="data:)(?P<type>[^;>]*?);base64, (?P<base64>[^">]*?)"')
return re.sub(pattern, replace_img_base64, html)


def replace_img_base64(match):
entry = match.group(0)
content_type = match.group('type')
if content_type in NON_SVG_CONTENT_TYPES:
return entry # Skip processing if content type isn't svg explicitly
else:
# We do not require to have 'image/svg+xml' content type coz not all systems will properly set it
content_base64 = match.group('base64')
replaced_content_base64 = replace_svg_with_png(content_base64)
if replaced_content_base64 == content_base64:
# For some reason content wasn't replaced (e.g. it was not a svg)
return entry
else:
return f'<img{match.group("intermediate")}image/svg+xml;base64, {replaced_content_base64}"'


# Checks that base64 encoded content is a svg image and replaces it with the png screenshot made by chrome
def replace_svg_with_png(possible_svg_base64_content):
svg_content = base64.b64decode(possible_svg_base64_content).decode('utf-8')

# Fast check that this is a svg
if '</svg>' not in svg_content:
return possible_svg_base64_content

chrome_executable = os.environ.get('CHROME_EXECUTABLE_PATH')
if not chrome_executable:
logging.error('CHROME_EXECUTABLE_PATH not set')
return possible_svg_base64_content

# Fetch width & height from root svg tag
match = re.search(r'<svg[^>]+?width="(?P<width>[\d.]+)', svg_content)
if match:
width = match.group('width')
else:
logging.error('Cannot find svg width in ' + svg_content)
return possible_svg_base64_content

match = re.search(r'<svg[^>]+?height="(?P<height>[\d.]+)', svg_content)
if match:
height = match.group('height')
else:
logging.error('Cannot find svg height in ' + svg_content)
return possible_svg_base64_content

# Will be used as a name for tmp files
uuid = str(uuid4())

temp_folder = tempfile.gettempdir()

# Put svg into tmp file
svg_filepath = os.path.join(temp_folder, uuid + '.svg')
f = open(svg_filepath, 'w', encoding='utf-8')
f.write(svg_content)
f.close()

# Feed svg file to chrome
png_filepath = os.path.join(temp_folder, uuid + '.png')
result = subprocess.run([
f'{chrome_executable}',
'--headless',
'--no-sandbox',
'--default-background-color=00000000',
'--hide-scrollbars',
f'--screenshot={png_filepath}',
f'--window-size={width},{height}',
f'{svg_filepath}',
])

# Get resulting screenshot content
with open(png_filepath, 'rb') as img_file:
img_data = img_file.read()
png_base64 = base64.b64encode(img_data).decode('utf-8')

# Remove tmp files
os.remove(svg_filepath)
os.remove(png_filepath)

if result.returncode != 0:
logging.error('Error converting to png')
return possible_svg_base64_content
else:
return png_base64
58 changes: 58 additions & 0 deletions app/WeasyprintController.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import logging
import platform
from urllib.parse import unquote

import weasyprint
from flask import Flask, Response, request
from gevent.pywsgi import WSGIServer

import SvgUtils

app = Flask(__name__)


@app.route("/version", methods=["GET"])
def version():
return {
"python": platform.python_version(),
"weasyprint": weasyprint.__version__
}


@app.route("/convert/html", methods=["POST"])
def convert_html():
try:
encoding = request.args.get("encoding", default="utf-8")
media_type = request.args.get("media_type", default="print")
file_name = request.args.get("file_name", default="converted-document.pdf")
presentational_hints = request.args.get("presentational_hints", default=False)

base_url = request.args.get("base_url", default=None)
if base_url:
base_url = unquote(base_url, encoding=encoding)

html = request.get_data().decode(encoding)
html = SvgUtils.process_svg(html)
weasyprint_html = weasyprint.HTML(string=html, base_url=base_url, media_type=media_type, encoding=encoding)
output_pdf = weasyprint_html.write_pdf(presentational_hints=presentational_hints)

response = Response(output_pdf, mimetype="application/pdf", status=200)
response.headers.add("Content-Disposition", "attachment; filename=" + file_name)
return response

except AssertionError as e:
return process_error(e, "Assertion error, check the request body html: " + str(e), 400)
except (UnicodeDecodeError, LookupError) as e:
return process_error(e, "Cannot decode request html body: " + str(e), 400)
except Exception as e:
return process_error(e, "Unexpected error due converting to PDF: " + str(e), 500)


def process_error(e, err_msg, status):
logging.exception(e)
return Response(err_msg, mimetype="plain/text", status=status)


def start_server(port):
http_server = WSGIServer(("", port), app)
http_server.serve_forever()
15 changes: 15 additions & 0 deletions app/WeasyprintServiceApplication.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import argparse
import logging

import WeasyprintController

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Weasyprint service")
parser.add_argument("--port", default=9080, type=int, required=False, help="Service port")
args = parser.parse_args()

logging.getLogger().setLevel(logging.INFO)
logging.info("Weasyprint service listening port: " + str(args.port))
logging.getLogger().setLevel(logging.WARN)

app = WeasyprintController.start_server(args.port)
5 changes: 5 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
###### Requirements without Version Specifiers ######
flask
gevent
###### Requirements with Version Specifiers ######
weasyprint==61.2

0 comments on commit 077c2ed

Please sign in to comment.