Skip to content

Latest commit

 

History

History
275 lines (208 loc) · 23.9 KB

README.md

File metadata and controls

275 lines (208 loc) · 23.9 KB

ApacheFOP.Serverless -- Quality PDF Rendering-as-a-service for any Environment!

ApacheFOP.Serverless is a ready to use server-less implementation of Apache FOP via Azure Functions. This provides an easy to use REST API micro-service for dynamically rendering quality PDF binary outputs from XSL-FO source using Apache FOP.

When combined with the ease and simplicity of Azure Functions this project is a powerful, efficient, and scalable PDF Reporting Service that generates high quality, true paged media, reports for any environment and any client technology (.Net, NodeJS/JavaScript, Ruby, Mobile iOS/Android, Powershell, even Windows/Mac apps, etc.)!

You should be able to pull this code down and be up and running quickly & easily with IntelliJ or VS Code (after installing pre-requisites), or even just clone the repo and deploy directly to your Azure Subscription via GitHub Actions with no local Java needed.

Give Star 🌟

If you like this project and/or use it the please give it a Star 🌟 (c'mon it's free, and it'll help others find the project)!

I'm happy to share with the community, but if you find this useful (e.g for professional use), and are so inclinded, then I do love-me-some-coffee!

Buy Me A Coffee

Updates / Change Log

Updated the project to v1.5 with the following:
  • Add support to read the Accessibility flag correctly from ApacheFOP configuration as noted in the documentation; a bug exists where the value is not loaded so we manually support this now in a way that is fully compliant with the documentation.
    • The original support from Azure Function configuration (environment variable) is still supported also.
  • Several small code improvements for consistency
  • Additional debugging log added to better know if rendering process was completed (e.g. logs SUCCESS along with Pdf Byte Size).
Updated the project to v1.4 with the following:
  • Added support for running, debugging, and deploying from within VS Code as well as IntelliJ IDEA.
    • Both project types use folder context configuration, so all configuration files have now been included and checked into the Repository.
    • This should make it easier to get up and running quickly with either IDE.
  • Resolved a bug in the Font loading/path handling when running in Windows Host (due to existing font paths).
  • Updated Microsoft's azure-functions-maven-plugin to address various issues (esp. the need for a GUID in the deployment name which broke VS Code's ability to debug).
  • Pom.xml cleanup to eliminate various "Problems" flagged by VS Code's pom parsing (using M2Eclipse processor
  • Various small code cleanup items as noted in VS Code Java "Problems" tab.
Updated the project to v1.3 with the following:
  • Added support for Azure Function configuration capability to enable Accessibility since Apache FOP <accessibility> xml config element is not working as of v2.6.
    • Added an XslFO markup sample to test/demonstrate Accessibility in resources/samples/WorkinWithAccessibilitySample.fo.
    • Updated KeepinItWarm.fo to run correctly when Accessibility is enabled.
  • Added in-memory caching of Java embedded resources that are resolved (e.g. Fonts) for performance.
  • Code cleanup.
Updated the project to v1.2 with the following:
  • Added support for Custom Font integration as Resource Files in the project and deployed with the JAR!
    • This enables adding fonts easily by simply dropping them in the resources/fonts folder, and then registering them via configuration in the apache-fop-config.xml according to Apache FOP Font Documentation.
    • Added a a couple sample (free) custom fonts and sample markup resources/samples/WorkinWithFontsSample.fo in the project.
  • Fixed bug in the render Event Log debug details returned in the Http Header whereby Apache FOP may send Unicode Characters but only ASCII are allowed; Unicode are now safely escaped into their respective Hex Codes so that the message is readable.
  • Fixed issue in Maven build to enforce the clean stage so the artifact always contains the latest changes (e.g. especially physical resource file changes) when debugging.
  • Some miscellaneous code cleanup.
Updated the project to v1.1 as it now incorporates:
  • Upgraded the project to use Java v11 now as the latest long term supported (LTS) version for Azure Functions (aka Zulu Java v11)
    • Previously was Java 8 (v1.8) (aka Zulu Java v8).
  • Bumping the versions of all dependencies to the latest stable versions
  • Bumping the Apache FOP version to v2.6 (just released in Jan 2021)
  • Adding support for configuration Xml to fully configure ApacheFOP Factory by editing the `/src/main/resources/apache-fop-config.xml' as needed.
    • The configuration will be bundled and deployed with the application.
  • Now includes an existing apache-fop-config.xml file which enables Font 'auto-detect' feature for much better Font support.
  • Removed dependency on com.sun.deploy.net.HttpRequest import as importing it no longer compiles on the latest versions of IntelliJ IDEA; little value was added by using only one constant that was needed: ACCEPT_ENCODING
  • All Heading and Content type constants are now self-contained so no additional dependencies are needed.
    • This enabled removal of the dependency on com.sun.deploy.net.HttpRequest import as importing it no longer compiles on the latest versions of IntelliJ IDEA, and is a bad practice. Little value was added by using only 1 constant was needed, ACCEPT_ENCODING
  • Notable cleanup & optimization of the Pom.xml
  • Implemented a fix for a possilbe deployment risk when AppName and ResourceGroupName values are not unique with the azure-functions-maven-plugin

Technical Summary:

This project provides a REST API that recieves a POST body containing a well formed Xsl-FO Xml document (like these Apache FOP samples). The service will respond with the rendered Pdf binary (file bytes).

If an error occurs -- likely due to incorrect Xsl-FO syntax or structure -- then an Http 500 Response will be returned with a JSON payload containing details about the error. For issues with the Xsl-FO parsing/processing, ApacheFOP generally provides very helpful info. about the line, location, and markup that caused the error so that it can be resolved.

Azure Function API:

  • Path: /api/apache-fop/xslfo
  • Request Type: POST
  • Request Body: Xsl-FO Content as valid Xml to Render

Responses:

  • Http 200-OK: Binary File paylaod containing the rendered Pdf File
  • Http 500-InternalServerError -- Json response payload with ApacheFOP processing error details.

Postman Example:

Project Overview:

Generating high quality printable PDF outputs from a highly flexible pdf templating approach (separating content/data from presentation) hasn't been easy in the world of .NET -- vs the world of Java where ApacheFOP has been around for a very long time.

For a more exhaustive dive into why PDF templating and markup based solutions are more powerful than report designer based solutions -- in today's modern web apps -- I ramble on about that over here in:

Suffice it to say that markup based solutions have alot of value, and Xsl-FO is still one of the best ways to maintain strong software development practices by rendering PDF outputs (as a presentation output) from separated content/data + template. And Xsl-FO offers features that some approaches just can't do (looking at you Crystal Reports).

There has been a fully managed .NET C# port of Apache FOP (FO.Net) based on a pre-v1.0 version (is my guesstimate); it's old & unsupported, but still fairly functional, and I've used it very successfully on several projects. But Apache FOP is now on v2.5 as of May 2020! with annual/bi-annual support updates still being released.

So my goal has been, for a while, to take advantage of the many great innovations in the past several years to provide an interoperable integration between Java Apache FOP and .NET, without resorting to something that makes my eyes cross (ugg)..

Taking advantage of some awesome new innovations (in Azure) we can do this in a much cleaner way using:

  • C# .NET for Templating using Razor (anything other than native Java will benefit from this)
  • Microservice for integration architecture
  • REST API for interoperability
  • Azure Function for native Java Support
  • Apache FOP for the free & robust XSL-FO processing implementation

So that's the goal of this project, to provide a ready-to-deploy microservice implementation of Apache FOP via Azure Functions. And enable others to pull the code down, deploy to their Azure subscription/resource group and be up and running in minutes -- especially those less experienced in Java.

I'm definitely not the only person to think of this, and definitely gleaned some insight from this article written by Marc Mathijssen....

But, I did find that article left alot of nebulous details unclear, and would expect that a Java novice may really struggle to get it up and running. Especially the dependency on the command line tutorial, and lack of guidance on Java IDE, or how to Debug/Test, etc.

And ultimately, it didn't provide any insight on how to configure Maven correctly (I can hear some devs asking "what's Maven" right now) or any code at all really. So, Kudos for the intro, but I hope this helps to bring it across the goal line!

Conceptual Diagram - PDF-as-a-service:

Conceptual Diagram - PDF Templating:

Getting Started:

Here's the high level steps to get started...

Want to run it locally?

  1. First get your Java IDE (IntelliJ IDEA or VS Code) setup with Azure Toolkit!
  2. This project has already been configured using an Azure Functions Maven archetype, and all necessary dependencies for ApacheFOP, Apache Commons libraries, etc. have already been configured correctly.
  3. Once IntelliJ or VS Code is up and running, you can pull down the Repo, and then just open the apachefop-serverless-az-func as the root project -- I just right click and select:
    • IntelliJ: "Open Folder as IntelliJ Community Edition Project".
    • VS Code: "Open with Code"
  4. It's usually a good idea to reload the Maven pom.xml and kick off a build to load all dependencies by running/executing the package phase in the Maven console of either IntelliJ or VS Code.
  5. IntelliJ: Run the function app locally and debug via IntelliJ or VS Code . . . click the good-ole Debug/Run Icon and fire up the micro-service locally.
    • Yes, this will fully support local execution, testing, and debugging!
  6. Install Postman/Insomnia/etc. and play with it by posting your Xsl-FO Markup to the Service and seeing your PDF be returned (see above screenshots)...
  7. Finally, Deploy to Azure using the Azure Toolkit via IntelliJ or VS Code whenever you're ready...

Just want it Running in Azure and don't want to bother with any local installations?

If you'd rather just deploy directly to Azure, then there's some info on using Github Actions to do just that with no local installation required shared over here: #3 (comment)

Additional Features:

GZIP Compression:

In addition to rendering the PDF, this service already implements GZIP compression for returning the Binary PDF files. This can greatly improve performance and download times, especially for PDFs rendered without much imagery, as the text content bytes tends to be more compressible.

All you need to do is add the Accept-Encoding header with a value of gzip to the request:

Accept-Encoding=gzip

Custom Fonts via Resource Files:

Add Font Files (*.ttf, *.otf, etc.)

To easily utilize custom fonts with the Azure Functions deployment, this project provides the ability to simply add them into the project as resource files by simply placing them in the src/main/resources/fonts folder. So you can literally just copy them into the project and deploy. In IntelliJ IDEA the structure will look like:

Once there they can be resolved at runtime by the application even after being deployed to Azure Functions; because they will be embedded resources with the JAR file. ApacheFOP.Serverless has a custom ResourceResolver implementation that can then locate these via relative path is used when registering the fonts via configuration in the apache-fop-config.xml according to Apache FOP Font Documentation.

Enable Accessibility:

Apache FOP Supports accessibility compliance in PDFs however, the <accessibility> xml configuration attribute noted in the documentation (here) does not work as of v2.6.

Therefore ApacheFOP.Serverless provides an Azure Function configuration value to set this directly which can be enabled by setting the Azure Functions environment config value: 'AccessibilityEnabled' = 'true'.

Basic Azure Functions Configuration Values:

Ensure that the configuration values are set correctly for your Azure Function...

  • When deployed, you will set these as variables in the Portal Settings -> Configuration of the Azure Function.
  • When running locally in IntelliJ these will be set in the AppSettings of the Run/Debug Configuration for IntelliJ. And, in VS Code these are stored in the local.settings.json configuration file (included).

Configuration Values:

  • FUNCTIONS_WORKER_RUNTIME = java
  • FUNCTIONS_EXTENSION_VERSION = ~3
  • FUNCTIONS_CORE_TOOLS_DISPLAY_LOGO = true (pretty sure this is optional, but I like having it)
  • JAVA_HOME = C:\Program Files\Zulu\zulu-11\ (may be optional, but at one point this was the only way I got it to find the right version before uninstalling everything else)
  • AzureWebJobsStorage = UseDevelopmentStorage=true OR Initialize a new Azure Function in the Portal to get a valid Web Jobs storage key
  • KeepWarmCronSchedule = 0 */5 * * * * (also required configuration for the KeepWarmFunction)
  • DebuggingEnabled = true (Optional but very helpful once you start using it to return debug details in the responses).

Calling the Service from .NET

Snippet:

Because I talked about follow-through up above, I'd be amiss if I didn't provide a sample implementation of calling this code from .NET.

Assuming the use of the great Flurl library for REST api calls, and the Xsl-FO content is validated and parsed as an XDocument (Linq2Xml)... this sample should get you started on the .NET side as a client calleing the new PDF microservice.

NOTE: Just use (Flurl)[https://flurl.dev/] or (RESTSharp)[https://restsharp.dev/] and avoid incorrectly implementing HttpClient (hint, it should be a singleton)

Here's a very simple client class that will get the job done! But this does not include functionality to handle debugging, viewing the event log which is returned in the response headers (and may be gzipped if large), etc. Therefore you might be interested in the readily available .NET Client that's available in Nuget -- more details below in the .NET Client section.

using System;
using System.Net.Mime;
using System.Text;
using Flurl;
using Flurl.Http;

namespace PdfTemplating.XslFO.Render.ApacheFOP.Serverless
{
    public class ApacheFOPServerlessClient
    {
        public Uri ApacheFOPServerlessUri { get; protected set; }
        public string? AzFuncAuthCode { get; protected set; }

        public ApacheFOPServerlessClient(Uri pdfServiceUri, string? azFuncAuthCode = null)
        {
            ApacheFOPServerlessUri = pdfServiceUri;
            AzFuncAuthCode = azFuncAuthCode;
        }

        public async Task<byte[]> RenderPdfAsync(string xslfoMarkup)
        {
            var pdfServiceUrl = ApacheFOPServerlessUri
                .SetQueryParam("code", AzFuncAuthCode, NullValueHandling.Remove);
            
            using var response = await pdfServiceUrl.PostAsync(
                new StringContent(xslfoMarkup, Encoding.UTF8, MediaTypeNames.Application.Xml)
            );
            
            var pdfBytes = await response.GetBytesAsync();
            return pdfBytes;
        }
    }
}

.Net PdfTemplating (Full blown) Sample Implementation & .NET Client:

A full blown implementation of Razor Templating + ApacheFOP.Serverless is available in my PdfTemplating.XslFO project here.

.NET Client

The PdfTemplating.XslFO project also provides ready-to-use .NET Client for ApacheFOP.Serverless that is readily availalbe in Nuget: PdfTemplating.XslFO.Render.ApacheFOP.Serverless

It illustrates the use of both Xslt and/or Razor templates from ASP.Net MVC to render PDF Binary reports dynamically from queries to the Open Movie Database API. And it has now been enhanced to also illustrate the use of ApacehFOP.Serverless microservice for rendering instead of the embedded legacy FO.Net implementation.

With the running application provided in the project above, the following page url's will render the dynamic Pdf using ApacheFOP.Serverless.

NOTE: You will need to have ApacheFOP.Serverless project running either locally or in your own instance of Azure :-) Just update the Web.config to point to your Host (Local or Azure).

Additional Background:

For many-many years, I've implemented Pdf Reporting solutions with templating approaches for various clients (enterprises & small businesses) to help them automate their paper processes with dynamic generation of printable media outputs such as: PDF files, invoices, shipping/packaging labels, newletters, etc.

And, for a long while now I've known that the current C# implementation FO.Net was limited by the fact that it was created circa 2008 and is now an archived CodePlex project.

At one client the technology stack was fully Java based, so the use of Apache FOP was a no-brainer; ApacheFOP is a supported, open-source, full implementation of an XSL-FO processor in Java, that has had regular updates/enhancements over the years.

The FO.Net C# variant was ported from Apache FOP; likely from a pre-v1.0 version of ApacheFOP, but to be honest it has worked incredibly well, and reliably. As a fully managed C# solution, it ran in web projects as well a WinForms projects where viewing the rendered PDF live int the app real-time provided and wonderful user experience for a couple of projects.

But, as things have evolved the advent of cloud services has opened doors for accomplishing this in a much more powerful/scaleable/manageable way -- particularly Azure Functions and their excellent support for varios technology languages including: .Net, Java, NodeJS, etc.!

So I finally had the time to flush out the details, and share this project. I truly hope that it helps many others out!

Now Geaux Code!