Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ITI: Javascript Implementation of AMP Cache Transformer #449

Open
jonluca opened this issue Jul 1, 2020 · 5 comments
Open

ITI: Javascript Implementation of AMP Cache Transformer #449

jonluca opened this issue Jul 1, 2020 · 5 comments

Comments

@jonluca
Copy link

jonluca commented Jul 1, 2020

Summary

The current Google AMP Cache requirements effectively mandate a specific implementation of the AMP transforms (the AMP Packager, currently written in Go). We’d like to create a standalone, JavaScript version of the transforms that could be used with any available SXG packaging logic.

Motivation

By introducing a standalone package, this would decouple the logic of the Google AMP cache transforms from that of the signed exchanges. This would introduce a new level of flexibility, and allow for wider adoption of the technology.

Additionally, a javascript version of the AMP cache transforms would integrate much more seamlessly with existing web pipelines, without requiring a standalone external service. Javascript would allow us to implement these transforms as a final step in our web server, which renders AMP for users.

There currently exists a subset of these transforms already written in JavaScript, in the AMP Toolbox → Optimizer repo. These are not fully complete, however,

Design

The main requirement is to ensure a way of keeping the JavaScript and Go versions in sync. The actual implementation will dictate a lot of the difficulty - we would need to set up some form of shared testing infrastructure that ensures any changes made to one version gets reflected in the other.

We see three main paths forward in terms of where this implementation will live:

  1. Within AMP Packager
  2. Within AMP Toolbox → Optimizer
  3. As a standalone repo
  Pros Cons
AMP Packager Single source of truth for code and logicEasier to create a shared testing infrastructure Polluting the same repo with two languages and two implementations
AMP Toolbox Add on to the existing AMP transformsLeverage existing work and framework for adding new transformsAlready implemented and used by industry Expands scope of existing toolsetHarder to keep implementation in sync with AMP Packager
Standalone Repo Might be the right level of abstraction for a new codebaseDoes not affect technical direction of the AMP toolbox library Harder to keep implementation in sync with AMP Packager
Standalone Repo + Go Cache transforms Single source of truth of transformsBest of both worlds between a standalone repo and having the code live in AMP packager Requires splitting up the AMP packager repo

We believe that having the code live in the AMP Packager repo is the best way forward, although we are open to a discussion. Sharing the code in the same repo will have the highest likelihood of making sure the versions are in sync, and is the easiest way to set the tests up in such a way that they can be shared.

We could also split the AMP Packager, such that the Go and Javascript versions of the transforms live in their own “AMP Transforms” repo, and the AMP Signed Exchange live in separate repos.

We are open to options depending on which direction the AMP packager maintainers prefer.

@cpapazian
Copy link

@nainar @twifkak @jeffjose

@cpapazian
Copy link

related: envoyproxy/envoy#9763

if we built out the proposal in this ITI, an envoy filter for signed exchange becomes pretty simple: it's basically just wrapping libsxg

@sebastianbenz
Copy link

AMP Optimizer maintainer here 👋. Thanks a lot for the write-up!

AMP Optimizer is the "official" JS based AMP transformer implementation. Hence I'd prefer to include any changes in there instead of creating a new JS based transformer implementation. Otherwise we'd need to maintain to JS based transformer implementations which would mean an additional maintenance burden I'd really like to avoid.

The important thing to note is that in the near future, transformers won't need to produce byte compatible AMPHTML to the AMPPackager. This will greatly simplify things and means that transformers only need to produce valid AMP. So you will be able to use AMP Optimizer out of the box. However, for performance reasons, asset URL rewriting to cdn.ampproject.org is critical in the SXG case and should be added to AMP Optimizer. This could be enabled via a flag as an "AMPPacker compatibility mode".

There's also the bigger problem of keeping all AMP transformers in sync. We currently support three different versions:

  • Go-based which is part of AMPPackager.
  • Node-based AMP Optimizer
  • PHP-based AMP Optimizer used by the AMP Wordpress Plugin

The Node and PHP Optimizer already share the same test suite, but this is something we should extract and formalize with a standalone (versioned) spec and test suite.

@jonluca
Copy link
Author

jonluca commented Jul 10, 2020

Thanks for the context! This is great news.

I filed ampproject/amp-toolbox#862 in amp-toolbox. If asset rewriting is all that's required we can probably get this out the door pretty quickly! 😄

@twifkak
Copy link
Member

twifkak commented Jul 16, 2020

@ampproject/wg-caching for general feedback. Overall, I'm pretty excited about this. I agree that maintenance is the problem to address here.

If AMP Optimizer is the place where this code lives, then @sebastianbenz is probably the best authority on how to keep up with AMP transformer changes. But if there's movement on a common test suite, I'm interested to follow.

I'll add to the list of AMP transformer variants: the AMP Cache transforms. Currently, the Google AMP Cache runs a closed-source set of C++ transforms. I'm not sure what the Bing AMP Cache does, but I suspect similar, since it was launched before the Go transforms were released. These internal transforms are also slightly different:

  • some transforms are not exported to the packager (e.g. stripping some extra tags to protect the cache origin)
  • some transforms have to take special care to work on packaged docs (dunno any examples offhand, but likewise packager has to run atop optimized AMP; e.g. don't re-add boilerplate and don't double-rewrite URLs)
  • some transforms don't exist internally (e.g. stripjs which only exists as a defense-in-depth since the packager might run on invalid AMP docs; this won't happen internally)

None of that is to block anything; just something to keep in mind if working towards a full solution. I hope ampproject/amp-toolbox#862 is sufficient in the meantime!

Also, I'll add a couple of things that have to keep up with AMP validator changes. These are only tangentially related, but may serve as inspiration:

  • The AMP WordPress plugin includes a PHP-based sanitizer, and seems to update roughly monthly from the validator protoascii.
  • The Java AMP validator will also need to follow validator protoascii, though it looks like there's no regular process for that yet. Note that it was moved to its own repo because issues keeping it up-to-date were causing Travis failures that blocked others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@twifkak @sebastianbenz @cpapazian @jonluca and others