Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Review of already merged code #1

Open
wants to merge 24 commits into
base: empty-branch
Choose a base branch
from

Conversation

arcusfelis
Copy link
Collaborator

@arcusfelis arcusfelis commented Sep 5, 2024

We have to do coding in the main branch, so gh-pages are built.

Still, we could do a basic code review.

Here is a diff PR.

You can review it, just do not merge it, please.

So, how it works.
404.html handles any not-found-pages by default in gh-pages.
In 404.html we have the code "for control interface". Control interface allows to install/uninstall worker and view logs from the worker.

404.html would be hit when a user first time presses a link to the file in archive. It would install the service worker and reload the page.

Once service worker is installed, it would handle onfetch method.

Fetch allows to handle requests to the domain, in a way similar to how proxy works.
So, if we see in the method, that a user wants to read something from the archive, we:

  • download the tar.gz archive.
  • we use DecompressionStream to decompress gz. Standard from 2023, just in time for us :)
  • it is in tar.gz format, so we use tar-stream (or tar-web repack of it), to read tar.
  • We could just put files from tar into ServiceWorker Cache. Cache has put and match methods.
  • Except version that uses tar-web to read files content and put it into cache takes 25 seconds... oops.
  • So, we use tee to make two readable streams, instead of one.
  • One stream we use to collect filenames, file sizes and file offsets in the tar. The result JSON we store.
  • Another stream we use to get the Tar blob and a binary. But we also split that blob into 1 megabyte parts, so we suddenly do not allocate 170MB of memory.
  • We use cache API to write tar parts and files.json. So, around 20 writes to cache instead of 9000 writes :)
  • In fetch we decide if the file is in archive, get size and offset, figure out which parts we need and read the data from file. And we create Response with the data. If not found, we write "oops, not found" if file is supposed to be in the archive. Otherwise we just allow browser to ask github pages and ignore the worker.

Copy link

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, I didn't know you could do so many things on the browser. Very well done, readable code and everything! 😄

@NelsonVides
Copy link

Oh and that PR description, gorgeous!

NelsonVides added a commit to esl/MongooseIM that referenced this pull request Sep 6, 2024
Optimize (compress) ct_report logs on CI and read them using a service worker in Browser

This PR addresses MIM-2283 "Optimize (compress) ct_report logs somehow on CI"

Proposed changes include:
    Compress files into tar.gz.
    We use https://github.com/esl/html-zip-reader to read the archive and serve it using a service worker.
    The logic for the service worker could be reviewed in this PR: esl/html-zip-reader#1
    We can maybe stop using s3-parallel-put (unless it is smaller than awscli tools). But in a separate PR :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants