Skip to content

Conversation

syeopite
Copy link
Member

@syeopite syeopite commented Jun 4, 2025

Fixes the CI for Crystal nightly


Kemal's subclass of the stdlib HTTP::StaticFileHandler is not as
maintained as its parent, and so misses out on many enhancements and bug fixes from upstream, which unfortunately also includes the patches for security vulnerabilities...

Though this isn't necessarily Kemal's fault since the bulk of the stdlib handler's logic was done in a single big method, making any changes hard to maintain. This was fixed in Crystal 1.17.0 where the handler was refactored into many private methods, making it easier for an inheriting type to implement custom behaviors while still leveraging
much of the pre-existing code.

Since we don't actually use any of the Kemal specific features added by Kemal::StaticFileHandler, there really isn't a reason to not just create a new handler based upon the stdlib implementation instead which
will address the problems mentioned above.

This PR implements a new handler which inherits from the stdlib version and overrides the helper methods added in Crystal 1.17.0 to add the caching behavior with minimal code changes. Since this new handler depends on the code in Crystal 1.17.0, it will only be applied on versions greater than or equal to 1.17.0. On older versions we'll fallback to the current monkey patched Kemal::StaticFileHandler

end
end

CACHE_LIMIT = 5_000_000 # 5MB
Copy link
Member Author

@syeopite syeopite Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CACHE_LIMIT is actually way too lenient imo. It actually allows you to fit everything from the assets folder since the entire non-compressed folder with the videojs dependencies is 4.9MB. And if you minified the videojs scripts, its only 2.7MB. Adding gzip compression to that can then get it down to around 840kB.

Adding some simple LRU cache and/or compressing the cached files might be something to consider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would probably only save the gzipped + deflated versions to cache. Even storing the three versions would be less than 15M, which is perfectly reasonnable given the rest of the RAM usage of invidious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compressed responses are created by the FilteredCompressHandler so I'm not sure how the compressed data can be stored without duplicating a bunch of code and skipping the FilteredCompressHandler entirely.


Design wise though I was thinking that maybe we could just store only the gzipped data and just decompress and recompress as needed to serve the uncompressed, deflate, and range requests. For most requests it'll just be sending the compressed gzip data and for the rest I don't think the extra (re)compression steps would cause too much of an issue.

# being set to `IO` rather than `File`
#
# Can be removed once https://github.com/crystal-lang/crystal/issues/15817 is fixed.
private def serve_file_range(context : HTTP::Server::Context, file : IO, range_header : String, file_info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the size of files we are serving, we probably could let the stdlib code load the file from disk if we ever receive a ranged requests.

end
end

CACHE_LIMIT = 5_000_000 # 5MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would probably only save the gzipped + deflated versions to cache. Even storing the three versions would be less than 15M, which is perfectly reasonnable given the rest of the RAM usage of invidious.

Comment on lines +34 to +37
private def file_info(expanded_path : Path)
file_path = @public_dir.join(expanded_path.to_kind(Path::Kind.native))
{@@cached_files[file_path]? || File.info?(file_path), file_path}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions here:

  1. Do you think it could be interesting to keep track of the time a file was last read from disk, so that we could reload it if its modification time changed? I'm thinking about users that complained about not being able to make CSS changes without restarting.
  2. Less important: is there a reason to use the full (system) path instead of the shorter expanded_path as the key of the cached_files Hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well keeping track of when the file was last read to update the cache after N time has passed and the last modified time has changed will still mean that things like CSS changes won't immediately show up till after that initial time to live.

Since we do already track the last modified time of the cached files already, we could just compare it with the actual last modified time on disk for each request. But that's going to cause a lot of disk reads on instances with high request loads. I don't know if that's going to be a nonissue on modern systems but Invidious already has a reputation of killing drives and I'm concerned that this could potentially strain disks further.

Alternatively we could also just forgo the file cache system entirely and let instance maintainers add caching during deployment instead.


As for why the full path is used, the later methods down the chain doesn't have access to the shorter expanded_path so they won't be able to use it as a key to write to the cache.

@NovaAndrom3da
Copy link
Contributor

Hi! I'm wondering if there's anything blocking these commits? I'm hoping to be able to use the new crystallang version hopefully soon.

@syeopite syeopite changed the title Replace Kemal::StaticFileHandler with direct subclass of stdlib HTTP::StaticFileHandler on Crystal < 1.17.0 Replace Kemal::StaticFileHandler with direct subclass of stdlib HTTP::StaticFileHandler on Crystal >= 1.17.0 Aug 20, 2025
@syeopite
Copy link
Member Author

Hi! I'm wondering if there's anything blocking these commits? I'm hoping to be able to use the new crystallang version hopefully soon.

The PR is still going through the code review process but after that it can be merged

@Fijxu
Copy link
Member

Fijxu commented Sep 8, 2025

This is now needed to build the latest invidious commits with Crystal 1.17.1 🫤 :

In src/ext/kemal_static_file_handler.cr:167:49

 167 | directory_listing(context.response, request_path, file_path)
                                           ^-----------
Error: expected argument #2 to 'Kemal::StaticFileHandler#directory_listing' to be Path, not String

Overloads are:
 - HTTP::StaticFileHandler#directory_listing(io : IO, request_path : Path, path : Path)

@Fijxu
Copy link
Member

Fijxu commented Sep 9, 2025

I just tested this PR and I can't get Invidious interface to load completely, css files are empty and js files do not load:

image

Kemal's subclass of the stdlib `HTTP::StaticFileHandler` is not as
maintained as its parent, and so misses out on many enhancements and bug
fixes from upstream, which unfortunately also includes the patches for
security vulnerabilities...

Though this isn't necessarily Kemal's fault since the bulk of the stdlib
handler's logic was done in a single big method, making any changes hard
to maintain. This was fixed in Crystal 1.17.0 where the handler
was refactored into many private methods, making it easier for an
inheriting type to implement custom behaviors while still leveraging
much of the pre-existing code.

Since we don't actually use any of the Kemal specific features added by
`Kemal::StaticFileHandler`, there really isn't a reason to not just
create a new handler based upon the stdlib implementation instead which
will address the problems mentioned above.

This PR implements a new handler which inherits from the stdlib variant
and overrides the helper methods added in Crystal 1.17.0 to add the
caching behavior with minimal code changes. Since this new handler
depends on the code in Crystal 1.17.0, it will only be applied on
versions greater than or equal to 1.17.0. On older versions we'll
fallback to the current monkey patched `Kemal::StaticFileHandler`
Overriding `#call` or patching out `serve_file_compressed` provides
only minimal benefits over the ease of maintenance granted by only
overriding what we need to for the caching behavior.
Running `crystal spec` without a file argument essentially produces one
big program that combines every single spec file, their imports, and
the files that those imports themselves depend on. Most of the types
within this combined program will get ignored by the compiler due to a
lack of any calls to them from the spec files.

But for some types, partially the HTTP module ones, using them within
the spec files will suddenly make the compiler enable a bunch of
previously ignored code. And those code will suddenly require the
presence of additional types, constants, etc. This not only make it
annoying for getting the specs working but also makes it difficult to
isolate behaviors for testing.

The `static_assets_handler_spec.cr` causes this issue and so will be
marked as an isolated spec for now. In the future all of the tests
should be organized into independent groupings similar to how the
Crystal compiler splits their tests into std, compiler, primitives and
interpreter.
Summing the sizes of each cached file every time is very inefficient.
Instead we can simply store the cache size in an constant and increase
it everytime a file is added into the cache.
@syeopite syeopite force-pushed the fix-static-file-handler-on-1.17.0 branch from a49bfa7 to 92888bd Compare September 11, 2025 03:32
@syeopite
Copy link
Member Author

Oops looks like #5337 is required for compression to work correctly... I just rebased against master which includes that commit and its fixed now.

A test case can't be added due to needing to import handlers.cr which depends on many other Invidious objects and constants... I guess we should separate the handlers there to make designing tests easier

@Fijxu
Copy link
Member

Fijxu commented Sep 11, 2025

I tested it and works fine now ;). Thanks

@dekeonus
Copy link

dekeonus commented Sep 28, 2025

in contrast to @Fijxu I'm not seeing any of the static content load in the browser (neither Firefox-143.0 nor Google-Chrome-140.0.7339.207 on linux). Even the favicon fails to load.
This seems to be related to content-type mismatch coupled with "x-content-type-options: nosniff"

here are some of the headers for the request for the favicon
GET

scheme: https
host: invidious.localdomain
filename:  /favicon-16x16.png

snippet of Request Headers
Accept: image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5

snippet of Response Headers
content-type: text/html; charset=UTF-8

Console log error

The resource from “https://invidious.localdomain/feed/popular” was blocked due to MIME type (“text/html”) mismatch (X-Content-Type-Options: nosniff).

This same mismatch occurs with css files and js (i.e. they are sent with "content-type: text/html; charset=UTF-8")

EDIT:
This is Gentoo amd64 with
crystal-1.17.1
kemal-1.7.2¹
exception_page-0.5.0
invidious-v2.20250913.0
¹ kemal-1.6.0 is not in gentoo's guru repo (community maintained) due to the path transversal vulnerability that was fixed in kemal-1.7.0

@dekeonus
Copy link

After further examination, the issue I'm having is that the redirect to the static assets is not correct.
I restarted (after reconfiguring) invidious to serve directly (rather than by reverse proxy), to rule out some interaction there.
Note the location header is "/" in the response headers returned below.

GET

scheme: http
host: 10.144.1.5:3000
filename: /css/empty.css

v: b32b077
Address: 10.144.1.5:3000

Status: 302 Found
Version: HTTP/1.1
Transferred: 0 GB (0 GB size)
Referrer Policy: same-origin
DNS Resolution: System

Response Headers

HTTP/1.1 302 Found
Connection: keep-alive
Content-Type: text/html
Date: Sun, 28 Sep 2025 19:50:24 GMT
X-Frame-Options: sameorigin
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Content-Security-Policy: default-src 'none'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self' data:; connect-src 'self'; manifest-src 'self'; media-src 'self' blob:; child-src 'self' blob:; frame-src 'self'; frame-ancestors 'none'
Referrer-Policy: same-origin
Permissions-Policy: interest-cohort=()
Location: /
Content-Length: 0

Request Headers

GET /css/empty.css?v=b32b077 HTTP/1.1
Host: 10.144.1.5:3000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:143.0) Gecko/20100101 Firefox/143.0
Accept: text/css,*/*;q=0.1
Accept-Language: en-AU,en-GB;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://10.144.1.5:3000/
DNT: 1
Sec-GPC: 1
Connection: keep-alive

@dekeonus
Copy link

Sorry for the noise, my issue was insufficient sed in the gentoo build script.
However, that said, it does raise the issue of less than ideal operation.

In gentoo's case the build script modifies several files to put assets/, locales/, and config/{sql,migrate-scripts}/ under /usr/share/invidious.
When invidious fails to find one of the asset files, it continues running without it (that's fine) BUT without logging the fact that it failed to load the asset file, nor sending to the client a 404 response due to a non-existent resource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants