Skip to content
This repository has been archived by the owner on Sep 3, 2021. It is now read-only.

improve yt proxying #116

Open
wants to merge 9 commits into
base: dev-indep
Choose a base branch
from
Open

improve yt proxying #116

wants to merge 9 commits into from

Conversation

lowne
Copy link

@lowne lowne commented Oct 19, 2020

  • consolidate proxying logic in routes.py
  • can proxy images and videos internally
  • flexible user configuration of external proxy
  • (don't use json for human-facing configuration)

Note: I haven't tested nginx, but unless it chokes on a ?&arg= query it should work.

@pluja
Copy link
Member

pluja commented Oct 19, 2020

Awesome thanks! I will have to review it as this would only work with a non-nginx set-up. I will be adding the Nginx part to this!

@lowne lowne closed this Oct 19, 2020
@lowne lowne deleted the lowne branch October 19, 2020 18:50
@lowne lowne restored the lowne branch October 19, 2020 18:51
@lowne
Copy link
Author

lowne commented Oct 19, 2020

...of course I used the wrong branch 🤦

@lowne lowne reopened this Oct 19, 2020
@lowne
Copy link
Author

lowne commented Oct 20, 2020

@pluja after some testing with nginx, it does work fine with ?&host=, so the suggested default in yotter-config.sample.yaml of

external_proxy: "https://my.nginx.instance{path}?{query}&host={netloc}"

should indeed work just fine with the nginx config from https://paste.ubuntu.com/p/Bzd9SRCJSG/ (as linked from SELF-HOSTING.md).

However, as per #53, this more elegant nginx config works as well:

  location /proxy/ {
    proxy_buffering off;
    resolver 1.1.1.1;
    rewrite /proxy/(https?:/)([^/]+)/?(.*) $1$2$3 break;
    #nginx "normalised" the scheme // to /, so need to restore one slash
    #also note that proxy_pass with a naked host (no path) breaks horribly, 
    #so the root / is excluded from captures and manually placed here
    proxy_pass $1/$2/$3;
    proxy_set_header Host $2;
    proxy_ssl_server_name on;
    sendfile on;
    tcp_nopush on;
    add_header Access-Control-Allow-Origin *;
  }

with this simpler yotter-config:

external_proxy: "https://my.nginx.instance/proxy/{url_encoded}"

@FireMasterK
Copy link
Member

Any reason why you decided to URLEncode the URL instead of handling them how nginx currently does them? I thought a better approach would be to proxy them similar to how Invidious and YouTube work.

@lowne
Copy link
Author

lowne commented Oct 20, 2020

@FireMasterK I'm not sure I understand your question, could you be more specific? I'll attempt a reply nonetheless:

URLEncode the URL

well, how else are you going to pass around URLs to webservers?

handling them how nginx currently does them?

The admin can:

  • set proxy_videos and proxy_images to false, omit external_proxy: the client connects directly to YT
  • set proxy_videos and proxy_images to true, omit external_proxy: the app (e.g. gunicorn) proxies requests to YT, no need for any reverse proxy (theoretically)
  • set proxy_videos and proxy_images to true, set external_proxy: requests to YT are proxied by whatever external proxy is used with whatever url scheme is set, including the current nginx scheme with "https://my.nginx.instance{path}?{query}&host={netloc}"

proxy them similar to how Invidious and YouTube work

again, could you clarify?

@FireMasterK
Copy link
Member

@FireMasterK I'm not sure I understand your question, could you be more specific? I'll attempt a reply nonetheless:

URLEncode the URL

well, how else are you going to pass around URLs to webservers?

Take a look at this URL for example https://yotter.kavin.rocks/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s88-c-k-c0x00ffffff-no-rj-mo?host=yt3.ggpht.com

Yotter then proxies the URL from https://yt3.ggpht.com/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s88-c-k-c0x00ffffff-no-rj-mo?host=yt3.ggpht.com with the help of Nginx

handling them how nginx currently does them?

The admin can:

  • set proxy_videos and proxy_images to false, omit external_proxy: the client connects directly to YT
  • set proxy_videos and proxy_images to true, omit external_proxy: the app (e.g. gunicorn) proxies requests to YT, no need for any reverse proxy (theoretically)
  • set proxy_videos and proxy_images to true, set external_proxy: requests to YT are proxied by whatever external proxy is used with whatever url scheme is set, including the current nginx scheme with "https://my.nginx.instance{path}?{query}&host={netloc}"

I think that proxying should always be enabled, without a configuration option, but I'm open to other opinions too. My solution was to change the routes so that they perform just like how they do with Nginx, thereby removing the need for a configuration option altogether.

proxy them similar to how Invidious and YouTube work

again, could you clarify?

YouTube: https://yt3.ggpht.com/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s900-c-k-c0x00ffffff-no-rj
Invidious: https://invidious.kavin.rocks/ggpht/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s900-c-k-c0x00ffffff-no-rj
Yotter: https://yotter.kavin.rocks/a/AATXAJzCMHrRPZ6ibZCTrBQCm6rdW27mEj8ZZ3KMNBuIyw=s900-c-k-c0x00ffffff-no-rj?host=yt3.ggpht.com

Your PR seems to change the format of how it's currently done (by URL-encoding it).

@lowne
Copy link
Author

lowne commented Oct 21, 2020

Your PR seems to change the format of how it's currently done (by URL-encoding it).

No.
Paste this into your yotter-config.yaml:

proxy_videos: true
proxy_images: true
external_proxy: "https://my.nginx.instance{path}?{query}&host={netloc}"

and it'll behave exactly as before (with nginx and "nginxVideoStream":"true" in the json config).

The internal proxy routing can be whatever we want, as it never leaks. Instead of rolling-my-own-scheme-which-breaks-on-unforeseen-edge-cases, I simply used the <path:var> facility already provided by flask, which yes, happens to urlencode the parameter, because it's the simplest and sanest way to pass a full url around.

To see why, let's get back to the current nginx proxy url scheme: if tomorrow Google decides to change the endpoints to, say yt9000.google/abc123/AATX.., then you need to go and change your nginx.conf to serve /abc123/ as well. If the new endpoint is yt.over9000.org/AATX... (no top-level folder) then the gig is up. My suggestion above, using the full urlencoded url, prevents all of these problems.

And if your argument is really that none of this should be configurable by the admin (which I don't get, but to each their own), then this entire PR can simply be thrown away. 😄

I, for one, like to use caddyserver and/or haproxy, and wouldn't want to touch nginx with a ten foot pole.

@pluja
Copy link
Member

pluja commented Oct 23, 2020

How nice! I'll be testing that out and merging if it's all ok!

@pluja
Copy link
Member

pluja commented Oct 27, 2020

@FireMasterK would this be compatible with #122 ?

@FireMasterK
Copy link
Member

I can't say for sure, considering this

To see why, let's get back to the current nginx proxy url scheme: if tomorrow Google decides to change the endpoints to, say yt9000.google/abc123/AATX.., then you need to go and change your nginx.conf to serve /abc123/ as well. If the new endpoint is yt.over9000.org/AATX... (no top-level folder) then the gig is up. My suggestion above, using the full urlencoded url, prevents all of these problems.

While I do get his point, the likelihood of YouTube doing something like changing the endpoints is pretty low, especially for the
endpoints behind their CDN.
This would also remove the ability to cache only a certain type of content, which I do on my instance.

So, in short, no it is not compatible with the current design.

@pluja
Copy link
Member

pluja commented Oct 27, 2020

What changes should we make so we can make this two approaches compatible?

@FireMasterK
Copy link
Member

Either,

Maintain the same old routes:
/vi/
/a/
/videoplayback
etc

Or modify the go app to use the new route:
/proxy/

I'm against the new route as it makes it harder to understand what's happening under the hood and makes the URLs illegible to a human eye

@pluja
Copy link
Member

pluja commented Oct 27, 2020

maybe we could use the /proxy/ when it's running locally, and use all others when is a server-hosted instance with nginx.

@FireMasterK
Copy link
Member

Yes, that should be fine technically

@pluja
Copy link
Member

pluja commented Oct 30, 2020

We will need to review all of this as some changes now produce merge conflicts.

I would really like this to work as I want for Yotter to be able to be executed locally without the need of a server. I think that the best way to approach this is to have a setting where we can choose if we are on an instance or running locally.

If we are running locally we will need to change the endpoints to the Flask routes for the proxying.

Another good option is to use all the youtube links without proxy at all, and recommend users who are running this locally to just run with a VPN. In the end, they will be connecting to google even if we proxy. It doesn't really make sense to proxy when running locally.

@toh995
Copy link
Contributor

toh995 commented Dec 28, 2020

looking forward to this :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants