Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoToken implementation to solve 403 errors #11955

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

Stypox
Copy link
Member

@Stypox Stypox commented Jan 25, 2025

What is it?

  • Bugfix (user facing)
  • Feature (user facing)
  • Codebase improvement (dev facing)
  • Meta improvement to the project (dev facing)

Description of the changes in your PR

General information about poTokens and about this PR structure:

  • YouTube now requires integrity checks to access their clients. The most "vulnerable" client is the WEB client, since they can't enforce integrity checks on all web browsers, so that's the only client (for now) that we have found a way to obtain an integrity token for.
  • In order to obtain a poToken, we need to run BotGuard, an obfuscated virtual machine implemented in JavaScript that performs the integrity checks and gives us an integrity token. In order to make the integrity checks succeed, we need to run this VM in an environment that resembles a browser as much as possible. The integrity token can be used to generate multiple poTokens. Two network requests are needed: Create to obtain the VM code, GenerateIT to obtain the integrity token after running the VM code. See the README here for the detailed steps.
  • PoTokenGenerator is the base class for all poToken generators. It has a factory method that allows asynchronously obtaining a new instance of a PoTokenGenerator, and then two methods to generate a poToken given a specific identifier, and a method to check if the integrity token has expired.
  • PoTokenWebView is currently the only implementation of PoTokenGenerator, but we might want to add other implementations in the future, e.g. ones that do not rely on WebView.
  • PoTokenProviderImpl implements the extractor interface and is supposed to take care of possibly multiple PoTokenGenerators (although right now there is only one based on WebView). It takes care of retrying in case of problems, recreates a new PoTokenGenerator if the current one expired, and finally returns a PoTokenResult. A PoTokenResult contains two poTokens: one for the specific requested video id (used to fetch the player), and another that can be generated only once as the first thing and is specific to a visitor data (used in streaming urls).

TODO:

  • The JavaScript poToken implementation comes from https://github.com/LuanRT/BgUtils
  • Obtaining a poToken via WebView
  • Obtaining a poToken with something like HtmlUnit not doable unfortunately
  • Handling devices that don't have a WebView (needs to be tested)
  • Passing the poToken to the extractor when requested
  • Passing the poToken to player network requests (not sure if needed?)
  • Understand whether we need to change user agent everywhere

You can test whether the poTokens generated work also using the latest yt-dlp commit from their git repo (older commits won't work!), this way (take PLAYER_POT, STREAMING_POT and VISITOR_DATA from logcat):

yt-dlp "https://www.youtube.com/watch?v=i_SsnRdgitA" --extractor-args 'youtube:player_client=web;player-skip=webpage,configs;po_token=web.player+PLAYER_POT,web.gvs+STREAMING_POT;visitor_data=VISITOR_DATA'

Fixes the following issue(s)

Relies on the following changes

APK testing

The APK can be found by going to the "Checks" tab below the title. On the left pane, click on "CI", scroll down to "artifacts" and click "app" to download the zip file which contains the debug APK of this PR. You can find more info and a video demonstration on this wiki page.

Due diligence

@Stypox
Copy link
Member Author

Stypox commented Jan 26, 2025

Now the PR builds fine based on TeamNewPipe/NewPipeExtractor#1247, you can download the APK which uses poTokens! Let us know if you notice any issues.

private val TAG = PoTokenWebView::class.simpleName
private const val GOOGLE_API_KEY = "AIzaSyDyT5W0Jh49F30Pqqtyfdf7pDLFKLJoAnw"
private const val REQUEST_KEY = "O43z0dpjhgX20SCx4KAo"
private const val USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.3"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be Firefox ESR like in DownloaderImpl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for some reason it does not work with the Firefox user agent. It would work with the curl user agent though, I don't know why...

@gechoto
Copy link

gechoto commented Jan 27, 2025

Would it be possible to move the po token implementation to a library?

Currently this is in NewPipe (the app repo) which makes it inaccessible by other apps which also have the need for po tokens.

This will lead to a lot of duplicate code because it needs to be implement over and over again for each YT client app.

Would be cool if this can be maintained in just one place (and multiple apps could benefit like it is already the case with NewPipeExtractor).

@Figim
Copy link

Figim commented Jan 27, 2025

Would it be possible to move the po token implementation to a library?

Currently this is in NewPipe (the app repo) which makes it inaccessible by other apps which also have the need for po tokens.

This will lead to a lot of duplicate code because it needs to be implement over and over again for each YT client app.

Would be cool if this can be maintained in just one place (and multiple apps could benefit like it is already the case with NewPipeExtractor).

You can recreate this PR in your own application.

This simply connects to the extractor to support the Potoken stream. You will need to do this separately in your application. It should have been like this.

@gechoto
Copy link

gechoto commented Jan 27, 2025

You can recreate this PR in your own application.

my point was this would be inefficient

If you want to implement this over and over again for each app - sure, go ahead.

Keep in mind that this will likely not be "done" after the initial implementation.
YT will probably try to break this solution every few months.

You will have to update the implementation in many places again. And again. And again...
What a great way to waste time.

If this was implemented in just one place as a library it would be easier for more developers to share efforts.
To me this sounds like a reasonable thing to discuss - if possible.

Comment on lines +48 to +55
// an asynchronous function runs in the background and it will eventually call
// `vmFunctionsCallback`, however we need to manually tell JavaScript to pass
// control to the things running in the background by interrupting this async
// function in any way, e.g. with a delay of 1ms. The loop is most probably not
// needed but is there just because.
for (let i = 0; i < 10000 && !this.vmFunctions.asyncSnapshotFunction; ++i) {
await new Promise(f => setTimeout(f, 1))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I … don’t think this is how async works. The timeout is just gonna be scheduled on a new task, but the code before the loop still runs on a microtask on the previous task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this.vm.a seems to start a standalone task in the background or something like that, and we need to explicitly pass control back to the event loop by pausing this async execution, for the background task to finish executing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop actually executes only once as far as I know, I still put a loop because you never know

@Profpatsch
Copy link
Contributor

Can there be an architecture overview of this somewhere? From a skim of the code I don’t get any idea of what problem this solves or how the solution is structured.

This will be tried only once, and afterwards an error will be thrown
@Stypox
Copy link
Member Author

Stypox commented Jan 27, 2025

  • YouTube now requires integrity checks to access their clients. The most "vulnerable" client is the WEB client, since they can't enforce integrity checks on all web browsers, so that's the only client (for now) that we have found a way to obtain an integrity token for.
  • In order to obtain a poToken, we need to run BotGuard, an obfuscated virtual machine implemented in JavaScript that performs the integrity checks and gives us an integrity token. In order to make the integrity checks succeed, we need to run this VM in an environment that resembles a browser as much as possible. The integrity token can be used to generate multiple poTokens. Two network requests are needed: Create to obtain the VM code, GenerateIT to obtain the integrity token after running the VM code. See the README here for the detailed steps.
  • PoTokenGenerator is the base class for all poToken generators. It has a factory method that allows asynchronously obtaining a new instance of a PoTokenGenerator, and then two methods to generate a poToken given a specific identifier, and a method to check if the integrity token has expired.
  • PoTokenWebView is currently the only implementation of PoTokenGenerator, but we might want to add other implementations in the future, e.g. ones that do not rely on WebView.
  • PoTokenProviderImpl implements the extractor interface and is supposed to take care of possibly multiple PoTokenGenerators (although right now there is only one based on WebView). It takes care of retrying in case of problems, recreates a new PoTokenGenerator if the current one expired, and finally returns a PoTokenResult. A PoTokenResult contains two poTokens: one for the specific requested video id (used to fetch the player), and another that can be generated only once as the first thing and is specific to a visitor data (used in streaming urls).

Let me know which places are not documented enough.

Copy link

@Profpatsch
Copy link
Contributor

@Stypox I think it would be good to include this documentation into the source code somewhere, maybe in the interface module.

@Profpatsch
Copy link
Contributor

So that people who want to understand the code later don’t have to find this PR and looks through lots of issues first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/large PRs with less than 750 changed lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[YouTube] HTTP error 403 for playback or download
4 participants