[hal][egl] Fix robust context creation failing on some devices #7952
+93
−39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
On certain devices, context creation would fail with the following log message:
Instance::new: failed to create Gl backend: InstanceError { message: "unable to create GLES 3.x context", source: Some(BadAttribute) }
This would surface up to the client as a
RequestAdapterError::NotFound
Typically, this would happen predominantly on Motorola devices (Moto G54 5G and Moto G73 5G) but also on some Samsung devices (Galaxy S24+), under both Android 14 and 15.
Why
The root of the issue seem to be a misunderstanding of how robust context creation interplays with the EGL 1.5 specification.
The current code assumes that if the EGL version is >= 1.5, then robustness must be supported (except for ANGLE specifically which apparently was already identified as being problematic and so was hardcoded to go to Ext directly).
However, from the spec (section 3.7.1.5, emphasis mine)
How the fix works
Unfortunately this is a bit of a cat-and-mouse game: we need to check the context extensions to know if the robustness parameter will be supported, and we need the robustness parameter to create the context...
The proposed fix simply tries Core -> Ext -> No robustness, in this order, starting from which makes the most sense (aka. Core if we have EGL >= 1.5, and then Ext if the EGL extension is defined), only retrying if the returned error is the specified
BAD_ATTRIBUTE
.Note: this eliminates the need for the hardcoded ANGLE IOP as it will fail with "Core" and go to "Ext", at which point it will supposedly succeed. In fact, the comment already had identified the correct underlying issue but simply built a specific IOP instead of a generic solution.
Note 2: in practice it seems that most devices I am encountering the issue with fail with "Core" and succeed with "Ext" (just as ANGLE, which is a strange behavior admittedly). One could possibly exchange the Ext then Core checks as it was done in a previous version of the code. It is not clear to me why the check was inverted in the linked commit, I'll let @kvark chime in. In any case, I believe this proposed solution is more robust (🥁)
Testing
Manually tested on some devices and rolled out to a subset of users for which the previous method was failing.
Checklist
cargo fmt
.taplo format
.cargo clippy --tests
. If applicable, add:--target wasm32-unknown-unknown
cargo xtask test
to run tests.If this contains user-facing changes, add aCHANGELOG.md
entry.