You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several aspects of the OTel-Arrow admission control mechanism are broken or not working as intended.
As a matter of design, the admission.BoundedQueue has a source of complexity, which causes it to have fallible APIs. The Arrow admission path was performing multiple calls to Acquire, once with compressed data and once with uncompressed data. This leads to error handling that could be avoided if it were not for the complication.
The fallible APIs as used in the OTLP code path (internal/{traces,logs,metrics}) are returning control before finishing the call to obsrecv, so no observability happens when admission control fails. This is a major bug.
There is a race condition in the context-cancelled exit path from Acquire(), in case the waiter was already admitted by a concurrent call to Release(). This condition causes the semaphore to leak, potentially. This is a minor bug.
The semaphore is obeying FIFO discipline, but not intentionally. The internal-to-Lightstep code on which this is modeled uses LIFO for reasons documented here. This is not working as intended.
Proposed solution
First, eliminate the complication that necessitates fallible APIs. The problem is the two calls to Acquire() once with compressed size and once with uncompressed size. Because compressed size is typically so much smaller the uncompressed size, the advantage of these two Acquire calls does not outweigh the complexity cost.
Therefore, we can eliminate fallible APIs from the admission package. This may be done by returning a closure from Acquire() to perform the correct release. The potential for mis-use is greatly reduced.
The bounded queue implementation should transition to LIFO. To avoid fallible APIs, transition to LIFO, and fix the race condition is a substantial change. The BoundedQueue tests will be completely rewritten.
Finally, the OTel-Arrow receiver should perform admission control Acquire() once after it computes the uncompressed size of the request, meaning it will stop using the otlp-pdata-size header. The OTel-Arrow exporter should continue to emit this header while older receivers are still in use, but it can be removed eventually.
Collector version
v0.111.0
Environment information
Environment
Any.
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
#### Description
Adds a no-op implementation of the BoundedQueue used by the OTel-Arrow
receiver for admission control.
#### Link to tracking issue
Part of #36074.
#### Testing
Adds a new end-to-end test to verify admission control with/without
waiters and unbounded. The test was taken from #36033.
#### Documentation
Added: "0" request_limit_mib indicates no admission control.
Component(s)
receiver/otelarrow
What happened?
Description
Several aspects of the OTel-Arrow admission control mechanism are broken or not working as intended.
admission.BoundedQueue
has a source of complexity, which causes it to have fallible APIs. The Arrow admission path was performing multiple calls to Acquire, once with compressed data and once with uncompressed data. This leads to error handling that could be avoided if it were not for the complication.obsrecv
, so no observability happens when admission control fails. This is a major bug.Proposed solution
First, eliminate the complication that necessitates fallible APIs. The problem is the two calls to Acquire() once with compressed size and once with uncompressed size. Because compressed size is typically so much smaller the uncompressed size, the advantage of these two Acquire calls does not outweigh the complexity cost.
Therefore, we can eliminate fallible APIs from the admission package. This may be done by returning a closure from Acquire() to perform the correct release. The potential for mis-use is greatly reduced.
The bounded queue implementation should transition to LIFO. To avoid fallible APIs, transition to LIFO, and fix the race condition is a substantial change. The BoundedQueue tests will be completely rewritten.
Finally, the OTel-Arrow receiver should perform admission control Acquire() once after it computes the uncompressed size of the request, meaning it will stop using the otlp-pdata-size header. The OTel-Arrow exporter should continue to emit this header while older receivers are still in use, but it can be removed eventually.
Collector version
v0.111.0
Environment information
Environment
Any.
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: