Skip to content

dynamic_modules: add remote HTTP support for asyncDataSource module binary#43818

Open
kanurag94 wants to merge 6 commits intoenvoyproxy:mainfrom
kanurag94:dym-module-http
Open

dynamic_modules: add remote HTTP support for asyncDataSource module binary#43818
kanurag94 wants to merge 6 commits intoenvoyproxy:mainfrom
kanurag94:dym-module-http

Conversation

@kanurag94
Copy link
Contributor

@kanurag94 kanurag94 commented Mar 6, 2026

Commit Message: dynamic_modules: add remote HTTP support for asyncDataSource module binary
Additional Description: Extends the asyncDataSource support (added in #43333 for local.filename) to support fetching dynamic module binaries from remote HTTP sources via module.remote in DynamicModuleConfig. The module is downloaded asynchronously during listener initialization with SHA256 verification, written to a temporary file, and loaded via dlopen. If the remote fetch fails, the filter is not installed and requests pass through (fail-open).
Risk Level: Low
Testing: Added unit tests + Manually tested
Docs Changes: API docs already a part of this PR
Release Notes: Added

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #43818 was opened by kanurag94.

see: more, trace.

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
@kanurag94
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to fetch dynamic modules from remote HTTP sources asynchronously. However, critical security vulnerabilities were identified in the caching and temporary file handling mechanisms. These include the use of the world-writable /tmp directory with predictable filenames and a lack of integrity verification for cached files, which could lead to local privilege escalation and arbitrary code execution. The staging file creation is also vulnerable to symlink attacks and race conditions. Furthermore, a critical use-after-free bug was found in the asynchronous callback implementation.

Comment on lines +146 to +147
[weak_state, proto_config_copy, module_config_copy, &context,
&scope](const std::string& data) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The lambda for the RemoteAsyncDataProvider captures context and scope by reference. These references point to objects that are typically allocated on the stack during the configuration loading process. Since the lambda is executed asynchronously after the remote data fetch completes, the original stack frame will be gone, leading to dangling references and a use-after-free. This is a critical issue that could lead to crashes.

To resolve this, you should avoid capturing stack variables by reference in an asynchronous callback. Instead, you should extract the required long-lived objects (e.g., from context.api(), context.mainThreadDispatcher(), etc.) and capture those. This will likely require refactoring buildFilterFactoryCallback to accept these individual components instead of the entire ServerFactoryContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ServerFactoryContext is long-lived and not stack allocated.
  • stats::scope is owned by listener and is again long lived.
  • RemoteAsyncDataProvider is fired during init so both references are guaranteed valid.

This comment seems wrong.

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
@kanurag94
Copy link
Contributor Author

/retest

1 similar comment
@kanurag94
Copy link
Contributor Author

/retest

@kanurag94 kanurag94 changed the title dynamic_modules: asyncDataSource - http support dynamic_modules: add remote HTTP support for asyncDataSource module binary Mar 9, 2026
@kanurag94 kanurag94 marked this pull request as ready for review March 9, 2026 03:38
@repokitteh-read-only
Copy link

CC @envoyproxy/coverage-shephards: FYI only for changes made to (test/coverage.yaml).
envoyproxy/coverage-shephards assignee is @RyanTheOptimist
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @wbpcode
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #43818 was ready_for_review by kanurag94.

see: more, trace.

source/extensions/filters/http/cache: 95.9
source/extensions/filters/http/dynamic_forward_proxy: 94.8
source/extensions/filters/http/dynamic_modules: 95.2
source/extensions/filters/http/dynamic_modules: 95.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to decrease the coverage here? is there no way to have an integration test to cover everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage drop comes mainly from defensive paths:

  • mkstemp() failure -- don't know how clearly but probably a exhaustion of FDs should only trigger this.
  • write() failure -- needs interrupt signals or disk permission issues
  • std::filesystem::permissions/rename failure -- needs mocking read only permissions.
  • weak_ptr expiry -- needs a race condition to trigger.

They'd need exceptional situations to happen. All of them are very tought for UTs so I lowered the coverage instead.

Writing an integration test is quite hard for this one, if the approach looks good -- I can give it a try.

I have tried to increase coverage back with some possible paths that can be tested.

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
cb_or_error.status().message());
return;
}
state->filter_factory_cb = cb_or_error.value();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the following lambda will be called at any threads, I think there may result in unexpected contention? here?

[async_state](Http::FilterChainFactoryCallbacks& callbacks) -> void {
    if (async_state->filter_factory_cb) {
      async_state->filter_factory_cb(callbacks);
    }
  };

As the first version, to simplify the review, we I think we and use a lock and left a TODO here. Then, we can resolve the lock with another PR. cc @agrawroh WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, the remote fetching will pause the warming...so, it may be safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants