Skip to content

API shutdown doesn't await provider shutdowns #1745

@nicklasl

Description

@nicklasl

I've discovered an issue with provider cleanup when using OpenFeatureAPI.shutdown() that's causing telemetry data loss in our provider's shutdown phase.

The Problem:

When OpenFeatureAPI.getInstance().shutdown() is called, our provider's shutdown() method either doesn't execute or doesn't complete, resulting in telemetry data loss. However, calling provider.shutdown() directly or OpenFeatureAPI.getInstance().getProvider().shutdown() works perfectly.

Looking at ProviderRepository.java:

private void shutdownProvider(FeatureProvider provider) {
    taskExecutor.submit(() -> {
        try {
            provider.shutdown();
        } catch (Exception e) {
            log.error("Exception when shutting down feature provider {}",
                provider.getClass().getName(), e);
        }
    });
}

public void shutdown() {
    Stream.concat(Stream.of(this.defaultStateManger.get()), this.stateManagers.values().stream())
            .distinct()
            .forEach(this::shutdownProvider);
    this.stateManagers.clear();
    taskExecutor.shutdown();
}

The provider shutdown is submitted to a thread executor but never awaited. By comparison, EventSupport.shutdown() was updated to include awaitTermination(), but ProviderRepository wasn't.

Proposed Fix:

Add awaitTermination() to ProviderRepository.shutdown() similar to what was done for EventSupport:

public void shutdown() {
    // ... shutdown providers currently in stateManagers.values()
    taskExecutor.shutdown();
    try {
        if (!taskExecutor.awaitTermination(3, TimeUnit.SECONDS)) {
            log.warn("Provider shutdown did not complete within timeout");
            taskExecutor.shutdownNow();
        }
    } catch (InterruptedException e) {
        taskExecutor.shutdownNow();
        Thread.currentThread().interrupt();
    }
}

Open questions/Problems that needs to be handled:

  • We likely still want to shutdown the providers in parallell.
  • We might get undefined behaviour when adding providers to the sdk while the shutdown is in progress (the newly added provider might get shutdown too, or it might not get shutdown. We use concurrent data structures without locks here) -- This should already be a problem but it awaiting provider shutdowns might increase the risk for a race.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions