-
Notifications
You must be signed in to change notification settings - Fork 51
Description
I've discovered an issue with provider cleanup when using OpenFeatureAPI.shutdown() that's causing telemetry data loss in our provider's shutdown phase.
The Problem:
When OpenFeatureAPI.getInstance().shutdown() is called, our provider's shutdown() method either doesn't execute or doesn't complete, resulting in telemetry data loss. However, calling provider.shutdown() directly or OpenFeatureAPI.getInstance().getProvider().shutdown() works perfectly.
Looking at ProviderRepository.java:
private void shutdownProvider(FeatureProvider provider) {
taskExecutor.submit(() -> {
try {
provider.shutdown();
} catch (Exception e) {
log.error("Exception when shutting down feature provider {}",
provider.getClass().getName(), e);
}
});
}
public void shutdown() {
Stream.concat(Stream.of(this.defaultStateManger.get()), this.stateManagers.values().stream())
.distinct()
.forEach(this::shutdownProvider);
this.stateManagers.clear();
taskExecutor.shutdown();
}The provider shutdown is submitted to a thread executor but never awaited. By comparison, EventSupport.shutdown() was updated to include awaitTermination(), but ProviderRepository wasn't.
Proposed Fix:
Add awaitTermination() to ProviderRepository.shutdown() similar to what was done for EventSupport:
public void shutdown() {
// ... shutdown providers currently in stateManagers.values()
taskExecutor.shutdown();
try {
if (!taskExecutor.awaitTermination(3, TimeUnit.SECONDS)) {
log.warn("Provider shutdown did not complete within timeout");
taskExecutor.shutdownNow();
}
} catch (InterruptedException e) {
taskExecutor.shutdownNow();
Thread.currentThread().interrupt();
}
}Open questions/Problems that needs to be handled:
- We likely still want to shutdown the providers in parallell.
- We might get undefined behaviour when adding providers to the sdk while the shutdown is in progress (the newly added provider might get shutdown too, or it might not get shutdown. We use concurrent data structures without locks here) -- This should already be a problem but it awaiting provider shutdowns might increase the risk for a race.