Skip to content

Conversation

@OliverGilan
Copy link

@OliverGilan OliverGilan commented Dec 13, 2025

This PR refactors the token-based authentication for the Postgres integration to support automatic, thread-safe token refreshing for AWS IAM and Azure Managed Identity. This change fixes a bug where connections would fail upon token expiration by proactively managing the token lifecycle within the connection pool.

The core of the change is a new TokenProvider abstract class, which centralizes token fetching, caching, and refreshing logic. Concrete implementations for AWSTokenProvider and AzureTokenProvider encapsulate the platform-specific details.

A new TokenAwareConnection class is introduced, which utilizes a TokenProvider to retrieve a valid token for the password just before establishing a connection. The ConnectionPoolManager has been updated to manage these providers, ensuring that connections remain resilient to token expiration.

Finally, comprehensive unit and integration tests have been added for the new token provider mechanism, validating its caching, refresh logic, and thread safety.

Description generated by Mesa. Update settings

…gres integration (DataDog#21503)

* Update connection pool to handle cloud provider token refreshing

* run format

* Fix kwarg usernname

* fixup

* Add token provider tests

* Lint

* Add changelog

---------

Co-authored-by: Seth Samuel <[email protected]>
@OliverGilan OliverGilan marked this pull request as ready for review December 13, 2025 01:34
@OliverGilan
Copy link
Author

/describe

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of 32e0fdf...63d2d29

Analysis

  1. The TokenAwareConnection implementation uses a class-level token_provider attribute, creating global state that will cause conflicts in multi-instance deployments where the last-configured provider overwrites others.

  2. The Azure token provider incorrectly expects an expires_at field instead of expires_on on Azure's AccessToken objects, which will cause runtime errors in production as the tests aren't aligned with real SDK behavior.

  3. While the token provider abstraction is architecturally sound with good separation of concerns, these implementation issues compromise the reliability of the authentication mechanism across different database configurations.

Tip

Help

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

5 files reviewed | 3 comments | Edit Agent SettingsRead Docs

"""Test Azure token fetching with custom scope."""
mock_token = Mock()
mock_token.token = "azure_token_123"
mock_token.expires_at = 1900.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium

This test mock uses expires_at attribute which doesn't exist on Azure's AccessToken. The actual AccessToken object has an expires_on attribute. This incorrect mock masks the bug in the implementation and should be changed to expires_on to match the real Azure SDK behavior.

Agent: 🤖 Integrations reviewer • Fix in Cursor • Fix in Claude

Prompt for Agent
Task: Address review feedback left on GitHub.
Repository: mesa-dot-dev/integrations-core#5
File: postgres/tests/test_token_provider.py#L198
Action: Open this file location in your editor, inspect the highlighted code, and resolve the issue described below.

Feedback:
This test mock uses `expires_at` attribute which doesn't exist on Azure's AccessToken. The actual AccessToken object has an `expires_on` attribute. This incorrect mock masks the bug in the implementation and should be changed to `expires_on` to match the real Azure SDK behavior.

from .azure import generate_managed_identity_token

token = generate_managed_identity_token(client_id=self.client_id, identity_scope=self.identity_scope)
return token.token, float(token.expires_at)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High

Azure's AccessToken object uses expires_on attribute, not expires_at. This will cause an AttributeError at runtime when the AzureTokenProvider attempts to access token.expires_at. The correct attribute is expires_on which contains the expiration time as seconds since epoch.

Agent: 🤖 Integrations reviewer • Fix in Cursor • Fix in Claude

Prompt for Agent
Task: Address review feedback left on GitHub.
Repository: mesa-dot-dev/integrations-core#5
File: postgres/datadog_checks/postgres/connection_pool.py#L92
Action: Open this file location in your editor, inspect the highlighted code, and resolve the issue described below.

Feedback:
Azure's AccessToken object uses `expires_on` attribute, not `expires_at`. This will cause an AttributeError at runtime when the AzureTokenProvider attempts to access `token.expires_at`. The correct attribute is `expires_on` which contains the expiration time as seconds since epoch.

"open": True,
}

TokenAwareConnection.token_provider = self.token_provider
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High

This stores the token provider as a class-level attribute, which creates a global state shared across all PostgreSQL check instances in the Agent process. If multiple PostgreSQL checks are configured (e.g., different hosts, or some with managed auth and others without), the last initialized check will overwrite this attribute, causing incorrect authentication behavior for other checks. The token provider should be scoped to the specific connection pool instance rather than the class.

Agent: 🤖 Integrations reviewer • Fix in Cursor • Fix in Claude

Prompt for Agent
Task: Address review feedback left on GitHub.
Repository: mesa-dot-dev/integrations-core#5
File: postgres/datadog_checks/postgres/connection_pool.py#L210
Action: Open this file location in your editor, inspect the highlighted code, and resolve the issue described below.

Feedback:
This stores the token provider as a class-level attribute, which creates a global state shared across all PostgreSQL check instances in the Agent process. If multiple PostgreSQL checks are configured (e.g., different hosts, or some with managed auth and others without), the last initialized check will overwrite this attribute, causing incorrect authentication behavior for other checks. The token provider should be scoped to the specific connection pool instance rather than the class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants