Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(GAQ): Add Redis Sentinel Support for Global Async Queries #29912

Merged
merged 8 commits into from
Aug 30, 2024

Conversation

nsivarajan
Copy link
Contributor

SUMMARY

This PR introduces a feature that allows the Async Query Manager to use a configured cache backend through the new GLOBAL_ASYNC_QUERIES_CACHE_BACKEND setting. To maintain backward compatibility, the existing GLOBAL_ASYNC_QUERIES_REDIS_CONFIG setting is still supported but will be deprecated in the future. Additionally, this update introduces support for Redis Sentinel caching, which eliminates the single point of failure associated with the previous standalone Redis configuration.

References taken from this Idea #28839

TESTING INSTRUCTIONS

Configure GLOBAL_ASYNC_QUERIES_CACHE_BACKEND in config.py or superset_config.py with the appropriate properties:
Set CACHE_TYPE to "RedisCache" for Redis cache backend.
Set CACHE_TYPE to "RedisSentinelCache" for Redis Sentinel cache backend.
Set CACHE_TYPE to None to fall back on the previous GLOBAL_ASYNC_QUERIES_REDIS_CONFIG.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@nsivarajan nsivarajan changed the title Feat (GAQ) : Add Redis Sentinel Support for Global Async Queries feat (GAQ) : Add Redis Sentinel Support for Global Async Queries Aug 10, 2024
@nsivarajan nsivarajan changed the title feat (GAQ) : Add Redis Sentinel Support for Global Async Queries feat : (GAQ) Add Redis Sentinel Support for Global Async Queries Aug 10, 2024
@nsivarajan nsivarajan changed the title feat : (GAQ) Add Redis Sentinel Support for Global Async Queries feat(GAQ): Add Redis Sentinel Support for Global Async Queries Aug 10, 2024
Copy link

codecov bot commented Aug 10, 2024

Codecov Report

Attention: Patch coverage is 67.94872% with 25 lines in your changes missing coverage. Please review.

Project coverage is 83.64%. Comparing base (76d897e) to head (4cc558a).
Report is 646 commits behind head on master.

Files with missing lines Patch % Lines
superset/async_events/cache_backend.py 59.45% 15 Missing ⚠️
superset/async_events/async_query_manager.py 69.69% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #29912       +/-   ##
===========================================
+ Coverage   60.48%   83.64%   +23.15%     
===========================================
  Files        1931      529     -1402     
  Lines       76236    38268    -37968     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    32009    -14105     
+ Misses      28017     6259    -21758     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.91% <43.58%> (-0.26%) ⬇️
javascript ?
mysql 76.73% <67.94%> (?)
postgres 76.80% <67.94%> (?)
presto 53.49% <61.53%> (-0.31%) ⬇️
python 83.64% <67.94%> (+20.15%) ⬆️
sqlite 76.27% <67.94%> (?)
unit 60.25% <44.87%> (+2.63%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nsivarajan nsivarajan marked this pull request as ready for review August 10, 2024 20:39
@dosubot dosubot bot added the global:async-query Related to Async Queries feature label Aug 10, 2024
@nsivarajan
Copy link
Contributor Author

HI @villebro , Could you please review this PR? It was created based on the idea discussed #28839

However, I recently noticed that there is a new SIP in the design phase to modernize the framework around the same topic: #29839 . I'm not sure if this PR will still add value before the new proposal is rolled out, but I would appreciate your feedback.

@villebro
Copy link
Member

@nsivarajan thanks for the PR - I think this PR is welcomed even if there's ongoing work to modernize the async task framework. Looks generally good, but let me do a proper review shortly (either today or tomorrow)

@nsivarajan
Copy link
Contributor Author

@villebro Thanks for the feedback! Looking forward to your review.

@nsivarajan
Copy link
Contributor Author

Hi @villebro,

I wanted to follow up on this PR. I understand you're busy, but I was hoping to get your review when you have a moment. Please let me know if there's any additional information I can provide or if there are any specific areas you'd like me to address.

Thanks again for your time and feedback!

@rusackas
Copy link
Member

@nsivarajan thank you for your patience in the meantime... good to see that CI is passing and there's no conflicts still, at least! :D

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @nsivarajan for this beautiful PR (the test coverage here is great)! Sorry for the long review time - there was a lot to take in, and I may need to do a new round after I get some feedback on these comments. Please let me know what you think, and I'll do my best to do a second (and hopefully final) pass shortly.

@@ -16,18 +16,27 @@
# under the License.
import logging
import uuid
from typing import Any, Literal, Optional
from typing import Any, Dict, Literal, Optional, Union
Copy link
Member

@villebro villebro Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can now use dict and | in type annotations as long as we do the following import at the beginning:

from __future__ iimport annotations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. I'll make sure to update the code accordingly.

Comment on lines +266 to +280
# Decode bytes to strings, decode_responses is not supported at RedisCache and RedisSentinelCache
if isinstance(self._cache, (RedisSentinelCacheBackend, RedisCacheBackend)):
decoded_results = [
(
event_id.decode("utf-8"),
{
key.decode("utf-8"): value.decode("utf-8")
for key, value in event_data.items()
},
)
for event_id, event_data in results
]
return (
[] if not decoded_results else list(map(parse_event, decoded_results))
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this part was mentioned in the description. Can you elaborate on the need for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RedisCache and RedisSentinelCache typically don't support built-in decoding like decode_responses. They return data in bytes, which needs to be manually decoded into the desired formats (e.g., strings).

https://github.com/pallets-eco/flask-caching/blob/master/src/flask_caching/backends/rediscache.py#L155

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I hadn't noticed we were using redis.Redis, I thought we were using RedisCache from flask_caching here. Thanks for clarifying.

Comment on lines 38 to 44
# Define the cache backends once as mocks
cache_backends = {
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend),
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend),
"redis.Redis": mock.Mock(spec=redis.Redis),
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this loop be replaced by pytest.mark.parametrize? It feels more idiomatic (unless I'm missing some reason why it needs to be implemented like this..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a great suggestion. I'll update the code with pytest.mark.parametrize and commit the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pytest.mark.parametrize with mock from unittest caused compatibility issues. Switching to parameterized.expand resolved these issues and successfully covered the test cases, leading to a successful CI run. I’ll revisit this case later for further investigation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivarajan I believe it's generally suggested to use pytest_mock with pytest test cases. I don't want to drag out this review process more than necessary, but I think the cleanest solution would be one with pytest.mark.parametrize and pytest_mock. LMKWYT, but in the meantime let me re-review the functional parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @villebro , for the review. I agree that using the standard approach is ideal. However, I've encountered continuous failures when combining pytest.mark.parametrize with mock, which suggests there might be compatibility issues. After some research, I found references that indicate potential conflicts when using pytest features with unittest.TestCase subclasses and mock (see pytest documentation, SeleniumBase issue, and pytest issue).

I noticed that our repository already uses parameterized.expand in similar cases, so I opted for it here to ensure the tests run smoothly.

Would it be possible to proceed with this PR using parameterized.expand? I'll continue investigating the issues with pytest.mark.parametrize and mock and plan to submit a follow-up PR if I find a viable solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting in the extra effort on these tests @nsivarajan ! Let's not let this derail your PR - I can also take a stab at migrating the remaining components to pytest (seems like a fun coffee break challenge).

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor observations during second review round

Comment on lines 46 to 52
# Define the cache backends once
cache_backends = {
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend),
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend),
"redis.Redis": mock.Mock(spec=redis.Redis),
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these are now redundant as we have the parametrized tests?

Suggested change
# Define the cache backends once
cache_backends = {
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend),
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend),
"redis.Redis": mock.Mock(spec=redis.Redis),
}
# Define the cache backends once
cache_backends = {
"RedisCacheBackend": mock.Mock(spec=RedisCacheBackend),
"RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend),
"redis.Redis": mock.Mock(spec=redis.Redis),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll get this removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in last commit

Comment on lines 38 to 44
# Define the cache backends once as mocks
# cache_backends = {
# "RedisCacheBackend": mock.Mock(spec=RedisCacheBackend),
# "RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend),
# "redis.Redis": mock.Mock(spec=redis.Redis),
# }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
# Define the cache backends once as mocks
# cache_backends = {
# "RedisCacheBackend": mock.Mock(spec=RedisCacheBackend),
# "RedisSentinelCacheBackend": mock.Mock(spec=RedisSentinelCacheBackend),
# "redis.Redis": mock.Mock(spec=redis.Redis),
# }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll get this removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in last commit

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the hard work here! I've left a few extra type annotation tips, but they're not blocking (we've already done enough iterating here). I'll add a new task to our 5.0 project board to remove the old config. If you feel up for it, please open a PR with the change that removes the old config option so we can merge it when the 5.0 breaking window is opened up.


def get_cache_backend(
config: dict[str, Any],
) -> Union[RedisCacheBackend, RedisSentinelCacheBackend, redis.Redis]: # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the new type annotations support this simpler format:

Suggested change
) -> Union[RedisCacheBackend, RedisSentinelCacheBackend, redis.Redis]: # type: ignore
) -> RedisCacheBackend | RedisSentinelCacheBackend | redis.Redis: # type: ignore

@@ -73,7 +103,7 @@ class AsyncQueryManager:

def __init__(self) -> None:
super().__init__()
self._redis: redis.Redis # type: ignore
self._cache: Optional[BaseCache] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
self._cache: Optional[BaseCache] = None
self._cache: BaseCache | None = None

@villebro villebro merged commit 103cd3d into apache:master Aug 30, 2024
38 of 39 checks passed
@nsivarajan
Copy link
Contributor Author

Thanks, @villebro, for the feedback and type annotation tips!

I appreciate your guidance throughout this process. I'll go ahead and open a new PR to remove the old config option and will reference the task once it's created on the project board, so it’ll be ready when the 5.0 breaking window opens up. Looking forward to wrapping this up and moving on to the next steps!

@villebro
Copy link
Member

Thanks @nsivarajan, I really appreciate the help! Here's the project card: https://github.com/orgs/apache/projects/345?pane=issue&itemId=78081192

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
global:async-query Related to Async Queries feature size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants