Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add abstraction to Async API for using Spark #3252

Open
normanj-bitquill opened this issue Jan 16, 2025 · 1 comment
Open

[FEATURE] Add abstraction to Async API for using Spark #3252

normanj-bitquill opened this issue Jan 16, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@normanj-bitquill
Copy link
Contributor

Currently the async API is designed to use the AWS EMR service when it needs Spark to process a query. For example, the EMRServerlessClientFactory class is used for getting an instance of AWSEMRServerless.

https://github.com/opensearch-project/sql/blob/main/async-query/src/main/java/org/opensearch/sql/spark/transport/config/AsyncExecutorServiceModule.java#L250

https://github.com/opensearch-project/sql/blob/main/async-query-core/src/main/java/org/opensearch/sql/spark/client/EMRServerlessClientFactoryImpl.java#L62

There should be an abstraction here so that async API uses an abstract service to start and manage Spark jobs. EMR would be an implementation of this abstract service. Another possible implementation is to use Docker.

Is your feature request related to a problem?
No

What solution would you like?
An abstract service for managing Spark jobs, along with an EMR implementation of this service.

What alternatives have you considered?
None, open to discussion.

Do you have any additional context?
In this OpenSearch Spark PR, the aws-java-sdk-emrserverless Jar file is updated to replace the AWSEMRServerless implementation. There should be a cleaner way of replacing usage of the EMR service.

@penghuo
Copy link
Collaborator

penghuo commented Jan 28, 2025

@normanj-bitquill in case opensearch-project/opensearch-spark#1003 is merged, do we still need this interface. In my understanding, it is only for test purpose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants