Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[<Ray component: Data] - inconsistent URL handling in Ray's Databricks integration #49925

Open
leibovitzgil opened this issue Jan 17, 2025 · 0 comments · May be fixed by #49926
Open

[<Ray component: Data] - inconsistent URL handling in Ray's Databricks integration #49925

leibovitzgil opened this issue Jan 17, 2025 · 0 comments · May be fixed by #49926
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@leibovitzgil
Copy link

What happened + What you expected to happen

The current Databricks integration in Ray Data requires providing the Databricks host URL without the "https://" prefix. However, this creates compatibility issues when using Ray Data alongside MLflow, as MLflow's Databricks integration (which uses the same DATABRICKS_HOST environment variable) expects the URL to include the "https://" prefix.

Versions / Dependencies

ray[data]==2.40.0
mlflow==2.10.2

Reproduction script

class SomeModel:
def init(self):
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
fr_model_path = "models:/SOME_ML_FLOW_PATH"
model = mlflow.pytorch.load_model(model_uri=fr_model_path)
model.to(self.device)
model.eval()

ds = read_databricks_tables(
warehouse_id="xxxx",
catalog="xxx",
schema="xxx",
query=SOME_SQL_QUERY
)

df = (
ds
.map(
SomeModel,
concurrency=5
))

Issue Severity

High: It blocks me from completing my task.

@leibovitzgil leibovitzgil added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 17, 2025
@leibovitzgil leibovitzgil linked a pull request Jan 17, 2025 that will close this issue
8 tasks
@jcotant1 jcotant1 added the data Ray Data-related issues label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants