Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] databricks-aws/databricks-azure COMMAND does not work with latest Databricks v0.2.x #609

Closed
NvTimLiu opened this issue Oct 6, 2023 · 2 comments · Fixed by #614
Closed
Assignees
Labels
bug Something isn't working

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Oct 6, 2023

Describe the bug
Follow below README file (https://github.com/NVIDIA/spark-rapids-tools/blob/main/user_tools/docs/user-tools-databricks-aws.md) to run rapids tools command:

spark_rapids_user_tools databricks-aws profiling --eventlogs /tmp/eventlogs --gpu_cluster 'test-aws-12.2' --tools_jar ./rapids-4-spark-tools_2.12-23.08.2-SNAPSHOT.jar

Got below error:

ERROR root: Profiling. Raised an error in phase [Process-Arguments]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/rapids/rapids_tool.py", line 108, in wrapper
    func_cb(self, *args, **kwargs)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/rapids/rapids_tool.py", line 152, in _process_arguments
    self._process_custom_args()
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/rapids/profiling.py", line 62, in _process_custom_args
    self._process_offline_cluster_args()
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/rapids/profiling.py", line 69, in r_args
    if self._process_gpu_cluster_args(offline_cluster_opts):
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/rapids/profiling.py", line 81, in gs
    gpu_cluster_obj = self._create_migration_cluster('GPU', gpu_cluster_arg)
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/rapids/rapids_tool.py", line 629, in er
    cluster_obj = self.ctxt.platform.connect_cluster_by_name(cluster_arg)
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/cloud_api/sp_types.py", line 794, in 
    cluster_props = self.cli.pull_cluster_props_by_args(args={'cluster': cluster})
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/cloud_api/databricks_aws.py", line 119, in rgs
    cluster_described = self.run_sys_cmd(get_cluster_cmd)
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/cloud_api/sp_types.py", line 464, in run_sys_cmd
    return sys_cmd.exec()
  File "/usr/local/lib/python3.8/dist-packages/spark_rapids_pytools/common/utilities.py", line 363, in exec
    raise RuntimeError(f'{cmd_err_msg}')
RuntimeError: Error invoking CMD <databricks clusters get --profile DEFAULT --cluster-name tim-user-tools-profiling-aws-
	| Error: unknown flag: --cluster-name
	| 
	| Usage:
	|   databricks clusters get CLUSTER_ID [flags]
	| 
	| Flags:
	|   -h, --help               help for get
	|       --no-wait            do not wait to reach RUNNING state
	|       --timeout duration   maximum amount of time to reach RUNNING state (default 20m0s)
	| 
	| Global Flags:
	|       --log-file file            file to write logs to (default stderr)
	|       --log-format type          log output format (text or json) (default text)
	|       --log-level format         log level (default disabled)
	|   -o, --output type              output type: text or json (default text)
	|   -p, --profile string           ~/.databrickscfg profile
	|       --progress-format format   format for progress logs (append, inplace, json) (default default)
	|   -t, --target string            bundle target to use (if applicable)
	| 

Processing Completed!
script returned exit code 1

As the latest Databricks CLI version(v0.2.x, e.g. databricks clusters get CLUSTER_ID) is not compatible with the old ones (v0.1.x, e.g. databricks clusters get --cluster-id/--cluster-name CLUSTER_ID/CLUSTER_NAME)

---------------------------------------- Databricks CLI v0.1.x --------------------------------------------------
root@b21879072023:/# databricks --version
Version 0.17.6

root@b21879072023:/# databricks clusters get -h
Usage: databricks clusters get [OPTIONS]

  Retrieves metadata about a cluster.

Options:
  --cluster-id CLUSTER_ID    Can be found in the URL at https://*.cloud.databr
                             icks.com/#/setting/clusters/$CLUSTER_ID/configura
                             tion.
  --cluster-name CLUSTER_ID  Can be found in the URL at https://*.cloud.databr
                             icks.com/#/setting/clusters/$CLUSTER_ID/configura
                             tion.
  --debug                    Debug Mode. Shows full stack trace on error.
  --profile TEXT             CLI connection profile to use. The default
                             profile is "DEFAULT".
  -h, --help                 Show this message and exit.
root@b21879072023:/# 


------------------------------------------------- Databricks CLI v0.2.x -----------------------------------------
~$ databricks --version
Databricks CLI v0.207.0
~$ databricks clusters get -h
Get cluster info.
  
  Retrieves the information for a cluster given its identifier. Clusters can be
  described while they are running, or up to 60 days after they are terminated.

Usage:
  databricks clusters get CLUSTER_ID [flags]

Flags:
  -h, --help               help for get
      --no-wait            do not wait to reach RUNNING state
      --timeout duration   maximum amount of time to reach RUNNING state (default 20m0s)

Global Flags:
      --log-file file            file to write logs to (default stderr)
      --log-format type          log output format (text or json) (default text)
      --log-level format         log level (default disabled)
  -o, --output type              output type: text or json (default text)
  -p, --profile string           ~/.databrickscfg profile
      --progress-format format   format for progress logs (append, inplace, json) (default default)
  -t, --target string            bundle target to use (if applicable)

https://github.com/NVIDIA/spark-rapids-tools/blob/dev/user_tools/src/spark_rapids_pytools/cloud_api/databricks_aws.py#L116-L120

Steps/Code to reproduce bug
Follow below README file (https://github.com/NVIDIA/spark-rapids-tools/blob/main/user_tools/docs/user-tools-databricks-aws.md) to run rapids tools command:

spark_rapids_user_tools databricks-aws profiling --eventlogs /tmp/eventlogs --gpu_cluster 'test-aws-12.2' --tools_jar ./rapids-4-spark-tools_2.12-23.08.2-SNAPSHOT.jar

Expected behavior
spark_rapids_user_tools databricks-aws should work with latest Databricks CLI

@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage labels Oct 6, 2023
@NvTimLiu
Copy link
Collaborator Author

@mattahrens Can you help to take a look? Thanks!

@mattahrens
Copy link
Collaborator

@cindyyuanjiang have you take a look at this issue related to the Databricks CLI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants