Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate httpPath by pulling from Databricks REST API #689

Open
edgararuiz opened this issue Dec 18, 2023 · 6 comments
Open

Automate httpPath by pulling from Databricks REST API #689

edgararuiz opened this issue Dec 18, 2023 · 6 comments

Comments

@edgararuiz
Copy link
Contributor

The Databricks REST API has a “SQL warehouse” endpoint, where you can pull the httpPath value from, here is an example using httr2, which I guess odbc would need to take a dependency against, unless y’all prefer to implement something like this at a lower level, here’s the example:

library(httr2)

warehouses <- paste0(
  Sys.getenv("DATABRICKS_HOST"),
  "/api/2.0/sql/warehouses"
) %>%
  request() %>%
  req_auth_bearer_token(Sys.getenv("DATABRICKS_TOKEN")) %>%
  req_perform() %>%
  resp_body_json()

# The path is in a specific location inside the object returned by the API
warehouses[[1]][[1]]$odbc_params$path

I was thinking it makes sense to automate seeing that it may be the only argument in databricks() that does not yet have a way, even in a happy path, for it to be automated.

@hadley
Copy link
Member

hadley commented Dec 19, 2023

I wonder why the driver doesn't automate this?

@edgararuiz
Copy link
Contributor Author

I think it is because the user can either use the SQL Warehouse service, or a running Spark cluster as the end-point for the ODBC connection: https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html#retrieve-the-connection-details

Wow, come to think of it, I bet there is a way to also pull the path from a cluster via the REST API, so you could potentially have it switch between the two (SQL warehouse or Spark cluster) if the user provides a cluster_id

@atheriel
Copy link
Collaborator

atheriel commented Dec 19, 2023

I wonder why the driver doesn't automate this?

Originally I did want to automate finding the httpPath, but there are a number of problems. First, the ODBC driver works for both of what Databricks calls "SQL Warehouses" and "clusters". These have different identifiers and different HTTP paths. So we might be able to swap out httpPath for more intuitive identifiers like clusterId and warehouse -- but the user would still have to provide those.

Secondly, Databricks's API only lists the HTTP path for SQL Warehouses. There's no equivalent field for clusters. If you poke around at their frontend code, you'll see that the HTTP path for clusters is computed programmatically from the "account ID" -- but there's no way to find the account ID for a given workspace/host from their official APIs, it's just injected into the frontend as a query parameter. So we'd need to wait for such a field to be exposed in their API. Other tools in the Databricks ecosystem require the user to provide the account ID to work out these paths.

@atheriel
Copy link
Collaborator

Right now the Databricks integration in RStudio Pro can do things like generate a connection snippet for a given cluster, and it has a full API client. I expect that as that integration becomes more sophisticated and grows support for SQL warehouses, it might be able to generate the httpPath parameter for an odbc::databricks() call, and even make more judicious choices about whether to point users to odbc::databricks() or sparklyr for clusters.

@hadley
Copy link
Member

hadley commented Dec 19, 2023

So probably best to leave alone in odbc, and assume a higher level tool will set appropriately?

@edgararuiz
Copy link
Contributor Author

Maybe do that in an Shiny based snippet, like inside odbc? With a checkbox for "Warehouse" and if it's checked, the snippet can do the REST API call to get httpPath?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants