Call Databricks API from Logic Apps

Recently I needed to help a customer to call Databricks API and since there are many ways to do this I must start by scoping the scenario

  • This is Azure Databricks not Databricks on another cloud provider.
  • Authentication can be done by 3 ways
  • In this scenario we chose using service principal because it will be used by a service because I’d like to keep all the identities centralized in Azure AD for easy management.
  • The APIs that we needed are to list running clusters and terminate them. auto-terminate wasn’t an option because of some restrictions related to this implementation.

The chosen service to run the automation

We chose Logic Apps for simplicity however all what we are doing is calling REST APIs so whether it’s logic apps, Function app, automation runbook or any other service hosted inside a VM it’s the same concept

The workflow

The workflow I’m using as illustrated by the diagram below

  • Get Access token for the Databricks login app
  • Get Access token for the Azure management endpoint
  • Use the two tokens when calling any Databricks API

But why two access tokens?

​ Because Databricks is very well integrated into Azure using the Databricks resource provider, some APIs requires Azure management (think of anything you can change from the Azure portal) and some require login to the Databricks workspace (i.e listing and updating clusters) however the APIs designed in a way to require both tokens for all of them (or at least up to my knowledge). For that we have to do two API calls to the Azure AD login endpoint

What’s he difference between the two API calls?

Both are identical except for the resource to get the access token to. In Azure AD, you must specify why do you need access token. Meaning what resource you want to access by this token so Azure will get you a token *only* for this service. For the Databricks login app, you must use the guid “2ff814a6-3304-4ab8-85cb-cd0e6f879c1d” which if you navigate to Azure portal and searched for this id, you will find it associated with enterprise app named AzureDatabricks. This is the app representing Databricks to facilitate login to the workspace

Collecting the requirements

App info

To start getting the access tokens, we need the service principal info. Provided that you have app registration already created in Azure AD. The process to provision he service principal is documented well in the docs so no need to repeat it

In this context we can use Azure AD app & service principal interchangeably. However they are not both the same. App is one instance that can be shared across multiple directories (Databricks login app is example ) and the service principal is the representation of this app inside the directory. When we authorize, we authorize the service principal not the app.

From the App info page, collect

  • Client ID
  • Tenant ID
  • Then navigate to the Certificates & Secrets page from the left navigation bar and generate a secret.

Consider all these information as secrets and you should keep them safely in a keyvault or a similar secret management solution. The logic app I’m including with this article expect all these as input so it doesn’t save or retrieve secrets. I made it this way to be re-usable. More to this later

Azure Resource info

Databricks workspace is an Azure resource, you need to collect

  • subscription id
  • resource group name
  • workspace name

We will use them later inside the log app to generate the resource id

Databricks instance

All the Databricks URLs are using the instance name which is what comes before azuredatabricks.net in the URL when you login to the Databricks UI. It’s auto generated and usually starts with adb- then numbers

The logic app

The complete code of the app at the end of this article. I’ll go through the main steps with some description

The logic app is triggered by an http trigger. This way I can call it from another logic app that fetch the secrets from key vault. So I can reuse and share this one without worrying about secret management

This is the first access token. we get Azure AD access token for the Databricks login app that will be used to access the Databricks instance

This step is followed by a step to parse the return json to get the access token out of it.

Next is to issue almost identical REST API call to authenticate with only one difference is the resource=https://management.core.windows.net/

This is the first API call to Databricks. There’s another one later in the app but the principal is the same so I’ll explain here this one only.

URL: The URL is on the format **https://..azuredatabricks.net/api/2.0/clusters/list** so I concatenated the input parameter into the URL

Headers:

​ Authorization: the concatenation of the keyword Bearer and the access token we got for the Databricks login app (where the resource is the app id)

​ X-Databricks-Azure-SP-Management-Token: The access token (without Bearer keyword) of the Azure management endpoint

​ X-Databricks-Azure-Workspace-Resource-Id: The resource Id of the workspace, I used the input parameters of the workspace name, resource group name and subscription id to create it.

And here’s the complete code of the logic app

				
					{
    "definition": {
        "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
        "actions": {
            "For_each": {
                "actions": {
                    "Condition": {
                        "actions": {
                            "Deallocate_Cluster": {
                                "inputs": {
                                    "body": {
                                        "cluster_id": "@items('For_each')?['cluster_id']"
                                    },
                                    "headers": {
                                        "Authorization": "Bearer @{body('Parse_Azure_AD_Access_Token')?['access_token']}",
                                        "X-Databricks-Azure-SP-Management-Token": "@body('Parse_Management_Endpoint_Token')?['access_token']",
                                        "X-Databricks-Azure-Workspace-Resource-Id": "/subscriptions/@{triggerBody()?['subscription']}/resourceGroups/@{triggerBody()?['resourceGroup']}/providers/Microsoft.Databricks/workspaces/@{triggerBody()?['workspaceName']}"
                                    },
                                    "method": "POST",
                                    "uri": "@{concat('https://',triggerBody()?['databricksInstance'],'.azuredatabricks.net/api/2.0/clusters/delete')}"
                                },
                                "runAfter": {},
                                "type": "Http"
                            }
                        },
                        "expression": {
                            "and": [
                                {
                                    "equals": [
                                        "@items('For_each')?['state']",
                                        "RUNNING"
                                    ]
                                }
                            ]
                        },
                        "runAfter": {},
                        "type": "If"
                    }
                },
                "foreach": "@body('Select')",
                "runAfter": {
                    "Select": [
                        "Succeeded"
                    ]
                },
                "type": "Foreach"
            },
            "Get_Azure_AD_Access_Token": {
                "inputs": {
                    "body": "grant_type=client_credentials\n&client_id=@{triggerBody()?['client_id']}\n&resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d\n&client_secret=@{triggerBody()?['client_secret']}",
                    "headers": {
                        "Content-Type": "application/x-www-form-urlencoded"
                    },
                    "method": "POST",
                    "uri": "@{concat('https://login.microsoftonline.com/',triggerBody()?['tenantID'],'/oauth2/token')}"
                },
                "runAfter": {},
                "type": "Http"
            },
            "Get_Management_endpoint_token": {
                "inputs": {
                    "body": "grant_type=client_credentials\n&client_id=@{triggerBody()?['client_id']}\n&resource=https://management.core.windows.net/\n&client_secret=@{triggerBody()?['client_secret']}",
                    "headers": {
                        "Content-Type": "application/x-www-form-urlencoded"
                    },
                    "method": "POST",
                    "uri": "@{concat('https://login.microsoftonline.com/',triggerBody()?['tenantID'],'/oauth2/token')}"
                },
                "runAfter": {
                    "Parse_Azure_AD_Access_Token": [
                        "Succeeded"
                    ]
                },
                "type": "Http"
            },
            "List_Clusters": {
                "inputs": {
                    "headers": {
                        "Authorization": "Bearer @{body('Parse_Azure_AD_Access_Token')?['access_token']}",
                        "X-Databricks-Azure-SP-Management-Token": "@body('Parse_Management_Endpoint_Token')?['access_token']",
                        "X-Databricks-Azure-Workspace-Resource-Id": "/subscriptions/@{triggerBody()?['subscription']}/resourceGroups/@{triggerBody()?['resourceGroup']}/providers/Microsoft.Databricks/workspaces/@{triggerBody()?['workspaceName']}"
                    },
                    "method": "GET",
                    "uri": "@{concat('https://',triggerBody()?['databricksInstance'],'.azuredatabricks.net/api/2.0/clusters/list')}"
                },
                "runAfter": {
                    "Parse_Management_Endpoint_Token": [
                        "Succeeded"
                    ]
                },
                "type": "Http"
            },
            "Parse_Azure_AD_Access_Token": {
                "inputs": {
                    "content": "@body('Get_Azure_AD_Access_Token')",
                    "schema": {
                        "properties": {
                            "access_token": {
                                "type": "string"
                            },
                            "expires_in": {
                                "type": "string"
                            },
                            "expires_on": {
                                "type": "string"
                            },
                            "ext_expires_in": {
                                "type": "string"
                            },
                            "not_before": {
                                "type": "string"
                            },
                            "resource": {
                                "type": "string"
                            },
                            "token_type": {
                                "type": "string"
                            }
                        },
                        "type": "object"
                    }
                },
                "runAfter": {
                    "Get_Azure_AD_Access_Token": [
                        "Succeeded"
                    ]
                },
                "type": "ParseJson"
            },
            "Parse_Clusters": {
                "inputs": {
                    "content": "@body('List_Clusters')",
                    "schema": {
                        "properties": {
                            "clusters": {
                                "items": {
                                    "properties": {
                                        "autoscale": {
                                            "properties": {
                                                "max_workers": {
                                                    "type": "integer"
                                                },
                                                "min_workers": {
                                                    "type": "integer"
                                                }
                                            },
                                            "type": "object"
                                        },
                                        "autotermination_minutes": {
                                            "type": "integer"
                                        },
                                        "azure_attributes": {
                                            "properties": {
                                                "availability": {
                                                    "type": "string"
                                                },
                                                "first_on_demand": {
                                                    "type": "integer"
                                                },
                                                "spot_bid_max_price": {
                                                    "type": "integer"
                                                }
                                            },
                                            "type": "object"
                                        },
                                        "cluster_cores": {
                                            "type": "integer"
                                        },
                                        "cluster_id": {
                                            "type": "string"
                                        },
                                        "cluster_memory_mb": {
                                            "type": "integer"
                                        },
                                        "cluster_name": {
                                            "type": "string"
                                        },
                                        "cluster_source": {
                                            "type": "string"
                                        },
                                        "creator_user_name": {
                                            "type": "string"
                                        },
                                        "default_tags": {
                                            "properties": {
                                                "ClusterId": {
                                                    "type": "string"
                                                },
                                                "ClusterName": {
                                                    "type": "string"
                                                },
                                                "Creator": {
                                                    "type": "string"
                                                },
                                                "Vendor": {
                                                    "type": "string"
                                                }
                                            },
                                            "type": "object"
                                        },
                                        "driver": {
                                            "properties": {
                                                "host_private_ip": {
                                                    "type": "string"
                                                },
                                                "instance_id": {
                                                    "type": "string"
                                                },
                                                "node_id": {
                                                    "type": "string"
                                                },
                                                "private_ip": {
                                                    "type": "string"
                                                },
                                                "public_dns": {
                                                    "type": "string"
                                                },
                                                "start_timestamp": {
                                                    "type": "integer"
                                                }
                                            },
                                            "type": "object"
                                        },
                                        "driver_node_type_id": {
                                            "type": "string"
                                        },
                                        "enable_elastic_disk": {
                                            "type": "boolean"
                                        },
                                        "enable_local_disk_encryption": {
                                            "type": "boolean"
                                        },
                                        "executors": {
                                            "items": {
                                                "properties": {
                                                    "host_private_ip": {
                                                        "type": "string"
                                                    },
                                                    "instance_id": {
                                                        "type": "string"
                                                    },
                                                    "node_id": {
                                                        "type": "string"
                                                    },
                                                    "private_ip": {
                                                        "type": "string"
                                                    },
                                                    "public_dns": {
                                                        "type": "string"
                                                    },
                                                    "start_timestamp": {
                                                        "type": "integer"
                                                    }
                                                },
                                                "required": [
                                                    "public_dns",
                                                    "node_id",
                                                    "instance_id",
                                                    "start_timestamp",
                                                    "host_private_ip",
                                                    "private_ip"
                                                ],
                                                "type": "object"
                                            },
                                            "type": "array"
                                        },
                                        "init_scripts_safe_mode": {
                                            "type": "boolean"
                                        },
                                        "jdbc_port": {
                                            "type": "integer"
                                        },
                                        "last_state_loss_time": {
                                            "type": "integer"
                                        },
                                        "node_type_id": {
                                            "type": "string"
                                        },
                                        "spark_conf": {
                                            "properties": {
                                                "spark.databricks.delta.preview.enabled": {
                                                    "type": "string"
                                                }
                                            },
                                            "type": "object"
                                        },
                                        "spark_context_id": {
                                            "type": "integer"
                                        },
                                        "spark_env_vars": {
                                            "properties": {
                                                "PYSPARK_PYTHON": {
                                                    "type": "string"
                                                }
                                            },
                                            "type": "object"
                                        },
                                        "spark_version": {
                                            "type": "string"
                                        },
                                        "start_time": {
                                            "type": "integer"
                                        },
                                        "state": {
                                            "type": "string"
                                        },
                                        "state_message": {
                                            "type": "string"
                                        },
                                        "terminated_time": {
                                            "type": "integer"
                                        },
                                        "termination_reason": {
                                            "properties": {
                                                "code": {
                                                    "type": "string"
                                                },
                                                "parameters": {
                                                    "properties": {
                                                        "username": {
                                                            "type": "string"
                                                        }
                                                    },
                                                    "type": "object"
                                                },
                                                "type": {
                                                    "type": "string"
                                                }
                                            },
                                            "type": "object"
                                        }
                                    },
                                    "required": [
                                        "cluster_id",
                                        "spark_context_id",
                                        "cluster_name",
                                        "spark_version",
                                        "spark_conf",
                                        "node_type_id",
                                        "driver_node_type_id",
                                        "spark_env_vars",
                                        "autotermination_minutes",
                                        "enable_elastic_disk",
                                        "cluster_source",
                                        "enable_local_disk_encryption",
                                        "azure_attributes",
                                        "state",
                                        "state_message",
                                        "start_time",
                                        "last_state_loss_time",
                                        "autoscale",
                                        "default_tags",
                                        "creator_user_name",
                                        "init_scripts_safe_mode"
                                    ],
                                    "type": "object"
                                },
                                "type": "array"
                            }
                        },
                        "type": "object"
                    }
                },
                "runAfter": {
                    "List_Clusters": [
                        "Succeeded"
                    ]
                },
                "type": "ParseJson"
            },
            "Parse_Management_Endpoint_Token": {
                "inputs": {
                    "content": "@body('Get_Management_endpoint_token')",
                    "schema": {
                        "properties": {
                            "access_token": {
                                "type": "string"
                            },
                            "expires_in": {
                                "type": "string"
                            },
                            "expires_on": {
                                "type": "string"
                            },
                            "ext_expires_in": {
                                "type": "string"
                            },
                            "not_before": {
                                "type": "string"
                            },
                            "resource": {
                                "type": "string"
                            },
                            "token_type": {
                                "type": "string"
                            }
                        },
                        "type": "object"
                    }
                },
                "runAfter": {
                    "Get_Management_endpoint_token": [
                        "Succeeded"
                    ]
                },
                "type": "ParseJson"
            },
            "Select": {
                "inputs": {
                    "from": "@body('Parse_Clusters')?['clusters']",
                    "select": {
                        "cluster_id": "@item()?['cluster_id']",
                        "cluster_name": "@item()?['cluster_name']",
                        "state": "@item()?['state']"
                    }
                },
                "runAfter": {
                    "Parse_Clusters": [
                        "Succeeded"
                    ]
                },
                "type": "Select"
            }
        },
        "contentVersion": "1.0.0.0",
        "outputs": {},
        "parameters": {},
        "triggers": {
            "manual": {
                "description": "{     \"databricksInstance\":\"first part in the url\",     \"tenantID\":\"azure ad id\",     \"client_id\":\"\",     \"client_secret\":\"\",     \"subscription\":\"sub id\",     \"resourceGroup\":\"\",     \"workspaceName\":\"\" }",
                "inputs": {
                    "schema": {
                        "properties": {
                            "client_id": {
                                "type": "string"
                            },
                            "client_secret": {
                                "type": "string"
                            },
                            "databricksInstance": {
                                "type": "string"
                            },
                            "resourceGroup": {
                                "type": "string"
                            },
                            "subscription": {
                                "type": "string"
                            },
                            "tenantID": {
                                "type": "string"
                            },
                            "workspaceName": {
                                "type": "string"
                            }
                        },
                        "type": "object"
                    }
                },
                "kind": "Http",
                "type": "Request"
            }
        }
    },
    "parameters": {}
}
				
			
What do you think?