Latest news & articles
This article explores building a robust data lakehouse using the medallion architecture, which organizes data into three layers—Bronze for raw data ingestion, Silver for data transformation, and Gold for optimized data aggregation. Best practices for each layer are outlined, including creating a separate staging area for raw data in the Bronze layer, enforcing data quality checks in the Silver layer, and optimizing query performance in the Gold layer. By following these best practices, organizations can effectively manage data across the data lakehouse, ensuring structured, governed, and optimized data operations.
In the case that you want to route all the outbound traffic from Databricks clusters to Azure Firewall, you need to create a UDR (User Defined Route) and add it to the subnet where the Databricks clusters are created. Then add Firewall Policy allow the trusted traffic. The definition of the trusted traffic is mainly the mandatory traffic that Databricks needs to function. Plus the traffic to storage and packages repositories that you deem trusted.
IP Access list is one of the ways to netowrk-isolate Azure Databricks. It is a list of IP addresses that are allowed to access Azure Databricks. You can use this list to control access to Azure Databricks from specific IP addresses or ranges of IP addresses.
It’s only accessable by using REST API.
Thank you to Olivier Martin for your valuable insights and contributions to this post. Oliver Martin is a Microsoft Cloud Solution Architect for data analytics & AI.
When creating a linked service to Key vault that’s using private endpoint in a data factory or synapse workspace that is using managed virtual network, the UI doesn’t have a way to test the connection or list the secrets, versions of the key vault.
That’s a known limitation when using managed VNET. The solution is simple, add the secret info manually (using edit not the dropdown) and save the linked service. It will work when used in a pipeline or dataflow.
The documented minimum permissions required for using the synapsesql connector for spark to read or write data from Synapse SQL Pools is giving high privileges to spark users even though the required operation is only read. In this article, I’ll provide a workaround
Azure Synapse Analytics’ most appealing feature at first glance is the Synapse Studio. One unified UX across data stores, notebooks and pipelines. Notebook experience is appreciated the most among folks who read a load of data that takes minutes or hours to load then do operations on it whether in data engineering, feature engineering or ML training. The ability to divide your code into smaller chunks that you control which to execute when is a powerful productivity tool.
Added value that the notebook stores not only code by also the results of your code so to speak, it has now a data storage capacity that makes some organizations that are highly regulated or handles high confidential information worried about it.