October, 2025

Lakeflow Explained: Simplifying Data Engineering on Databricks

Pipeline Run, Azure Databricks, Data Lakehouse, Delta Lake, Data Platform, Data Engineering

TL;DR: Lakeflow is Databricks’ unified data-engineering experience that brings ingestion, transformation, and orchestration under one roof. It combines Lakeflow Connect (managed connectors), Lakeflow Declarative Pipelines (the successor to DLT), and Lakeflow Jobs (workflow orchestration) on top of serverless compute and Unity Catalog governance. It’s built to lower operational toil and accelerate time to value for data teams.

What is Databricks Lakeflow?

Lakeflow is Databricks’ end-to-end data-engineering solution. The original launch framed three pillars—Connect (ingest), Pipelines (transform), and Jobs (orchestrate)—with AI assistance, serverless compute, lineage and quality monitoring built in. June 2025, Databricks announced Lakeflow reached General Availability, formalizing it as the default way to build pipelines on the platform.

The Three Pillars

Lakeflow Connect — managed ingestion

Lakeflow Connect is a first-party set of managed connectors for databases and enterprise apps. As of now, Salesforce, Workday (RaaS), ServiceNow, Google Analytics Raw Data (GA4), and SQL Server are available on Databricks-managed serverless compute with Unity Catalog. By design, each ingested table is written to a streaming table in Unity Catalog. Some sources support CDC or other incremental modes; others are full extract.

The benefit is that you can ingest data from inside the Databricks ecosystem instead of standing up external tooling. That’s simpler for teams with limited platform-engineering capacity. The caveat is you’re bounded by available connectors and their feature sets (e.g., not every source supports incremental). If Connect doesn’t cover a source yet, Partner Connect integrations like Fivetran broaden options—but introduce another vendor, cost, and potential compliance surface area. Also note: Connect pipelines require a Unity Catalog–enabled, serverless-capable workspace.

Lakeflow ELT Pipelines — the transformation layer

Lakeflow Declarative Pipelines are the evolution of Delta Live Tables (DLT): you write flows in SQL or Python that incrementally process data and publish streaming tables, materialized views, and views. Lakeflow automates orchestration, incremental processing, and autoscaling; it includes built-in data-quality monitoring, and Real‑time mode (Public Preview) enables continuous, low-latency delivery of time-sensitive datasets (often without pipeline rewrites, depending on workload). Pipelines primarily manage streaming tables and materialized views (both Delta-backed and governed in UC). They don’t directly publish “regular” external Delta tables as managed outputs accessible outside Databricks; if you need to write out to external Delta tables or to event systems like Kafka/Event Hubs, use sinks (Public Preview). You can also define views in pipeline SQL where appropriate. This tool sits between no-code/low-code ETL and fully custom pipelines. It abstracts away most orchestration details so teams can build and operate pipelines faster, at the cost of some flexibility. For environments with relatively simple transformation requirements, the productivity trade-off may be worth it. However, complex transformations may hit Lakeflow’s limitations, requiring notebooks instead. Some limitations to be aware of:

Concurrency: up to 200 concurrent pipeline updates per workspace.
SQL features:
The pivot() function isn’t supported in Lakeflow.
(workarounds include pre-aggregation/reshaping or conditional aggregates).
Time travel: supported on streaming tables; not supported on materialized views.

Lakeflow Jobs — orchestration

DAG-based workflows with scheduling, retries, branching/looping (If/else), task-level dependencies, and first-class tasks for pipelines/notebooks/SQL/ML—plus serverless options. You can include pipelines as a Pipeline task in a job, and you can orchestrate with Apache Airflow if that’s your preference. For teams already using Databricks Jobs, the experience is largely unchanged aside from the Lakeflow branding. The nice part is the seamless fit with existing infrastructure. You can build end-to-end workflows fully within Lakeflow—from ingestion to transformation to orchestration—on serverless compute and with Unity Catalog governance; but you can also mix and match with existing code, e.g., notebooks as required.

Conclusion

Lakeflow is a compelling option for teams looking to simplify data engineering on Databricks. By integrating ingestion, transformation, and orchestration into a unified experience, it reduces the operational overhead of managing separate tools. The serverless compute model further alleviates infrastructure management burdens.

However, Lakeflow’s managed nature means it may not suit every use case. Teams with complex transformation logic or specialized ingestion needs may find the abstractions limiting. Additionally, reliance on first-party connectors could pose challenges if your data sources aren’t yet supported. There is also a risk of vendor lock-in, as migrating away from Lakeflow could require significant rework of pipelines.

Overall, Lakeflow is an interesting choice for organizations prioritizing speed to value and operational simplicity in their data engineering workflows. Evaluating your team’s specific requirements against Lakeflow’s capabilities will help determine if it’s the right fit.

References

Databricks — Introducing Databricks Lakeflow
https://www.databricks.com/blog/introducing-databricks-lakeflow
Databricks — Databricks Lakeflow Connect
https://www.databricks.com/product/data-engineering/lakeflow-connect
Databricks — Databricks Lakeflow Declarative Pipelines
https://www.databricks.com/product/data-engineering/lakeflow-declarative-pipelines
Databricks — Databricks Lakeflow Jobs
https://learn.microsoft.com/en-us/azure/databricks/jobs/
Databricks — Announcing the General Availability of Databricks Lakeflow
https://www.databricks.com/blog/announcing-general-availability-databricks-lakeflow
Microsoft Learn — What is Lakeflow Connect?
https://learn.microsoft.com/en-us/azure/databricks/ingestion/overview
Microsoft Learn — Managed connectors in Lakeflow Connect
https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/
Microsoft Learn — Managed connector FAQs — Azure Databricks
https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/faq
Microsoft Learn — Lakeflow Declarative Pipelines (overview)
https://learn.microsoft.com/en-us/azure/databricks/ldp/
Microsoft Learn — Lakeflow Declarative Pipelines concepts
https://learn.microsoft.com/en-us/azure/databricks/ldp/concepts
Microsoft Learn — Lakeflow Declarative Pipelines limitations
https://learn.microsoft.com/en-us/azure/databricks/ldp/limitations
Microsoft Learn — Use sinks to stream records to external services (Public Preview)
https://learn.microsoft.com/en-us/azure/databricks/ldp/sinks
Microsoft Learn — Real-time mode in Structured Streaming (Public Preview)
https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/real-time
Microsoft Learn — Add branching logic with the If/else task
https://learn.microsoft.com/en-us/azure/databricks/jobs/if-else
Microsoft Learn — Run your Lakeflow Jobs with serverless compute
https://learn.microsoft.com/en-us/azure/databricks/jobs/run-serverless-jobs
Microsoft Learn — Pipeline task for jobs
https://learn.microsoft.com/en-us/azure/databricks/jobs/p Pipeline task for jobs ipeline

Max Foxley-Marrable

Dr. Max Foxley-Marrable is a seasoned Data Engineer and Data Scientist with over eight years of experience transforming complex datasets into actionable insights that drive innovation and operational efficiency. Specializing in advanced data processing with Python, Apache Spark, Microsoft Azure, and Databricks, he has designed and deployed scalable lakehouse architectures, machine learning models, and cloud-based solutions for both industry and academia. Holding a PhD in Astronomy and Astrophysics, along with multiple professional certifications from Databricks and Microsoft, Dr. Foxley-Marrable is recognized for his ability to solve challenging technical problems, lead high-impact projects, and translate complex concepts into practical, business-aligned solutions. His work spans data engineering, cloud architecture, AI applications, and technical education, all underpinned by a passion for innovation, collaboration, and delivering measurable results.

Lakeflow Explained: Simplifying Data Engineering on Databricks

What is Databricks Lakeflow?

The Three Pillars

Lakeflow Connect — managed ingestion

Lakeflow ELT Pipelines — the transformation layer

Lakeflow Jobs — orchestration

Conclusion

References

Max Foxley-Marrable

Leave a Reply Cancel reply

What to read next

Elevating Power BI Performance: A Practical Guide to DAX Optimization with MCP Server

No-Code AI Agents: How Databricks AI Builder (Agent Bricks) Is Changing the Game

Power BI Refresh Limits

Data Governance

Analytics & Engineering

Cloud Transformation

Generative AI

Automation & Other Services