TL;DR: Lakeflow is Databricks’ unified data-engineering experience that brings ingestion, transformation, and orchestration under one roof. It combines Lakeflow Connect (managed connectors), Lakeflow Declarative Pipelines (the successor to DLT), and Lakeflow Jobs (workflow orchestration) on top of serverless compute and Unity Catalog governance. It’s built to lower operational toil and accelerate time to value for data teams.
What is Databricks Lakeflow?
Lakeflow is Databricks’ end-to-end data-engineering solution. The original launch framed three pillars—Connect (ingest), Pipelines (transform), and Jobs (orchestrate)—with AI assistance, serverless compute, lineage and quality monitoring built in. June 2025, Databricks announced Lakeflow reached General Availability, formalizing it as the default way to build pipelines on the platform.
The Three Pillars
Lakeflow Connect — managed ingestion
Lakeflow Connect is a first-party set of managed connectors for databases and enterprise apps. As of now, Salesforce, Workday (RaaS), ServiceNow, Google Analytics Raw Data (GA4), and SQL Server are available on Databricks-managed serverless compute with Unity Catalog. By design, each ingested table is written to a streaming table in Unity Catalog. Some sources support CDC or other incremental modes; others are full extract.
The benefit is that you can ingest data from inside the Databricks ecosystem instead of standing up external tooling. That’s simpler for teams with limited platform-engineering capacity. The caveat is you’re bounded by available connectors and their feature sets (e.g., not every source supports incremental). If Connect doesn’t cover a source yet, Partner Connect integrations like Fivetran broaden options—but introduce another vendor, cost, and potential compliance surface area. Also note: Connect pipelines require a Unity Catalog–enabled, serverless-capable workspace.
Lakeflow ELT Pipelines — the transformation layer
Lakeflow Declarative Pipelines are the evolution of Delta Live Tables (DLT): you write flows in SQL or Python that incrementally process data and publish streaming tables, materialized views, and views. Lakeflow automates orchestration, incremental processing, and autoscaling; it includes built-in data-quality monitoring, and Real‑time mode (Public Preview) enables continuous, low-latency delivery of time-sensitive datasets (often without pipeline rewrites, depending on workload). Pipelines primarily manage streaming tables and materialized views (both Delta-backed and governed in UC). They don’t directly publish “regular” external Delta tables as managed outputs accessible outside Databricks; if you need to write out to external Delta tables or to event systems like Kafka/Event Hubs, use sinks (Public Preview). You can also define views in pipeline SQL where appropriate. This tool sits between no-code/low-code ETL and fully custom pipelines. It abstracts away most orchestration details so teams can build and operate pipelines faster, at the cost of some flexibility. For environments with relatively simple transformation requirements, the productivity trade-off may be worth it. However, complex transformations may hit Lakeflow’s limitations, requiring notebooks instead. Some limitations to be aware of:- Concurrency: up to 200 concurrent pipeline updates per workspace.
- SQL features: The(workarounds include pre-aggregation/reshaping or conditional aggregates).
pivot()function isn’t supported in Lakeflow. - Time travel: supported on streaming tables; not supported on materialized views.
Lakeflow Jobs — orchestration
DAG-based workflows with scheduling, retries, branching/looping (If/else), task-level dependencies, and first-class tasks for pipelines/notebooks/SQL/ML—plus serverless options. You can include pipelines as a Pipeline task in a job, and you can orchestrate with Apache Airflow if that’s your preference. For teams already using Databricks Jobs, the experience is largely unchanged aside from the Lakeflow branding. The nice part is the seamless fit with existing infrastructure. You can build end-to-end workflows fully within Lakeflow—from ingestion to transformation to orchestration—on serverless compute and with Unity Catalog governance; but you can also mix and match with existing code, e.g., notebooks as required.
Conclusion
Lakeflow is a compelling option for teams looking to simplify data engineering on Databricks. By integrating ingestion, transformation, and orchestration into a unified experience, it reduces the operational overhead of managing separate tools. The serverless compute model further alleviates infrastructure management burdens.
However, Lakeflow’s managed nature means it may not suit every use case. Teams with complex transformation logic or specialized ingestion needs may find the abstractions limiting. Additionally, reliance on first-party connectors could pose challenges if your data sources aren’t yet supported. There is also a risk of vendor lock-in, as migrating away from Lakeflow could require significant rework of pipelines.
Overall, Lakeflow is an interesting choice for organizations prioritizing speed to value and operational simplicity in their data engineering workflows. Evaluating your team’s specific requirements against Lakeflow’s capabilities will help determine if it’s the right fit.
References
- Databricks — Introducing Databricks Lakeflow
https://www.databricks.com/blog/introducing-databricks-lakeflow - Databricks — Databricks Lakeflow Connect
https://www.databricks.com/product/data-engineering/lakeflow-connect - Databricks — Databricks Lakeflow Declarative Pipelines
https://www.databricks.com/product/data-engineering/lakeflow-declarative-pipelines - Databricks — Databricks Lakeflow Jobs
https://learn.microsoft.com/en-us/azure/databricks/jobs/ - Databricks — Announcing the General Availability of Databricks Lakeflow
https://www.databricks.com/blog/announcing-general-availability-databricks-lakeflow - Microsoft Learn — What is Lakeflow Connect?
https://learn.microsoft.com/en-us/azure/databricks/ingestion/overview - Microsoft Learn — Managed connectors in Lakeflow Connect
https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/ - Microsoft Learn — Managed connector FAQs — Azure Databricks
https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/faq - Microsoft Learn — Lakeflow Declarative Pipelines (overview)
https://learn.microsoft.com/en-us/azure/databricks/ldp/ - Microsoft Learn — Lakeflow Declarative Pipelines concepts
https://learn.microsoft.com/en-us/azure/databricks/ldp/concepts - Microsoft Learn — Lakeflow Declarative Pipelines limitations
https://learn.microsoft.com/en-us/azure/databricks/ldp/limitations - Microsoft Learn — Use sinks to stream records to external services (Public Preview)
https://learn.microsoft.com/en-us/azure/databricks/ldp/sinks - Microsoft Learn — Real-time mode in Structured Streaming (Public Preview)
https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/real-time - Microsoft Learn — Add branching logic with the If/else task
https://learn.microsoft.com/en-us/azure/databricks/jobs/if-else - Microsoft Learn — Run your Lakeflow Jobs with serverless compute
https://learn.microsoft.com/en-us/azure/databricks/jobs/run-serverless-jobs - Microsoft Learn — Pipeline task for jobs
https://learn.microsoft.com/en-us/azure/databricks/jobs/p Pipeline task for jobs ipeline
Max Foxley-Marrable
Dr. Max Foxley-Marrable is a seasoned Data Engineer and Data Scientist with over eight years of experience transforming complex datasets into actionable insights that drive innovation and operational efficiency. Specializing in advanced data processing with Python, Apache Spark, Microsoft Azure, and Databricks, he has designed and deployed scalable lakehouse architectures, machine learning models, and cloud-based solutions for both industry and academia. Holding a PhD in Astronomy and Astrophysics, along with multiple professional certifications from Databricks and Microsoft, Dr. Foxley-Marrable is recognized for his ability to solve challenging technical problems, lead high-impact projects, and translate complex concepts into practical, business-aligned solutions. His work spans data engineering, cloud architecture, AI applications, and technical education, all underpinned by a passion for innovation, collaboration, and delivering measurable results.