Posts tagged with data-pipelines

Data Eng Daily · May 18 ·5 min read

Airflow 3.2 Finally Knows Which Partition Changed. Now What?

For years, Airflow's answer to "which data changed?" was a shrug.

airflowasset-partitioningdagster

Data Eng Daily · May 8 ·4 min read

Spark Declarative Pipelines Shipped. Here's What 20 Lines Replace.

Every Spark job I've inherited has the same skeleton: read from source, filter the garbage, join against a dimension table, write to a target, then wire...

sparkspark-declarative-pipelinesdbt

Data Eng Daily · Apr 26 ·5 min read

Your RAG Index Went Stale Eight Hours Ago

Somewhere right now, a support chatbot is confidently quoting a refund policy that was updated at 2 PM yesterday.

ragcdcvector-database

Data Eng Daily · Apr 17 ·5 min read

72% of Data Teams Write AI Code. 24% Test It.

dbt Labs dropped their annual State of Analytics Engineering report on Tuesday, and one number keeps rattling around my head: 72% of data teams now use...

dbtanalytics-engineeringdata-quality

Data Eng Daily · Apr 15 ·5 min read

Stop Re-Embedding Your Entire Corpus Every Night

Most RAG teams treat embedding freshness the same way they treat data warehouse freshness — schedule a nightly batch job and hope nothing changes too fast.

ragcdcstreaming

Data Eng Daily · Apr 12 ·5 min read

Airflow 2 Hits End-of-Life in Nine Days. Here's What Breaks.

Airflow 2 end-of-life lands on April 22. That's nine days from now.

airflowmigrationorchestration

Data Eng Daily · Apr 11 ·6 min read

Your Feature Table Has 200 Columns. Iceberg Rewrites All of Them.

You ship a daily feature pipeline.

icebergml-feature-storewrite-amplification

Data Eng Daily · Apr 10 ·5 min read

dbt on Flink Works — Unless Your Jobs Have State

Every team running both dbt and Flink has had the same conversation at some point: why are we maintaining two completely separate transformation stacks?

dbtapache-flinkstreaming

Data Eng Daily · Apr 8 ·5 min read

Meta Needed 50 AI Agents to Document What Their Engineers Already Knew

Every data team has That Person — the one who knows that the user_activity_v2 table actually feeds three downstream jobs through an intermediate field called...

ai-agentsdata-pipelinesdocumentation

Data Eng Daily · Apr 6 ·5 min read

What Data Engineers Get Wrong About Pipeline Observability

Last Tuesday at 2:47 AM, a freshness SLA breach on our orders_enriched table woke up the on-call engineer.

opentelemetryobservabilitydata-pipelines

Data Eng Daily · Apr 4 ·5 min read

Kafka to Iceberg in 2026: Nine Options, Three That Matter

Every data team running Kafka eventually hits the same wall: how do I get these events into my lakehouse so analysts can actually query them?

kafkaapache-icebergstreaming

Data Eng Daily · Apr 3 ·6 min read

Flink CDC 3.6.0: Oracle Finally Gets a Real Pipeline Connector

If you've been duct-taping Oracle CDC into Flink pipelines using the DataStream API and custom Debezium wrappers, version 3.6.

apache-flinkcdcoracle

Data Eng Daily · Apr 2 ·5 min read

Stop Running Spark for 40 GB Jobs

Every quarter, someone on the team asks: "Do we really need this Spark cluster?" For most of the jobs running on it, the answer in 2026 is no.

duckdbapache-sparkbenchmarks

Data Eng Daily · Apr 1 ·4 min read

Airflow 2 EOL Is April 22 — Here's What Actually Breaks

Twenty days from now, Apache Airflow 2.x reaches end of life.

apache-airflowmigrationorchestration

Data Eng Daily · Mar 28 ·5 min read

dbt on Flink Won't Unify Your Data Stack

#dbt on Flink Won't Unify Your Data Stack Three days ago Confluent dropped the dbt-confluent adapter, and the data engineering corner of the internet lost...

dbtapache-flinkstreaming