Most data pipelines fail silently when a source schema drifts. dbt tests run AFTER the model — they catch the broken state, they do not prevent it from being written. We wire Great Expectations as the OSS validation engine on every engagement, with a clear-eyed view of where it shines, where it doesn't, what we are NOT doing after GX Cloud's May-2026 shutdown announcement, and which alternatives (Soda, Pandera, dbt-native tests, Elementary) we layer alongside it. Includes the current GX 1.x Fluent-API code, the integration patterns that actually work in production, the real performance bottlenecks (with citations), the competitive landscape (GX vs Soda vs Pandera vs Anomalo vs Monte Carlo vs Bigeye), and the anti-patterns we audit in client engagements.
The canonical TwiceData engagement: customer starts with a freshly provisioned AWS account and the vendor's Postgres database. Twelve weeks later they have a Iceberg lakehouse on S3, dbt-modeled metrics, a governed semantic layer, Looker dashboards their team designs, AND an AI chat surface that answers natural-language questions on the same data. This post walks through every layer of the build — the choices, the tradeoffs, the seams between layers, and the day-91 handoff where you keep the keys.
Most data teams default to SCD Type 2 because it's the only pattern they remember from Kimball. There are eight types, three modern variants, and three hybrid systems — and the right one for your pipeline is determined by signals in your incoming data, not by tradition. This article walks the diagnostic loop end-to-end: identify the data pattern, identify the query need, pick the type (or compose a hybrid), implement in dbt + Iceberg. Every type gets its own worked example.
The exact dbt model we drop into mid-market SaaS engagements — the four contract types it normalizes, the three tests that gate it, and the lineage hook that keeps it honest.