Most data pipelines fail silently when a source schema drifts. dbt tests run AFTER the model — they catch the broken state, they do not prevent it from being written. We wire Great Expectations as the OSS validation engine on every engagement, with a clear-eyed view of where it shines, where it doesn't, what we are NOT doing after GX Cloud's May-2026 shutdown announcement, and which alternatives (Soda, Pandera, dbt-native tests, Elementary) we layer alongside it. Includes the current GX 1.x Fluent-API code, the integration patterns that actually work in production, the real performance bottlenecks (with citations), the competitive landscape (GX vs Soda vs Pandera vs Anomalo vs Monte Carlo vs Bigeye), and the anti-patterns we audit in client engagements.
The canonical TwiceData engagement: customer starts with a freshly provisioned AWS account and the vendor's Postgres database. Twelve weeks later they have a Iceberg lakehouse on S3, dbt-modeled metrics, a governed semantic layer, Looker dashboards their team designs, AND an AI chat surface that answers natural-language questions on the same data. This post walks through every layer of the build — the choices, the tradeoffs, the seams between layers, and the day-91 handoff where you keep the keys.
Most data teams default to SCD Type 2 because it's the only pattern they remember from Kimball. There are eight types, three modern variants, and three hybrid systems — and the right one for your pipeline is determined by signals in your incoming data, not by tradition. This article walks the diagnostic loop end-to-end: identify the data pattern, identify the query need, pick the type (or compose a hybrid), implement in dbt + Iceberg. Every type gets its own worked example.
Most prompt engineering is informed guessing. DSPy treats prompts like programs: define a signature, build a dataset, pick an optimizer, ship the compiled artifact. Here is the loop we run, the optimizer trade-offs we actually face, and the signal-detection trick (emotion markers, escalation phrases) that quietly does most of the work.