Building reliable LLM pipelines: eleven systems, one lesson
Across eleven production AI systems, one pattern shows up every time a pipeline fails.
We have shipped eleven production LLM pipelines across clients in fintech, legal, and operations. They range from simple document classifiers to multi-step reasoning chains that coordinate five model calls before producing an output. One failure pattern shows up across all of them.
It is not hallucination. It is not latency. It is implicit state — the assumption that upstream components are working correctly, baked silently into downstream logic.
The failure pattern
Step 1 extracts entities from a document. Step 2 classifies those entities. Step 3 generates a report. When Step 1 fails silently — returns an empty list instead of an error — Step 2 and Step 3 run correctly on empty input and produce a confident, empty, wrong report. No exception is raised. The system has succeeded in the technical sense while failing in every meaningful sense.
The fix is not error handling at Step 3. The fix is assertions at Step 1’s output boundary: expected at least N entities, got zero. Halt here, not at the end.
What this costs
Explicit boundary assertions slow you down at the start. Every step needs a contract. That contract takes time to write and update as the pipeline evolves. We have clients who pushed back on this work early in projects and regretted it uniformly.
The payoff is observability. When a pipeline has assertions at every boundary, you always know which step failed and why. Debugging a production failure takes minutes instead of hours. The system is legible under pressure — which is when legibility matters most.