Insights / Engineering · 2024-02-20 · 4 min read

Where bad records should go

One malformed row shouldn't fail a million-row job. But it also shouldn't vanish. The records you can't process need a destination, not a silent drop.

Every ingestion eventually meets a row it can't parse — a broken timestamp, a missing key, an encoding that shouldn't exist. Two common reactions are both wrong: crash the whole job over one bad row, or silently skip it so the data quietly disappears with no trace. The right answer is a dead-letter destination.

Quarantine, don't discard

When a record fails validation, route it to a separate table or location along with the reason it failed and the time it arrived. The main pipeline continues with the good records; the bad ones are preserved, countable, and inspectable.

good   -> analytical table
bad    -> dead_letter (row, error, ingested_at)

Watch the dead-letter rate

A handful of quarantined rows is normal. A sudden spike usually means an upstream change, not a hundred individually broken records — so alert on the rate, not on any single failure. The dead-letter table becomes an early-warning signal for source problems.

A dropped record is a bug you'll never see. A quarantined one is a question you can answer later.

Make replay easy

Once you fix the cause — or the upstream data is corrected — you'll want those rows back. Because the dead-letter store keeps the original record, replaying it through the now-fixed pipeline is straightforward, and nothing was lost in the meantime.


← All insights