All Insights

AI slop is a systems problem — the eval loop

By Max Zwisler · Published April 16, 2026 · 2 min read

When people complain about AI slop — generic copy, confident wrong numbers, the same five adjectives in every paragraph — they usually blame the model. Then they switch models, and the slop survives the migration. That is the tell: slop is a property of the system, not the weights.

Slop survives because nothing rejects it

A human writer has an editor, a fact-check, a brand reviewer — or at minimum the fear of a colleague reading the draft. A bare model call has none of that. Whatever comes out goes straight to the reader. The fix is not a better prompt; it is putting the editor back, as software.

Flow diagram: agent, check, send — with a gate before send and a human escalation path

What an eval loop actually checks

Our standard pass for any outbound text runs three layers:

  • Factual. Every number, name and date in the output is matched against the source data in the company brain. A claim without a source fails. This single check kills the most damaging slop category — the confidently wrong figure in a client report.
  • Brand. Banned-word lists (every team has theirs), tone rules, structural requirements. Ours rejects "unlock", "seamlessly" and any sentence about today's fast-paced world on sight.
  • Format. Does it parse, does it fit the template, are the links live, is the German actually German and not translated English syntax.

Each layer is boring. Stacked, they reject roughly a third of first attempts on our own production loops — drafts nobody ever sees, fixed and re-checked in seconds.

Failures route back, not forward

The critical design decision: a failed check returns to the generator with the failure attached, not to a human inbox. The human sees attempt three, already verified, with the trail available if they want it. Review time drops from editing to approving.

The compounding part

Every rejection is data. Recurring factual failures point to gaps in the brain; recurring tone failures become new rules; recurring format failures fix the template. The eval loop is not just a filter — it is the mechanism by which the whole system learns. Teams that skip it don't just ship worse output. They ship the same worse output, forever.

Frequently asked questions

A check that runs every output against clear criteria before it reaches the reader. Failures route back, not forward.

Not because of the model, but because nothing between the model and the reader checks the work. The eval loop closes that gap.

eval-loopsqualitysystems

Operator Notes.

How we run a company on agents. One e-mail when we publish — no drip sequence, no sales follow-up.