Holdouts are not a trick to slow the newsroom; they are a way to keep experiments honest when everyone wants an answer by Friday. The failure mode is familiar: a treatment looks victorious because the control group was starved of traffic or saddled with a broken layout. Readers feel the unevenness even if the dashboard never flags it.

We start with parity checks on depth distributions—not because we worship p-values, but because humans recognize skew faster when it is drawn plainly. If the control tail is thinner, we pause before anyone declares victory. The pause is a feature: it gives your counsel and your designers the same screenshot to discuss.

Stop rules should read like memos, not like code. “We end the test if either arm drops below N engaged sessions for two consecutive weekdays” is boring and beautiful. Boring text survives handoffs; clever math rarely does.

Finally, archive everything. Future you will not remember why a headline variant was shelved. A short paragraph in the experiment log saves entire afternoons of archaeology. Respect readers by respecting your own memory.