2 Comments

So true, I've seen the same patterns in a few companies I've worked for too! One more consequence of separate goals for Infra vs App teams - with not enough "playground" and "load testing" environments with Kafka and/or other data intensive frameworks (like Flink) - App teams do not test every change on a "true" infra, and as a result merges and deploys from multiple features/teams cause endless failures in the one-and-only real env, sometimes taking weeks (!!) to finally get a build and deploy working, and then you start seeing weird issues in Prod when the real load is hitting your systems!

Expand full comment

Agreed! I've seen a pattern where production data is mirrored to a staging environment but heavily sampled (5-10%). Unfortunately, this is not always enough for a real test.

This also can make local development quite painful.

Expand full comment