BellJar: A new framework for testing system recoverability at scale

Building infrastructure that can easily recover from outages, particularly outages involving adjacent infrastructure, too often becomes a murky exploration of nuanced fate-sharing between systems. Untangling dependencies and uncovering side effects of unavailability has historically been time-consuming work. A lack of great tooling built for this, and the rarity of infrastructure outages, makes reasoning about them [...] Read More... The post BellJar: A new framework for testing system recoverability at scale appeared first on Engineering at Meta.
http://dlvr.it/SPqvD7

Komentar

Postingan populer dari blog ini

Inside Meta’s first smart glasses

Post-quantum readiness for TLS at Meta

Maintaining large-scale AI capacity at Meta