Power Loss Siren: Making Meta resilient to power loss events

There are thousands of distributed services running on millions of servers in Meta’s data centers. Part of ensuring the reliability of those services means making them resilient to power loss events as our data center fleet grows. To help increase resiliency, we built the Power Loss Siren (PLS) — a rack level, low latency, distributed [...] Read More... The post Power Loss Siren: Making Meta resilient to power loss events appeared first on Engineering at Meta.
http://dlvr.it/SFVvTJ

Komentar

Postingan populer dari blog ini

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Fully Sharded Data Parallel: faster AI training with fewer GPUs

Risk-driven backbone management during COVID-19 and beyond