Asicmon: A platform agnostic observability system for AI accelerators

We will be hosting a talk about our work on, “A Platform Agnostic Observability System for AI Accelerators” during our virtual Systems @Scale event at 10:20 a.m. PT on Wednesday, June 30, followed by a live Q&A session. Please submit any questions to systemsatscale@fb.com before the event. Accelerators are special-purpose hardware devices optimized for specific [...] Read More... The post Asicmon: A platform agnostic observability system for AI accelerators appeared first on Facebook Engineering.
http://dlvr.it/S2dhyb

Komentar

Postingan populer dari blog ini

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Fully Sharded Data Parallel: faster AI training with fewer GPUs

Risk-driven backbone management during COVID-19 and beyond