Postingan

Inside Bento: Jupyter Notebooks at Meta

Gambar
This episode of the Meta Tech Podcast is all about Bento, Meta’s internal distribution of Jupyter Notebooks, an open-source web-based computing platform. Bento allows our engineers to mix code, text, and multimedia in a single document and serves a wide range of use cases at Meta from prototyping to complex machine learning workflows. Pascal Hartig [...] Read More... The post Inside Bento: Jupyter Notebooks at Meta appeared first on Engineering at Meta. http://dlvr.it/TDMB5y

Simulator-based reinforcement learning for data center cooling optimization

Gambar
We’re sharing more about the role that reinforcement learning plays in helping us optimize our data centers’ environmental controls. Our reinforcement learning-based approach has helped us reduce energy consumption and water usage across various weather conditions.   Meta is revamping its new data center design to optimize for artificial intelligence and the same methodology will be [...] Read More... The post Simulator-based reinforcement learning for data center cooling optimization appeared first on Engineering at Meta. http://dlvr.it/TD4BJ2

Read Meta’s 2024 Sustainability Report

Gambar
[...] Read More... The post Read Meta’s 2024 Sustainability Report appeared first on Engineering at Meta. http://dlvr.it/TCqSsP

Meta is getting ready for post-quantum cryptography

Gambar
The Quantum Apocalypse is coming. The advent of quantum computers has raised real questions about the future of data privacy over the internet.  Someday, advances in quantum computing will make it possible to decrypt sensitive data that was encrypted using today’s complex cryptography systems. In the latest episode of the Meta Tech Podcast you’ll meet Sheran [...] Read More... The post Meta is getting ready for post-quantum cryptography appeared first on Engineering at Meta. http://dlvr.it/TCVChg

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Gambar
At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. Today, we’re excited to share some cutting-edge technologies that are part of our Privacy Aware Infrastructure (PAI) initiative. These innovations mark a major milestone in our ongoing commitment to honoring user privacy.  PAI offers [...] Read More... The post How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale appeared first on Engineering at Meta. http://dlvr.it/TCRbJ9

RETINAS: Real-Time Infrastructure Accounting for Sustainability

Gambar
We are introducing a new metric— real-time server fleet utilization effectiveness —as part of the RETINAS initiative to help reduce emissions and achieve net zero emissions across our value chain in 2030. This new metric allows us to measure server resource usage (e.g., compute, storage) and efficiency in our large-scale data center server fleet in [...] Read More... The post RETINAS: Real-Time Infrastructure Accounting for Sustainability appeared first on Engineering at Meta. http://dlvr.it/TCPLW6

How PyTorch powers AI training and inference

Gambar
Learn about new PyTorch advancements for LLMs and how PyTorch is enhancing every aspect of the LLM lifecycle. In this talk from AI Infra @ Scale 2024, software engineers Wanchao Liang and Evan Smothers are joined by Meta research scientist Kimish Patel to discuss our newest features and tools that enable large-scale training, memory efficient [...] Read More... The post How PyTorch powers AI training and inference appeared first on Engineering at Meta. http://dlvr.it/TCHqp0

Inside the hardware and co-design of MTIA

Gambar
In this talk from AI Infra @ Scale 2024, Joel Colburn, a software engineer at Meta, technical lead Junqiang Lan, and software engineer Jack Montgomery discuss the second generation of MTIA, Meta’s in-house training and inference accelerator. They cover the co-design process behind building the second generation of Meta’s first-ever custom silicon for AI workloads, [...] Read More... The post Inside the hardware and co-design of MTIA appeared first on Engineering at Meta. http://dlvr.it/TCFYPv

Bringing Llama 3 to life

Gambar
Llama 3 is Meta’s most capable openly-available LLM to date and the recently-released Llama 3.1 will enable new workflows, such as synthetic data generation and model distillation with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models.  At AI Infra @ Scale 2024, Meta engineers discussed every step of how we [...] Read More... The post Bringing Llama 3 to life appeared first on Engineering at Meta. http://dlvr.it/TCCC1D

Aparna Ramani discusses the future of AI infrastructure

Gambar
Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even our data center designs. For the second year in a row, Meta’s engineering and infrastructure teams returned for the AI Infra @ Scale conference, where they discussed the challenges of scaling up an [...] Read More... The post Aparna Ramani discusses the future of AI infrastructure appeared first on Engineering at Meta. http://dlvr.it/TC8WPd

How Meta animates AI-generated images at scale

Gambar
We launched Meta AI with the goal of giving people new ways to be more productive and unlock their creativity with generative AI (GenAI). But GenAI also comes with challenges of scale. As we deploy new GenAI technologies at Meta, we also focus on delivering these services to people as quickly and efficiently as possible. [...] Read More... The post How Meta animates AI-generated images at scale appeared first on Engineering at Meta. http://dlvr.it/TBwnL0

A RoCE network for distributed AI training at scale

Gambar
AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 405B. This week at ACM SIGCOMM 2024 in Sydney, Australia, we are sharing details on the network we have built at Meta [...] Read More... The post A RoCE network for distributed AI training at scale appeared first on Engineering at Meta. http://dlvr.it/TBXCRm

DCPerf: An open source benchmark suite for hyperscale compute applications

Gambar
We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud deployments. We hope that DCperf can be used more broadly by academia, the hardware industry, and internet companies to design and evaluate future products. DCPerf is available now on GitHub. Hyperscale and cloud datacenter [...] Read More... The post DCPerf: An open source benchmark suite for hyperscale compute applications appeared first on Engineering at Meta. http://dlvr.it/TBXC3t

Meet Caddy – Meta’s next-gen mixed reality CAD software

Gambar
What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom? Meet the team behind Caddy, a new CAD app for mixed reality. They join Pascal Hartig (@passy) on the Meta Tech Podcast to talk about teaching themselves to code, disrupting the CAD software space, and [...] Read More... The post Meet Caddy – Meta’s next-gen mixed reality CAD software appeared first on Engineering at Meta. http://dlvr.it/T9mJpz

AI Lab: The secrets to keeping machine learning engineers moving fast

Gambar
The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB.  AI Lab prevents TTFB regressions [...] Read More... The post AI Lab: The secrets to keeping machine learning engineers moving fast appeared first on Engineering at Meta. http://dlvr.it/T9gRb1

Taming the tail utilization of ads inference at Meta scale

Gambar
Tail utilization is a significant system issue and a major factor in overload-related failures and low compute utilization. The tail utilization optimizations at Meta have had a profound impact on model serving capacity footprint and reliability.  Failure rates, which are mostly timeout errors, were reduced by two-thirds; the compute footprint delivered 35% more work for [...] Read More... The post Taming the tail utilization of ads inference at Meta scale appeared first on Engineering at Meta. http://dlvr.it/T9QxWF

Meta’s approach to machine learning prediction robustness

Gambar
Meta’s advertising business leverages large-scale machine learning (ML) recommendation models that power millions of ads recommendations per second across Meta’s family of apps. Maintaining reliability of these ML systems helps ensure the highest level of service and uninterrupted benefit delivery to our users and advertisers. To minimize disruptions and ensure our ML systems are intrinsically [...] Read More... The post Meta’s approach to machine learning prediction robustness appeared first on Engineering at Meta. http://dlvr.it/T9PxWg

The key to a happy Rust/C++ relationship

Gambar
The history of Rust at Meta goes all the way back to 2016, when we first started using it for source control. Today, it has been widely embraced at Meta and is one of our primary supported server-side languages (along with C++, Python, and Hack). But that doesn’t mean there weren’t any growing pains. Aida [...] Read More... The post The key to a happy Rust/C++ relationship appeared first on Engineering at Meta. http://dlvr.it/T8lmgy

Leveraging AI for efficient incident response

Gambar
We’re sharing how we streamline system reliability investigations using a new AI-assisted root cause analysis system. The system uses a combination of heuristic-based retrieval and large language model-based ranking to speed up root cause identification during investigations. Our testing has shown this new system achieves 42% accuracy in identifying root causes for investigations at their [...] Read More... The post Leveraging AI for efficient incident response appeared first on Engineering at Meta. http://dlvr.it/T8jLB2

PVF: A novel metric for understanding AI systems’ vulnerability against SDCs in model parameters

Gambar
We’re introducing parameter vulnerability factor (PVF), a novel metric for understanding and measuring AI systems’ vulnerability against silent data corruptions (SDCs) in model parameters. PVF can be tailored to different AI models and tasks, adapted to different hardware faults, and even extended to the training phase of AI models. We’re sharing results of our own [...] Read More... The post PVF: A novel metric for understanding AI systems’ vulnerability against SDCs in model parameters appeared first on Engineering at Meta. http://dlvr.it/T8VMWQ

MLow: Meta’s low bitrate audio codec

Gambar
At Meta, we support real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger.  We are working to make RTC accessible by providing a high-quality experience for everyone – even those who might not have the fastest connections or the latest phones. As more and more people have relied on [...] Read More... The post MLow: Meta’s low bitrate audio codec appeared first on Engineering at Meta. http://dlvr.it/T8DkVR

How Meta trains large language models at scale

Gambar
As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of computation required to train large language models (LLMs). Traditionally, our AI model training has involved a training massive number of models that required a comparatively [...] Read More... The post How Meta trains large language models at scale appeared first on Engineering at Meta. http://dlvr.it/T8C7Gt

Maintaining large-scale AI capacity at Meta

Gambar
Meta is currently operating many data centers with GPU training clusters across the world. Our data centers are the backbone of our operations, meticulously designed to support the scaling demands of compute and storage. A year ago, however, as the industry reached a critical inflection point due to the rise of artificial intelligence (AI), we [...] Read More... The post Maintaining large-scale AI capacity at Meta appeared first on Engineering at Meta. http://dlvr.it/T8BcRq

Unlocking the power of mixed reality devices with MobileConfig

Gambar
MobileConfig enables developers to centrally manage a mobile app’s configuration parameters in our data centers. Once a parameter value is changed on our central server, billions of app devices automatically fetch and apply the new value without app updates. These remotely managed configuration parameters serve various purposes such as A/B testing, feature rollout, and app [...] Read More... The post Unlocking the power of mixed reality devices with MobileConfig appeared first on Engineering at Meta. http://dlvr.it/T87k0N

Serverless Jupyter Notebooks at Meta

Gambar
At Meta, Bento, our internal Jupyter notebooks platform, is a popular tool that allows our engineers to mix code, text, and multimedia in a single document. Use cases run the entire spectrum from what we call “lite” workloads that involve simple prototyping to heavier and more complex machine learning workflows. However, even though the lite [...] Read More... The post Serverless Jupyter Notebooks at Meta appeared first on Engineering at Meta. http://dlvr.it/T859wF

Composable data management at Meta

Gambar
In recent years, Meta’s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency.  We’re sharing how we’ve achieved this, in part, by leveraging Velox, Meta’s open source execution engine, as well as work ahead as we continue to rethink our data management systems.  Data is at [...] Read More... The post Composable data management at Meta appeared first on Engineering at Meta. http://dlvr.it/T7Fm52

Post-quantum readiness for TLS at Meta

Gambar
Today, the internet (like most digital infrastructure in general) relies heavily on the security offered by public-key cryptosystems such as RSA, Diffie-Hellman (DH), and elliptic curve cryptography (ECC). But the advent of quantum computers has raised real questions about the long-term privacy of data exchanged over the internet. In the future, significant advances in quantum [...] Read More... The post Post-quantum readiness for TLS at Meta appeared first on Engineering at Meta. http://dlvr.it/T7Flq2

Behind the scenes of Threads for web

Gambar
When Threads first launched one of the top feature requests was for a web client. In this episode of the Meta Tech Podcast, Pascal Hartig (@passy) sits down with Ally C. and Kevin C., two engineers on the Threads Web Team that delivered the basic version of Threads for web in just under three months. [...] Read More... The post Behind the scenes of Threads for web appeared first on Engineering at Meta. http://dlvr.it/T6t653

Building an infrastructure for AI’s future

Gambar
[...] Read More... The post Building an infrastructure for AI’s future appeared first on Engineering at Meta. http://dlvr.it/T5NLK3

Introducing the next-gen Meta Training and Inference Accelerator

Gambar
[...] Read More... The post Introducing the next-gen Meta Training and Inference Accelerator appeared first on Engineering at Meta. http://dlvr.it/T5KvGc

Bringing HDR photo support to Instagram and Threads

Gambar
Meta’s family of apps serves trillions of image download requests every day. And if you’re into high-quality images, you’ve probably noticed that Instagram and Threads have added support for high dynamic range (HDR) photos. Now people on Threads and Instagram can upload and share images that are more true-to-life, with the full color and range [...] Read More... The post Bringing HDR photo support to Instagram and Threads appeared first on Engineering at Meta. http://dlvr.it/T4fLq4

Threads has entered the fediverse

Gambar
Threads has entered the fediverse! As part of our beta experience, now available in a few countries, Threads users aged 18+ with public profiles can now choose to share their Threads posts to other ActivityPub-compliant servers. People on those servers can now follow federated Threads profiles and see, like, reply to, and repost posts from [...] Read More... The post Threads has entered the fediverse appeared first on Engineering at Meta. http://dlvr.it/T4QNyG

Optimizing RTC bandwidth estimation with machine learning

Gambar
Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps. We’ve adopted a machine learning (ML)-based approach that allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport. We’re sharing our experiment results from this approach, some of [...] Read More... The post Optimizing RTC bandwidth estimation with machine learning appeared first on Engineering at Meta. http://dlvr.it/T4Mvz3

Better video for mobile RTC with AV1 and HD

Gambar
At Meta, we support real-time communication (RTC) for billions of people through our apps, including Messenger, Instagram, and WhatsApp. We’ve seen significant benefits by adopting the AV1 codec for RTC. Here’s how we are improving the RTC video quality for our apps with tools like the AV1 codec, the challenges we face, and how we [...] Read More... The post Better video for mobile RTC with AV1 and HD appeared first on Engineering at Meta. http://dlvr.it/T4MvlT

Logarithm: A logging engine for AI training workflows and services

Gambar
Systems and application logs play a key role in operations, observability, and debugging workflows at Meta. Logarithm is a hosted, serverless, multitenant service, used only internally at Meta, that consumes and indexes these logs and provides an interactive query interface to retrieve and view logs. In this post, we present the design behind Logarithm, and [...] Read More... The post Logarithm: A logging engine for AI training workflows and services appeared first on Engineering at Meta. http://dlvr.it/T4Fp7Z

Building Meta’s GenAI Infrastructure

Gambar
Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open [...] Read More... The post Building Meta’s GenAI Infrastructure appeared first on Engineering at Meta. http://dlvr.it/T3z2gC

Making messaging interoperability with third parties safe for users in Europe

Gambar
To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services.  We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as [...] Read More... The post Making messaging interoperability with third parties safe for users in Europe appeared first on Engineering at Meta. http://dlvr.it/T3h4SQ

How DotSlash makes executable deployment simpler

Gambar
Andres Suarez and Michael Bolin, two software engineers at Meta, join Pascal Hartig (@passy) on the Meta Tech Podcast to discuss the ins and outs of DotSlash, a new open source tool from Meta. DotSlash takes the pain out of distributing binaries and toolchains to developers. Instead of committing large, platform-specific executables to a repository, [...] Read More... The post How DotSlash makes executable deployment simpler appeared first on Engineering at Meta. http://dlvr.it/T3HXYH

Aligning Velox and Apache Arrow: Towards composable data management

Gambar
We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox, Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, [...] Read More... The post Aligning Velox and Apache Arrow: Towards composable data management appeared first on Engineering at Meta. http://dlvr.it/T31740

Meta loves Python

Gambar
By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta? Meta engineer Pascal Hartig (@passy) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the [...] Read More... The post Meta loves Python appeared first on Engineering at Meta. http://dlvr.it/T2dx3M

Simple Precision Time Protocol at Meta

Gambar
While deploying Precision Time Protocol (PTP) at Meta, we’ve developed a simplified version of the protocol (Simple Precision Time Protocol – SPTP), that can offer the same level of clock synchronization as unicast PTPv2 more reliably and with fewer resources. In our own tests, SPTP boasts comparable performance to PTP, but with significant improvements in [...] Read More... The post Simple Precision Time Protocol at Meta appeared first on Engineering at Meta. http://dlvr.it/T2Rjvv

DotSlash: Simplified executable deployment

Gambar
We’ve open sourced DotSlash, a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations. With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote [...] Read More... The post DotSlash: Simplified executable deployment appeared first on Engineering at Meta. http://dlvr.it/T2NRkB

Improving machine learning iteration speed with faster application build and packaging

Gambar
Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of time while working on our training stack. By addressing these issues head-on, we were able to reduce this overhead by double-digit percentages.  In the fast-paced world of AI/ML development, it’s crucial to ensure that our [...] Read More... The post Improving machine learning iteration speed with faster application build and packaging appeared first on Engineering at Meta. http://dlvr.it/T22Qcp

Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta

Gambar
At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime. The outcome? Up to 40 percent time to first batch (TTFB) improvements, along with a 20 percent reduction in Jupyter kernel startup times. This advancement facilitates swifter experimentation capabilities and elevates the [...] Read More... The post Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta appeared first on Engineering at Meta. http://dlvr.it/T1Z9nc

How Meta is advancing GenAI

Gambar
What’s going on with generative AI (GenAI) at Meta? And what does the future have in store? In this episode of the Meta Tech Podcast, Meta engineer Pascal Hartig (@passy) speaks with Devi Parikh, an AI research director at Meta. They cover a wide range of topics, including the history and future of GenAI and the most [...] Read More... The post How Meta is advancing GenAI appeared first on Engineering at Meta. http://dlvr.it/T1G1WR

How Meta built the infrastructure for Threads

Gambar
On July 5, 2023, Meta launched Threads, the newest product in our family of apps, to an unprecedented success that saw it garner over 100 million sign ups in its first five days. A small, nimble team of engineers built Threads over the course of only five months of technical work. While the app’s production [...] Read More... The post How Meta built the infrastructure for Threads appeared first on Engineering at Meta. http://dlvr.it/T0LrDX

AI debugging at Meta with HawkEye

Gambar
HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning (ML) workflow that powers ML-based products. HawkEye supports recommendation and ranking models across several products at Meta. Over the past two years, it has facilitated order of magnitude improvements in the time spent debugging production issues. [...] Read More... The post AI debugging at Meta with HawkEye appeared first on Engineering at Meta. http://dlvr.it/T0Lqqs

Building end-to-end security for Messenger

Gambar
We are beginning to upgrade people’s personal conversations on Messenger to use end-to-end encryption (E2EE) by default Meta is publishing two technical white papers on end-to-end encryption: Our Messenger end-to-end encryption whitepaper describes the core cryptographic protocol for transmitting messages between clients. The Labyrinth encrypted storage protocol whitepaper explains our protocol for end-to-end encrypting stored [...] Read More... The post Building end-to-end security for Messenger appeared first on Engineering at Meta. http://dlvr.it/Szpb5c

Writing and linting Python at scale

Gambar
Python plays a big part at Meta. It powers Instagram’s backend and plays an important role in our configuration systems, as well as much of our AI work. Meta even made contributions to Python 3.12, the latest version of Python. On this episode of the Meta Tech Podcast, Meta engineer Pascal Hartig (@passy) is joined by Amethyst [...] Read More... The post Writing and linting Python at scale appeared first on Engineering at Meta. http://dlvr.it/Sz7cLc

Watch: Meta’s engineers on building network infrastructure for AI

Gambar
Meta is building for the future of AI at every level – from hardware like MTIA v1, Meta’s first-generation AI inference accelerator to publicly released models like Llama 2, Meta’s next-generation large language model, as well as new generative AI (GenAI) tools like Code Llama. Delivering next-generation AI products and services at Meta’s scale also [...] Read More... The post Watch: Meta’s engineers on building network infrastructure for AI appeared first on Engineering at Meta. http://dlvr.it/Syt3dm