Desa aikmual

Postingan

Simulator-based reinforcement learning for data center cooling optimization

September 10, 2024

We’re sharing more about the role that reinforcement learning plays in helping us optimize our data centers’ environmental controls. Our reinforcement learning-based approach has helped us reduce energy consumption and water usage across various weather conditions. Meta is revamping its new data center design to optimize for artificial intelligence and the same methodology will be [...] Read More... The post Simulator-based reinforcement learning for data center cooling optimization appeared first on Engineering at Meta. http://dlvr.it/TD4BJ2

Baca selengkapnya

Read Meta’s 2024 Sustainability Report

September 04, 2024

[...] Read More... The post Read Meta’s 2024 Sustainability Report appeared first on Engineering at Meta. http://dlvr.it/TCqSsP

Baca selengkapnya

Meta is getting ready for post-quantum cryptography

Agustus 28, 2024

The Quantum Apocalypse is coming. The advent of quantum computers has raised real questions about the future of data privacy over the internet. Someday, advances in quantum computing will make it possible to decrypt sensitive data that was encrypted using today’s complex cryptography systems. In the latest episode of the Meta Tech Podcast you’ll meet Sheran [...] Read More... The post Meta is getting ready for post-quantum cryptography appeared first on Engineering at Meta. http://dlvr.it/TCVChg

Baca selengkapnya

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Agustus 27, 2024

At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. Today, we’re excited to share some cutting-edge technologies that are part of our Privacy Aware Infrastructure (PAI) initiative. These innovations mark a major milestone in our ongoing commitment to honoring user privacy. PAI offers [...] Read More... The post How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale appeared first on Engineering at Meta. http://dlvr.it/TCRbJ9

Baca selengkapnya

RETINAS: Real-Time Infrastructure Accounting for Sustainability

Agustus 26, 2024

We are introducing a new metric— real-time server fleet utilization effectiveness —as part of the RETINAS initiative to help reduce emissions and achieve net zero emissions across our value chain in 2030. This new metric allows us to measure server resource usage (e.g., compute, storage) and efficiency in our large-scale data center server fleet in [...] Read More... The post RETINAS: Real-Time Infrastructure Accounting for Sustainability appeared first on Engineering at Meta. http://dlvr.it/TCPLW6

Baca selengkapnya

How PyTorch powers AI training and inference

Agustus 23, 2024

Learn about new PyTorch advancements for LLMs and how PyTorch is enhancing every aspect of the LLM lifecycle. In this talk from AI Infra @ Scale 2024, software engineers Wanchao Liang and Evan Smothers are joined by Meta research scientist Kimish Patel to discuss our newest features and tools that enable large-scale training, memory efficient [...] Read More... The post How PyTorch powers AI training and inference appeared first on Engineering at Meta. http://dlvr.it/TCHqp0

Baca selengkapnya

Inside the hardware and co-design of MTIA

Agustus 22, 2024

In this talk from AI Infra @ Scale 2024, Joel Colburn, a software engineer at Meta, technical lead Junqiang Lan, and software engineer Jack Montgomery discuss the second generation of MTIA, Meta’s in-house training and inference accelerator. They cover the co-design process behind building the second generation of Meta’s first-ever custom silicon for AI workloads, [...] Read More... The post Inside the hardware and co-design of MTIA appeared first on Engineering at Meta. http://dlvr.it/TCFYPv

Baca selengkapnya

Bringing Llama 3 to life

Agustus 21, 2024

Llama 3 is Meta’s most capable openly-available LLM to date and the recently-released Llama 3.1 will enable new workflows, such as synthetic data generation and model distillation with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models. At AI Infra @ Scale 2024, Meta engineers discussed every step of how we [...] Read More... The post Bringing Llama 3 to life appeared first on Engineering at Meta. http://dlvr.it/TCCC1D

Baca selengkapnya

Aparna Ramani discusses the future of AI infrastructure

Agustus 20, 2024

Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even our data center designs. For the second year in a row, Meta’s engineering and infrastructure teams returned for the AI Infra @ Scale conference, where they discussed the challenges of scaling up an [...] Read More... The post Aparna Ramani discusses the future of AI infrastructure appeared first on Engineering at Meta. http://dlvr.it/TC8WPd

Baca selengkapnya

How Meta animates AI-generated images at scale

Agustus 14, 2024

We launched Meta AI with the goal of giving people new ways to be more productive and unlock their creativity with generative AI (GenAI). But GenAI also comes with challenges of scale. As we deploy new GenAI technologies at Meta, we also focus on delivering these services to people as quickly and efficiently as possible. [...] Read More... The post How Meta animates AI-generated images at scale appeared first on Engineering at Meta. http://dlvr.it/TBwnL0

Baca selengkapnya

A RoCE network for distributed AI training at scale

Agustus 05, 2024

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 405B. This week at ACM SIGCOMM 2024 in Sydney, Australia, we are sharing details on the network we have built at Meta [...] Read More... The post A RoCE network for distributed AI training at scale appeared first on Engineering at Meta. http://dlvr.it/TBXCRm

Baca selengkapnya

DCPerf: An open source benchmark suite for hyperscale compute applications

Agustus 05, 2024

We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud deployments. We hope that DCperf can be used more broadly by academia, the hardware industry, and internet companies to design and evaluate future products. DCPerf is available now on GitHub. Hyperscale and cloud datacenter [...] Read More... The post DCPerf: An open source benchmark suite for hyperscale compute applications appeared first on Engineering at Meta. http://dlvr.it/TBXC3t

Baca selengkapnya

Meet Caddy – Meta’s next-gen mixed reality CAD software

What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom? Meet the team behind Caddy, a new CAD app for mixed reality. They join Pascal Hartig (@passy) on the Meta Tech Podcast to talk about teaching themselves to code, disrupting the CAD software space, and [...] Read More... The post Meet Caddy – Meta’s next-gen mixed reality CAD software appeared first on Engineering at Meta. http://dlvr.it/T9mJpz

Baca selengkapnya

AI Lab: The secrets to keeping machine learning engineers moving fast

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB. AI Lab prevents TTFB regressions [...] Read More... The post AI Lab: The secrets to keeping machine learning engineers moving fast appeared first on Engineering at Meta. http://dlvr.it/T9gRb1

Baca selengkapnya

Taming the tail utilization of ads inference at Meta scale

Tail utilization is a significant system issue and a major factor in overload-related failures and low compute utilization. The tail utilization optimizations at Meta have had a profound impact on model serving capacity footprint and reliability. Failure rates, which are mostly timeout errors, were reduced by two-thirds; the compute footprint delivered 35% more work for [...] Read More... The post Taming the tail utilization of ads inference at Meta scale appeared first on Engineering at Meta. http://dlvr.it/T9QxWF

Baca selengkapnya

Meta’s approach to machine learning prediction robustness

Meta’s advertising business leverages large-scale machine learning (ML) recommendation models that power millions of ads recommendations per second across Meta’s family of apps. Maintaining reliability of these ML systems helps ensure the highest level of service and uninterrupted benefit delivery to our users and advertisers. To minimize disruptions and ensure our ML systems are intrinsically [...] Read More... The post Meta’s approach to machine learning prediction robustness appeared first on Engineering at Meta. http://dlvr.it/T9PxWg

Baca selengkapnya

The key to a happy Rust/C++ relationship

The history of Rust at Meta goes all the way back to 2016, when we first started using it for source control. Today, it has been widely embraced at Meta and is one of our primary supported server-side languages (along with C++, Python, and Hack). But that doesn’t mean there weren’t any growing pains. Aida [...] Read More... The post The key to a happy Rust/C++ relationship appeared first on Engineering at Meta. http://dlvr.it/T8lmgy

Baca selengkapnya

Leveraging AI for efficient incident response

We’re sharing how we streamline system reliability investigations using a new AI-assisted root cause analysis system. The system uses a combination of heuristic-based retrieval and large language model-based ranking to speed up root cause identification during investigations. Our testing has shown this new system achieves 42% accuracy in identifying root causes for investigations at their [...] Read More... The post Leveraging AI for efficient incident response appeared first on Engineering at Meta. http://dlvr.it/T8jLB2

Baca selengkapnya

PVF: A novel metric for understanding AI systems’ vulnerability against SDCs in model parameters

We’re introducing parameter vulnerability factor (PVF), a novel metric for understanding and measuring AI systems’ vulnerability against silent data corruptions (SDCs) in model parameters. PVF can be tailored to different AI models and tasks, adapted to different hardware faults, and even extended to the training phase of AI models. We’re sharing results of our own [...] Read More... The post PVF: A novel metric for understanding AI systems’ vulnerability against SDCs in model parameters appeared first on Engineering at Meta. http://dlvr.it/T8VMWQ

Baca selengkapnya

MLow: Meta’s low bitrate audio codec

At Meta, we support real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger. We are working to make RTC accessible by providing a high-quality experience for everyone – even those who might not have the fastest connections or the latest phones. As more and more people have relied on [...] Read More... The post MLow: Meta’s low bitrate audio codec appeared first on Engineering at Meta. http://dlvr.it/T8DkVR

Baca selengkapnya

How Meta trains large language models at scale

As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of computation required to train large language models (LLMs). Traditionally, our AI model training has involved a training massive number of models that required a comparatively [...] Read More... The post How Meta trains large language models at scale appeared first on Engineering at Meta. http://dlvr.it/T8C7Gt

Baca selengkapnya

Maintaining large-scale AI capacity at Meta

Meta is currently operating many data centers with GPU training clusters across the world. Our data centers are the backbone of our operations, meticulously designed to support the scaling demands of compute and storage. A year ago, however, as the industry reached a critical inflection point due to the rise of artificial intelligence (AI), we [...] Read More... The post Maintaining large-scale AI capacity at Meta appeared first on Engineering at Meta. http://dlvr.it/T8BcRq

Baca selengkapnya

Unlocking the power of mixed reality devices with MobileConfig

MobileConfig enables developers to centrally manage a mobile app’s configuration parameters in our data centers. Once a parameter value is changed on our central server, billions of app devices automatically fetch and apply the new value without app updates. These remotely managed configuration parameters serve various purposes such as A/B testing, feature rollout, and app [...] Read More... The post Unlocking the power of mixed reality devices with MobileConfig appeared first on Engineering at Meta. http://dlvr.it/T87k0N

Baca selengkapnya

Serverless Jupyter Notebooks at Meta

At Meta, Bento, our internal Jupyter notebooks platform, is a popular tool that allows our engineers to mix code, text, and multimedia in a single document. Use cases run the entire spectrum from what we call “lite” workloads that involve simple prototyping to heavier and more complex machine learning workflows. However, even though the lite [...] Read More... The post Serverless Jupyter Notebooks at Meta appeared first on Engineering at Meta. http://dlvr.it/T859wF

Baca selengkapnya

Composable data management at Meta

In recent years, Meta’s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency. We’re sharing how we’ve achieved this, in part, by leveraging Velox, Meta’s open source execution engine, as well as work ahead as we continue to rethink our data management systems. Data is at [...] Read More... The post Composable data management at Meta appeared first on Engineering at Meta. http://dlvr.it/T7Fm52

Baca selengkapnya

Post-quantum readiness for TLS at Meta

Today, the internet (like most digital infrastructure in general) relies heavily on the security offered by public-key cryptosystems such as RSA, Diffie-Hellman (DH), and elliptic curve cryptography (ECC). But the advent of quantum computers has raised real questions about the long-term privacy of data exchanged over the internet. In the future, significant advances in quantum [...] Read More... The post Post-quantum readiness for TLS at Meta appeared first on Engineering at Meta. http://dlvr.it/T7Flq2

Baca selengkapnya

Behind the scenes of Threads for web

When Threads first launched one of the top feature requests was for a web client. In this episode of the Meta Tech Podcast, Pascal Hartig (@passy) sits down with Ally C. and Kevin C., two engineers on the Threads Web Team that delivered the basic version of Threads for web in just under three months. [...] Read More... The post Behind the scenes of Threads for web appeared first on Engineering at Meta. http://dlvr.it/T6t653

Baca selengkapnya

Building an infrastructure for AI’s future

[...] Read More... The post Building an infrastructure for AI’s future appeared first on Engineering at Meta. http://dlvr.it/T5NLK3

Baca selengkapnya

Introducing the next-gen Meta Training and Inference Accelerator

[...] Read More... The post Introducing the next-gen Meta Training and Inference Accelerator appeared first on Engineering at Meta. http://dlvr.it/T5KvGc

Baca selengkapnya

Bringing HDR photo support to Instagram and Threads

Meta’s family of apps serves trillions of image download requests every day. And if you’re into high-quality images, you’ve probably noticed that Instagram and Threads have added support for high dynamic range (HDR) photos. Now people on Threads and Instagram can upload and share images that are more true-to-life, with the full color and range [...] Read More... The post Bringing HDR photo support to Instagram and Threads appeared first on Engineering at Meta. http://dlvr.it/T4fLq4

Baca selengkapnya

Threads has entered the fediverse

Threads has entered the fediverse! As part of our beta experience, now available in a few countries, Threads users aged 18+ with public profiles can now choose to share their Threads posts to other ActivityPub-compliant servers. People on those servers can now follow federated Threads profiles and see, like, reply to, and repost posts from [...] Read More... The post Threads has entered the fediverse appeared first on Engineering at Meta. http://dlvr.it/T4QNyG

Baca selengkapnya

Optimizing RTC bandwidth estimation with machine learning

Bandwidth estimation (BWE) and congestion control play an important role in delivering high-quality real-time communication (RTC) across Meta’s family of apps. We’ve adopted a machine learning (ML)-based approach that allows us to solve networking problems holistically across cross-layers such as BWE, network resiliency, and transport. We’re sharing our experiment results from this approach, some of [...] Read More... The post Optimizing RTC bandwidth estimation with machine learning appeared first on Engineering at Meta. http://dlvr.it/T4Mvz3

Baca selengkapnya

Better video for mobile RTC with AV1 and HD

At Meta, we support real-time communication (RTC) for billions of people through our apps, including Messenger, Instagram, and WhatsApp. We’ve seen significant benefits by adopting the AV1 codec for RTC. Here’s how we are improving the RTC video quality for our apps with tools like the AV1 codec, the challenges we face, and how we [...] Read More... The post Better video for mobile RTC with AV1 and HD appeared first on Engineering at Meta. http://dlvr.it/T4MvlT

Baca selengkapnya

Logarithm: A logging engine for AI training workflows and services

Systems and application logs play a key role in operations, observability, and debugging workflows at Meta. Logarithm is a hosted, serverless, multitenant service, used only internally at Meta, that consumes and indexes these logs and provides an interactive query interface to retrieve and view logs. In this post, we present the design behind Logarithm, and [...] Read More... The post Logarithm: A logging engine for AI training workflows and services appeared first on Engineering at Meta. http://dlvr.it/T4Fp7Z

Baca selengkapnya

Building Meta’s GenAI Infrastructure

Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open [...] Read More... The post Building Meta’s GenAI Infrastructure appeared first on Engineering at Meta. http://dlvr.it/T3z2gC

Baca selengkapnya

Making messaging interoperability with third parties safe for users in Europe

To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services. We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as [...] Read More... The post Making messaging interoperability with third parties safe for users in Europe appeared first on Engineering at Meta. http://dlvr.it/T3h4SQ

Baca selengkapnya

How DotSlash makes executable deployment simpler

Februari 26, 2024

Andres Suarez and Michael Bolin, two software engineers at Meta, join Pascal Hartig (@passy) on the Meta Tech Podcast to discuss the ins and outs of DotSlash, a new open source tool from Meta. DotSlash takes the pain out of distributing binaries and toolchains to developers. Instead of committing large, platform-specific executables to a repository, [...] Read More... The post How DotSlash makes executable deployment simpler appeared first on Engineering at Meta. http://dlvr.it/T3HXYH

Baca selengkapnya

Aligning Velox and Apache Arrow: Towards composable data management

Februari 20, 2024

We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox, Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, [...] Read More... The post Aligning Velox and Apache Arrow: Towards composable data management appeared first on Engineering at Meta. http://dlvr.it/T31740

Baca selengkapnya

Meta loves Python

Februari 12, 2024

By now you’re already aware that Python 3.12 has been released. But did you know that several of its new features were developed by Meta? Meta engineer Pascal Hartig (@passy) is joined on the Meta Tech Podcast by Itamar Oren and Carl Meyer, two software engineers at Meta, to discuss their teams’ contributions to the [...] Read More... The post Meta loves Python appeared first on Engineering at Meta. http://dlvr.it/T2dx3M

Baca selengkapnya

Simple Precision Time Protocol at Meta

Februari 07, 2024

While deploying Precision Time Protocol (PTP) at Meta, we’ve developed a simplified version of the protocol (Simple Precision Time Protocol – SPTP), that can offer the same level of clock synchronization as unicast PTPv2 more reliably and with fewer resources. In our own tests, SPTP boasts comparable performance to PTP, but with significant improvements in [...] Read More... The post Simple Precision Time Protocol at Meta appeared first on Engineering at Meta. http://dlvr.it/T2Rjvv

Baca selengkapnya

DotSlash: Simplified executable deployment

Februari 06, 2024

We’ve open sourced DotSlash, a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations. With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote [...] Read More... The post DotSlash: Simplified executable deployment appeared first on Engineering at Meta. http://dlvr.it/T2NRkB

Baca selengkapnya

Improving machine learning iteration speed with faster application build and packaging

Januari 29, 2024

Slow build times and inefficiencies in packaging and distributing execution files were costing our ML/AI engineers a significant amount of time while working on our training stack. By addressing these issues head-on, we were able to reduce this overhead by double-digit percentages. In the fast-paced world of AI/ML development, it’s crucial to ensure that our [...] Read More... The post Improving machine learning iteration speed with faster application build and packaging appeared first on Engineering at Meta. http://dlvr.it/T22Qcp

Baca selengkapnya

Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta

Januari 18, 2024

At Meta, the quest for faster model training has yielded an exciting milestone: the adoption of Lazy Imports and the Python Cinder runtime. The outcome? Up to 40 percent time to first batch (TTFB) improvements, along with a 20 percent reduction in Jupyter kernel startup times. This advancement facilitates swifter experimentation capabilities and elevates the [...] Read More... The post Lazy is the new fast: How Lazy Imports and Cinder accelerate machine learning at Meta appeared first on Engineering at Meta. http://dlvr.it/T1Z9nc

Baca selengkapnya

How Meta is advancing GenAI

Januari 11, 2024

What’s going on with generative AI (GenAI) at Meta? And what does the future have in store? In this episode of the Meta Tech Podcast, Meta engineer Pascal Hartig (@passy) speaks with Devi Parikh, an AI research director at Meta. They cover a wide range of topics, including the history and future of GenAI and the most [...] Read More... The post How Meta is advancing GenAI appeared first on Engineering at Meta. http://dlvr.it/T1G1WR

Baca selengkapnya

How Meta built the infrastructure for Threads

Desember 19, 2023

On July 5, 2023, Meta launched Threads, the newest product in our family of apps, to an unprecedented success that saw it garner over 100 million sign ups in its first five days. A small, nimble team of engineers built Threads over the course of only five months of technical work. While the app’s production [...] Read More... The post How Meta built the infrastructure for Threads appeared first on Engineering at Meta. http://dlvr.it/T0LrDX

Baca selengkapnya

AI debugging at Meta with HawkEye

Desember 19, 2023

HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning (ML) workflow that powers ML-based products. HawkEye supports recommendation and ranking models across several products at Meta. Over the past two years, it has facilitated order of magnitude improvements in the time spent debugging production issues. [...] Read More... The post AI debugging at Meta with HawkEye appeared first on Engineering at Meta. http://dlvr.it/T0Lqqs

Baca selengkapnya

Building end-to-end security for Messenger

Desember 06, 2023

We are beginning to upgrade people’s personal conversations on Messenger to use end-to-end encryption (E2EE) by default Meta is publishing two technical white papers on end-to-end encryption: Our Messenger end-to-end encryption whitepaper describes the core cryptographic protocol for transmitting messages between clients. The Labyrinth encrypted storage protocol whitepaper explains our protocol for end-to-end encrypting stored [...] Read More... The post Building end-to-end security for Messenger appeared first on Engineering at Meta. http://dlvr.it/Szpb5c

Baca selengkapnya

Writing and linting Python at scale

November 21, 2023

Python plays a big part at Meta. It powers Instagram’s backend and plays an important role in our configuration systems, as well as much of our AI work. Meta even made contributions to Python 3.12, the latest version of Python. On this episode of the Meta Tech Podcast, Meta engineer Pascal Hartig (@passy) is joined by Amethyst [...] Read More... The post Writing and linting Python at scale appeared first on Engineering at Meta. http://dlvr.it/Sz7cLc

Baca selengkapnya

Watch: Meta’s engineers on building network infrastructure for AI

November 15, 2023

Meta is building for the future of AI at every level – from hardware like MTIA v1, Meta’s first-generation AI inference accelerator to publicly released models like Llama 2, Meta’s next-generation large language model, as well as new generative AI (GenAI) tools like Code Llama. Delivering next-generation AI products and services at Meta’s scale also [...] Read More... The post Watch: Meta’s engineers on building network infrastructure for AI appeared first on Engineering at Meta. http://dlvr.it/Syt3dm

Baca selengkapnya