Engineering at Meta https://engineering.fb.com/ Engineering at Meta Blog Thu, 12 Mar 2026 17:14:35 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 147945108 Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps https://engineering.fb.com/2026/03/13/android/ai-codemods-secure-by-default-android-apps-meta-tech-podcast/ Fri, 13 Mar 2026 16:00:26 +0000 https://engineering.fb.com/?p=23722 Even seemingly simple engineering tasks — like updating an API — can become monumental undertakings when you’re dealing with millions of lines of code and thousands of engineers, especially if the changes are security-related. Nowhere is this more apparent than in mobile security, where a single class of vulnerability can be replicated across hundreds of [...]

Read More...

The post Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps appeared first on Engineering at Meta.

]]>

Even seemingly simple engineering tasks — like updating an API — can become monumental undertakings when you’re dealing with millions of lines of code and thousands of engineers, especially if the changes are security-related. Nowhere is this more apparent than in mobile security, where a single class of vulnerability can be replicated across hundreds of call sites scattered throughout a sprawling, multi-app codebase serving billions of users.

Meta’s Product Security team has developed a two-pronged strategy to address this:

  • Designing secure-by-default frameworks that wrap potentially unsafe Android OS APIs and make the secure path the easiest path for developers, and
  • Leveraging generative AI to automate the migration of existing code to those frameworks at scale.

The result is a system that can propose, validate, and submit security patches across millions of lines of code with minimal friction for the engineers who own them.

On this episode of the Meta Tech Podcast, Pascal Hartig talks to Alex and Tanu, from Meta’s Product Security team about the challenges and learnings from the journey of making Meta’s mobile frameworks more secure at a scale few companies ever experience. Tune in to this episode and join us as we explore the compelling crossroads of security, automation, and AI within mobile development.

Download or listen to the episode below:

You can also find the episode wherever you get your podcasts, including:

The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.

Send us feedback on InstagramThreads, or X.

And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.

The post Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps appeared first on Engineering at Meta.

]]>
23722
How Advanced Browsing Protection Works in Messenger https://engineering.fb.com/2026/03/09/security/how-advanced-browsing-protection-works-in-messenger/ Mon, 09 Mar 2026 16:00:36 +0000 https://engineering.fb.com/?p=23690 We’re sharing the technical details behind how Advanced Browsing Protection (ABP) in Messenger protects the privacy of the links clicked on within chats while still warning people about malicious links. We hope that this post has helped to illuminate some of the engineering challenges and infrastructure components involved for providing this feature for our users. [...]

Read More...

The post How Advanced Browsing Protection Works in Messenger appeared first on Engineering at Meta.

]]>
  • We’re sharing the technical details behind how Advanced Browsing Protection (ABP) in Messenger protects the privacy of the links clicked on within chats while still warning people about malicious links.
  • We hope that this post has helped to illuminate some of the engineering challenges and infrastructure components involved for providing this feature for our users.
  • While end-to-end encryption (E2EE) on Messenger ensures that direct messages and calls are protected, Messenger’s Safe Browsing feature safeguards against malicious links within end-to-end encrypted messages and calls on the app. If you’re sent an unsafe link for some reason – maybe it’s sent by someone you don’t know or by a friend whose account has been compromised – Safe Browsing warns you that the link points to an unsafe website that may try to steal passwords or other personal information from you.

    In its standard setting, Safe Browsing uses on-device models to analyze malicious links shared in chats. But we’ve extended this further with an advanced setting called Advanced Browsing Protection (ABP) that leverages a continually updated watchlist of millions more potentially malicious websites.

    To build ABP, we had to leverage a series of intricate infrastructure components, a complex system of cryptographic primitives, all working together with the goal of protecting user privacy in Messenger.

    Private Information Retrieval – The Starting Point for ABP

    ABP closely mirrors the setting for a cryptographic primitive known as private information retrieval (PIR). In the classical PIR setting, a client queries a server (that holds a database) to learn whether or not the subject of the query is a member of that database. This protocol aims for the server to learn as little information as possible (ideally no information) about the client’s query.

    In a theoretical setting, the server could send the entire database to the client, allowing the client to perform subsequent query lookups on its own, without needing to involve the server anymore. However, the database used by ABP needs to be updated frequently, and is too large to reasonably be sent down to the client. Furthermore, revealing the entire database to the client could inadvertently aid attackers attempting to circumvent the system.

    Other work has suggested that this approach can be improved upon by using an oblivious pseudorandom function (OPRF) and dividing the database into multiple shards (or “buckets”) so that the linear-time operation is performed over a fraction of the database.

    This existing approach was the starting point for our implementation of ABP, but there were two issues that we needed to adapt into our setting.

    1. An OPRF works well for queries that are exact matches into the database. However, URL-matching queries are not exact matches, as we will describe in more detail, shortly.
    2. This also means that the client still needs to tell the server which bucket to look into. This inherently introduces a tradeoff between the privacy of the system versus the efficiency/bandwidth: The less granular the buckets, the less efficient the protocol becomes, but the less information is leaked from client’s query to the server.

    There are also other approaches, namely cryptographic constructions, which improve this tradeoff by employing lattice-based techniques to reduce the amount of sharding needed. However, at the time of writing, these did not appear to be practical enough to completely eliminate the need for sharding at our scale. This could be a promising future direction for the system, though, and for industrial applications of PIR in general.

    How ABP Handles Prefix Queries for URLs

    The server’s database entries consist of URL domains with (and without) paths, which do not always correspond to exact link matches. For instance, if an entry for “example.com” existed in our database, and the client submits a query in the form, “example.com/a/b/index.html” this should be reported to the client as a match, even though the link contents do not match exactly.

    Instead, what we need is a privacy-preserving “URL-matching” scheme between the client’s query and each of the database entries. Subdomains are also a consideration here, but we’ve omitted them for the simplicity of this example.

    One simple approach we considered to address these prefix queries was to run a series of parallel queries for PIR, one for each path prefix of the URL. So, in our running example of the client query being “example.com/a/b/index.html” the client would create PIR queries for:

    • example.com
    • example.com/a
    • example.com/a/b
    • example.com/a/b/index.html

    Functionally, this would satisfy prefix matching, but there is a privacy issue with this approach: Each of these path prefix queries leaks extra information about the client’s actual URL. If the PIR scheme we use does not leak any information to the server, then this might be acceptable, but if the server learns B bits of the client query, then in this scheme the server learns P * B bits, where P is the number of path prefixes in the URL. For extremely long URLs, this might even be enough to uniquely identify a plaintext link!

    In order to reduce the leakage to the server, we can instead have the server group together links that share the same domain. This way, the client can again request just one bucket (the bucket corresponding to the URL’s domain), then check all the prefix URL path components for membership in that one bucket.

    This would indeed address the privacy issue so that the server only learns B bits. But it also creates a new efficiency problem: Bucket sizes can become unbalanced. We create buckets by hashing URLs. If we were to hash full URLs, we could expect bucket sizes to be approximately uniform because each blocklist entry is mapped to a bucket pseudorandomly. When we hash only domains, that’s no longer the case. If many blocklist entries share the same domain, they all end up in the same bucket. 

    It turns out that in practice many blocklisted URLs do share domains. For example, consider link shortening services: These services might host many, many URLs (both malicious and benign) that all share the same domain. If many links share the same domain and, hence, belong in the same bucket, then the size of the bucket might be too large to be able to return to the client. And since we apply padding to buckets, the response size would be equal to the maximum across all buckets! 

    Pre-processing Rulesets

    To address this problem, we have the server perform a pre-processing step in which it attempts to balance buckets by generating a “ruleset”: a set of operations to process and hash a given URL. The server computes this ruleset and shares it with clients ahead of time so that the client can apply the same set of rules at lookup time.

    Here’s an example of a ruleset containing three rules:

    Hash Prefix # of Path Segments
    08bd4dd11758b503 2
    fe891588d205cf7f 1
    c078e5ff2e262830 4


    Each row is a rule that maps an 8-byte hash prefix to a certain number of path segments to append to the running URL query. Using our example of the link “example.com/a/b/index.html,” the client starts by computing a short hash of the domain: Hash(“example.com”). Let’s say that it matches one of the hashes in the ruleset,
    08bd4dd11758b503. Then the client is instructed to recompute the hash after appending two path segments, meaning that the client computes the new hash as Hash(“example.com/a/b”) and again checks to see if the ruleset contains an entry for the new hash. The client repeats these steps until the hash prefix does not exist in the ruleset, at which point it stops and outputs the first two bytes of that hash prefix as a bucket identifier.

    The server generates the ruleset in an iterative process. The server starts with the assumption that each URL is hashed only by its domain and computes the initial buckets. It then identifies the largest bucket and finds the most common domain in that bucket. Then, it breaks up that bucket by adding a rule to append one or more additional URL segments for that domain. This process is repeated until all buckets are below an acceptable threshold.

    Because of the way the ruleset is generated, any URL that has a blocked prefix is guaranteed to hash to the bucket containing that entry. This invariant holds so long as the blocklist doesn’t contain redundant entries (e.g., one entry for “example.com” and another for “example.com/a”) and as long as the hash function used for ruleset mapping doesn’t produce any collisions among blocklist entries.

    At lookup time, the client uses the same ruleset to compute the URL’s bucket identifier. The client sends the bucket identifier to the server alongside an OPRF-blinded element for each path segment of the query link. The server responds with the bucket contents and the OPRF-blinded responses. Finally, the client unblinds the OPRF output and checks for an exact match of any of the OPRF outputs in the bucket contents. If a match is found, then the URL is flagged.

    Note that in order to hide the number of path segments of the query link from the server, we must appropriately pad up to a fixed maximum number of elements in order to prevent the length of the request from revealing information about the link. Likewise, we must also pad the bucket contents so that all buckets are of the same length, so that the length of the server response doesn’t reveal information about the client’s link.

    Safeguarding Client Queries

    Now, in the description of this protocol so far, the client still sends a bucket identifier (computed from the URL) to the server in order to be able to efficiently process the query. We can use additional mechanisms to further reduce the bits of information that a hypothetically adversarial server could glean from the client’s query, which we will cover in the following sections.

    Confidential Computing

    In order to limit the exposure of these hash prefixes to Meta’s servers, we leverage AMD’s SEV-SNP technology to provide a confidential virtual machine (CVM) for which the server-side code processes these hash prefixes. At a high level, the CVM provides an environment for us to run application code that we can generate attestation reports for. It also allows us to bootstrap a secure channel from a client to the CVM after the client establishes “trust” by verifying these attestation reports.

    An attestation report contains:

    • A container manifest containing hash digests of the CVM’s launch configuration and packages, which essentially acts as a commitment to the application logic running on the CVM.
    • A public key generated on CVM startup, corresponding to a private key that remains secured within the TEE.
    • A certificate chain, with its root certificate established by AMD’s Key Distribution Service.
    • A signature from the transparency log witness, which provides a uniqueness guarantee that mitigates server-side equivocation

    Upon receiving this report, the client verifies all of the certificates/signatures and then uses the embedded public key to establish a secure channel with the CVM. This secure channel is used by the client to transmit the bucket identifier to the CVM, which then uses the corresponding decryption key to decrypt the client’s request to obtain the plaintext bucket identifier.

    Last year, we posted about our usage of AMD SEV-SNP for providing a trusted execution environment for WhatsApp Private Processing, and many of the details behind the hardware setup are similar there.

    One aspect missing from this verification procedure is the releasing of these artifacts for external security researchers to be able to validate. We aim to provide a platform for hosting these artifacts in the near future.

    Oblivious RAM

    While the hardware guarantees provided by AMD SEV-SNP do allow us to reduce the exposure of these hash prefixes and send them through an encrypted channel, they are not sufficient by themselves to fully hide these hash prefixes from an observer that obtains administrative privileges of the host system to monitor memory accesses over time. Although the memory pages are encrypted through AMD’s Secure Nested Paging (SNP) technology, the patterns of access themselves must also be kept private.

    A straightforward way to address this would be to load the database into the machine’s memory on startup, and upon every client request, ensure that each one of the B buckets in the database is retrieved from memory, even though only one bucket is actually included in the server’s response. While this is fairly wasteful from a purely computational perspective (the B-1 accesses don’t actually factor into the response), the server can avoid directly leaking the bucket index being fetched to an adversary that can observe its memory access patterns when serving client requests.

    For a really large database, these B-1 accesses can end up being a bottleneck on the overall runtime of the server. There are two methods we leverage to optimize this performance overhead without compromising on privacy:

    1. Since our database is (at the time of writing) not overwhelmingly large, we can fit multiple disparate copies of the same database into memory on a single machine. Incoming client requests are assigned one of these copies based on availability, since the linear scan is inherently sequential in nature.
    2. We can improve on the number of accesses asymptotically, from linear to sublinear, by relying on an algorithm called Path ORAM.

    The exact details of how Path ORAM works in our setting is beyond the scope of this post, but you can find more information about this in our open-source library for Path ORAM.

    Using Oblivious HTTP

    To further strengthen the privacy guarantees of ABP, we leverage a third-party proxy and the Oblivious HTTP (OHTTP) protocol to de-identify client requests. The third-party proxy sits in between the client and server, processing encrypted client requests by stripping out identifying information from them and forwarding these de-identified requests to the server, which, in turn, is able to decrypt the request payload. This makes it more difficult for the server to be able to observe identifiers (such as the client’s IP address).

    The ABP Request Lifecycle

    The overall lifecycle of ABP for a request works as follows:

    Pre-processing/background phase:

    1. On a periodic basis, the server pulls in the latest updates to the URL database, iteratively computing a ruleset that balances the database entries into similarly-sized buckets. 
    2. These buckets are then loaded onto a TEE using ORAM. 
    3. The TEE generates a keypair, and the public key is embedded in an attestation report, generated by AMD SEV-SNP hardware. 
    4. The attestation report and the current ruleset for the database are provided to the client upon request (through a third-party proxy).
    5. The client verifies the signatures contained in the attestation report, and locally stores a copy of the public key and database ruleset.

    And then, on each client request corresponding to a link click:

    1. The client, upon clicking a link in an E2EE chat, calculates the bucket identifier for the link by applying the rules of the “ruleset” to the URL. 
    2. This bucket identifier is encrypted for the specific CVM instance using its public key. 
    3. The client also computes a series of OPRF requests (blinded group elements), one for each path segment of the URL (padded). 
    4. The encrypted bucket identifier, along with these OPRF requests, are sent through a third-party proxy to the server, along with a client public key as part of the establishment of a secure channel.
    5. The server precomputes the server-side evaluation of the OPRF requests to produce OPRF responses. 
    6. The server then decrypts the bucket identifier, uses ORAM to look up the corresponding bucket contents, and returns the OPRF responses and bucket contents to the client, encrypted under the client’s public key.
    7. The client then decrypts the server’s response, and uses the bucket contents along with the OPRF responses to complete the OPRF evaluation and determine if a match was found. If a match was found, then the client displays a warning about the query link.

    The post How Advanced Browsing Protection Works in Messenger appeared first on Engineering at Meta.

    ]]>
    23690
    FFmpeg at Meta: Media Processing at Scale https://engineering.fb.com/2026/03/02/video-engineering/ffmpeg-at-meta-media-processing-at-scale/ Mon, 02 Mar 2026 20:00:12 +0000 https://engineering.fb.com/?p=23674 FFmpeg is truly a multi-tool for media processing. As an industry-standard tool it supports a wide variety of audio and video codecs and container formats. It can also orchestrate complex chains of filters for media editing and manipulation. For the people who use our apps, FFmpeg plays an important role in enabling new video experiences [...]

    Read More...

    The post FFmpeg at Meta: Media Processing at Scale appeared first on Engineering at Meta.

    ]]>
    FFmpeg is truly a multi-tool for media processing. As an industry-standard tool it supports a wide variety of audio and video codecs and container formats. It can also orchestrate complex chains of filters for media editing and manipulation. For the people who use our apps, FFmpeg plays an important role in enabling new video experiences and improving the reliability of existing ones.

    Meta executes ffmpeg (the main CLI application) and ffprobe (a utility for obtaining media file properties) binaries tens of billions of times a day, introducing unique challenges when dealing with media files. FFmpeg can easily perform transcoding and editing on individual files, but our workflows have additional requirements to meet our needs. For many years we had to rely on our own internally developed fork of FFmpeg to provide features that have only recently been added to FFmpeg, such as threaded multi-lane encoding and real-time quality metric computation.

    Over time, our internal fork came to diverge significantly from the upstream version of FFmpeg. At the same time, new versions of FFmpeg brought support for new codecs and file formats, and reliability improvements, all of which allowed us to ingest more diverse video content from users without disruptions. This necessitated that we support both recent open-source versions of FFmpeg alongside our internal fork. Not only did this create a gradually divergent feature set, it also created challenges around safely rebasing our internal changes to avoid regressions.

    As our internal fork became increasingly outdated, we collaborated with FFmpeg developers, FFlabs, and VideoLAN to develop features in FFmpeg that allowed us to fully deprecate our internal fork and rely exclusively on the upstream version for our use cases. Using upstreamed patches and refactorings we’ve been able to fill two important gaps that we had previously relied on our internal fork to fill: threaded, multi-lane transcoding and real-time quality metrics.  

    Building More Efficient Multi-Lane Transcoding for VOD and Livestreaming

    A video transcoding pipeline producing multiple outputs at different resolutions.

    When a user uploads a video through one of our apps, we generate a set of encodings to support Dynamic Adaptive Streaming over HTTP (DASH) playback. DASH playback allows the app’s video player to dynamically choose an encoding based on signals such as network conditions. These encodings can differ in resolution, codec, framerate, and visual quality level but they are created from the same source encoding, and the player can seamlessly switch between them in real time.

    In a very simple system separate FFmpeg command lines can generate the encodings for each lane one-by-one in serial. This could be optimized by running each command in parallel, but this quickly becomes inefficient due to the duplicate work done by each process.

    To work around this, multiple outputs could be generated within a single FFmpeg command line, decoding the frames of a video once and sending them to each output’s encoder instance. This eliminates a lot of overhead by deduplicating the video decoding and process startup time overhead incurred by each command line. Given that we process over 1 billion video uploads daily, each requiring multiple FFmpeg executions, reductions in per-process compute usage yield significant efficiency gains.

    Our internal FFmpeg fork provided an additional optimization to this: parallelized video encoding. While individual video encoders are often internally multi-threaded, previous FFmpeg versions executed each encoder in serial for a given frame when multiple encoders were in use. By running all encoder instances in parallel, better parallelism can be obtained overall.

    Thanks to contributions from FFmpeg developers, including those at FFlabs and VideoLAN, more efficient threading was implemented starting with FFmpeg 6.0, with the finishing touches landing in 8.0. This was directly influenced by the design of our internal fork and was one of the main features we had relied on it to provide. This development led to the most complex refactoring of FFmpeg in decades and has enabled more efficient encodings for all FFmpeg users.

    To fully migrate off of our internal fork we needed one more feature implemented upstream: real-time quality metrics.

    Enabling Real-Time Quality Metrics While Transcoding for Livestreams

    Visual quality metrics, which give a numeric representation of the perceived visual quality of media, can be used to quantify the quality loss incurred from compression. These metrics are categorized as reference or no-reference metrics, where the former compares a reference encoding to some other distorted encoding.

    FFmpeg can compute various visual quality metrics such as PSNR, SSIM, and VMAF using two existing encodings in a separate command line after encoding has finished. This is okay for offline or VOD use cases, but not for livestreaming where we might want to compute quality metrics in real time.

    To do this, we need to insert a video decoder after each video encoder used by each output lane. These provide bitmaps for each frame in the video after compression has been applied so that we can compare against the frames before compression. In the end, we can produce a quality metric for each encoded lane in real time using a single FFmpeg command line.

    Thanks to “in-loop” decoding, which was enabled by FFmpeg developers including those from FFlabs and VideoLAN, beginning with FFmpeg 7.0, we no longer have to rely on our internal FFmpeg fork for this capability.

    We Upstream When It Will Have the Most Community Impact

    Things like real-time quality metrics while transcoding and more efficient threading can bring efficiency gains to a variety of FFmpeg-based pipelines both in and outside of Meta, and we strive to enable these developments upstream to benefit the FFmpeg community and wider industry. However, there are some patches we’ve developed internally that don’t make sense to contribute upstream. These are highly specific to our infrastructure and don’t generalize well.

    FFmpeg supports hardware-accelerated decoding, encoding, and filtering with devices such as NVIDIA’s NVDEC and NVENC, AMD’s Unified Video Decoder (UVD), and Intel’s Quick Sync Video (QSV). Each device is supported through an implementation of standard APIs in FFmpeg, allowing for easier integration and minimizing the need for device-specific command line flags. We’ve added support for the Meta Scalable Video Processor (MSVP), our custom ASIC for video transcoding, through these same APIs, enabling the use of common tooling across different hardware platforms with minimal platform-specific quirks.

    As MSVP is only used within Meta’s own infrastructure, it would create a challenge for FFmpeg developers to support it without access to the hardware for testing and validation. In this case, it makes sense to keep patches like this internal since they wouldn’t provide benefit externally. We’ve taken on the responsibility of rebasing our internal patches onto more recent FFmpeg versions over time, utilizing extensive validation to ensure robustness and correctness during upgrades.

    Our Continued Commitment to FFmpeg

    With more efficient multi-lane encoding and real-time quality metrics, we were able to fully deprecate our internal FFmpeg fork for all VOD and livestreaming pipelines. And thanks to standardized hardware APIs in FFmpeg, we’ve been able to support our MSVP ASIC alongside software-based pipelines with minimal friction.

    FFmpeg has withstood the test of time with over 25 years of active development. Developments that improve resource utilization, add support for new codecs and features, and increase reliability enable robust support for a wider range of media. For people on our platforms, this means enabling new experiences and improving the reliability of existing ones. We plan to continue investing in FFmpeg in partnership with open source developers, bringing benefits to Meta, the wider industry, and people who use our products.

    Acknowledgments

    We would like to acknowledge contributions from the open source community, our partners in FFlabs and VideoLAN, and many Meta engineers, including Max Bykov, Jordi Cenzano Ferret, Tim Harris, Colleen Henry, Mark Shwartzman, Haixia Shi, Cosmin Stejerean, Hassene Tmar, and Victor Loh.

    The post FFmpeg at Meta: Media Processing at Scale appeared first on Engineering at Meta.

    ]]>
    23674
    Investing in Infrastructure: Meta’s Renewed Commitment to jemalloc https://engineering.fb.com/2026/03/02/data-infrastructure/investing-in-infrastructure-metas-renewed-commitment-to-jemalloc/ Mon, 02 Mar 2026 17:00:04 +0000 https://engineering.fb.com/?p=23682 Meta recognizes the long-term benefits of jemalloc, a high-performance memory allocator, in its software infrastructure. We are renewing focus on jemalloc, aiming to reduce maintenance needs and modernize the codebase while continuing to evolve the allocator to adapt to the latest hardware and workloads. We are committed to continuing to develop jemalloc development with the [...]

    Read More...

    The post Investing in Infrastructure: Meta’s Renewed Commitment to jemalloc appeared first on Engineering at Meta.

    ]]>
  • Meta recognizes the long-term benefits of jemalloc, a high-performance memory allocator, in its software infrastructure.
  • We are renewing focus on jemalloc, aiming to reduce maintenance needs and modernize the codebase while continuing to evolve the allocator to adapt to the latest hardware and workloads.
  • We are committed to continuing to develop jemalloc development with the open source community and welcome contributions and collaborations from the community.
  • Building a software system is a lot like building a skyscraper: The product everyone sees is the top, but the part that keeps it from falling over is the foundation buried in the dirt and the scaffolding hidden from sight.

    jemalloc, the high performance memory allocator, has consistently been a highly-leveraged component within our software stack, adapting over time to changes in underlying hardware and upper-layer software. Alongside the Linux kernel and the compilers, it has delivered long-term benefits to Meta, contributing to a reliable and performant infrastructure. 

    Listening, Reflecting, and Changing

    High leverage comes with high stakes. On the spectrum of practical versus principled engineering practice, foundational software components like jemalloc need the highest rigor. With the leverage jemalloc provides however, it can be tempting to realize some short-term benefit. It requires strong self-discipline as an organization to resist that temptation and adhere to the core engineering principles. 

    In recent years, there has been a gradual shift away from the core engineering principles that have long guided jemalloc’s development. While some decisions delivered immediate benefits, the resulting technical debt eventually slowed progress.

    We took the community’s feedback to heart. In the spirit of collaboration, we have reflected deeply on our stewardship and its impact on jemalloc’s long-term health. As we’ve met with some members of the community, including the project’s founder, Jason Evans, to share our introspection and how we are changing our approach. We’ve started an effort to remove technical debt and rebuild a long-term roadmap for jemalloc

    A New Chapter for jemalloc

    As a result of these conversations with the community, the original jemalloc open source repository has been unarchived. We are grateful for the opportunity to continue as stewards of the project. Meta is renewing focus on jemalloc, aiming to reduce maintenance needs and modernizing the codebase while continuing to evolve the allocator to adapt to the latest and emerging hardware and workloads.

    Looking ahead, our current plan for jemalloc focus on several key areas of improvement:

    • Technical Debt Reduction: We are focusing on cleaning up technical debt, refactoring, and enhancing jemalloc to ensure it remains efficient, reliable and easy to use for all users.
    • Huge-Page Allocator: We will continue to improve jemalloc’s hugepage allocator  (HPA) to better utilize transparent hugepages (THP) for improved CPU efficiency.
    • Memory Efficiency: We plan to deliver improvements to packing, caching, and purging mechanisms for optimized memory efficiency.
    • AArch64 Optimizations: We will make sure jemalloc has good out-of-the-box performance for the AArch64 (ARM64) platform.

    We know that trust is earned through action. Our hope is that, over time, our renewed commitment will be evident in the health and progress of jemalloc. We invite the community to join us in this new chapter — share your feedback and help shape jemalloc’s future. We look forward to collaborating with the community to drive jemalloc forward.

    The post Investing in Infrastructure: Meta’s Renewed Commitment to jemalloc appeared first on Engineering at Meta.

    ]]>
    23682
    RCCLX: Innovating GPU Communications on AMD Platforms https://engineering.fb.com/2026/02/24/data-center-engineering/rrcclx-innovating-gpu-communications-amd-platforms-meta/ Tue, 24 Feb 2026 21:30:54 +0000 https://engineering.fb.com/?p=23617 We are open-sourcing the initial version of RCCLX – an enhanced version of RCCL that we developed and tested on Meta’s internal workloads. RCCLX is fully integrated with Torchcomms and aims to empower researchers and developers to accelerate innovation, regardless of their chosen backend. Communication patterns for AI models are constantly evolving, as are hardware [...]

    Read More...

    The post RCCLX: Innovating GPU Communications on AMD Platforms appeared first on Engineering at Meta.

    ]]>
    We are open-sourcing the initial version of RCCLX – an enhanced version of RCCL that we developed and tested on Meta’s internal workloads. RCCLX is fully integrated with Torchcomms and aims to empower researchers and developers to accelerate innovation, regardless of their chosen backend.

    Communication patterns for AI models are constantly evolving, as are hardware capabilities. We want to iterate on collectives, transports, and novel features quickly on AMD platforms. Earlier, we developed and open-sourced CTran, a custom transport library on the NVIDIA platform. With RCCLX, we have integrated CTran to AMD platforms, enabling the AllToAllvDynamic – a GPU-resident collective. While not all the CTran features are currently integrated into the open source RCCLX library, we’re aiming to have them available in the coming months. 

    In this post, we highlight two new features – Direct Data Access (DDA) and Low Precision Collectives. These features provide significant performance improvements on AMD platforms and we are excited to share this with the community. 

    Direct Data Access (DDA) – Lightweight Intra-node Collectives

    Large language model inference operates through two distinct computational stages, each with fundamentally different performance characteristics: 

    • The prefill stage processes the input prompt, which can span thousands of tokens, to generate a key-value (KV) cache for each transformer layer of the model. This stage is compute-bound because the attention mechanism scales quadratically with sequence length, making it highly demanding on GPU computational resources.
    • The decoding stage then utilizes and incrementally updates the KV cache to generate tokens one by one. Unlike prefill, decoding is memory-bound, as the I/O time of reading memory dominates attention time, with model weights and the KV cache occupying the majority of memory.

    Tensor parallelism enables models to be distributed across multiple GPUs by sharding individual layers into smaller, independent blocks that execute on different devices. However, one important challenge is the AllReduce communication operation can contribute up to 30% of end-to-end (E2E) latency. To address this bottleneck, Meta developed two DDA algorithms. 

    • The DDA flat algorithm improves small message-size allreduce latency by allowing each rank to directly load memory from other ranks and perform local reduce operations, reducing latency from O(N) to O(1) by increasing the data exchange from O(n) to O(n²).
    • The DDA tree algorithm breaks the allreduce into two phases (reduce-scatter and all-gather) and uses direct data access in each step, moving the same amount of data as the ring algorithm but reducing latency to a constant factor for slightly larger message sizes.

    The performance improvements of DDA over baseline communication libraries are substantial, particularly on AMD hardware. With AMD MI300X GPUs, DDA outperforms the RCCL baseline by 10-50% for decode (small message sizes) and yields 10-30% speedup for prefill. These improvements resulted in approximately 10% reduction in time-to-incremental-token (TTIT), directly enhancing the user experience during the critical decoding phase.

    Low-precision Collectives

    Low-precision (LP) collectives are a set of distributed communication algorithms — AllReduce, AllGather, AlltoAll, and ReduceScatter — optimized for AMD Instinct MI300/MI350 GPUs to accelerate AI training and inference workloads. These collectives support both FP32 and BF16 data types, leveraging FP8 quantization for up to 4:1 compression, which significantly reduces communication overhead and improves scalability and resource utilization for large message sizes (≥16MB). 

    The algorithms use parallel peer-to-peer (P2P) mesh communication, fully exploiting AMD’s Infinity Fabric for high bandwidth and low latency, while compute steps are performed in high precision (FP32) to maintain numerical stability. Precision loss is primarily dictated by the number of quantization operations — typically one or two per data type in each collective — and whether the data can be adequately represented within the FP8 range. 

    By dynamically enabling LP collectives, users can selectively activate these optimizations in E2E scenarios that benefit most from performance gains. Based on internal experiments, we have observed significant speed up for FP32 and notable improvements for BF16; it’s important to note that these collectives have been tuned for single-node deployments at this time. 

    Reducing the precision of types can potentially have an impact on numeric accuracy so we tested for this and we found that it provided acceptable numerical accuracy for our workloads. This flexible approach allows teams to maximize throughput while maintaining acceptable numerical accuracy, and is now fully integrated and available in RCCLX for AMD platforms — simply set the environment variable RCCL_LOW_PRECISION_ENABLE=1 to get started.

    MI300 – Float LP AllReduce speedup.
    MI300 – Float LP AllGather speedup.
    MI300 – Float LP AllToAll speedup.
    MI300 – Float LP ReduceScatter speedup.

    We are observing the following results from E2E inference workload evaluations when selectively enabling LP collectives:

    • Approximately ~0.3% delta on GSM8K evaluation runs.
    • ~9–10% decrease in latency.
    • ~7% increase in throughput.

    The throughput measurements shown in the graphs were obtained using param-bench rccl-tests. For the MI300, the tests were run on RCCLX built with ROCm 6.4, and for the MI350, on RCCLX built with ROCm 7.0. Each test included 10 warmup iterations followed by 100 measurement iterations. The reported results represent the average throughput across the measurement iterations.

    Easy adaptation of AI models

    RCCLX is integrated with the Torchcomms API as a custom backend. We aim for this backend to have feature parity with our NCCLX backend (for NVIDIA platforms). Torchcomms allows users to have a single API for communication for different platforms. A user would not need to change the APIs they’re familiar with to port their applications across AMD, or other platforms even when using the novel features provided by CTran. 

    RCCLX Quick Start Guide

    Install Torchcomms with RCCLX backend by following the installation instructions in the Torchcomms repo.

    import torchcomms
    
    # Eagerly initialize a communicator using MASTER_PORT/MASTER_ADDR/RANK/WORLD_SIZE environment variables 
    provided by torchrun.
    # This communicator is bound to a single device.
    comm = torchcomms.new_comm("rcclx", torch.device("hip"), name="my_comm")
    print(f"I am rank {comm.get_rank()} of {comm.get_size()}!")
    
    t = torch.full((10, 20), value=comm.rank, dtype=torch.float)
    
    # run an all_reduce on the current stream
    comm.allreduce(t, torchcomms.ReduceOp.SUM, async_op=False)
    

    Acknowledgements

    We extend our gratitude to the AMD RCCL team for their ongoing collaboration. We also want to recognize the many current and former Meta employees whose contributions were vital in developing torchcomms and torchcomms-backends for production-scale training and inference. In particular, we would like to give special thanks to Dingming Wu, Qiye Tan, Pavan Balaji Yan Cui, Zhe Qu, Ahmed Khan, Ajit Mathews, CQ Tang, Srinivas Vaidyanathan, Harish Kumar Chandrappa, Peng Chen, Shashi Gandham, and Omar Baldonado

    The post RCCLX: Innovating GPU Communications on AMD Platforms appeared first on Engineering at Meta.

    ]]>
    23617
    The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It https://engineering.fb.com/2026/02/11/developer-tools/the-death-of-traditional-testing-agentic-development-jit-testing-revival/ Wed, 11 Feb 2026 17:00:05 +0000 https://engineering.fb.com/?p=23650 WHAT IT IS The rise of agentic software development means code is being written, reviewed, and shipped faster than ever before across the entire industry. It also means that testing frameworks need to evolve for this rapidly changing landscape. Faster development demands faster testing that can catch bugs as they land in a codebase, without [...]

    Read More...

    The post The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It appeared first on Engineering at Meta.

    ]]>
    WHAT IT IS

    The rise of agentic software development means code is being written, reviewed, and shipped faster than ever before across the entire industry. It also means that testing frameworks need to evolve for this rapidly changing landscape. Faster development demands faster testing that can catch bugs as they land in a codebase, without requiring regular updates and maintenance.

    Just-in‑Time Tests (JiTTests) are a fundamentally novel approach to testing where tests are automatically generated by large language models (LLMs) on the fly to catch bugs – even ones that traditional testing might not catch – just-in-time before the code lands into production.

    A Catching JiTTest focuses specifically on finding regressions introduced by a code change. This type of testing reimagines decades of software testing theory and practice. While traditional testing relies on static test suites, manual authoring, and ongoing maintenance, Catching JiTTests require no test maintenance and no test code review, meaning engineers can focus their expertise on real bugs, not false positives. Catching JiTTests use sophisticated techniques to maximize test signal value and minimize false positive drag, targeting test signals where they matter most: on serious failures.

    HOW TESTING TRADITIONALLY WORKS

    Under the traditional paradigm, tests are manually built as new code lands in a codebase and continually executed, requiring regular updates and maintenance. The engineers building these tests face the challenge of needing to check the behavior, not only of the current code, but all possible future changes. Inherent uncertainty about future changes results in tests that don’t catch anything, or when they do, it’s a false positive. Agentic development dramatically increases the pace of code change, straining test development burden and scaling the cost of false positives and test maintenance to breaking point. 

    HOW CATCHING JITTESTS WORK

    Broadly, JiTTests are bespoke tests, tailored to a specific code change, that give engineers simple, actionable feedback about unexpected behavior changes without the need to read or write test code. LLMs can generate JiTTests automatically the moment a pull request is submitted. And since the JiTTest itself is LLM-generated, it can often infer the plausible intention of a code change and simulate possible faults that may result from it.

    With an understanding of intent, Catching JiTTests can significantly drive down instances of false positives.

    Here are the key steps of the Catching JiTTest process:

    1. New code lands in the codebase.
    2. The system infers the intention of the code change.
    3. It creates mutants (code versions with faults deliberately inserted) to simulate what could go wrong.
    4. It generates and runs tests to catch those faults.
    5. Ensembles of rule-based and LLM-based assessors focus the signal on true positive failures.
    6. Engineers receive clear, relevant reports about unexpected changes right when it matters most.

    WHY IT MATTERS

    Catching JiTTests are designed for the world of AI-powered agentic software development and accelerate testing by focusing on serious unexpected bugs. With them engineers no longer have to spend time writing, reviewing, and testing complex test code. Catching JiTTests, by design, kill many of the issues with traditional testing in one stroke:

    • They are generated on-the-fly for each code change and do not reside in the codebase, eliminating ongoing maintenance costs and shifting effort from humans to machines.
    • They are tailored to each change, making them more robust and less prone to breaking due to intended updates.
    • They automatically adapt as the code changes.
    • They only require human review when a bug is actually caught.

    This all amounts to an important shift in testing infrastructure where the focus moves from generic code quality to whether a test actually finds faults in a specific change without raising a false positive. It helps improve testing overall while also allowing it to keep up with the pace of agentic coding.

    READ THE PAPER

    Just-in-Time Catching Test Generation at Meta

    The post The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It appeared first on Engineering at Meta.

    ]]>
    23650
    Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters https://engineering.fb.com/2026/02/09/data-center-engineering/building-prometheus-how-backend-aggregation-enables-gigawatt-scale-ai-clusters/ Mon, 09 Feb 2026 17:00:33 +0000 https://engineering.fb.com/?p=23611 We’re sharing details of the role backend aggregation (BAG) plays in building Meta’s gigawatt-scale AI clusters like Prometheus. BAG allows us to seamlessly connect thousands of GPUs across multiple data centers and regions. Our BAG implementation is connecting two different network fabrics – Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF). Once it’s complete our AI [...]

    Read More...

    The post Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters appeared first on Engineering at Meta.

    ]]>
  • We’re sharing details of the role backend aggregation (BAG) plays in building Meta’s gigawatt-scale AI clusters like Prometheus.
  • BAG allows us to seamlessly connect thousands of GPUs across multiple data centers and regions.
  • Our BAG implementation is connecting two different network fabrics – Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF).
  • Once it’s complete our AI cluster, Prometheus, will deliver 1-gigawatt of capacity to enhance and enable new and existing AI experiences across Meta products. Prometheus’ infrastructure will span several data center buildings in a single larger region, interconnecting tens of thousands of GPUs.

    A key piece of scaling and connecting this infrastructure is backend aggregation (BAG), which we use to seamlessly connect GPUs and data centers with robust, high-capacity networking. By leveraging modular hardware, advanced routing, and resilient topologies, BAG ensures both performance and reliability at unprecedented scale

    As our AI clusters continue to grow, we expect BAG to play an important role in meeting future demands and driving innovation across Meta’s global network.

    What Is Backend Aggregation?

    BAG is a centralized Ethernet-based super spine network layer that primarily functions to interconnect multiple spine layer fabrics across various data centers and regions within large clusters. Within Prometheus, for example, the BAG layer serves as the aggregation point between regional networks and Meta’s backbone, enabling the creation of mega AI clusters. BAG is designed to support immense bandwidth needs, with inter-BAG capacities reaching the petabit range (e.g., 16-48 Pbps per region pair).

    We use backend aggregation (BAG) to interconnect data center regions to share compute and other resources into large clusters.

    How BAG Is Helping Us Build Gigawatt-Scale AI Clusters 

    To address the challenge of interconnecting tens of thousands of GPUs, we’re deploying distributed BAG layers regionally.

    How We Interconnect BAG Layers

    BAG layers are strategically distributed across regions to serve subsets of L2 fabrics, adhering to distance, buffer, and latency constraints. Inter-BAG connectivity utilizes either a planar (direct match) or spread connection topology, chosen based on site size and fiber availability.

    • Planar topology connects BAG switches one-to-one between regions following the plane, offering simplified management but concentrating potential failure domains.
    • Spread connection topology distributes links across multiple BAG switches/planes, enhancing path diversity and resilience.
    An example of an inter-BAG network topology.

    How a BAG Layer Connects to L2 Fabrics

    So far, we’ve discussed how the BAG layers are interconnected, now let’s see how a BAG layer connects downstream to L2 fabrics.

    We’ve used two main fabric technologies, Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF) to build L2 networks.

    Below is an example of DSF L2 zones across five data center buildings connected to the BAG layer via a special backend edge pod in each building. 

    A BAG inter-building connection for DSF fabric across five data centers.

    Below is an example of NSF L2 connected to BAG planes. Each BAG plane connects to matching Spine Training Switches (STSWs) from all spine planes. Effective oversubscription is 4.98:1.  

    A BAG inter-building connection for NSF fabric.

    Careful management of oversubscription ratios assists in balancing scale and performance. Typical oversubscription from L2 to BAG is around 4.5:1, while BAG-to-BAG oversubscription varies based on regional requirements and link capacity.

    Hardware and Routing 

    Meta’s implementation of BAG uses a modular chassis equipped with Jericho3 (J3) ASIC line cards, each providing up to 432x800G ports for high-capacity, scalable, and resilient interconnect. The central hub BAG employs a larger chassis to accommodate numerous spokes and long-distance links with varied cable lengths for optimized buffer utilization.

    Routing within BAG uses eBGP with link bandwidth attributes, enabling Unequal Cost Multipath (UCMP) for efficient load balancing and robust failure handling. BAG-to-BAG connections are secured with MACsec, aligning with network security requirements.

    Designing the Network for Resilience

    The network design meticulously details port striping, IP addressing schemes, and comprehensive failure domain analysis to ensure high availability and minimize the impact of failures. Failure modes are analyzed at the BAG, data hall, and power distribution levels. We also employ various strategies to mitigate blackholing risks, including draining affected BAG planes and conditional route aggregation.

    Considerations for Long Cable Distances

    An important advantage of BAG’s distributed architecture is it keeps the distance from the L2 edge small, which is important for shallow buffer NSF switches. Longer, BAG-to-BAG, cable distances dictate that we use deep buffer switches for the BAG role. This provides a large headroom buffer to support lossless congestion control protocols like PFC.  

    Building Prometheus and Beyond

    As a technology, BAG is playing an important role in Meta’s next generation of AI infrastructure. By centralizing the interconnection of regional networks, BAG helps enable the gigawatt-scale Prometheus cluster, ensuring seamless, high-capacity networking across tens of thousands of GPUs. This thoughtful design, leveraging modular hardware and resilient topologies, positions BAG to not only meet the demands of Prometheus but also to drive the future innovation and scalability of Meta’s global AI network for years to come.

    The post Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters appeared first on Engineering at Meta.

    ]]>
    23611
    No Display? No Problem: Cross-Device Passkey Authentication for XR Devices https://engineering.fb.com/2026/02/04/security/cross-device-passkey-authentication-for-xr-devices-meta-quest/ Wed, 04 Feb 2026 22:00:07 +0000 https://engineering.fb.com/?p=23599 We’re sharing a novel approach to enabling cross-device passkey authentication for devices with inaccessible displays (like XR devices). Our approach bypasses the use of QR codes and enables cross-device authentication without the need for an on-device display, while still complying with all trust and proximity requirements. This approach builds on work done by the FIDO [...]

    Read More...

    The post No Display? No Problem: Cross-Device Passkey Authentication for XR Devices appeared first on Engineering at Meta.

    ]]>
  • We’re sharing a novel approach to enabling cross-device passkey authentication for devices with inaccessible displays (like XR devices).
  • Our approach bypasses the use of QR codes and enables cross-device authentication without the need for an on-device display, while still complying with all trust and proximity requirements.
  • This approach builds on work done by the FIDO Alliance and we hope it will open the door to bring secure, passwordless authentication to a whole new ecosystem of devices and platforms.
  • Passkeys are a significant leap forward in authentication, offering a phishing-resistant, cryptographically secure alternative to traditional passwords. Generally, the standard cross-device passkey flow, where someone registers or authenticates on a desktop device by approving the action on their nearby mobile device, is done in a familiar way with QR codes scanned by their phone camera.  But how can we facilitate this flow for XR devices with a head-mounted display or no screen at all, or for other devices with an inaccessible display like smart home hubs and industrial sensors? 

    We’ve taken a  novel approach to adapting the WebAuthn passkey flow and FIDO’s CTAP hybrid protocol for this unique class of devices that either lack a screen entirely or whose screen is not easily accessible to another device’s camera. Our implementation has been developed and is now broadly available on Meta Quest devices powered by Meta Horizon OS. We hope that this approach can also ensure robust security built on the strength of existing passkey frameworks, without sacrificing usability, for users of a variety of other screenless IoT devices, consumer electronics, and industrial hardware.

    The Challenge: No Screen, No QR Code

    The standard cross-device flow relies on two primary mechanisms:

    1. QR code scanning: The relying party displays a QR code on the desktop/inaccessible device, which the mobile authenticator scans to establish a secure link.
    2. Bluetooth/NFC proximity: The devices use local communication protocols to discover each other and initiate the secure exchange.

    For devices with no display, the QR code method is impossible. Proximity-based discovery is feasible, but initiating the user verification step and confirming the intent without any on-device visual feedback can introduce security and usability risks. People need clear assurance that they are approving the correct transaction on the correct device.

    Our Solution: Using a Companion App for Secure Message Transport

    Scanning a QR code sends the authenticator device a command to initiate a hybrid (cross-device) login flow with a nonce that identifies the unauthenticated device client. But if a user has a companion application – like the Meta Horizon app – that uses the same account as the device we can use that application to pass this same request to the authenticator OS and execute it using general link/intent execution.

    We made the flow easy to navigate by using in-app notifications to show users when a login request has been initiated, take them directly into the application, and immediately execute the login request.

    For simplicity, we opted to begin the hybrid flow as soon as the application is opened since the user would have had to take some action (clicking the notification or opening the app) to trigger this and there is an additional user verification step in hybrid implementations on iOS and Android.

    Here’s how this plays out on a Meta Quest with the Meta Horizon mobile app:

    1. The Hybrid Flow Message Is Generated

    When a passkey login is initiated on the Meta Quest, the headset’s browser locally constructs the same payload that would have been embedded in a QR Code – including a fresh ECDH public key, a session-specific secret, and routing information used later in the handshake. Instead of rendering this information into an image (QR code), the browser encodes it into a FIDO URL (the standard mechanism defined for hybrid transport) that instructs the mobile device to begin the passkey authentication flow.

    2. The Message Is Sent to the Companion App

    After the FIDO URL is generated, the headset requires a secure and deterministic method for transferring it to the user’s phone. Because the device cannot present a QR code, the system leverages the Meta Horizon app’s authenticated push channel to deliver the FIDO URL directly to the mobile device. When the user selects the passkey option in the login dialog, the headset encodes the FIDO URL as structured data within a GraphQL-based push notification. 

    The Meta Horizon app, signed in with the same account as the headset, receives this payload and validates the delivery context to ensure it is routed to the correct user. 

    3. The Application Sends a Notification of the Login Request

    After the FIDO URL is delivered to the mobile device, the platform’s push service surfaces it as a standard iOS or Android notification indicating that a login request is pending. When the user taps the notification, the operating system routes the deep link to the Meta Horizon app. The app then opens the FIDO URL using the system URL launcher and invokes the operating system passkey interface.

    For users that have the notification turned off or disabled, launching the Meta Horizon app directly will also trigger a query to the backend for any pending passkey requests associated with the user’s account. If a valid request exists (requests expire after five minutes), the app automatically initiates the same passkey flow by opening the FIDO URL.

    Once the FIDO URL is opened, the mobile device begins the hybrid transport sequence, including broadcasting the BLE advertisement, establishing the encrypted tunnel, and producing the passkey assertion. In this flow, the system notification and the app launch path both serve as user consent surfaces and entry points into the standard hybrid transport workflow.

    4. The App Executes the Hybrid Command

    Once the user approves the action on their mobile device, the secure channel is established as per WebAuthn standards. The main difference is the challenge exchange timing:

    1. The inaccessible device generates the standard WebAuthn challenge and waits.
    2. The mobile authenticator, initiates the secure BLE/NFC connection.
    3. The challenge is transmitted over this secure channel.
    4. Upon UV success, the mobile device uses the relevant key material to generate the AuthenticatorAssertionResponse or AuthenticatorAttestationResponse.
    5. The response is sent back to the inaccessible device.

    The inaccessible device then acts as the conduit, forwarding the response to the relying party server to complete the transaction, exactly as a standard display-equipped device would.

    Impact and Future Direction

    This novel implementation successfully bypasses the need for an on-device display in the cross-device flow and still complies with the proximity and other trust challenges that exist today for cross-device passkey login. We hope that our solution paves the way for secure, passwordless authentication across a wider range of different platforms and ecosystems, moving passkeys beyond just mobile and desktop environments and into the burgeoning world of wearable and IoT devices. 

    We are proud to build on top of the excellent work already done in this area by our peers in the FIDO Alliance and mobile operating systems committed to this work and building a robust and interoperable ecosystem for secure and easy login.

    The post No Display? No Problem: Cross-Device Passkey Authentication for XR Devices appeared first on Engineering at Meta.

    ]]>
    23599
    Rust at Scale: An Added Layer of Security for WhatsApp https://engineering.fb.com/2026/01/27/security/rust-at-scale-security-whatsapp/ Tue, 27 Jan 2026 15:00:09 +0000 https://engineering.fb.com/?p=23541 WhatsApp has adopted and rolled out a new layer of security for users – built with Rust – as part of its effort to harden defenses against malware threats. WhatsApp’s experience creating and distributing our media consistency library in Rust to billions of devices and browsers proves Rust is production ready at a global scale. [...]

    Read More...

    The post Rust at Scale: An Added Layer of Security for WhatsApp appeared first on Engineering at Meta.

    ]]>
  • WhatsApp has adopted and rolled out a new layer of security for users – built with Rust – as part of its effort to harden defenses against malware threats.
  • WhatsApp’s experience creating and distributing our media consistency library in Rust to billions of devices and browsers proves Rust is production ready at a global scale.
  • Our Media Handling Strategy

    WhatsApp provides default end-to-end encryption for over 3 billion people to message securely each and every day. Online security is an adversarial space, and to continue ensuring users can keep messaging securely, we’re constantly adapting and evolving our strategy against cyber-security threats – all while supporting the WhatsApp infrastructure to help people connect. 

    For example, WhatsApp, like many other applications, allows users to share media and other types of documents. WhatsApp helps protect users by warning about dangerous attachments like APKs, yet rare and sophisticated malware could be hidden within a seemingly benign file like an image or video. These maliciously crafted files might target unpatched vulnerabilities in the operating system, libraries distributed by the operating system, or the application itself.

    To help protect against such potential threads, WhatsApp is increasingly using the Rust programming language, including in our media sharing functionality. Rust is a memory safe language offering numerous security benefits. We believe that this is the largest rollout globally of any library written in Rust.

    To help explain why and how we rolled this out, we should first look back at a key OS-level vulnerability that sent an important signal to WhatsApp around hardening media-sharing defenses.

    2015 Android Vulnerability: A Wake-up Call for Media File Protections

    In 2015, Android devices, and the applications that ran on them, became vulnerable to the “Stagefright” vulnerability. The bug lay in the processing of media files by operating system-provided libraries, so WhatsApp and other applications could not patch the underlying vulnerability. Because it could often take months for people to update to the latest version of their software, we set out to find solutions that would keep WhatsApp users safe, even in the event of an operating system vulnerability. 

    At that time, we realized that a cross-platform C++ library already developed by WhatsApp to send and consistently format MP4 files (called “wamedia”) could be modified to detect files which do not adhere to the MP4 standard and might trigger bugs in a vulnerable OS library on the receiver side – hence putting a target’s security at risk. We rolled out this check and were able to protect WhatsApp users from the Stagefright vulnerability much more rapidly than by depending on users to update the OS itself.

    But because media checks run automatically on download and process untrusted inputs, we identified early on that wamedia was a prime candidate for using a memory safe language. 

    Our Solution: Rust at Scale

    Rather than an incremental rewrite, we developed the Rust version of wamedia in parallel with the original C++ version. We used differential fuzzing and extensive integration and unit tests to ensure compatibility between the two implementations.

    Two major hurdles were the initial binary size increase due to bringing in the Rust standard library and the build system support required for the diverse platforms supported by WhatsApp. WhatsApp made a long-term bet to build that support. In the end, we replaced 160,000 lines of C++ (excluding tests) with 90,000 lines of Rust (including tests). The Rust version showed performance and runtime memory usage advantages over the C++. Given this success, Rust was fully rolled out to all WhatsApp users and many platforms: Android, iOS, Mac, Web, Wearables, and more. With this positive evidence in hand, memory safe languages will play an ever increasing part in WhatsApp’s overall approach to application and user security.

    Over time, we’ve added more checks for non-conformant structures within certain file types to help protect downstream libraries from parser differential exploit attempts. Additionally, we check higher risk file types, even if structurally conformant, for risk indicators. For instance, PDFs are often a vehicle for malware, and more specifically, the presence of embedded files and scripting elements within a PDF further raise risks. We also detect when one file type masquerades as another, through a spoofed extension or MIME type. Finally, we uniformly flag known dangerous file types, such as executables or applications, for special handling in the application UX. Altogether, we call this ensemble of checks “Kaleidoscope.” This system protects people on WhatsApp from potentially malicious unofficial clients and attachments. Although format checks will not stop every attack, this layer of defense helps mitigate many of them.

    Each month, these libraries are distributed to billions of phones, laptops, desktops, watches, and browsers running on multiple operating systems for people on WhatsApp, Messenger, and Instagram. This is the largest ever deployment of Rust code to a diverse set of end-user platforms and products that we are aware of. Our experience speaks to the production-readiness and unique value proposition of Rust on the client-side.

    How Rust Fits In To WhatsApp’s Approach to App Security

    This is just one example of WhatsApp’s many investments in security. It’s why we built default end-to-end encryption for personal messages and calls, offer end-to-end encrypted backups, and use key transparency technology to verify a secure connection, provide additional calling protections, and more.

    WhatsApp has a strong track record of being loud when we find issues and working to hold bad actors accountable. For example, WhatsApp reports CVEs for important issues we find in our applications, even if we do not find evidence of exploitation. We do this to give people on WhatsApp the best chance of protecting themselves by seeing a security advisory and updating quickly.

    To ensure application security, we first must identify and quantify the sources of risk. We do this through internal and external audits like NCC Group’s public assessment of WhatsApp’s end-to-end encrypted backups, fuzzing, static analysis, supply chain management, and automated attack surface analysis. We also recently expanded our Bug Bounty program to introduce the WhatsApp Research Proxy – a tool that makes research into WhatsApp’s network protocol more effective.

    Next, we reduce the identified risk. Like many others in the industry, we found that the majority of the high severity vulnerabilities we published were due to memory safety issues in code written in the C and C++ programming languages. To combat this we invest in three parallel strategies:

    1. Design the product to minimize unnecessary attack surface exposure.
    2. Invest in security assurance for the remaining C and C++ code.
    3. Default the choice of memory safe languages, and not C and C++, for new code.

    WhatsApp has added protections like CFI, hardened memory allocators, safer buffer handling APIs, and more. C and C++ developers have specialized security training, development guidelines, and automated security analysis on their changes. We also have strict SLAs for fixing issues uncovered by the risk identification process.

    Accelerating Rust Adoption to Enhance Security

    Rust enabled WhatsApp’s security team to develop a secure, high performance, cross-platform library to ensure media shared on the platform is consistent and safe across devices. This is an important step forward in adding additional security behind the scenes for users and part of our ongoing defense-in-depth approach. Security teams at WhatsApp and Meta are highlighting opportunities for high impact adoption of Rust to interested teams, and we anticipate accelerating adoption of Rust over the coming years.

    The post Rust at Scale: An Added Layer of Security for WhatsApp appeared first on Engineering at Meta.

    ]]>
    23541