Parag Mali - tag: hyper-v

Hyper-V Enlightenments, VMBus, and the Synthetic Device Model

noreply@paragmali.com (Parag Mali) — Thu, 14 May 2026 00:00:00 GMT

Hyper-V's guest OSes do not see emulated 1990s hardware. They see a published, versioned hypervisor ABI called the **Top-Level Functional Specification**, a transport called **VMBus** that consists of two ring buffers per channel, and a catalogue of synthetic devices whose backends live in the privileged root partition. This design is what makes Windows and Linux equally fast inside Hyper-V, and it is also why the host-side parsers in `vmswitch.sys` keep producing critical CVEs. The 2024 OpenHCL paravisor moves those parsers into the guest's own trust boundary in memory-safe Rust, which is the most consequential change to the Hyper-V device model since 2008.

1. The Type-1 hypervisor foundation

Open Task Manager on a modern Windows 11 desktop, switch to the Performance tab, and look at the line that says "Virtualization: Enabled." That single line hides one of the most consequential design choices in modern operating systems: when Microsoft shipped Hyper-V with Windows Server 2008 in June 2008 [@ms-hyperv-server-overview], they did not bolt a virtualization product on top of Windows. They put a small hypervisor underneath it.

That ordering matters more than it sounds. In the older Microsoft Virtual Server 2005 model, Windows ran on the bare metal and a user-mode service emulated PC hardware for guests inside it. In the Hyper-V architecture documented by Microsoft in 2008 [@ms-hyperv-architecture], the hypervisor boots first and Windows itself becomes a guest of the hypervisor. Microsoft calls this guest the root partition. Every other VM on the box is a child partition.

A hypervisor that runs directly on the physical hardware rather than inside a host operating system. Hyper-V, VMware ESXi, and Xen are Type-1; VirtualBox and the original Microsoft Virtual Server are Type-2 (hosted). In a Type-1 design no general-purpose OS sits between the hypervisor and the silicon, which lets the hypervisor enforce isolation directly using CPU virtualization extensions like Intel VT-x and AMD-V.

The root partition is not just another VM. It is a privileged partition: it owns the physical I/O devices, runs the parent stack of synthetic-device backends, and brokers everything that touches real hardware. Children get virtual processors and a slice of memory, and they communicate with the root over a software bus called VMBus that we will spend most of this article taking apart.

flowchart TD HW["Physical hardware (CPU, RAM, NICs, NVMe)"] HV["Hyper-V hypervisor (microkernel)"] Root["Root partition (Windows Server)"] VSP["Virtualization Service Providers (VSPs): vmswitch.sys, storvsp.sys, ..."] C1["Child partition: Windows VM"] C2["Child partition: Linux VM"] VSC1["VSCs: netvsc, storvsc, ..."] VSC2["VSCs: hv_netvsc, hv_storvsc, ..."] HW --> HV HV --> Root HV --> C1 HV --> C2 Root --> VSP VSP -. "VMBus channel" .-> VSC1 VSP -. "VMBus channel" .-> VSC2 C1 --> VSC1 C2 --> VSC2

The hypervisor itself is small by design. The Hyper-V architecture page on Microsoft Learn [@ms-hyperv-architecture-perf] describes it as a microkernel: it does the minimum a hypervisor must do (CPU scheduling, memory partitioning, interrupt routing, an inter-partition message bus) and pushes everything else, including the device models, out to the root partition. This is the opposite of the early VMware ESX design, where the hypervisor itself contained large device drivers.The microkernel choice was pragmatic, not ideological. A monolithic hypervisor with built-in NIC and storage drivers would have been a catastrophic certification problem: every NIC firmware update would risk a hypervisor patch. By delegating I/O to the Windows root partition, Microsoft re-used the entire Windows driver stack.

The split also explains why Hyper-V "feels Windows-shaped" even though it is technically not Windows. The root partition is Windows, with all of its drivers, its WMI, its event log, its Get-VM PowerShell cmdlets. The hypervisor underneath is a small, separate binary (hvix64.exe on Intel, hvax64.exe on AMD) that you almost never have a reason to think about. Microsoft itself goes further: in the same architecture document, it stresses that all device-model traffic flows through the root: "the management operating system hosts virtual service providers (VSPs) that communicate over the VMBus to handle device access requests from child partitions" (Microsoft Learn: Overview of Hyper-V [@ms-overview-hyper-v]).

This sets up the question the rest of the article answers: if the hypervisor is small, the guest is unmodified Windows or Linux, and the root partition owns the real devices, then how does a guest actually do disk and network I/O at gigabit-or-better speeds without paying enormous costs to traverse all of these boundaries?

The short answer is in three pieces: enlightenments (the guest knows it is virtualized and uses hypercalls), VMBus (the inter-partition transport), and the VSP/VSC pair (split drivers that share memory through VMBus rings). The next section starts with the first of those three.

2. Enlightenments: what "knowing you are virtualized" buys you

In the early 2000s, the dominant intuition was that a hypervisor's job is to fool the guest. A perfectly faithful emulation of an Intel 440BX motherboard, a DEC 21140 NIC, and an IDE controller is what made VMware Workstation a useful product in 1999. It is also what made Microsoft Virtual Server 2005 too slow to saturate gigabit links: every out instruction on a fake NIC port trapped to the hypervisor, was decoded against an in-memory chip model, and produced a synthetic interrupt that itself trapped on the way out. The Microsoft Virtual Server retrospective on Wikipedia [@wikipedia-virtual-server] notes that the architecture had no paravirtualization support and that performance was constrained relative to later hardware-assisted designs.

Hyper-V's answer was to drop the pretence. If the guest knows it is in a VM, it can use a fast path designed for VMs instead of pretending to drive imaginary chips. Microsoft calls this knowledge an enlightenment, and the Hyper-V feature discovery page [@ms-tlfs-feature-discovery] is the contract a guest uses to learn what enlightenments the hypervisor offers.

A modification or feature in a guest operating system that takes advantage of running under a specific hypervisor. An enlightened guest detects the hypervisor (on x86, by reading the `cpuid` leaves at `0x40000000` and above), then opts in to using paravirtual interfaces (hypercalls, synthetic timers, synthetic interrupt controllers, shared TSC pages) instead of trapping on emulated hardware. An unmodified guest would still boot, but slower.

Detection is the cheap part. The Linux kernel's Hyper-V overview document [@kernel-hyperv-overview] describes four cooperating mechanisms, layered atop one another: implicit traps that the hypervisor handles transparently, explicit hypercalls the guest issues on purpose, synthetic registers exposed as model-specific registers (MSRs) in the architectural CPU register file, and VMBus for high-bandwidth device traffic. Each layer builds on the one below it.

Key idea: The contract between Hyper-V and its guests is published. Microsoft maintains the Top-Level Functional Specification as a public document under the Open Specification Promise. That single decision is why Linux ships an in-tree Hyper-V driver stack and why VMBus is not a black box.

The hypercall page

The first thing an enlightened guest does is set up a hypercall page. The TLFS Hypercall Interface page [@ms-tlfs-hypercall] describes the dance: the guest writes its identity into HV_X64_MSR_GUEST_OS_ID (MSR 0x40000000), then writes a guest-physical address and an enable bit into HV_X64_MSR_HYPERCALL (MSR 0x40000001). The hypervisor responds by populating that page with the right opcode for the current CPU: vmcall on Intel, vmmcall on AMD. From that moment on, "make a hypercall" is a normal call into a known address rather than an opcode the kernel must hand-assemble per CPU vendor.This trick neatly externalises the vendor-specific calling convention. Microsoft can later swap to a new opcode (say, on ARM64, where the equivalent is an HVC instruction) without any guest code change. The guest just learns the new page contents.

The same TLFS page documents two hypercall classes: simple hypercalls (one operation, returns or faults) and rep (repeated) hypercalls that take a counter and a start index, so a long-running operation can yield mid-flight without losing work. Three calling conventions exist: a memory-based one for large parameter blocks, a register-only fast variant for the very common case of one or two inputs, and an XMM-register variant that lets a guest pass up to 112 bytes of input through SSE registers.

That XMM variant is unusual enough to flag. Most kernel ABIs do not touch SSE in privileged code because saving and restoring the full SSE state is expensive. Hyper-V's hypercall ABI uses XMM precisely because the round-trip cost of a hypercall is dominated by the VMEXIT itself, so squeezing a few more bytes into registers is cheaper than spilling them to memory and reading them back.

Synthetic interrupts and synthetic timers

A guest's virtual processor has its own emulated local APIC by default, but an enlightened guest can also use a Synthetic Interrupt Controller (SynIC), defined in the TLFS. Each virtual processor gets 16 SINT slots, a per-CPU shared message page, and a per-CPU shared event page. SINTs are how VMBus signals events to the guest without going through the legacy LAPIC fast path.

One of 16 logical interrupt sources per virtual processor that the Hyper-V Synthetic Interrupt Controller can signal. SINTs are reachable through MSRs (`HV_X64_MSR_SINT0` through `HV_X64_MSR_SINT15`) and back the doorbell mechanism for VMBus channels and for synthetic timers. They are paravirtual: they would not exist on a bare-metal CPU.

The clock side is even more interesting. The Linux kernel Hyper-V clocks documentation [@kernel-clocks] describes a reference TSC page that the hypervisor maintains in shared memory: it contains a scale factor and an offset such that

$$ \text{guest_time} = (\text{TSC} \times \text{scale}) >> 64 + \text{offset} $$

ticks at a constant 10 MHz frequency regardless of the underlying TSC. The guest's clock_gettime and gettimeofday can read TSC, multiply, shift, add, and return, all in user space via vDSO, with no kernel transition and no hypercall.

A web server that calls `clock_gettime` once per request, on a million-requests-per-second box, is a ridiculous workload that real systems run constantly. Without enlightenments, every call would be a `rdmsr` on a virtualised TSC or a trap into the hypervisor. With the reference TSC page, the same call is four arithmetic ops and a memory load. The kernel doc explains that this scale and offset survive live migration: "in the case of a live migration to a host with a different TSC frequency, Hyper-V adjusts the scale and offset values in the shared page so that the 10 MHz frequency is maintained" (Linux kernel: Hyper-V clocks [@kernel-clocks]).

Synthetic timers complete the picture. Each virtual CPU has four synthetic timers programmable via MSRs; they fire SINTs into the SynIC. The guest does not need to touch an emulated PIT or HPET. Combined, SynIC + synthetic timers + the reference TSC page mean that an enlightened guest can do most of its time-keeping and inter-partition signalling without ever touching the legacy interrupt/timer chip surface.

The TLFS as a contract

All of this is published. The Top-Level Functional Specification [@ms-tlfs] is the document a guest author reads to know which MSRs to write, which cpuid leaves to query, which hypercalls exist, and which features the hypervisor signals via feature flags. Microsoft maintains it under the Open Specification Promise. That promise is a deliberate contractual choice. Without it, Linux could not ship drivers/hv/ in-tree and Microsoft could not credibly claim that Linux is a first-class Hyper-V guest. The TLFS is the artefact that makes the rest of the architecture cooperative rather than reverse-engineered.

The next layer up uses these primitives to build something more ambitious: a general-purpose inter-partition transport.

3. VMBus: the inter-partition transport

If enlightenments are the alphabet, VMBus is the language that synthetic devices speak. The Linux kernel VMBus document [@kernel-vmbus] puts the definition tersely: "VMBus is a software construct provided by Hyper-V to guest VMs. It consists of a control path and common facilities used by synthetic devices that Hyper-V presents to guest VMs. The common facilities include software channels for communicating between the device driver in the guest VM and the synthetic device implementation that is part of Hyper-V, and signaling primitives to allow Hyper-V and the guest to interrupt each other."

There is a lot in that paragraph. Let me unpack it, because this is the architectural core.

A software-only inter-partition communication bus provided by Hyper-V. It has a control path (channel offer, open, close, rescind), and per-device data channels built on shared memory ring buffers. VMBus is not a real bus in any hardware sense; nothing on the PCIe topology is named VMBus. It is a contract between guest drivers and the hypervisor.

Channels and the offer protocol

Every synthetic device a guest sees corresponds to a VMBus channel. The root partition advertises (OfferChannel) the list of devices a guest is permitted to use. The guest's VMBus driver iterates the offers, matches each to a class GUID (synthetic SCSI is one GUID, synthetic NIC is another, the input-style vmbusrhid device is a third), and binds an in-kernel device driver to each one. The reverse operation, RescindChannel, lets the host revoke a device cleanly, which is what happens during live migration when an SR-IOV virtual function gets pulled out from under a running VM.

sequenceDiagram participant Root as Root partition (VSP) participant HV as Hyper-V hypervisor participant Guest as Guest VM (VSC) Root->>HV: OfferChannel(class_guid, instance_guid) HV->>Guest: ChannelOffer message via SynIC Guest->>HV: OpenChannel(ringbuf_gpa, signal_event) HV->>Root: Channel opened loop steady-state I/O Guest->>Root: write descriptor + payload to ring, signal SINT Root->>Guest: write response to ring, signal SINT end Root->>HV: RescindChannel(instance_guid) HV->>Guest: ChannelRescind via SynIC Guest->>Root: CloseChannel

Two ring buffers, one channel

Each open channel is two unidirectional ring buffers in shared memory: one for guest-to-host messages, one for host-to-guest. Each ring has a 4 KiB header page that holds the read index, the write index, and control flags, plus a power-of-two payload region. The guest tells the hypervisor which guest-physical pages back the ring through an object called a GPA Descriptor List (GPADL), built up via the vmbus_establish_gpadl API.

The kernel doc reveals a small but durable engineering detail. It maps the ring buffer twice in the guest's kernel virtual address space: header page first, ring contents next, and then the ring contents again, contiguously. Why? Because that lets a copy loop walk past the end of the ring without writing wrap-around code; the next byte after the ring's last byte is the ring's first byte, by virtual-memory arrangement. It is the same trick used inside the Linux page cache for fbdev and inside DPDK's mempool. It costs a little address space; it saves a branch on every payload byte.The Linux kernel doc is explicit that this double-mapping convenience exists in the guest only. If you are writing a userspace tool that ingests a captured VMBus ring (for forensics or debugging) you must implement wrap-around manually. This is exactly the kind of detail that source code documentation captures and prose articles forget.

The total amount of GPADL-shared memory a single guest can hold is capped per Windows version. The kernel doc records the numbers: roughly 1280 MiB on Windows Server 2019 and later, roughly 384 MiB on earlier hosts (Linux kernel: VMBus [@kernel-vmbus]). For a guest with 30+ channels (multiple netvsc subchannels, multiple storvsc subchannels, vPCI, KVP, time sync, VSS, balloon, framebuffer), that ceiling is real but not yet limiting at typical ring sizes of 1 to 16 MiB per direction.

The doorbell

Shared memory alone is not enough. The guest can write into the ring all it wants; the host will not look until it is told to. Conversely, the host can write into the ring; the guest will not check until something signals it. That signal is the doorbell, and it is implemented via the Synthetic Interrupt Controller SINTs introduced in the previous section.

When the guest enqueues a request and the host's read pointer is already chasing it (i.e., the host is still processing the last batch), the guest can suppress the doorbell entirely. Only the first request after the host has caught up triggers a hypercall. This is interrupt coalescing in software, and it is the single most important performance lever on a software data plane: the round-trip cost of a VMEXIT is amortised across many packets.

Note: This same shape, shared memory rings plus an event-channel doorbell, was the central insight of Xen's split-driver paravirtualization model in 2003 [@xen-pv-wiki]). Hyper-V's contribution was not the shape; it was packaging the shape so unmodified Windows guests could use it via in-box drivers, and publishing the protocol so unmodified Linux could too.

VSPs and VSCs

The two endpoints of a channel have specific names. The Virtualization Service Provider (VSP) is the kernel module in the root partition that owns the device backend. The Virtualization Service Client (VSC) is the guest-side driver that talks to the VSP through the channel. Microsoft's own architecture page is precise: "the Hyper-V-specific I/O architecture consists of virtualization service providers (VSPs) in the root partition and virtualization service clients (VSCs) in the child partition. Each service is exposed as a device over VM Bus, which acts as an I/O bus and enables high-performance communication between VMs that use mechanisms such as shared memory" (Microsoft Learn: Hyper-V architecture [@ms-hyperv-architecture-perf]).

**VSP** (Virtualization Service Provider): a kernel module in the root partition that exposes a synthetic device backend to guests over a VMBus channel. Examples: `vmswitch.sys` (synthetic NIC), `storvsp.sys` (synthetic SCSI), the `vmbusrhid` server (synthetic input). **VSC** (Virtualization Service Client): the matching driver in the guest that consumes the channel and presents an OS-native device interface (a NIC, a SCSI controller, a keyboard) to the rest of the kernel.

The split is symmetric in transport (both sides use the same ring) but asymmetric in trust. The VSP runs in the most privileged context on the box, the root partition's kernel. The VSC runs in a normal guest kernel. Every byte that flows from guest to host crosses a trust boundary and gets parsed by code with full system privilege. The next two sections will return to this fact at length, because it is where the security story lives.

Why this works for closed-source guests

The Xen project tried something similar in 2003 with netfront/blkfront rings and event channels, but Xen PV required a paravirtualised guest kernel: the guest had to know it was running on Xen at compile time. Closed-source guests like Windows could not be modified, so Xen's wiki [@xen-pv-wiki]) eventually documents PV-on-HVM as a workaround.

Hyper-V finessed this with hardware virtualization. The guest kernel runs unmodified inside VT-x or AMD-V; CPU-level privilege separation handles the privileged instructions. The only thing the guest needs to do to opt into VMBus is load a driver. Every supported Windows version since Windows 7 / Server 2008 R2 ships those drivers in-box. Linux ships them in-tree from kernel 2.6.32 onward. There is no separate "install paravirt drivers" step, which is why Hyper-V "just works" for almost any guest you point at it.

The transport is settled. What rides on it is a catalogue.

4. Synthetic device classes: storage, network, input, video, vPCI

A modern Hyper-V guest, on first boot, sees a small zoo of devices that have nothing to do with PC hardware. There is no IDE controller, no PS/2 keyboard, no Cirrus VGA. There is a synthetic SCSI controller, a synthetic NIC, a synthetic keyboard and mouse, a synthetic framebuffer, and (often) a synthetic PCI passthrough channel. Each is a VSP/VSC pair on top of VMBus.

The Linux kernel VMBus document [@kernel-vmbus] enumerates the catalogue: synthetic SCSI controller (storvsc), synthetic NIC (netvsc), synthetic framebuffer (synthvid), synthetic keyboard, synthetic mouse, PCI passthrough, plus the non-device services: heartbeat, time sync, shutdown, memory balloon, KVP exchange, and online backup (VSS).

flowchart LR subgraph Guest nv["netvsc (NIC)"] st["storvsc (SCSI)"] sv["synthvid (framebuffer)"] kb["hyperv-keyboard"] ms["hyperv-mouse"] pc["pci-hyperv (vPCI)"] kvp["hv_kvp (KVP)"] ts["hv_utils (timesync, shutdown, heartbeat)"] end subgraph Root vsw["vmswitch.sys"] sto["storvsp.sys"] sfb["synthvid VSP"] rhid["vmbusrhid VSP"] vpci["vPCI VSP"] kvpd["KVP daemon"] tsd["IS daemons"] end nv -- "VMBus channel" --- vsw st -- "VMBus channel(s)" --- sto sv -- "VMBus channel" --- sfb kb -- "VMBus channel" --- rhid ms -- "VMBus channel" --- rhid pc -- "VMBus channel" --- vpci kvp -- "VMBus channel" --- kvpd ts -- "VMBus channel" --- tsd

Synthetic SCSI: storvsc

The storvsc VSC presents itself to the guest as a SCSI host bus adapter. Disks attached to the VM appear as SCSI LUNs hanging off that HBA. The wire protocol uses ring buffers carrying SRB (SCSI Request Block) style commands. To scale, storvsc can open multiple sub-channels, one per host CPU, so that I/O completion interrupts and request submission spread across cores rather than serialising on a single VMBus channel.

This is also why Hyper-V's "Generation 2" VMs work. A Generation 2 VM [@ms-gen1-gen2-vms], introduced in Windows Server 2012 R2 in 2013, has no IDE controller in the boot path at all. UEFI loads the OS loader from a synthetic SCSI device, the OS loader hands off to the kernel, and the kernel binds storvsc to the same device. The legacy IDE emulator simply never runs. That removes a lot of attack surface and lets boot volumes grow up to 64 TB on VHDX.

Synthetic NIC: netvsc

netvsc is the synthetic NIC. The wire protocol historically wrapped Microsoft's NDIS-style RNDIS frames around payloads sent through the channel ring, which is why some Linux discussions mention "RNDIS frames over VMBus." The Linux driver lives in drivers/net/hyperv/ and the kernel netvsc documentation [@kernel-netvsc] describes how it can spread receive-side traffic across multiple VMBus subchannels via Receive Side Scaling.

netvsc is also the one device class where Hyper-V composes with hardware passthrough. Section 8 will take this apart in detail; for now, note that the same netvsc VSC can run alongside an SR-IOV virtual function in the guest, with netvsc acting as the slow-path failover and the VF carrying the steady-state traffic.

Synthetic input: vmbusrhid

The synthetic keyboard, the synthetic mouse, and a few related input streams ride on a server in the root partition called vmbusrhid (the name is shorthand for "VMBus relay HID"). It is a small surface in bytes, but architecturally it has the same shape as netvsc: guest-controllable messages parsed in kernel mode in the root partition. Anyone evaluating the trust boundary should treat it the same way as netvsc, even though the data rate is six orders of magnitude lower.

Note: A path that carries 100 keystrokes per second is, on the wire, almost free. As an attack surface, it is identical to a path that carries a million packets per second: both are guest-controlled bytes parsed by privileged code. Section 7 walks through why the security community treats vmbusrhid the way it treats vmswitch.sys.

Synthetic video: synthvid

synthvid is a synthetic framebuffer. It is what lets you connect to a Hyper-V VM through the Virtual Machine Connection client without dragging in an emulated VGA. It is intentionally simple: there is no 3D acceleration in the synthetic path. Workloads that need GPU acceleration use a different mechanism, vPCI / DDA, to assign a real GPU to the VM.

vPCI: synthetic PCI passthrough

The most subtle device class is pci-hyperv, which exposes a virtual PCIe topology to the guest. The Linux kernel vPCI document [@kernel-vpci] describes the trick: a passthrough device is offered to the guest initially over VMBus (the channel carries the device's PCI configuration space and BARs), and once the guest's vPCI driver has constructed a real PCI device object for it, the device dual-identifies as a normal PCIe device. The vendor driver can then load against it.

This is the mechanism behind both Hyper-V's Discrete Device Assignment (DDA) [@ms-dda] and Azure's Accelerated Networking, which we will return to in Section 8. The DDA planning document is explicit that Microsoft formally supports DDA for GPUs and NVMe storage as device classes; other PCIe devices are "likely to work" but require vendor support.

Generation-1 vs Generation-2: a quick decoder

Putting the device classes side by side clarifies why the move from Generation-1 to Generation-2 VMs simplified so much:

Element	Generation-1 VM (legacy)	Generation-2 VM (since 2013)
Firmware	BIOS	UEFI with Secure Boot
Boot disk	Emulated IDE	Synthetic SCSI (`storvsc`)
Network on boot	Emulated DEC 21140 fallback	Synthetic NIC (`netvsc`)
Input	Emulated PS/2 + `vmbusrhid`	`vmbusrhid` only
Display	Emulated VGA + `synthvid`	`synthvid` only
Max boot VHDX	2 TB	64 TB
Source	Microsoft Learn: Gen 1 vs Gen 2 [@ms-gen1-gen2-vms]	Same

Generation-2 is what the Hyper-V architecture wanted to be from the beginning: an all-synthetic stack with no fallback to imaginary 1990s chipsets. The two-generation existence was not a design preference; it was the cost of supporting older operating systems whose boot loaders only knew about BIOS and IDE. Today, every modern Windows and modern Linux supports Generation-2; Generation-1 remains for legacy guests.

Counting boundary crossings

The shape of the hot path is now visible. To send one network packet from a guest:

The guest writes one descriptor and one payload copy into the netvsc TX ring (one memory copy).
The guest possibly fires a doorbell (one hypercall, often suppressed if the host has not caught up).
The host's vmswitch.sys reaps the descriptor, parses it, and forwards it through the virtual switch to a real NIC.

A single packet's hot path is at most one hypercall and one memory copy in the guest, plus host-side ring traversal. Section 8's comparison table will quantify how this stacks up against virtio and SR-IOV, but the scale is clear: paravirt I/O on Hyper-V is orders of magnitude cheaper per packet than full PC emulation, and the gap closes only when you go all the way to hardware passthrough.

The catalogue is set. Now, who actually wrote the Linux side of all this?

5. Linux Integration Services: Microsoft writes Linux drivers

In December 2009, Microsoft did something quietly historic. Linux kernel 2.6.32 merged a set of drivers under drivers/staging/hv/, contributed by Microsoft itself, that taught the Linux kernel to be an enlightened Hyper-V guest. The kernel.org Hyper-V index page [@kernel-hyperv-index] is the maintained landing page for that work. Over the next several releases the drivers moved out of staging/, settled at drivers/hv/, drivers/net/hyperv/, drivers/scsi/storvsc_drv.c, and drivers/pci/controller/pci-hyperv.c, and became the default in every mainstream distribution.

That set of drivers is collectively called Linux Integration Services (LIS).

The set of in-kernel Hyper-V guest drivers that Microsoft contributes to upstream Linux. Includes `hv_vmbus` (the VMBus core), `hv_netvsc` (synthetic NIC), `hv_storvsc` (synthetic SCSI), `hv_utils` (KVP, time sync, shutdown, heartbeat, VSS), `pci-hyperv` (vPCI), and `hv_balloon` (memory ballooning). The same code that Microsoft maintains in the Linux tree powers Linux guests on Hyper-V on Windows Server, on Azure, and on developer Hyper-V on Windows 11.

The reason this matters is bigger than convenience. In 2009, Linux had a long, painful history with Hyper-V's competitors. VMware shipped open-vm-tools but the deepest paravirt drivers (VMXNET3, PVSCSI) lived in vendor packages. Xen's PV drivers existed in-tree but their evolution depended on Citrix and the Xen project. By contributing the full driver stack upstream and committing to keep it there, Microsoft chose a different route: they put the spec (the TLFS) and the implementation (LIS) in the open at the same time.

Microsoft did not just publish a hypervisor specification and hope Linux would adopt it. They wrote the Linux drivers themselves and upstreamed them, and then they kept doing it for fifteen years.

You can see the maintenance pattern in any current kernel. The drivers/hv/ directory has continuous commit activity from Microsoft engineers. Kernel-doc files like the VMBus [@kernel-vmbus], clocks [@kernel-clocks], vPCI [@kernel-vpci], overview [@kernel-hyperv-overview], and CoCo VM [@kernel-coco] pages are written by the same engineers who write the drivers. Several of those documents are the most lucid descriptions of the architecture that exist anywhere in public.One unexpected consequence: the Linux kernel docs are often easier to read for the architecture than Microsoft's own customer-facing docs. The customer docs answer "how do I configure this?"; the kernel docs answer "what is actually happening?" When researching this article, I found that the cleanest single description of VMBus channel lifecycle is the Linux kernel doc, not the TLFS.

What "in-box" really means

Both major guests now ship VMBus support without any post-install step:

On Windows, the VMBus client stack is built into every supported Windows version since Windows 7 / Windows Server 2008 R2. The legacy Integration Services package, which once shipped as an ISO you mounted into the VM, is no longer needed on supported Windows.
On Linux, the drivers are in-tree from kernel 2.6.32 (December 2009) onward and ship in every mainstream distro.

The kernel.org Hyper-V overview document [@kernel-hyperv-overview] explicitly warns against installing legacy LIS packages on top of a kernel that already has the in-tree drivers: it can break MSI-X handling and PCI passthrough. This is the kind of operational footgun that survives precisely because the in-box answer is correct and the LIS package is a holdover from earlier kernels.

A practical smoke test

You can confirm a Linux guest is using its enlightenments without any vendor tooling. The kernel exposes cpuid leaves and Hyper-V detection through dmesg and through /sys. A small script makes it concrete:

{ // This logic mirrors what \dmesg | grep -i hyperv` and a peek into // /sys/devices/virtual/misc/vmbus would tell you on a real Linux Hyper-V guest.

const guestObservations = { cpuidSig: '0x40000000', // Microsoft's vendor signature for Hyper-V guestOsIdMsr: 0x40000000, // HV_X64_MSR_GUEST_OS_ID, written by the guest hypercallMsr: 0x40000001, // HV_X64_MSR_HYPERCALL, returns the hypercall page vmbusModuleLoaded: true, netvscDevice: '/sys/class/net/eth0/device/driver', netvscDriverName: 'hv_netvsc', storvscModuleLoaded: true, };

function isEnlightenedHyperVGuest(o) { if (o.cpuidSig !== '0x40000000') return false; if (!o.vmbusModuleLoaded) return false; if (o.netvscDriverName !== 'hv_netvsc') return false; return true; }

console.log( isEnlightenedHyperVGuest(guestObservations) ? 'Yes: Hyper-V enlightened, using netvsc + storvsc' : 'No: running on emulated PC hardware or non-Hyper-V hypervisor' ); `}

The point is not the script itself (anyone can write a few lines of awk against dmesg); it is that the verification surface is public. The CPU vendor signature, the MSRs, the kernel module names, the /sys paths are all documented. There is nothing to reverse-engineer.

Why this earned trust

Two pieces of practical evidence persuaded the Linux community that LIS was not a strategic trap:

The drivers stayed upstream. From 2009 to the present, Microsoft has maintained the drivers/hv/ tree, responded to maintainer feedback, and shipped patches through the normal kernel process.
The TLFS stayed accurate. Successive Hyper-V releases either matched what the TLFS said or updated the TLFS. There was no second, secret protocol.

The combination put Microsoft in the unusual position of being the most open hypervisor vendor for Linux guest support. (VirtIO on KVM has a richer cross-vendor story; that comparison is Section 8.) This open posture is also what set up the 2024 OpenVMM open-sourcing as a credible move rather than a stunt.

But before we get to OpenVMM, we need to look at a different way Hyper-V matters: not just as a substrate for VMs, but as a substrate for in-VM security boundaries inside Windows itself.

6. VBS and HVCI: Hyper-V as the trust anchor inside Windows

Up to this point the article has treated Hyper-V as a virtualization product: a thing that hosts VMs. Starting in Windows 10 and Windows Server 2016 [@ms-server-2016], Microsoft began using the same hypervisor for a different job: enforcing security boundaries inside a single OS install. The umbrella name is Virtualization-Based Security (VBS).

The mechanism is simple in description and subtle in consequences. The hypervisor splits a single guest's address space into two Virtual Trust Levels (VTLs). The lower one, VTL0, runs the normal Windows kernel and user mode (this is where explorer.exe and your browser live). The higher one, VTL1, runs a much smaller stack called the Secure Kernel plus a set of isolated user-mode services called trustlets. A compromise of VTL0, even of ntoskrnl.exe, cannot read or write VTL1 memory because the hypervisor enforces that boundary using the same hardware machinery (Intel EPT / AMD NPT, plus Intel VT-d / AMD-Vi for DMA) that it uses to isolate one VM from another.

A Hyper-V construct that partitions a single guest's address space into multiple privilege tiers enforced by the hypervisor. VTL0 hosts the normal kernel and user mode; VTL1 hosts the Secure Kernel and trustlets. The hypervisor presents each VTL with its own separate set of memory mappings, system registers, and interrupt state, so code running at VTL0 cannot read VTL1's memory even if it has run-as-NT-AUTHORITY-SYSTEM privilege. flowchart TD HV["Hyper-V hypervisor"] subgraph Guest["A single Windows guest"] subgraph VTL0["VTL0 (normal world)"] User0["User mode: apps"] Kernel0["NT kernel"] end subgraph VTL1["VTL1 (secure world)"] SK["Secure Kernel"] Trustlets["Trustlets: LSAIso, BIOiso, ..."] end end HV --> Guest HV -. "EPT + IOMMU enforcement" .-> VTL0 HV -. "EPT + IOMMU enforcement" .-> VTL1 Kernel0 -. "VTL switch (hypercall)" .-> SK

What lives in VTL1

The flagship inhabitant of VTL1 is Hypervisor-protected Code Integrity (HVCI), which moves kernel-mode page-table integrity checking into the Secure Kernel. With HVCI on, no VTL0 driver can mark a kernel page as both writable and executable; the Secure Kernel mediates the page tables and refuses the request. The result is that attackers who already have code execution in the NT kernel cannot trivially load arbitrary unsigned kernel code or build new executable JIT pages on the fly.

The other tenants of VTL1 are trustlets. The most familiar is lsaiso.exe (LSA Isolation), which holds the cached domain credentials that historically lived in lsass.exe and were the prime target for tools like Mimikatz. With Credential Guard on, those secrets move to a trustlet whose memory is unreadable from VTL0; even SYSTEM-level malware in the normal world cannot extract them. Other trustlets handle biometric template storage, key isolation for code integrity policy, and similar small, security-sensitive workloads.

Why the hypervisor is the right place for this

Putting these protections inside the hypervisor rather than inside the kernel has a property that no in-kernel mitigation can match: the protected component does not share an address space with the attacker. A defence built inside ntoskrnl.exe (PatchGuard, KASLR, control-flow guard) lives in the same memory the attacker is trying to corrupt. A defence built inside VTL1 lives in memory the attacker cannot touch, because the page tables that map it are themselves invisible from VTL0.

Note: Pre-VBS Windows had decades of memory-safety bugs in the NT kernel. After VBS, exploiting one of those bugs no longer immediately yields the attacker the ability to read LSASS secrets or load arbitrary kernel code. The attacker now needs a second bug, in the much smaller Secure Kernel codebase. The defender's effective budget went up by a large multiplier without rewriting a single line of NT.

How this connects back to VMBus

VBS would not be possible without the work the previous sections described. The Secure Kernel is what runs in VTL1; it needs to communicate with VTL0 for ordinary system services (the lsaiso.exe process must respond to authentication requests from VTL0 callers, the HVCI mediator must answer page-table requests, and so on). The signalling and shared-memory primitives that make those calls cheap are the same SynIC and shared-page primitives that VMBus uses between partitions.

In other words, the architecture Microsoft built in 2008 to give a Windows VM a fast network card became, in 2016, the architecture that gives a single Windows install a security boundary stronger than its own kernel. The same hypervisor, the same trust-mediation primitives, two completely different applications.

Windows Server 2019 [@ms-server-2019] extended this further with Hyper-V isolation for containers, where a container's lightweight VM gets its own kernel inside a tiny VTL0 of its own. The pattern is consistent: every time Windows wanted a stronger isolation primitive, the answer was "use the hypervisor."

This dual-use is the reason a serious Windows security review touches the Hyper-V codebase even on machines that nobody thinks of as virtualization hosts. A Hyper-V escape (a guest-to-host VMBus exploit) is not just "an exploit against Azure"; it is also, on a typical Windows 11 desktop with VBS enabled, an exploit against the boundary that protects LSASS secrets from kernel-mode malware.

That makes the next section's question urgent: how strong is the VMBus boundary, in practice?

7. VMBus security: every message is a parser at the trust boundary

Here is the part of the architecture worth being honest about. The same property that makes VMBus fast, namely that the host-side VSP runs in the root partition's kernel and parses guest-supplied bytes directly, also makes the VSP the most consequential piece of attack surface in the entire stack. Microsoft itself prices it that way: the Hyper-V Bug Bounty Program [@ms-bounty-hyperv] pays up to USD 250,000 specifically for guest-to-host escapes that hit this surface, which is among the highest payouts Microsoft offers for any category of vulnerability.

Key idea: Every byte that crosses a VMBus channel from a guest is a byte that a kernel-mode parser in the most privileged partition on the host has to interpret. The performance argument for a software data plane and the security argument against it are the same argument, looked at from opposite directions.

The historical record

Three CVEs make the pattern concrete:

CVE-2017-0075 is the Hyper-V escape that the Qihoo 360 Vulcan Team demonstrated at Pwn2Own 2017. The NVD entry [@nvd-cve-2017-0075] describes it as a Hyper-V flaw that "allows guest OS users to execute arbitrary code on the host OS via a crafted application." The reachable code was in a VMBus message handler on the host side.
CVE-2021-28476 is the canonical example. The NVD record [@nvd-cve-2021-28476] classifies it as a critical Hyper-V remote code execution vulnerability with a CVSS score of 9.9. The Akamai writeup with Guardicore and SafeBreach [@akamai-cve-2021-28476] traces the bug to vmswitch.sys, the synthetic-NIC VSP, and shows it had been present in production since the August 2019 vmswitch build. The exploit primitive is exactly what the architecture invites: a guest crafts an OID-style RNDIS request, sends it through the netvsc VMBus channel, and the host's kernel parser misvalidates a length, producing memory corruption in the most privileged kernel on the box.
CVE-2024-21407 is a more recent Hyper-V remote code execution vulnerability patched in March 2024 (NVD [@nvd-cve-2024-21407]). Its existence demonstrates that the bug class did not vanish; the same shape (guest-controlled message, host kernel parser, escalation to host code execution) keeps reappearing.

The MSRC bounty page ranges from \$5,000 for low-impact bugs to \$250,000 for full guest-to-host escapes (Microsoft bounty page [@ms-bounty-hyperv]). That price point is not a marketing number; it is Microsoft signalling what its threat model says these bugs are worth. A defender pricing their own controls should treat any VSP code path that parses guest-controlled data as a category that justifies the same level of attention as remote internet-facing services.

Why the bug class is structural

The pattern in all three CVEs is the same:

A guest writes carefully crafted bytes into a VMBus channel ring.
The guest fires the doorbell.
The host's VSP, running in the root partition's kernel, dequeues the message.
The VSP parses the message in C or C++ kernel code.
A memory-safety mistake (length confusion, missing bounds check, integer overflow) becomes a write or read primitive in the host kernel.

There is no exotic mechanism here. The exploit surface is "kernel C code parsing untrusted input," which has been the dominant source of remote-code-execution bugs in operating systems since the 1990s. The novelty is the location: the parser sits below the most privileged supervisor on the box, with full access to every other tenant's memory.

sequenceDiagram participant Mal as Malicious guest VM participant Ring as VMBus ring (shared memory) participant SInt as Synthetic Interrupt Controller participant VSP as Host VSP (e.g., vmswitch.sys, kernel) Mal->>Ring: Write crafted RNDIS-style message Mal->>SInt: Hypercall: signal channel event SInt-->>VSP: SINT delivered on host CPU VSP->>Ring: Read message header note over VSP: Length confusion / missing bounds check VSP->>VSP: Out-of-bounds write in root partition kernel note over VSP: Result: arbitrary code in the most privileged partition

Mitigations short of a rewrite

Microsoft's first line of defence is the same one every kernel team uses: ASLR, control-flow integrity, kernel hardening, fuzzing the parsers, code review of every new device class, and, on Azure specifically, isolating each tenant's compute hypervisor so a single compromised host does not become a multi-tenant disaster. The MSRC bounty program is partly a procurement mechanism for this same effort: pay researchers to find and report bugs before attackers find them in the wild.

A second line of defence is Generation-2 VMs (Microsoft Learn [@ms-gen1-gen2-vms]), which remove the legacy emulators (IDE, PS/2, PIC) from the host data path entirely. Every emulator removed is one fewer parser in the most privileged kernel.

A third is the Microsoft Hyper-V architecture page [@ms-hyperv-architecture-perf]'s "minimise root-partition exposure" guidance: configure hosts with the smallest set of root-partition services that the workload requires, since every service is potential surface.

These all help, but none of them change the structural fact that VSPs parse guest-controlled data in C/C++ kernel code. The next architectural shift, the one that does change that fact, is what Section 9 is about.

Side channels and the Spectre era

VMBus also has to defend against side-channel attacks across the partition boundary. The same Spectre / Meltdown / L1TF mitigations that apply to a multi-tenant hypervisor in general apply to Hyper-V specifically. Microsoft's broader hypervisor mitigation strategy interacts with VMBus mostly indirectly: the SynIC, the hypercall page, and the timer subsystem all needed audit and adjustment when these classes of attacks emerged. The detail is largely outside the scope of an article about the device model, but the takeaway is consistent with the rest of this section: any shared CPU resource between partitions is a potential attack surface, and "shared via the hypervisor's bus" is no exception.

The structural answer to all of this, the one Microsoft itself has been working toward, is to change the languages and the trust boundaries. To set that up, the next section first widens the field by comparing VMBus to its peer in the KVM world, virtio.

8. VMBus vs virtio: two answers to the same question

Hyper-V is not the only hypervisor with a paravirt I/O story. The KVM world evolved its own answer to the same problem at roughly the same time, and it ended up with a different design with different trade-offs. The standard is virtio.

The original virtio paper, Rusty Russell's "virtio: Towards a De-Facto Standard For Virtual I/O Devices" [@rusty-virtio-paper], was published at OLS 2008, the same year Hyper-V shipped. The proposal was explicit in its motivation: every hypervisor was reinventing paravirt drivers, and a single hypervisor-independent specification could let one guest driver work everywhere. OASIS later standardised virtio 1.0 in 2016, then virtio 1.1 in 2019 [@oasis-virtio-1-1], then virtio 1.2 as a Committee Specification in 2023 [@oasis-virtio-1-2].

A hypervisor-independent paravirtual I/O specification, governed by OASIS. A virtio device is presented to the guest over a transport (PCI, MMIO, or s390 channel I/O) that advertises capability bits. The data plane is a generic ring layout called a **virtqueue**: a ring of descriptors, an `avail` ring (guest-to-host), and a `used` ring (host-to-guest). Each device class (virtio-net, virtio-blk, virtio-scsi, virtio-fs, virtio-gpu) defines its own message format on top of virtqueues.

The same shape, viewed sideways

Architecturally, virtio and VMBus are sibling answers to the same shaped problem.

flowchart LR subgraph virtio_pci["virtio over PCI"] gv["Guest virtio driver"] vq["virtqueue (descriptors + avail + used)"] host_be["Host backend (vhost-net, vhost-user, OpenVMM)"] gv -- "PIO doorbell write" --> host_be gv -- "shared memory" --- vq host_be -- "shared memory" --- vq host_be -- "MSI-X" --> gv end subgraph vmbus["Hyper-V VMBus"] gv2["Guest VSC"] ring["Two ring buffers + GPADL"] vsp["Host VSP (kernel)"] gv2 -- "Hypercall doorbell" --> vsp gv2 -- "shared memory" --- ring vsp -- "shared memory" --- ring vsp -- "SINT" --> gv2 end

Both:

Use shared-memory rings for payload.The phrase "shared-memory rings" hides a small subtlety: a ring buffer is a circular buffer with separate read and write indices. Producer and consumer can run concurrently as long as they only touch their own index, which is what makes ring buffers a wait-free communication primitive on cache-coherent hardware.
Use a doorbell for signalling.
Batch many requests per doorbell so per-message hypercall cost amortises.
Have per-class device protocols layered on top of a common transport.

The differences are where the world bites:

Dimension	VMBus	virtio (1.2)
Transport	Software-only "bus", channel offer/open/close	PCI, MMIO, s390 channel I/O
Doorbell	Hypercall (`HV_SIGNAL_EVENT`)	PIO write to a doorbell BAR
Reverse signal	Synthetic interrupt (SINT)	MSI-X
Standardisation	Microsoft-owned, Open Specification Promise [@ms-tlfs]	OASIS-ratified, multi-vendor
Windows in-box drivers	Yes, every supported version	No; out-of-box signed VirtIO INFs from cloud vendors
Device classes beyond I/O	Yes: KVP, time sync, VSS, balloon	Limited; non-I/O often built on virtio-vsock or out-of-band agents
Cross-hypervisor portability	Hyper-V only	Universal: KVM, QEMU, Cloud Hypervisor, Firecracker, Xen HVM, OpenVMM
Spec governance	Single vendor under OSP	Multi-vendor with formal conformance clauses
Source for Linux side	drivers/hv/ [@kernel-hyperv-index]	drivers/virtio in the Linux tree

Where each design wins

Virtio's strongest claim is portability. The same Linux guest VM image, with the same in-tree virtio drivers, runs on KVM, QEMU, Cloud Hypervisor, AWS Firecracker, and (since 2024) Microsoft's own OpenVMM, which added virtio backend support. A workload that has to move between cloud providers benefits from this directly: the guest does not need a different driver stack per host.

Virtio also has a richer multi-vendor governance story. The spec is OASIS-ratified, with explicit conformance clauses; multiple commercial hypervisors implement it; multiple SmartNIC vendors implement virtio data planes in hardware (the vDPA and VDUSE work, described by Red Hat [@redhat-vdpa] and the Linux kernel VDUSE doc [@kernel-vduse]).

VMBus's strongest claim is integration. Every supported Windows ships with the VSCs in-box; there is nothing for an admin to install. The transport carries not just I/O but a service catalogue: KVP for guest configuration, time sync, VSS for online backup, the heartbeat and shutdown channels. The TLFS, while owned by Microsoft, is published under the Open Specification Promise and is a single document a guest author can read end-to-end.This is why "VirtIO drivers for Windows" exist as a separate project (the Fedora/Red Hat-signed virtio-win package) for KVM clouds: out of the box, Windows does not know virtio. The Hyper-V world inverts the problem: out of the box, Linux does not need any third-party install because the drivers are upstream.

Where they coexist

The most interesting recent development is that the two camps have stopped being purely competitive. Microsoft's OpenVMM [@github-openvmm] implements both VMBus and virtio backends, so a Linux guest using virtio drivers can run on a Microsoft-developed VMM, and a Windows guest using VMBus drivers can run on the same VMM. This is partially ideological (Microsoft is no longer pretending its way is the only way) and partially pragmatic (a single VMM that supports both transports is simpler than maintaining two).

Beyond the protocol-level comparison, both VMBus and virtio sit inside a larger composition with hardware passthrough, where the transport becomes the slow path and a real PCIe device carries the steady-state traffic.

Hardware passthrough as a complement

The composition that runs almost every modern Azure VM is VMBus + SR-IOV, packaged as Accelerated Networking [@ms-accelerated-networking]. The same VM gets both a synthetic NIC (netvsc over VMBus) and an SR-IOV virtual function. The Linux netvsc driver documentation describes the failover mechanic: "If SR-IOV is enabled in both the vSwitch and the guest configuration, then the Virtual Function (VF) device is passed to the guest as a PCI device. In this case, both a synthetic (netvsc) and VF device are visible in the guest OS and both NIC's have the same MAC address. The VF is enslaved by netvsc device. The netvsc driver will transparently switch the data path to the VF when it is available and up." (Linux kernel: netvsc [@kernel-netvsc]).

When live migration starts, Azure revokes the VF, the data plane falls back to the netvsc/VMBus path, the VM moves, and a new VF on the destination host gets re-attached, all without dropping TCP connections. The VMBus path was never the production hot path, but its existence is what enables migration. The KVM world's analogue is vDPA, which gives a virtio-shaped guest interface backed by a hardware data plane.

A modern Azure NIC stack is pushing this even further. Azure Boost [@ms-azure-boost] moves both storage and networking data planes off the host CPU into dedicated FPGAs, with a stable Microsoft-engineered NIC interface called MANA [@ms-mana]. Microsoft's documentation reports up to 200 Gbps of network bandwidth and 6.6 million IOPS on local storage with this design, with the host's vmswitch still acting as the live-migration fallback path. The architectural insight is that the VMBus-based slow path is the durable invariant; what changes is whether the steady-state data plane is software, an SR-IOV VF, or a SmartNIC firmware path. Frameworks like DPDK [@dpdk-about] sit on top of whichever data plane the VM exposes.

What none of this changes is the property Section 7 cared about: as long as a host-side VSP exists and parses guest-controlled bytes in kernel C/C++, the bug class is open. The next section is about the architectural move that closes it.

9. OpenVMM and OpenHCL: the 2024 open-source pivot

In 2024, Microsoft did two things that would have been hard to imagine a decade earlier. First, they open-sourced OpenVMM [@github-openvmm], a Rust implementation of the virtualization stack including the VSPs and the VMBus protocol. Second, they introduced OpenHCL [@ms-openhcl-deep-explainer], a "paravisor" configuration of OpenVMM that runs inside a confidential VM as a higher-trust mediator between the workload and the (now-untrusted) host.

Both moves are explained by the same trend the article has been circling: confidential computing fundamentally inverts the trust boundary, and the device model has to follow.

A higher-privileged software layer that runs *inside* a guest VM (not on the host) and mediates the guest's interaction with the hypervisor. In the Hyper-V model, a paravisor lives in VTL2 of the same VM whose workload runs in VTL0; the host hypervisor is outside the VM's trust boundary. The paravisor presents the workload with a familiar VMBus + VSP interface while internally talking to a hardware-isolated confidential VM substrate (AMD SEV-SNP or Intel TDX).

What changed in confidential computing

The classical Hyper-V trust model places the root partition at the apex of trust. The guest trusts the host. Memory the guest writes is, in the worst case, readable by the host. In confidential computing, that is no longer acceptable. A regulated workload (a healthcare database, a financial processor) needs to run in a VM whose contents are protected even from a malicious or compromised hypervisor. AMD's SEV-SNP and Intel's TDX are CPU features that encrypt and integrity-protect VM memory in hardware so that a compromised host cannot read the guest's secrets.

Azure Confidential Computing [@ms-confidential-computing] made these capabilities available as a product starting around 2022. The Azure confidential VM options page [@ms-coco-vm-options] documents the SKUs.

This breaks the old VMBus story. In the classical model, the host's vmswitch.sys reads the guest's network packets out of the VMBus ring. In a confidential VM that protection demands you can no longer let the host see those bytes; that defeats the entire point. So the question becomes: where does the synthetic-device backend live, if not in the host?

The paravisor answer

The Linux kernel's Hyper-V CoCo VMs document [@kernel-coco] describes the design directly: "Paravisor mode. In this mode, a paravisor layer between the guest and the host provides some operations needed to run as a CoCo VM. The guest operating system can have fewer CoCo enlightenments than is required in the fully-enlightened case ... some aspects of CoCo VMs are handled by the Hyper-V paravisor while the guest OS must be enlightened for other aspects."

OpenHCL is that paravisor. It runs in a higher-trust virtual trust level inside the same confidential VM (VTL2), it has access to the encrypted-memory primitives the CPU provides, and it presents the workload (in VTL0) with the same VMBus + VSP world a non-confidential VM would see. The workload OS does not need to be heavily modified; it sees what looks like Hyper-V, talks to what look like normal VSPs, and never has to know that those VSPs are now inside its own VM rather than on the host.

flowchart TD HW["Confidential CPU (SEV-SNP / TDX)"] HV["Host hypervisor (untrusted by the workload)"] subgraph CoCoVM["Confidential VM (memory encrypted)"] VTL2["VTL2: OpenHCL paravisor (Rust VSPs)"] VTL0["VTL0: workload OS (Windows or Linux, lightly enlightened)"] VTL0 -- "VMBus, looks normal" --- VTL2 end HW --> HV HV --> CoCoVM HV -. "no access to guest plaintext" .-> CoCoVM

The Rust rewrite

The other half of the story is memory safety. Recall Section 7's CVE list: every headline Hyper-V escape in the past decade involved a parser bug in C/C++ kernel code. OpenVMM's choice to implement the entire VMM, including the VSPs, in Rust is a direct response to that history. Rust's ownership model rules out, by construction, a large class of memory-safety bugs (use-after-free, out-of-bounds access on slices, double-free) that produced those CVEs.

This does not magically eliminate every vulnerability. A logic bug in a state machine, an integer-overflow on a length field, a side-channel timing leak: all of these still exist in Rust. But the categories that produced CVE-2017-0075, CVE-2021-28476, and CVE-2024-21407 are exactly the categories Rust was designed to make hard.

Garbage-collected languages are wrong for a kernel-mode parser: GC pauses are unacceptable in a hypervisor-adjacent fast path, and you cannot afford a runtime that allocates memory during interrupt handling. Rust's compile-time memory safety with no GC is, today, the only mature option that gives you both the safety and the predictability a VSP needs. Microsoft's choice is consistent with the rest of the industry; comparable rewrites of low-level systems infrastructure (Cloudflare's `cf-cmd`, Mozilla's `quiche`, the Android Bluetooth stack) have all converged on Rust.

What you can actually look at

OpenVMM is not a press release; it is a public repository that ships:

The full Rust source tree at github.com/microsoft/openvmm [@github-openvmm].
A separate repository for the Linux kernel fork that the paravisor runs on top of, at github.com/microsoft/OHCL-Linux-Kernel [@github-ohcl-linux].
Project documentation centred at openvmm.dev [@openvmm-dev].
Both VMBus and virtio backends, so the same VMM can host Windows guests on VMBus and Linux guests on virtio.
Documentation through the deeper Microsoft Tech Community explainer [@ms-openhcl-deep-explainer] and the original announcement [@ms-openhcl-announce] describing the paravisor's role.

For a security researcher or a regulated-cloud customer, this is a meaningful change. For the first time, the VMBus + VSP stack is auditable end-to-end in source.

If you want to see how a VSP actually consumes a channel, the OpenVMM repository contains the Rust modules that implement the VMBus channel state machine. Cloning the repo and grepping for `Channel::open` and `RingBuffer` shows the same offer/open/close/rescind pattern Section 3 described, expressed in Rust types whose lifetimes the compiler checks. Reading the same logic in Rust after reading the Linux C version in `drivers/hv/channel_mgmt.c` is a useful exercise; the abstraction is identical, and the safety guarantees diverge.

What still has to be solved

The kernel CoCo doc is candid about an open architectural problem that OpenHCL alone cannot solve: "Unfortunately, there is no standardized enumeration of feature/functions that might be provided in the paravisor, and there is no standardized mechanism for a guest OS to query the paravisor for the feature/functions it provides. The understanding of what the paravisor provides is hard-coded in the guest OS." (Linux kernel: CoCo VMs [@kernel-coco]).

In other words, the TLFS gave us a portable contract between guests and Hyper-V hypervisors. The paravisor world does not yet have an equivalent portable contract between guests and paravisors. Today's guests have OpenHCL-specific knowledge baked in. A future "paravisor TLFS" would let any compliant paravisor host any compliant guest, the same way the original TLFS did for the hypervisor. That standard does not exist yet, and writing it is the most consequential open problem in this corner of the architecture.

The architecture is moving. Section 10 takes stock of what that means for engineers building or operating on this stack today.

10. Engineering takeaways and open problems

A working architecture is one where the trade-offs are visible. Hyper-V's enlightenments + VMBus + VSP/VSC stack is a working architecture in exactly that sense: every property it has, including the security ones, is a consequence of design choices a reader can name.

What the design optimises for

Three explicit optimisations:

In-box drivers for closed-source guests. Hardware virtualization handles privileged CPU instructions; the guest only needs to load a VMBus client driver to opt in to the fast path. Every supported Windows ships those drivers in-box. Every modern Linux ships them in-tree. There is no "install paravirt drivers" step, which is a large reason "it just works."
A single transport that carries everything. VMBus carries 12+ device classes plus non-device services (KVP, time sync, VSS, balloon, heartbeat). One protocol, one set of primitives, one debugging surface. This is the engineering equivalent of "everything is a file" applied to inter-partition communication.
Live migration. Because the data plane is software in the root partition, the VM is not bound to a specific host. The VSPs serialise their state during migration without guest cooperation. This is the property that makes VMBus the durable invariant under hardware-passthrough acceleration: SR-IOV gives you throughput; VMBus gives you mobility.

What it pays for those properties

Two costs:

The host CPU is on the data plane. A software ring serviced by vmswitch.sys cannot match a 100 GbE NIC's line rate per host CPU core. Microsoft's answer is hybrid composition with SR-IOV (Accelerated Networking [@ms-accelerated-networking]) and SmartNIC offload (Azure Boost + MANA [@ms-azure-boost]). The KVM analogue is vDPA [@redhat-vdpa]. Both of these accept the structural truth that for the highest throughputs, the host CPU has to leave the data plane.
The host kernel parses guest-controlled bytes. Section 7's CVE record is the catalogue of what that costs. The architectural answer is OpenHCL: move the parser into the guest's own trust boundary and rewrite it in Rust.

A four-property idealisation

It is useful to write down what an idealised paravirt I/O stack would do, so it is clear which properties any real stack today is trading away.

The four idealised properties:

Zero hypercalls per packet in steady state.
Live-migration parity with a software baseline.
Cross-vendor / cross-hypervisor portability of the guest driver.
No host-side memory-unsafe parser of guest-controlled data.

Approach	(1) Zero hypercall	(2) Live migration	(3) Portability	(4) No unsafe host parser
VMBus + in-kernel VSP	partial (batched)	yes	no	no
virtio + vhost-net	partial (batched)	yes	yes	no
SR-IOV / DDA	yes	no	no	yes
Accelerated Networking (VMBus + SR-IOV)	yes (steady)	yes	no	no
vDPA	yes	partial	yes	no
OpenHCL paravisor + VMBus	partial	yes	partial	yes
Azure Boost + MANA	yes	yes	no	partial

No single approach today matches all four properties. The Hyper-V production composition is roughly (VMBus baseline) + (Accelerated Networking for throughput) + (OpenHCL for confidential workloads). The KVM-world composition is (virtio baseline) + (vDPA / SmartNIC for throughput). SmartNIC-based stacks (Azure Boost, AWS Nitro, Google's offload) approach the same four-corner problem from yet another angle.

This is a synthesis, not a single-source claim: the matrix combines properties documented separately in the Microsoft Accelerated Networking docs [@ms-accelerated-networking], the Linux kernel CoCo doc [@kernel-coco], the Discrete Device Assignment doc [@ms-dda], the SR-IOV overview [@ms-sriov-overview], the Linux netvsc driver doc [@kernel-netvsc], the VDUSE userspace interface [@kernel-vduse], the vPCI doc [@kernel-vpci], and the OpenHCL explainer [@ms-openhcl-deep-explainer]. Each individual cell is sourced; the ranking is the author's reading of those sources.

Practical pitfalls for operators

A few things the customer-facing docs do not always say plainly:

vmbusrhid is not low-risk. The keyboard/mouse channel is a kernel-level RPC surface from guest to root. Treat it the same way you would treat netvsc when modelling threat exposure.
Generation-2 VMs reduce attack surface. Choosing Generation-2 for new workloads removes the legacy IDE/PS/2/PIC emulators from the host data path entirely (Microsoft Learn: Gen 1 vs Gen 2 [@ms-gen1-gen2-vms]).
Mixing in-box and out-of-band Integration Services breaks things. Modern Windows and modern Linux already have the drivers; installing the legacy LIS package on top can break MSI-X handling and PCI passthrough (Linux kernel: overview [@kernel-hyperv-overview]).
DDA is not SR-IOV. Discrete Device Assignment covers any PCIe device passthrough, but Microsoft formally supports only GPUs and NVMe as device classes (Microsoft Learn: DDA planning [@ms-dda]).
Confidential VMs do not have the same device set. Hardware constraints reduce or alter the device classes available; always validate the specific synthetic devices your workload depends on are present in the target SKU (Linux kernel: CoCo [@kernel-coco]).

Note: 1. Confidential VM (SEV-SNP / TDX)? Use the OpenHCL paravisor mode (Azure CoCo VM options [@ms-coco-vm-options]). 2. Need ≥40 Gbps with live migration? Use Accelerated Networking; on Boost-enabled SKUs, Boost adds another tier of offload. 3. Need ≥100 Gbps and accept binding to host? Use Discrete Device Assignment / SR-IOV. 4. Maximum guest portability across hypervisors? Use virtio; for bandwidth-sensitive workloads, vDPA. 5. Default Hyper-V workload, broad device coverage, native migration? VMBus + VSP (the default).

Open problems worth watching

The substantive open problems are:

A standardised paravisor feature-enumeration interface. OpenHCL is the first auditable paravisor, but there is no portable contract a guest can use to query "what does this paravisor support." The TLFS gave us this for hypervisors; the paravisor analogue is missing (Linux kernel: CoCo [@kernel-coco]).
Confidential-VM-friendly live migration with paravirt devices. Hardware-attested state cannot be cloned trivially; today's pragmatic answer is to constrain migration in CoCo VMs. A general solution is open.
A formal model of the VMBus offer/rescind state machine. The kernel docs describe it narratively. A model that the VSP code could be checked against would let static analysis rule out the bug class behind the headline CVEs.
Live-migrating stateful SR-IOV VFs without device cooperation. Vendor proposals exist; an industry standard does not.
Erasing memory-unsafety in legacy VSPs. The Rust rewrite path in OpenVMM is correct; the multi-year engineering effort to convert every existing VSP is real. CVE-2024-21407 is recent enough to remind everyone the bug class is still producing fresh entries.

What to remember in five years

The most important sentence in this article is one I have been quietly preparing throughout: the durable architectural invariant in Hyper-V is shared-memory ring + doorbell, with a published guest-side contract. Everything else, including the choice of programming language for the VSP, the question of whether the data plane is software or hardware, and even whether the trust boundary places the VSP on the host or in a paravisor, is implementation. The transport is the invariant. That is the lesson the next decade of CoCo VMs and SmartNIC offload is converging toward: keep the contract stable, and let everything else change.

FAQ

No. The drivers (`hv_vmbus`, `hv_netvsc`, `hv_storvsc`, `hv_utils`, `pci-hyperv`, `hv_balloon`) have been in the upstream Linux kernel since 2.6.32 in December 2009 and ship in every mainstream distribution. The legacy LIS package is a holdover from the era before in-tree support and can in fact break MSI-X handling and PCI passthrough if installed on top of a modern kernel (Linux kernel: Hyper-V overview [@kernel-hyperv-overview]). Because the trust gradient is asymmetric. The VSP runs in the root partition's kernel, the most privileged context on the box; the VSC runs in a normal guest kernel. Bytes flowing from guest to host get parsed by code with full system privilege. A VSC bug typically harms only the guest; a VSP bug can be a cross-tenant compromise. The pattern is visible in the CVE record: CVE-2017-0075 [@nvd-cve-2017-0075], CVE-2021-28476 [@nvd-cve-2021-28476], and CVE-2024-21407 [@nvd-cve-2024-21407] all hit host-side parsers. For live migration. SR-IOV gives you near-bare-metal throughput but binds the VM to a specific physical NIC; you cannot migrate that state. Keeping a VMBus-backed `netvsc` device in the same guest gives the hypervisor a software path it can fall back to during migration windows. The Linux kernel netvsc doc describes this failover explicitly: when SR-IOV is enabled, the VF is enslaved by netvsc and the data path switches transparently when the VF is up (Linux kernel: netvsc [@kernel-netvsc]). OpenHCL is a *configuration* of OpenVMM, not a separate codebase. OpenVMM is the Rust virtualization stack at github.com/microsoft/openvmm [@github-openvmm]; OpenHCL is OpenVMM run as a paravisor inside a confidential VM's higher-trust virtual trust level (VTL2), so that the synthetic-device backends sit inside the guest's own trust boundary rather than on a host the guest cannot trust. The same Rust code can run as a host-side VMM (when paired with a hypervisor on the host) or as an in-guest paravisor (when running inside a SEV-SNP or TDX VM). Both directions exist with caveats. OpenVMM, when used as a host VMM, supports both VMBus and virtio backends, so a Linux virtio guest can run on a Microsoft-developed VMM (github.com/microsoft/openvmm [@github-openvmm]). Native Hyper-V on a Windows Server host historically expects VMBus-driven guests; there is no in-box virtio device emulation on a stock Hyper-V Server. KVM hosts can technically present a VMBus-shaped device, but in practice the production answer on KVM is virtio. Generation-2 VMs use UEFI with Secure Boot, boot from synthetic SCSI, and have no emulated IDE, PS/2, or PIC in the data path (Microsoft Learn: Gen 1 vs Gen 2 [@ms-gen1-gen2-vms]). Every emulator that is removed is one fewer parser running in the most privileged kernel on the host, so the host-side attack surface is meaningfully smaller. Generation-1 still exists for legacy guests that only know how to boot from BIOS + IDE. VBS uses the Hyper-V hypervisor to split a single Windows install into VTL0 (the normal kernel and apps) and VTL1 (the Secure Kernel and trustlets like `lsaiso.exe`). The hypervisor enforces that VTL0 cannot read or modify VTL1's memory, even with kernel privileges. So an attacker who already has SYSTEM-level code execution in the normal world cannot trivially extract LSASS secrets or load arbitrary unsigned kernel code; the hypervisor stops them. This works on any modern Windows machine with the right CPU features, regardless of whether you ever run a VM yourself (Microsoft Learn: Windows Server 2016 What's New [@ms-server-2016]).

Windows Sandbox vs Windows Defender Application Guard: Two Hyper-V Sandboxes, Different Threat Models

noreply@paragmali.com (Parag Mali) — Thu, 14 May 2026 00:00:00 GMT

**Two Windows features, the same plumbing, opposite fates.** Windows Sandbox (2019) and Windows Defender Application Guard (2017) both spin up a Hyper-V child partition using the Host Compute Service (HCS) API [@learn-microsoft-com-hcs-overview] on top of the same Virtualization-Based Security [@learn-microsoft-com-oem-vbs] substrate. Sandbox is disposable, on-demand, and aimed at running an untrusted executable. WDAG was persistent, automatic, and aimed at rendering an untrusted website inside Microsoft Edge. WDAG was deprecated for Edge [@learn-microsoft-com-application-guard] and then removed entirely in Windows 11 24H2 [@learn-microsoft-com-guard-overview]; Sandbox still ships. The reason is not that one model was wrong -- it is that operational economics and threat models diverged. This article explains the shared substrate, the architectural differences, the deprecation story, and what replaced WDAG in the 2026 Windows isolation stack.

1. The Hyper-V Isolation Layer Both Features Share

Open two Microsoft Learn pages side by side -- the Windows Sandbox architecture page [@learn-microsoft-com-sandbox-architecture] and the Microsoft Defender Application Guard overview [@learn-microsoft-com-guard-overview] -- and the descriptions almost rhyme. The MDAG overview opens by calling its container an "isolated Hyper-V-enabled container"; the Sandbox architecture page describes its guest as a dynamically generated, "kernel-isolated" Windows image that the host can destroy. Two pages, two teams, the same noun and the same verb. The two features are siblings: built by overlapping teams, on top of the exact same compute substrate, and shipped roughly two years apart.

That substrate is named, and worth naming carefully, because the rest of the article rests on it.

The Windows API that creates, starts, configures, queries, and destroys Hyper-V "compute systems" -- the umbrella term Microsoft uses for either a virtual machine or a container with its own kernel. The HCS reference docs [@learn-microsoft-com-reference-apioverview] list functions like `HcsCreateComputeSystem`, `HcsStartComputeSystem`, and `HcsGetComputeSystemProperties` -- a kernel32-shaped surface for child partitions, parameterized by JSON.

When Microsoft first documented HCS publicly, it framed it as a kernel32-equivalent for child partitions. The HCS overview [@learn-microsoft-com-hcs-overview] describes the API in terms of compute systems (the umbrella term for either a virtual machine or a container with its own kernel), with "configurations and properties... stored in a JSON file which will then be passed through the HCS APIs to create the compute system." The API reference [@learn-microsoft-com-reference-apioverview] notes that the DLL "exports a set of C-style Windows API functions, using JSON schema as configuration"; Microsoft's hcsshim repository [@github-com-microsoft-hcsshim] provides the Go binding used by Moby and containerd. The framing matters. HCS is not a sandbox. It is the mechanism a sandbox feature uses to ask Hyper-V for "a kernel of my own, isolated from yours, please." What the consumer does with that kernel -- what it boots, what it shares, when it tears it down -- is the threat model. That is where Sandbox and WDAG diverge.

Underneath HCS sits Hyper-V itself, and underneath Hyper-V sits Virtualization-Based Security (VBS).

A Windows feature that runs the normal Windows kernel as a Hyper-V guest, then uses a second, smaller secure kernel running at a higher Virtual Trust Level (VTL1) to isolate keys, policies, and code-integrity decisions from the main kernel. The OEM VBS guidance [@learn-microsoft-com-oem-vbs] documents the boot path: the hypervisor launches first, then hosts the NT kernel in a child partition.

VBS itself does not run untrusted user code. Its job is to give Windows a hypervisor that ships in the box on every supported SKU and that is anchored by Secure Boot. Once that hypervisor exists, the cost of "spawn a second Windows guest just for this task" stops being "boot a full VM (minutes, gigabytes)" and starts being "ask HCS for a child partition (seconds, hundreds of megabytes)." Hyper-V on Windows [@learn-microsoft-com-windows-about] is the desktop face of that hypervisor; Hyper-V isolation containers [@learn-microsoft-com-hyperv-container] on Windows Server are the server face. Sandbox and WDAG are two desktop-flavored consumers of the same pipe.

flowchart TD HW[Hardware: VT-x/AMD-V, IOMMU, TPM] HV[Hyper-V hypervisor] Root[Root partition: host NT kernel] HCS[Host Compute Service API] WS[Windows Sandbox child partition] WDAG[WDAG child partition] HVC[Hyper-V Windows Container] HW --> HV HV --> Root Root --> HCS HCS --> WS HCS --> WDAG HCS --> HVC

Notice the diagram's most important detail: the host NT kernel is also a Hyper-V guest. It runs in the root partition, which has full physical-device access; the sandbox and WDAG partitions run as L1 guests with no physical-device access at all. The Hyper-V VM boundary -- the membrane between root and L1 -- is what Microsoft commits to defending. That commitment is published: the Microsoft Security Servicing Criteria for Windows [@microsoft-com-servicing-criteria] names the Hyper-V VM as a serviced security boundary, and the Hyper-V Bounty Program [@microsoft-com-hyper-v] pays up to $250,000 for a guest-to-host escape. The boundary has a dollar sign attached to it, which is how you know it counts.

Key idea: The Hyper-V VM boundary is the serviced security boundary. Windows Sandbox and WDAG ride on top of that boundary; neither product is the boundary. Bugs in the host-side broker, the clipboard channel, or the policy engine are not Hyper-V escapes -- they are integration bugs in features that happen to run inside a VM.

This distinction matters operationally. When the Windows Sandbox configuration docs [@learn-microsoft-com-wsb-file] warn that enabling vGPU "can potentially increase the attack surface of the sandbox," they are talking about widening a brokered channel from the L1 guest into the root partition's display stack. That channel runs through the VM boundary; abusing it does not require breaking Hyper-V. The same logic applies to mapped folders, the clipboard, networking, and the Edge-window remoting path WDAG used. Each is a deliberate hole in the boundary, made for a reason, with its own threat model.

With the substrate established, the next question is what Sandbox does on top of it. The answer turns on a single observation: the host already has a Windows image. Why download another one?

2. Windows Sandbox: The Disposable Desktop

Windows Sandbox shipped in Windows 10 1903 [@learn-microsoft-com-sandbox-install], the May 2019 update, as an optional Windows feature named Containers-DisposableClientVM -- the install page documents both the version requirement and the exact PowerShell feature name. The naming is loud about the design goal. A container (rather than a VM you manage with vmconnect), disposable (state is destroyed on close), for client workloads (interactive desktop apps, not server workloads). Microsoft's original Windows Sandbox announcement on the Kernel Internals blog [@web-archive-org-p-301849] (preserved on the Internet Archive after Microsoft retired the canonical URL) frames the use cases plainly: trying out an installer, opening a suspicious download, testing an executable from email -- the same use cases the current Sandbox overview [@learn-microsoft-com-sandbox-overview] still enumerates.

The trick that makes this feel cheap is the dynamic base image.

The disk template Windows Sandbox uses to boot its guest. According to the Windows Sandbox architecture page [@learn-microsoft-com-sandbox-architecture], the on-disk package is approximately 30 MB compressed and expands to about 500 MB when installed. The expansion is mostly *pointers* back to the host's own immutable OS files; pristine private copies exist only for files the guest may legitimately mutate.

A traditional VM ships a 4-8 GB VHDX with its own copy of Windows. The dynamic base image inverts that. Read-only files in the host -- ntdll.dll, the bulk of System32, the side-by-side cache -- are reflected into the guest by reference. The guest sees a complete Windows install at boot. The host barely paid for it.

Memory uses an even sharper trick: direct map.

A Hyper-V memory-sharing optimization, described on the Windows Sandbox architecture page [@learn-microsoft-com-sandbox-architecture], in which immutable physical pages are shared read-only between the host and the sandbox guest. When the guest loads `ntdll.dll`, the same physical RAM that already holds `ntdll.dll` on the host is mapped into the guest's address space rather than duplicated.

The page is read-only from both sides, so a guest exploit cannot scribble into the host's copy. The win is memory pressure: dozens of megabytes for the guest's mutable state instead of hundreds of megabytes for a fresh Windows kernel image. The same architecture page describes the memory model directly: "containers collaborate with the host to dynamically determine how host resources are allocated. This method is similar to how processes normally compete for memory on the host... If the host is under memory pressure, it can reclaim memory from the container much like it would with a process." A sandbox that does little uses little, and a sandbox that does a lot pulls pages from the host the same way a heavy host process would.Direct map is conceptually the same trick Linux uses to share glibc between processes -- the same physical pages of a read-only binary backing many virtual address spaces. The Windows Sandbox application is to use that primitive across a Hyper-V partition boundary, not just across processes inside one kernel.

Graphics use a third mechanism. A vGPU runs the guest's display through WDDM 2.5+, sharing the host GPU much like another process on the host would. When the host GPU lacks a compatible WDDM driver, the guest falls back to WARP, Microsoft's CPU-backed Direct3D rasterizer. The same Sandbox architecture page [@learn-microsoft-com-sandbox-architecture] cited above documents both branches: "a system with a compatible GPU and graphics drivers (WDDM 2.5 or newer) is required. Incompatible systems render apps in Windows Sandbox with Microsoft's CPU-based rendering technology, Windows Advanced Rasterization Platform (WARP)." That fallback is slow but means Sandbox starts on essentially any Pro+ Windows 10/11 install, not only on machines with modern discrete GPUs.

sequenceDiagram participant U as User participant Host as Host (root partition) participant HCS as HCS API participant G as Sandbox guest U->>Host: Launch WindowsSandbox.exe (with optional .wsb) Host->>HCS: HcsCreateComputeSystem(JSON) HCS->>G: Allocate partition, mount dynamic base image HCS->>G: Direct-map immutable host pages HCS->>G: Start guest (boot stripped NT, run LogonCommand) G-->>Host: RDP-style remoting of desktop U->>G: Drop .exe, run, observe U->>Host: Close Sandbox window Host->>HCS: HcsTerminateComputeSystem HCS->>G: Destroy partition, discard all writable state

What the user configures is the small set of seams between guest and host. Configuration lives in a .wsb file, an XML document the user double-clicks to launch a customized sandbox. The Windows Sandbox configuration page [@learn-microsoft-com-wsb-file] enumerates every supported element: <vGPU>, <Networking>, <MappedFolders>, <LogonCommand>, <MemoryInMB>, <AudioInput>, <VideoInput>, <ProtectedClient>, <PrinterRedirection>, <ClipboardRedirection>. Each line is a knob on the boundary.

A minimal hostile-binary harness disables almost every shared channel:

{// Produce a Windows Sandbox configuration that minimizes shared channels. // Run this in Node, save the output as triage.wsb, then double-click it. const sample = "C:\\\\Users\\\\analyst\\\\Samples\\\\suspicious.exe"; const wsb = [ "<Configuration>", " <VGpu>Disable</VGpu>", " <Networking>Disable</Networking>", " <AudioInput>Disable</AudioInput>", " <VideoInput>Disable</VideoInput>", " <PrinterRedirection>Disable</PrinterRedirection>", " <ClipboardRedirection>Disable</ClipboardRedirection>", " <ProtectedClient>Enable</ProtectedClient>", " <MappedFolders>", " <MappedFolder>", " <HostFolder>C:\\\\Users\\\\analyst\\\\Samples</HostFolder>", " <SandboxFolder>C:\\\\Samples</SandboxFolder>", " <ReadOnly>true</ReadOnly>", " </MappedFolder>", " </MappedFolders>", "</Configuration>" ].join("\\n"); console.log(wsb); console.log("\\nSample path inside guest: " + sample.replace("C:\\\\Users\\\\analyst", "C:"));}

Note: The default .wsb-less Windows Sandbox launch enables networking and clipboard redirection. The configuration docs [@learn-microsoft-com-wsb-file] warn that enabled networking "could expose untrusted applications to the internal network." For malware triage, the explicit <Networking>Disable</Networking> form above is the right starting point.

A few cost-of-doing-business limits are worth flagging up front. Only one Sandbox can run at a time, per the Sandbox overview [@learn-microsoft-com-sandbox-overview], which also restricts the feature to Pro/Enterprise/Education SKUs ("Windows Sandbox is currently not supported on Windows Home edition") and inherits the VBS-capable-hardware requirement from the OEM VBS guidance [@learn-microsoft-com-oem-vbs]. And while in-sandbox state survives a shutdown /r inside the guest (a Windows 11 22H2 refinement called out in the same overview page), state still dies when the host UI window closes -- "disposable" is the contract, not a marketing word.

A Containers-DisposableClientVM partition, then, is essentially a Hyper-V container with a desktop bolted on, a shared OS image, a few configurable channels, and a strict "destroy on close" lifecycle. WDAG, the older sibling, took the same building blocks and arranged them around a completely different question: not "run this executable once" but "render this website transparently."

3. WDAG: The Persistent Browser Container

Windows Defender Application Guard shipped in Windows 10 1709 -- the Fall Creators Update [@learn-microsoft-com-build-16299] (Microsoft's UWP what's-new page identifies "Windows 10 build 16299 (also known as the Fall Creators Update or version 1709)"), with a GA start date of October 17, 2017 [@learn-microsoft-com-and-education] per the Microsoft Lifecycle page. The WDAG install guide [@learn-microsoft-com-app-guard] still lists standalone-mode support starting at "Windows 10 Enterprise edition, version 1709 and later." At launch it integrated with the legacy EdgeHTML-based Edge; later it integrated with Chromium-based Edge for Business and, beginning in 2020, with Microsoft 365 Apps for Enterprise to wrap untrusted Office documents -- the MDAG overview [@learn-microsoft-com-guard-overview] explicitly enumerates the file types: "Application Guard helps prevent untrusted Word, PowerPoint, and Excel files from accessing trusted resources." The product was rebranded from "Windows Defender Application Guard" to "Microsoft Defender Application Guard" along the way -- the isolatedapplauncher.h header notes [@learn-microsoft-com-api-isolatedapplauncher] state directly that "Windows Defender Application Guard (WDAG) is now Microsoft Defender Application Guard (MDAG). The WDAG name is deprecated, but it is still used in some APIs." This article uses WDAG for continuity.

The use case was specific and corporate: in a managed enterprise, an employee may need to follow a customer's link, open an inbound attachment, or read a marketing site -- content that originates outside the organization's network perimeter. WDAG's job was to render that content in a partition that cannot talk to the intranet, so that a renderer exploit chained to a kernel LPE inside the guest could not pivot to corporate file shares, Active Directory, or the user's own home directory.

The trust boundary was policy-defined.

The Group Policy mechanism Windows uses to enumerate "what counts as inside the enterprise network," documented under Configure Microsoft Defender Application Guard [@learn-microsoft-com-app-guard-2]. Administrators populate Enterprise Network Domains, Cloud Resources, Internal Proxies, and IPv4/IPv6 Subnets. Anything inside the list is rendered in the host browser; anything outside is reissued inside the WDAG container.

This is the property Windows Sandbox does not have. Sandbox is a manual tool; WDAG was transparent. Click an external link in Outlook, and a separate Edge window opens hosting the rendered page -- the user did not have to know about, configure, or invoke a sandbox. The host-side broker forwarded the navigation; the in-container Edge rendered the page; an RDP-family remoting path relayed pixels back to the host UI. Microsoft has not publicly named the inner protocol, but its visible behavior -- per-window remoting of a single app onto the host desktop -- is the signature of the Remote Applications Integrated Locally (RAIL) virtual channel specified in [MS-RDPERP], the RDP extension that "presents a remote application... as a local user application." The implementation may differ in detail; the architectural shape is the same.

Where Sandbox is "destroy on close," WDAG was "warm at logon."

sequenceDiagram participant U as User participant Logon as Winlogon participant Broker as Host WDAG broker participant Cont as WDAG container participant Edge as Container Edge Logon->>Broker: User signs in Broker->>Cont: HCS pre-warm container (pruned image) Cont->>Edge: Boot, idle, await navigation U->>Broker: Click https://external.example Broker->>Broker: Network Isolation policy check (out of zone) Broker->>Edge: Reissue URL inside container Edge-->>Broker: Render via RDP-family remoting (RAIL-shaped) Broker-->>U: Display container window on host desktop

The configuration surface was correspondingly broader than Sandbox's .wsb. Group Policy settings controlled clipboard direction (upload, download, both, neither), file print to host, microphone and camera access, hardware acceleration, and -- most operationally consequential -- whether downloads escape the container at all. The Microsoft Edge WDAG configuration guidance [@learn-microsoft-com-application-guard] listed knobs like ApplicationGuardUploadBlockingEnabled and ApplicationGuardPassiveModeEnabled for security-versus-usability tuning.

WDAG also exposed a small detection API for applications running inside it, the IsolatedAppLauncher COM interface [@learn-microsoft-com-api-isolatedapplauncher]. The methods IsProcessInWDAGContainer and IsProcessInIsolatedContainer let an app know "am I in the guest?" -- useful for, say, disabling drivers that cannot work inside the partition. The header carries the deprecation notice today.The detection-from-inside API is itself a useful tell about WDAG's threat model. If the guest needed to know it was the guest, that means software running inside it could legitimately do guest-specific things -- license activation, optional installers, telemetry. WDAG was meant to host fully functional applications, not a stripped renderer. That breadth is part of what made its operational cost real.

What it improved over the AppContainer-wrapped Edge of 2016 was substantial. A renderer-to-kernel exploit in the container's stripped Windows guest landed in a throwaway kernel that had no view of the host filesystem, no route to the corporate intranet, no host clipboard write path, and no persistence across reboot. The kernel attack surface of the partition is, in the limit, the Hyper-V VM boundary. That is a serviced Microsoft boundary; an AppContainer-wrapped Edge is not.

So why was WDAG retired and Sandbox kept? Because in the WDAG threat model, the "warm partition for every employee, all day long, on every device" became a tax that the in-browser sandbox could finally outrun.

4. The Lifecycle Divergence

The clearest way to see why two siblings on the same substrate diverged is to put their lifecycles side by side. Sandbox boots on user gesture, runs as long as one window stays open, dies on close. WDAG booted at user logon, stayed resident the entire session, and discarded state only when the user signed out (or, for the persistent variant, never). The same Hyper-V partition primitive, allocated at very different points in the day, for very different durations, was being asked to solve very different problems.

The point at which a security primitive's instance lifetime is anchored. Sandbox is *gesture-bound*: an instance exists because a user explicitly launched it. WDAG was *session-bound*: an instance existed because a user signed in. AppContainer is *process-bound*: an instance exists because a sandboxed binary is running. Each binding implies a different cost model and a different set of failure modes.

Gesture binding is cheap in expectation. Most users open Sandbox seldom -- when a particular file looks suspicious, when an installer wants admin rights for unclear reasons, when an analyst is reproducing a sample. The cost of the partition is paid on demand and only by the user who needed it. The Containers-DisposableClientVM feature in the Sandbox install docs [@learn-microsoft-com-sandbox-install] ships an option: enabled but unused, it costs nothing beyond the dynamic base image on disk.

Session binding has the opposite cost profile. Every employee, on every device, pays for a Hyper-V child partition at every logon, whether or not they ever browse an untrusted URL. The partition holds memory. The host-side broker holds memory. The pre-warmed Edge holds memory. The Network Isolation policy must be evaluated on every navigation. The clipboard, downloads, and print paths require ongoing brokering. Even on a workstation that idles all day in front of an intranet portal, WDAG was a tax line item.

Key idea: A serviced security boundary is cheap when it is allocated by gesture and expensive when it is allocated by session. WDAG bet that the per-session cost would amortize against frequent untrusted browsing; that bet failed for most enterprise users because the marginal extra protection over an in-browser sandbox (Edge ESM + ACG + CET) was small for the typical navigation, even though it remained large for the worst-case navigation.

The threat models also point in opposite directions. Sandbox optimizes for a single, contained interaction with an untrusted artifact: drop the .exe, run it, watch what happens, close the window. Anything the artifact does inside the partition -- registry writes, scheduled-task creation, persistence attempts -- is annihilated when the window closes. WDAG was optimizing for the opposite: many small, transparent interactions with untrusted content (clicks, navigations, document opens), all of which had to feel seamless or the user would route around them.

gantt title Sandbox vs WDAG lifecycle on a typical 8-hour workday dateFormat HH:mm axisFormat %H:%M section Windows Sandbox Idle (no cost) :done, ws1, 09:00, 11:00 Analyst opens suspicious.exe :crit, ws2, 11:00, 11:15 Idle (no cost) :done, ws3, 11:15, 16:00 Test installer :crit, ws4, 16:00, 16:10 Idle (no cost) :done, ws5, 16:10, 17:00 section WDAG (legacy) Pre-warmed partition resident :crit, wd1, 09:00, 17:00

The lifecycle asymmetry also asymmetrically rewards engineering effort. Hardening an in-browser renderer pays off thousands of times per session -- every page load benefits. Hardening a per-employee Hyper-V partition pays off only on the navigations that actually leave the trusted zone, which for most users is a small fraction of clicks. As the in-browser side became dramatically stronger between 2018 and 2023 -- Site Isolation, V8 sandboxing, Arbitrary Code Guard (ACG), Control-flow Enforcement Technology (CET), the new Edge Enhanced Security Mode -- the marginal value of WDAG dropped while its operational cost did not.

That set up the deprecation decision.

5. The Deprecation Decision

The retirement happened in two visible steps and a long invisible runway.

The first visible step was the Edge-side deprecation, announced in 2023. The Microsoft Edge and Microsoft Defender Application Guard [@learn-microsoft-com-application-guard] page now opens with the banner: "Microsoft Defender Application Guard, including the Windows Isolated App Launcher APIs, is deprecated for Microsoft Edge for Business and will no longer be updated." The same page makes the operational substitute explicit: "The additional security features in Edge make it very secure without needing Application Guard," then enumerates Defender SmartScreen, Enhanced Security Mode, website typo protection, and Data Loss Prevention as the replacement set for the WDAG-for-Edge scenario.

The second visible step was the OS-level removal in Windows 11 version 24H2. The MDAG overview banner [@learn-microsoft-com-guard-overview] is unambiguous: "Starting with Windows 11, version 24H2, Microsoft Defender Application Guard, including the Windows Isolated App Launcher APIs, is no longer available." The isolatedapplauncher.h API reference [@learn-microsoft-com-api-isolatedapplauncher] carries the matching deprecation notice for the COM surface. Code paths that called IIsolatedAppLauncher on a 24H2 box now hit a removed feature; the API names remain in the documentation as historical record.

Note: The Edge-side deprecation and the OS-side removal in Windows 11 24H2 are often conflated in coverage. They are different. The Edge-side deprecation stopped updates and the New-Tab-Page entry point; existing fleets on earlier Windows versions retained the underlying feature. The 24H2 removal pulled the kernel-mode plumbing -- the Sandbox install page [@learn-microsoft-com-sandbox-install] even calls out a side-effect: "Beginning in Windows 11, version 24H2, inbox store apps like Calculator, Photos, Notepad and Terminal are not available inside Windows Sandbox," because the underlying app-isolation broker was reworked as part of the same cleanup.

The invisible runway behind those two banners is the more interesting story. Microsoft's Security Servicing Criteria for Windows [@microsoft-com-servicing-criteria] page names "Hyper-V VM" as a serviced security boundary but does not name WDAG or Application Guard. WDAG was always a feature that used the Hyper-V VM boundary; it was never the boundary itself. A bug in the WDAG broker, the Network Isolation policy evaluator, the clipboard channel, or the host-side window remoting was an integration bug, not a Hyper-V escape, and so was never going to attract the kind of bounty payout -- "up to $250,000 USD" per the Hyper-V Bounty Program [@microsoft-com-hyper-v] -- that the underlying boundary attracts. The economic shape of WDAG's bug surface always favored deprecation: it was a complex, brokered feature whose worst plausible CVE was a privilege-escalation inside Edge, not a guest-to-host RCE.

Key idea: WDAG's deprecation was overdetermined. (1) The session-bound cost model could not be amortized for most users. (2) The in-browser mitigations (ACG, CET, Edge ESM, Site Isolation) closed the marginal-security gap on the typical navigation. (3) The integration-bug class -- broker, clipboard, policy -- was never going to be on Microsoft's serviced security boundary list. Each reason alone could have justified retirement; all three together made it inevitable.

What was kept is just as telling. The Hyper-V VM boundary stayed; the bounty program stayed; the HCS API stayed; Windows Sandbox stayed; Hyper-V isolation containers on Windows Server stayed. Microsoft did not retire any of the things that made WDAG technically possible -- only the specific arrangement of those things into a session-bound, transparent, browser-targeted feature. The mechanism was kept; the productization was retired.

6. The WDAG Replacement Stack

There is no single replacement for WDAG. There is a stack of complementary features, each of which absorbs a slice of WDAG's old job. The Edge-deprecation page enumerates the substitute set; what follows pulls each item back to its primary source.

6.1 Edge Enhanced Security Mode (in-browser sandbox)

WDAG-for-Edge is now Microsoft Edge Enhanced Security Mode (ESM). The browse-safer page [@learn-microsoft-com-browse-safer] is explicit about what changed: "Enhanced security mode in Microsoft Edge mitigates memory-related vulnerabilities by disabling just-in-time (JIT) JavaScript compilation and enabling additional operating system protections for the browser. These protections include Hardware-enforced Stack Protection and Arbitrary Code Guard (ACG)."

Functionally, ESM gives up the Hyper-V partition boundary and replaces it with a set of process-mitigation policies that make the in-browser sandbox materially harder to escape. The renderer still runs on the host kernel, but with JIT disabled (closing a large class of write-then-execute primitives), CET enforcing shadow stacks, and ACG blocking dynamic code generation. For unfamiliar sites only, the browser flips into this mode automatically; familiar sites keep JIT on for performance.The trade is explicit on the page: "Developers should be aware that the WebAssembly (WASM) interpreter running in enhanced security mode might not yield the expected level of performance." ESM is consciously slower in exchange for a smaller attack surface, and it concedes that this trade is only worth making on a subset of navigations. WDAG made the same concession at the partition level; ESM makes it at the mitigation level.

6.2 Smart App Control (OS-level binary trust)

For downloads -- the other half of "untrusted content reaches the user" -- the replacement is Smart App Control [@learn-microsoft-com-control-overview]. The Microsoft Learn page describes it as "an app execution control feature that combines Microsoft's app intelligence services and Windows' code integrity features to protect users from untrusted or potentially dangerous code." The Windows 11 Security Book [@learn-microsoft-com-driver-control] clarifies the mechanism: Smart App Control "blocks untrusted or unsigned applications" by predicting safety from a cloud intelligence service, and "blocks unknown script files and macros from the web."

The replacement logic is direct. WDAG protected the host kernel from a malicious download by running the download inside a Hyper-V partition. Smart App Control protects the host kernel from a malicious download by not running it at all unless app intelligence predicts it is safe or it is signed by a trusted CA. The first approach contains the blast radius after execution; the second prevents execution altogether. For the common case -- a user clicking through a suspicious-looking installer -- prevention strictly dominates containment.

6.3 Defender for Endpoint network protection (policy-defined trust boundary)

For the policy-defined "enterprise vs not-enterprise" boundary that the Network Isolation policy used to draw for WDAG, the replacement is Defender for Endpoint's network protection [@learn-microsoft-com-network-protection] feature. The page describes it as "expand[ing] the scope of Microsoft Defender SmartScreen to block all outbound HTTP(S) traffic that attempts to connect to poor-reputation sources (based on the domain or hostname)," operating at the OS level. The same page is precise about which processes it covers on Windows: it enforces against non-Microsoft browsers and non-browser processes (for example, PowerShell), and its own Note states that "on Windows, network protection doesn't monitor Microsoft Edge. For processes other than Microsoft Edge and Internet Explorer, web protection scenarios use network protection for inspection and enforcement." For Microsoft Edge on Windows, the same reputational source feed (SmartScreen) operates inside the browser; Network Protection is the cross-process extension of that feed to the other process classes.

The shift is from isolate, then permit anything inside the isolated zone (WDAG) to enforce a reputational/IOC-based block list at the host network stack (Defender). The replacement gives up the partition boundary in exchange for matching coverage across every process on the device, not just Edge.

6.4 Office Protected View and Office SmartScreen (per-document admission)

The Office slice of WDAG -- the variant that wrapped untrusted Word, PowerPoint, and Excel files in a Hyper-V partition -- was always a layered feature on top of an older, cheaper primitive: Office Protected View [@support-microsoft-com-8e43-2bbcdbcb6653]. The Microsoft Support page describes the primitive directly: "files from these potentially unsafe locations are opened as read only or in Protected View. By using Protected View, you can read a file, see its contents and enable editing while reducing the risks." The page also calls out the WDAG layering explicitly: "If your machine has Application Guard for Microsoft 365 enabled, documents that previously opened in Protected View will now open in Application Guard for Microsoft 365" -- which is exactly the slice that 24H2 removed. Without WDAG, Protected View is the residual primitive: documents from the internet, untrusted senders, or unsafe locations open read-only in a stripped-down Office process until the user opts in to editing.

Around Protected View sits a second admission layer: SmartScreen-derived reputation checks on the artifact itself. The Microsoft 365 Apps internet-macros guidance [@learn-microsoft-com-macros-blocked] sets the policy directly: "VBA macros are a common way for malicious actors to gain access to deploy malware and ransomware. Therefore, to help improve security in Office, we're changing the default behavior of Office applications to block macros in files from the internet." The page describes how Office uses the Mark of the Web -- the same signal SmartScreen uses for binaries -- to decide whether macros in a given document are admitted. Where the WDAG-for-Office configuration would have re-rendered the document inside a Hyper-V partition regardless of macro content, the 2026 replacement turns the question off: macros in internet-origin documents simply do not run.

Together, the two features are the document analog of Smart App Control + Defender network protection: a read-only fallback for the artifact itself (Protected View) and a reputation-driven admission policy for its riskiest payload (macros). Neither replaces a partition; the union covers WDAG's Office slice at the cost of giving up the kernel boundary around the document.

The stack is not a one-for-one swap. Each feature trades the Hyper-V VM boundary for something cheaper to operate. The aggregate covers the WDAG threat model at typical navigation cost; for the unusual case where an enterprise still wants a hard kernel boundary around an untrusted workload, Microsoft's recommended fallback [@learn-microsoft-com-application-guard] is explicit: "If your organization requires container-based isolation, we recommend Windows Sandbox or Azure Virtual Desktop (AVD)." The Hyper-V partition is still there. It just is no longer running every employee's browser, all day, by default.

7. Why Sandbox Survives

If the deprecation reasoning above is right, the natural question is why the same substrate, allocated by the same HCS API, in the same kind of Hyper-V partition, survived as Windows Sandbox. The answer is that everything that made WDAG expensive maps to an operational advantage in Sandbox.

The lifecycle is gesture-bound, not session-bound. The cost is paid by the user who explicitly asked for it, when they asked for it, and not before. The Sandbox install page [@learn-microsoft-com-sandbox-install] ships the feature disabled by default; turning it on costs the dynamic base image on disk and nothing on memory until launch.

The substitute set is empty. For the "run an untrusted executable once" threat model, no in-process mitigation suite plays the role ESM plays for browsing. There is no in-AppContainer answer to "I want to detonate suspicious.exe and observe it"; Smart App Control prevents execution rather than containing it; AppLocker policies refuse the run rather than sandboxing it. The use case Sandbox fills is what is left over when prevention fails or is operationally unacceptable, and there is no cheaper way to fill it.

Key idea: The "run an untrusted executable" threat model lacks a cheaper substitute, so Sandbox's Hyper-V partition cost is the floor. WDAG's "render an untrusted website" threat model gained a cheaper substitute (Edge ESM), so WDAG's same Hyper-V partition cost stopped being the floor and became the ceiling -- and was retired.

The integration surface is also far smaller. Sandbox exposes one launcher binary, one configuration file format, no policy engine, no clipboard direction policies (just on/off), no network zoning, no pre-warmed worker process, no host-side broker for an embedded browser. The host-side attack surface is correspondingly thin: a .wsb parser, an HCS caller, a window-host process. Each WDAG-style integration bug class -- network-isolation evaluator, browser broker, document-routing logic -- has no analog.

A useful contrast: the isolatedapplauncher.h [@learn-microsoft-com-api-isolatedapplauncher] page exposes IIsolatedAppLauncher, IIsolatedProcessLauncher, IsProcessInIsolatedContainer, and IsProcessInWDAGContainer. That is the application surface of a feature that hosts third-party processes and lets them know they are inside it. Sandbox has nothing comparable, because no third-party application is supposed to ship with "behave differently inside Sandbox" logic. The guest is a fresh Windows install, the artifact is whatever the user dropped in, and the host does not need to expose an "am I inside?" predicate.

Sandbox also benefits from the negative space of the deprecation. The replacement stack (Edge ESM, Smart App Control, Defender network protection) collectively pushes the unbearable workloads off the Hyper-V partition: the always-on browser, the heavyweight Office document host, the network-zone enforcer. What is left for Sandbox is the relatively small set of workloads where partition isolation is the right answer: malware triage, installer testing, one-off compatibility checks, isolated developer environments. The feature is now used closer to its design intent than it was in the WDAG era.This is the rare case where deprecating a sibling makes a feature more aligned with its purpose. Before WDAG retirement, the question "should this be in Sandbox or WDAG?" had a complicated answer involving who you were and what you were doing. After 24H2, the question collapses: if you want a partition, you mean Sandbox.

8. The 2026 Isolation Stack

With WDAG gone, the post-24H2 Windows isolation taxonomy is best read as four orthogonal primitives, each answering a different question about an untrusted workload. None alone substitutes for WDAG's combined function; together they cover the ground WDAG used to. The tiers below are numbered for reference; the numbering does not imply a single linear cost/strength axis. Process mitigations live inside AppContainer-wrapped processes (they are not "below" AppContainer on a cost axis, they are within it), and admission primitives (Smart App Control, Defender network protection) are an orthogonal family that decides whether a binary or destination is allowed at all -- a question that runs before capability containment, not above or below it.

A mechanism that answers a specific question about how an untrusted workload is constrained. The four families that the 2026 Windows stack composes are: (1) *process-internal mitigations* that limit what a corrupted-memory exploit can do (ACG, CET, Edge ESM); (2) *capability sandboxes* that limit what a process can name and reach (AppContainer); (3) *admission policies* that decide whether a binary may run or a destination may be contacted at all (Smart App Control, Defender network protection); and (4) *kernel-partition boundaries* that put an entire second NT kernel between the workload and the user's data (Hyper-V VM). Each family has its own cost shape; only the kernel-partition boundary appears on the Microsoft Servicing Criteria for Windows [@microsoft-com-servicing-criteria] as a serviced security boundary.

Tier 1: Process mitigations (Edge ESM, ACG, CET)

The cheapest family runs the workload on the host kernel under a stricter set of process-level controls. Edge Enhanced Security Mode is the visible UI; the underlying primitives are the Hardware-enforced Stack Protection [@techcommunity-microsoft-com-p-2163340] (CET) and Arbitrary Code Guard [@learn-microsoft-com-protection-reference] (ACG) referenced from the browse-safer page [@learn-microsoft-com-browse-safer]. These mitigations apply inside whatever container the binary already runs in -- an AppContainer-wrapped Edge renderer, or a non-AppContainer process -- and limit the privileges a corrupted-memory exploit can give itself. They are sufficient for the vast majority of untrusted navigations after the JIT-disabling trade is made.

Tier 2: AppContainer (capability-bound sandbox)

The capability sandbox is the AppContainer [@learn-microsoft-com-appcontainer-isolation] primitive. The MSDN page enumerates the isolation slices under six section headings -- Credential isolation, Device isolation, File isolation, Network isolation, Process isolation, and Window isolation -- all enforced at the OS, none requiring a partition. AppContainer is the wrapper around Edge's renderer, around UWP apps, around modern app-isolation packages. It is the cheap container that scales to every process on the device. Inside an AppContainer, Tier 1 mitigations still apply; AppContainer constrains what the process can name, Tier 1 constrains what a corrupted process can do. They are complementary, not stacked.

Tier 3: Smart App Control + Defender network protection (policy at the OS edge)

Adjacent to the process tiers, the policy tier decides what binaries and what destinations are allowed at all. Smart App Control governs what runs; Defender network protection governs what the device talks to. These are not isolation primitives in the strict sense -- they are admission primitives. They turn off the question before the partition has to answer it.

Tier 4: Hyper-V VM (Windows Sandbox, Windows Server Hyper-V isolation containers)

At the top of the cost curve sits the Hyper-V VM boundary -- the only tier whose worst-case escape pays the $250,000 bounty [@microsoft-com-hyper-v]. Windows Sandbox is the desktop face; Hyper-V isolation containers [@learn-microsoft-com-hyperv-container] on Windows Server ("each container runs inside of a highly optimized virtual machine and effectively gets its own kernel") are the server face. The 2026 stack uses this tier sparingly, on user gesture, for workloads where the cheaper tiers are not enough.

flowchart TD T1[Tier 1: Process mitigations
ACG, CET, Edge ESM] T2[Tier 2: AppContainer
Capability-bound process sandbox] T3[Tier 3: Policy admission
Smart App Control + Defender network protection] T4[Tier 4: Hyper-V VM
Windows Sandbox / HV isolation containers] T1 ~~~ T2 ~~~ T3 ~~~ T4 T1 -.->|untrusted browsing| ESM2[Edge ESM] T2 -.->|modern apps| UWP[UWP / packaged apps] T3 -.->|admit/deny| SAC2[SAC + DNP] T4 -.->|on-demand detonation| WS2[Windows Sandbox]

The taxonomy is layered without being strictly linear. Read it as a four-question pipeline rather than a four-rung ladder. Tier 3 (Smart App Control + Defender network protection) decides what runs and what the device talks to -- admission, not containment. Whatever Tier 3 admits then lands inside Tier 2 (AppContainer), which constrains the binary's capabilities -- file, network, device, window, credential, and process scope. Inside that AppContainer-wrapped process, Tier 1 (ACG, CET, Edge ESM) applies process-internal mitigations that limit what a corrupted-memory exploit can do. Tier 4 (Hyper-V VM) is the last-resort kernel boundary: when the workload is hostile enough that the host kernel itself must be assumed reachable from the renderer, Windows Sandbox or a Hyper-V isolation container puts an entire NT kernel between the artifact and the user's data. WDAG used to plug a session-bound, transparent variant of Tier 4 underneath every Edge navigation; once Tier 1 hardening (ACG, CET, ESM) closed the marginal-security gap on the typical navigation, that session-bound Tier 4 paid more than it saved, and was deleted.

9. Engineering Takeaways

A few rules of thumb fall out of the architectural comparison.

Use the lowest tier whose boundary you actually need. If your threat is "this binary may try to escape the renderer," AppContainer + process mitigations suffice. If your threat is "this binary may try to escape the kernel," you need a Hyper-V partition; that means Sandbox or, for production, an isolation container. The Microsoft Servicing Criteria for Windows [@microsoft-com-servicing-criteria] lists which boundaries Microsoft commits to defending; anything not on that list is best treated as a depth-in-defense layer, not a primary boundary.

Disable shared channels first; renegotiate them only when forced. The default .wsb-less Sandbox enables networking and clipboard redirection -- per the configuration docs [@learn-microsoft-com-wsb-file], enabled networking "can expose untrusted applications to the internal network." For malware triage, build the paranoid .wsb from the snippet in section 2 and only loosen it when a specific analysis step requires it.

{` // A naive but useful decision tree. Print the recommended tier for a workload. function pickTier(workload) { const w = workload.toLowerCase(); if (w.includes("untrusted exe") || w.includes("malware")) { return "Tier 4: Windows Sandbox (on-demand, gesture-bound)"; } if (w.includes("untrusted website") || w.includes("unfamiliar site")) { return "Tier 1: Edge Enhanced Security Mode + Tier 3: Defender network protection"; } if (w.includes("untrusted document") || w.includes("office")) { return "Tier 3: Smart App Control + Office Protected View"; } if (w.includes("third-party app") || w.includes("uwp")) { return "Tier 2: AppContainer"; } return "Default: ship the workload under Tier 2 AppContainer; escalate if the threat model justifies it."; }

[ "Detonate untrusted exe from email", "Open an unfamiliar site for research", "Open an untrusted Office document", "Run a third-party packaged app" ].forEach(w => console.log(w + " -> " + pickTier(w))); `}

Avoid features whose own boundary is not a serviced boundary. A useful litmus test is to read the Microsoft Security Servicing Criteria for Windows [@microsoft-com-servicing-criteria] and ask whether the feature's own claimed isolation appears there. WDAG didn't; Hyper-V VM did. Designs that rely on a non-serviced isolation are more brittle, both technically (no bounty pressure on the boundary) and operationally (no commitment to fix integration bugs out-of-band).

Plan for deprecation of integration features, not of primitives. Microsoft retired WDAG; it did not retire HCS, Hyper-V isolation containers, or AppContainer. The primitives outlast the productizations. A codebase that depends on the primitives (calling HCS directly, wrapping a workload in AppContainer) is more durable than one that depends on a packaged feature (calling IIsolatedAppLauncher, relying on Network Isolation policy semantics) whose lifecycle is set by product economics.

Note: Workloads that traverse a Sandbox boundary leave specific telemetry. The host event log records Containers-DisposableClientVM start/stop events; HCS partition allocations are visible to ETW; mapped-folder access from inside the guest crosses a brokered channel that surfaces in host file-system filters. If an incident response playbook expects to see "Hyper-V partition created" or "container-disposable-client-vm started," those are the canonical signals from the substrate, not from Sandbox-specific telemetry.

10. Open Problems and Future Direction

The 2026 stack is good. It is not finished. Several open problems remain visible at the seams.

Transparent, gesture-priced isolation. WDAG's session-bound model failed; Sandbox's gesture-bound model is too explicit for everyday use. There is no current Windows feature that combines (a) automatic, policy-driven launch ("this URL is outside the trust zone, isolate it"), (b) Sandbox-style on-demand allocation, and (c) sub-second cold start. Each pair of those three is achievable -- WDAG had (a) and (b) but not (c); Sandbox has (b) and (c) but not (a); ESM has (a) and (c) but gives up the partition boundary. Closing all three simultaneously is the standing open problem.

Hardware-rooted attestation of the partition. Today's Hyper-V partition is anchored by VBS, which is anchored by Secure Boot, which is anchored by the platform firmware. Microsoft Pluton [@learn-microsoft-com-security-processor] -- "a secure crypto-processor built into the CPU... designed to provide the functionality of the Trusted Platform Module (TPM) and deliver other security functionality beyond what is possible with the TPM 2.0 specification" -- raises the floor on what a guest can attest about, opening a path to confidential-VM-style guarantees on the client. The shape of that integration with Sandbox is not yet public.

Confidential client VMs. On the server, Azure confidential VMs [@learn-microsoft-com-vm-overview] provide a "hardware-enforced boundary between your application and the virtualization stack" via AMD SEV-SNP and Intel TDX, with "secure key release with cryptographic binding between the platform's successful attestation and the VM's encryption keys." Whether that boundary -- guest memory unreadable by the host kernel -- ever shows up under client Windows Sandbox or Hyper-V is an open architectural question. If it does, it changes the Hyper-V VM threat model: a malicious host (or compromised host kernel) could no longer read guest memory, which would close a category of risk that currently sits outside the bounty scope.

AI-agent action containment. WDAG's specific shape -- a session-bound partition that transparently absorbed risky actions on behalf of a user -- is suggestive of an emerging problem: containing the actions of AI agents that take tool-using steps inside a user's session. Today's stack does not have a feature shaped quite like this. Sandbox is too explicit; AppContainer is too process-bound; ESM is browser-only; Smart App Control is admit/deny, not contain/observe. An "AI-agent action sandbox" would need WDAG's transparency without WDAG's resident-per-employee cost. The architectural question is whether the lessons of WDAG's retirement should make the next attempt look like Sandbox-with-a-policy-trigger or like AppContainer-with-a-stronger-boundary.

Shared-microarchitectural-state side channels. The Hyper-V VM boundary is a logical boundary. The CPU caches, branch predictors, and prefetch units are still shared with the host. Spectre-class side channels survive partition boundaries and survive confidential-VM boundaries; the canonical Meltdown/Spectre disclosure [@meltdownattack-com] frames the primitive as a class of transient-execution attacks against the shared microarchitectural state itself, and Microsoft's KB4072698 guidance for speculative-execution side-channel vulnerabilities [@support-microsoft-com-b632-0d96f30c8c8e] catalogs a long succession of advisories (ADV180002, ADV180012, ADV180018 L1TF, ADV190013 MDS, ADV220002 MMIO Stale Data, CVE-2022-23825 Branch Type Confusion, CVE-2022-0001 Branch History Injection, CVE-2023-20569 AMD Return Address Predictor) that each required Hyper-V-host mitigation to keep the partition boundary effective. Mitigations close known variants at the cost of performance, but the underlying primitive -- one core, two security domains -- does not change. The ideal sandbox -- sub-second cold start, single-digit-MB resident overhead, transparent policy launch, and a partition boundary that closes all microarchitectural channels -- remains unachievable on shared silicon. This is not a Windows-specific problem; it is the lower bound on what any client-side sandbox can deliver.

Key idea: The 2026 isolation stack is the right shape for the threats it was designed against: a malicious binary, an unfamiliar website, an untrusted document, a third-party app. It is not yet shaped for the threats that will dominate 2027 and beyond: confidential client compute, agent action containment, hardware-rooted attestation. Watching where the next partition-shaped primitive appears -- or fails to -- is how the architecture will continue to evolve.

Windows Sandbox and WDAG are the cleanest natural experiment Windows has run on Hyper-V isolation as a product. Same substrate, same partition primitive, same bounty-protected boundary; opposite lifecycle bindings, opposite threat models, opposite outcomes. The substrate survives because the substrate is the boundary; the productizations come and go because they are bets on how to spend that boundary's budget. WDAG bet on session-bound transparency and lost to a cheaper process-mitigation stack; Sandbox bet on gesture-bound disposability and remains the right answer for the workload it was designed for. The story is less about Hyper-V and more about lifecycle: when you allocate isolation matters as much as how much isolation you allocate.

Above Ring Zero: How the Windows Hypervisor Became a Security Primitive

noreply@paragmali.com (Parag Mali) — Sun, 10 May 2026 00:00:00 GMT

**The Windows hypervisor is the program that loaded before Windows did.** It runs at a privilege level the Windows kernel cannot reach and owns the page tables that decide which memory the Windows kernel may even see. Virtualization-Based Security, Credential Guard, HVCI (Memory Integrity in Windows Security), Application Control, VBS Enclaves, and System Guard Secure Launch are all built by composing five primitives the hypervisor exposes -- partitions, hypercalls, intercepts, SynIC, and per-VTL SLAT. The substrate is real, alive, and producing two to four public CVEs per year; the residual attack surface (firmware below, side channels above, IOMMU bypass beside, hypervisor rollback) is where Windows security still earns its hardest miles.

1. Above Ring Zero

On a Windows 11 machine with VBS turned on, a kernel-mode driver running with full Ring-0 privilege cannot read a single byte of the LSASS process's credential cache. It cannot load an unsigned driver. It cannot patch ntoskrnl.exe. It cannot disable HVCI without a reboot. None of this is enforced by Windows. It is enforced by a different program -- one that loaded before Windows did, that runs at a privilege level the Windows kernel cannot reach, and that owns the page tables that say which memory the Windows kernel may even see. That program is the Windows hypervisor [@ms-hyperv-architecture, @ms-tlfs-vsm].

The intuition this fact violates is older than most readers' careers. "SYSTEM owns the box." Every introductory security course teaches it. Local administrator escalates to SYSTEM, SYSTEM loads a driver, the driver runs in the kernel, and the kernel can do anything to the machine. That model is correct for a Windows installation running without Virtualization-Based Security. It is wrong, in three specific and load-bearing ways, for a Windows installation that has VBS turned on.

A Windows security architecture that uses the Hyper-V hypervisor to create a small, isolated execution environment alongside the normal Windows operating system. The hypervisor allocates a portion of memory, configures its second-level page tables to make that memory unreadable and unwritable from normal kernel mode, and runs Microsoft-signed code there -- the Secure Kernel and isolated user-mode trustlets -- that the regular NT kernel cannot reach. Credential Guard, HVCI, Application Control, and System Guard all sit on top of this primitive [@ms-tlfs-vsm].

The binary in question is named hvix64.exe on Intel hosts and hvax64.exe on AMD hosts.Loose security writing sometimes calls the hypervisor's privilege level "Ring -1." That phrase is colloquial. Intel's manuals say "VMX root operation"; AMD's manuals say "SVM host mode." Both terms denote a CPU operating mode that sits architecturally outside the four-ring privilege stack the guest OS sees, not a fifth ring inside it. It is loaded by hvloader.efi before winload.exe ever runs. By the time the Windows boot manager hands control to the NT kernel, the hypervisor has already configured the CPU's virtualization extensions, allocated its own private memory, taken ownership of the IOMMU, and set up the per-partition second-level page tables that decide which physical pages each partition can see [@ms-tlfs-pdf]. From the NT kernel's point of view, the machine starts up already inside a guest partition. There is no escape upward.

This article is about the program that loaded first. The siblings in this series -- on the Secure Kernel, on Credential Guard and NTLMless, on Secure Boot, and on Adminless -- all assume what this article explains. Each of them describes a policy: the Secure Kernel enforces code integrity; Credential Guard isolates LSASS; Adminless raises the bar on local administrator. None of those policies would be enforceable without a piece of software running at a privilege level the policy's adversary cannot reach. The hypervisor is that piece of software, and "security primitive" is how Microsoft, the security research community, and the bug-bounty market all describe its current role.

By the end of this article you will know five things. First, why the hypervisor became a security primitive -- the architectural failure of Ring-0 defenses that Microsoft fought for a decade and finally gave up on in 2015. Second, how it became one, in three steps: Popek and Goldberg's 1974 virtualizability theorem; Intel VT-x and AMD-V in 2005-2006; and David Hepkin and Arun Kishan's 2013 patent on hierarchical Virtual Trust Levels [@us9430642b2-patent]. Third, what it enforces, feature by feature, with the hypervisor primitive that backs each: HVCI rides on per-VTL SLAT; Credential Guard rides on SynIC plus the secure-call ABI; System Guard Secure Launch rides on DRTM [@ms-system-guard-secure-launch]. Fourth, where it has actually failed in public -- six worked CVEs across three distinct attack classes, all narrowly localized. Fifth, what is structurally outside its mandate: firmware below the hypervisor, microarchitectural side channels above it, IOMMU bypass beside it, and hypervisor rollback through the update pipeline.

The story is half engineering and half conceptual inversion. How did a server-consolidation hypervisor that shipped in 2008 with Windows Server 2008 -- a product whose original marketing pitch was "run more VMs per box" -- become the architectural substrate that protects every load-bearing Windows security boundary in 2026? The answer begins in 1974, with a paper that defined what a hypervisor even is. But the political and engineering thread begins five years before that, in San Mateo, California.

2. Origins -- Connectix to Viridian to Hyper-V

Microsoft entered the virtualization market three years late and by acquisition. On February 19, 2003, the company bought Connectix, a small San Mateo software house founded in 1988 that had built Virtual PC for Macintosh and, later, Virtual PC for Windows. The Connectix engineers became the nucleus of what Microsoft would internally call the Windows Server Virtualization team. The acquired products shipped as Microsoft Virtual PC 2004 and Microsoft Virtual Server 2005. Both were Type-2 hypervisors -- user-mode applications that ran on top of Windows, using software techniques rather than CPU virtualization extensions, because the CPU virtualization extensions did not yet exist on shipping x86 hardware.

A hypervisor that runs directly on hardware rather than as an application on top of a host operating system. The hypervisor owns the CPU, the second-level page tables, and (in the security-relevant case) the IOMMU; guest operating systems run at a lower privilege level, in partitions or virtual machines that the hypervisor schedules and isolates. IBM's CP-67/CMS in 1968 is the genre's origin; VMware ESX, Xen, and the Microsoft hypervisor (`hvix64.exe`/`hvax64.exe`) are the modern examples [@wp-hypervisor].

In 2005, the team began a new project under the codename "Viridian." The goal was a Type-1 micro-kernelized hypervisor for x86-64 -- a fresh build, not a derivative of Virtual Server -- that required hardware virtualization extensions at install time. Intel's VT-x had shipped in November 2005 with the Pentium 4 662/672; AMD-V had shipped on May 23, 2006 with the Socket AM2 platform, initially available across Athlon 64 X2 and Athlon 64 FX and select Athlon 64 models. Both were now broadly enough deployed that Microsoft could make hardware virtualization a system requirement rather than a configuration option. Three years later, on June 26, 2008 (Wikipedia's body text gives this date; the infobox states June 28), Hyper-V reached RTM and was delivered as a Windows Server 2008 feature through Windows Update [@wp-hyperv].Microsoft ships two hypervisor binaries: hvix64.exe for Intel hosts (using VT-x) and hvax64.exe for AMD hosts (using AMD-V). The instruction-set-architecture divergence is real -- Intel uses vmcall to enter the hypervisor; AMD uses vmmcall -- but the hypercall ABI surface above that single instruction is identical, so the rest of the Microsoft hypervisor codebase is shared between the two binaries.

The 2008 design choices are worth naming individually because the ones that mattered for server consolidation turned out, twelve years later, to also be the ones that mattered for security. Three deserve flagging:

Micro-kernelized architecture. The hypervisor binary contains only the minimum machinery needed to virtualize the CPU, schedule VMs, and enforce memory isolation. It does not contain device drivers. It does not contain a network stack. It does not contain a filesystem.
Root partition plus child partitions. From the Microsoft architecture documentation: "The Microsoft hypervisor must have at least one parent, or root, partition, running Windows. The virtualization management stack runs in the parent partition and has direct access to hardware devices. The root partition then creates the child partitions which host the guest operating systems" [@ms-hyperv-architecture]. The root partition is a full Windows install; the child partitions are guest VMs.
VMBus, VSP, and VSC. Inter-partition I/O happens over the VMBus -- a paravirtualized message channel. A Virtualization Service Provider (VSP) runs in the root partition and owns the real device; a Virtualization Service Client (VSC) runs in each child partition and talks to the VSP over VMBus. Device emulation lives in the root partition's user-mode and kernel-mode code, not in the hypervisor binary itself. This is the choice that, twelve years later, kept the hypervisor's Trusted Computing Base small enough to be defensible.

flowchart TD subgraph Root["Root partition (Windows Server)"] RD["Real device drivers"] VSP["Virtualization Service Providers"] VMM["VM Worker Processes (vmwp.exe)"] end subgraph Child1["Child partition 1 (guest OS)"] VSC1["Virtualization Service Clients"] Guest1["Guest kernel + apps"] end subgraph Child2["Child partition 2 (guest OS)"] VSC2["Virtualization Service Clients"] Guest2["Guest kernel + apps"] end HV["Microsoft Hypervisor (hvix64.exe / hvax64.exe)"] HW["Hardware (CPU, RAM, NIC, disk)"] Root -. VMBus .- Child1 Root -. VMBus .- Child2 Root --> HV Child1 --> HV Child2 --> HV HV --> HW

The micro-kernel, root-plus-child, and VMBus choices were defensible server engineering. Their server engineering rationale was that emulating a NIC, or a SCSI controller, or a graphics adapter inside a hypervisor binary would balloon the binary's size, lock its code-review cycles to those of every device the company shipped, and force the same security-critical code that scheduled CPUs to also handle Ethernet frame parsing. Putting device emulation in a normal Windows process inside the root partition -- the VM Worker Process vmwp.exe -- meant the hypervisor binary could stay small enough to reason about.

The 2008 design goal was, again, server consolidation. Microsoft's positioning materials at the time named "run more VMs per box, get better hardware use" as the customer pitch. Nothing in the 2008 Hyper-V documentation describes the hypervisor as a security primitive for the host OS. The security re-purposing -- the moment Hyper-V's hardware-privilege isolation became the way Windows itself protected its own kernel from itself -- did not arrive until 2015. To understand why it arrived at all, we have to back up thirty-four years to a 1974 paper that defined what virtualization formally requires.

3. The Theoretical Anchor -- Popek, Goldberg, and SLAT

Before Microsoft could build a hypervisor that ran security-critical code at a higher privilege than the Windows kernel, two unrelated decisions had to land. One was made in 1974, by two researchers who would never see Windows. The other was made in 2005, by Intel.

In July 1974, Gerald Popek of UCLA and Robert Goldberg of Harvard published "Formal Requirements for Virtualizable Third Generation Architectures" in Communications of the ACM. The paper laid down three properties any "true" virtual machine monitor must satisfy:

Equivalence. Programs run on the VMM exhibit behavior essentially identical to behavior on the bare machine, except for differences due to timing and resource availability.
Resource control. The VMM, not the guest, controls the system resources -- CPU time slices, memory, devices.
Efficiency. A statistically dominant subset of the instruction stream executes directly on hardware, without VMM intervention.

The theorem that gave the paper its lasting reputation followed from those properties. Let a sensitive instruction be one that either reads or modifies privileged state (the processor's mode bits, page-table base register, interrupt mask). Let a privileged instruction be one that traps when executed in user mode. Then a sufficient condition for an ISA to be virtualizable is that every sensitive instruction is privileged. The intuition is simple: the VMM must get a chance to see -- and to handle -- every guest action that touches the machine's privileged state. If the CPU silently lets the guest do something privileged-feeling without trapping, the VMM cannot maintain equivalence and control simultaneously.

A property of a processor architecture: every sensitive instruction in the instruction set is privileged. An architecture with this property can be virtualized "classically" -- with a thin trap-and-emulate hypervisor whose only entry points are the traps the CPU raises on privileged-instruction violations. An architecture without this property requires software workarounds (binary translation, paravirtualization) or hardware extensions (VT-x, AMD-V) before a Popek-Goldberg-style VMM can be built.

For three decades, x86 was famously not virtualizable in the Popek-Goldberg sense. John Robin and Cynthia Irvine enumerated the problem in their 2000 USENIX Security paper: seventeen protected-mode instructions on the IA-32 architecture either read or modified privileged state without trapping from user mode.The Robin and Irvine enumeration includes instructions like SGDT (store global descriptor table register), SIDT (store interrupt descriptor table register), SLDT (store local descriptor table register), SMSW (store machine status word), and PUSHF/POPF (push/pop flags including IOPL). Each of these silently returned or accepted privileged state from user mode without raising a fault. The aggregate effect was that no classical Popek-Goldberg VMM could correctly virtualize an unmodified x86 guest -- every one of those seventeen instructions was a hole the VMM could not see through. VMware Workstation, released in 1999 by VMware Inc. (which had been founded the year prior by Mendel Rosenblum, Diane Greene, Scott Devine, Ellen Wang, and Edouard Bugnion), worked around the problem with binary translation: it dynamically rewrote each protected-mode guest instruction stream to substitute or trap the seventeen offenders. The technique imposed double-digit overhead, made debugging miserable, and was a security liability in its own right -- the binary translator itself was a parser of arbitrary attacker-controlled code.

Intel and AMD ended the problem in hardware. Intel VT-x (codename Vanderpool, November 2005) and AMD-V (codename Pacifica, May 2006) added a new CPU mode -- VMX root operation for Intel, SVM host mode for AMD -- and a new instruction-emulation mechanism. A VM exit could be configured to fire on every sensitive instruction the hypervisor wished to intercept, transferring control to the host with a structured exit reason and an opaque, host-controlled snapshot of guest state. After 2006, x86-64 became Popek-Goldberg-virtualizable in hardware [@wp-x86-virtualization].

sequenceDiagram participant Guest as Guest OS (VMX non-root) participant CPU as CPU hardware participant HV as Hypervisor (VMX root) Guest->>CPU: MOV CR3, rax (sensitive instr) CPU->>HV: VM-EXIT (reason 28: CR access) HV->>HV: Read VMCS exit-qualification HV->>HV: Validate, emulate, update SLAT HV->>CPU: VMRESUME CPU->>Guest: Continue guest at next instruction

One architectural element more was needed before any of this could be a security primitive rather than just a virtualization primitive. Classical x86 paging maps a guest virtual address to a physical address through a single CPU-walked page table. In a virtualized system that single table cannot be enough, because the guest needs its own virtual-to-physical map and the host needs to remap the guest's "physical" address to a real machine-physical address. The first generations of VT-x simulated this two-level mapping in software through shadow page tables, which the hypervisor had to maintain alongside the guest's tables on every page-table edit. Shadow paging was correct but slow, and it gave the hypervisor no clean way to enforce a different memory map for different parts of the same guest.

Second-Level Address Translation (SLAT) -- Intel's Extended Page Tables (EPT, shipped with Nehalem in November 2008) and AMD's Nested Page Tables (NPT, shipped with the Barcelona-generation Opteron on September 10, 2007) -- solved both problems in hardware. The guest walks its own page table from virtual to "guest physical"; the CPU then walks a second, hypervisor-owned page table from "guest physical" to "system physical." Two key properties follow. First, the hypervisor has exclusive control of the second-level mapping; the guest cannot read, write, or even know that it exists. Second, because the second-level mapping is per-partition, the hypervisor can give two partitions different views of the same machine physical memory -- the same page can be readable in one partition and entirely absent in another.

A hardware feature on Intel (EPT) and AMD (NPT) CPUs that lets the hypervisor maintain a second page table mapping guest-physical addresses to system-physical addresses. The CPU walks the guest's own page table for the virtual-to-guest-physical mapping, then walks the hypervisor's table for the guest-physical-to-system-physical mapping. Because the second table is hypervisor-controlled and per-partition, the hypervisor can give different partitions -- and, in VBS, different Virtual Trust Levels inside the same partition -- different views of physical memory. SLAT is the bedrock of VTL memory protection [@ms-tlfs-pdf].

Hyper-V required VT-x or AMD-V at install time from day one. SLAT became mandatory with Windows Server 2016 and Windows 10 1607 [@ms-hyperv-architecture].

Popek and Goldberg gave us the property. Intel and AMD gave us the hardware. Microsoft used both to build a server hypervisor in 2008. But for the first seven years of Hyper-V's life, none of that machinery protected Windows from itself. Microsoft hadn't yet noticed the architectural problem that made it necessary -- or rather, they had noticed the problem (PatchGuard's bypass record was public) and had not yet conceded that the problem was structural. The concession came in 2015. What forced it was the same-privilege paradox.

4. The Same-Privilege Paradox -- Why PatchGuard Was Never Enough

PatchGuard, which Microsoft shipped in 2005 with Windows Server 2003 SP1 x64, ran inside ntoskrnl.exe at Ring 0 and scanned a curated list of kernel structures -- the system service dispatch table, the interrupt descriptor table, the kernel image's .text section -- at randomized intervals to detect tampering. It was bypassed within months by Skywing's Uninformed writeups. Microsoft kept shipping it. Researchers kept bypassing it. The pattern lasted a decade. The reason is not that PatchGuard's authors were sloppy [@wp-kpp]. The reason is structural, and naming it correctly is the first of the three insights this article is built around.

Key idea: Any defense reachable by mov from Ring 0 is defeasible by mov from Ring 0.

The intuition is simple. PatchGuard is a piece of code. It lives in the kernel's virtual address space at some page. It owns a timer that re-runs it periodically. It maintains a randomization seed for which structures it checks next. It has a callback path into KeBugCheckEx if it detects tampering. Every one of those four assets -- the code page, the timer callback, the randomization seed, the bug-check path -- is a kernel data structure or a kernel virtual address. An attacker with Ring-0 code execution can locate each of them by searching the same kernel address space PatchGuard searches. They can patch the callback so the timer no-ops. They can patch the seed so the randomization is predictable. They can patch the bug-check path so it reports success. They can do all of this with a sequence of plain mov instructions. PatchGuard cannot defend against this, because PatchGuard's defenses live in the same place its attacker's writes do.

PatchGuard and its attacker are colleagues, not adversaries. They share an office. The office is `ntoskrnl.exe`'s virtual address space, and there is no key on the door.

This is the same-privilege paradox. It is not an implementation bug. It does not yield to better obfuscation, more randomization, or harder-to-find timers. It is an architectural ceiling. A defense at privilege level $P$ cannot be enforced against an attacker who also runs at privilege level $P$, because the defender's state lives in the attacker's address space. The defender can be made expensive to find; it cannot be made impossible to find, because the attacker has the same instructions, the same address-space view, and the same MMU privileges as the defender.

Note: The same-privilege paradox is a property of where the defense lives, not of how clever the defense is. PatchGuard's authors did add randomization. They did add multiple decoy callbacks. They did add cryptographically derived integrity checks. None of those reductions changes the basic fact that the attacker, holding the same Ring-0 privilege, can locate and edit each of them. The architectural fix is not better PatchGuard. The architectural fix is moving the defender to a privilege level the attacker cannot reach.

Once the paradox is named, the defender's choice is binary. Either give up on having a defense at all -- treat Ring 0 as a free-fire zone where any malware that gets there has won -- or move the defender to a privilege level above Ring 0, at a hardware boundary the attacker's mov instructions cannot cross. Microsoft picked the second. It is the only architecturally honest choice.

To make it work, Microsoft needed three things. The first was a hypervisor already deployed on every Windows install. They had that since 2008. The second was a way to put a piece of Windows itself -- code, data, secrets -- inside the hypervisor's protection without spawning a separate VM, because spawning a separate VM doubles the system's resource cost and forces every Windows process to choose between living on the normal side or the secure side. That required an architectural idea that did not yet exist in 2010: a way to split a single partition into two privilege levels, each with its own SLAT mapping and its own register state. The third was a way to ensure the hypervisor itself could not be silently replaced or rolled back beneath the OS. That required a hardware-rooted measurement -- a DRTM event -- that the OS could attest to.

The architectural idea is the subject of section 6. The DRTM measurement is the subject of section 11. Both of them required a decade-long conversation about whether the hypervisor itself could be trusted at all -- a conversation that ran in parallel during the same years and that briefly seemed to argue the opposite case. We turn to that conversation next.

5. The Hyperjacking Era -- SubVirt, Blue Pill, and CloudBurst

While Microsoft was finishing Hyper-V, the security community was establishing that a hypervisor was not just a defense -- it was also the most powerful possible attacker against the OS sitting above it. Three demonstrations in three years made the point unmistakable.

SubVirt. In May 2006, Samuel King and Peter Chen at the University of Michigan, joined by Yi-Min Wang, Chad Verbowski, Helen Wang, and Jacob Lorch at Microsoft Research, presented "SubVirt: Implementing Malware with Virtual Machines" at IEEE S&P [@king-subvirt-2006]. Their construction was a Virtual Machine Based Rootkit (VMBR). A privileged installer running inside a legitimate OS installed a malicious VMM at boot time; on the next reboot, the malicious VMM ran first, brought up the original OS as a guest underneath it, and gained the privileged position of seeing every CPU instruction, every memory access, and every I/O the OS performed. The original OS had no architectural way to tell it was no longer the most-privileged software on the box. SubVirt was demonstrated against Windows XP (using Microsoft Virtual PC as the malicious VMM substrate) and against Linux (using VMware Workstation), specifically to show that the technique was not tied to any one operating system or any one hypervisor product.

Blue Pill. Three months later, at Black Hat USA 2006, Joanna Rutkowska of COSEINC demonstrated "Subverting Vista Kernel for Fun and Profit" [@wp-blue-pill]. Her tool, codenamed Blue Pill, took a step beyond SubVirt by doing the VMM insertion at runtime rather than at boot. The technique: a Ring-0 driver, running inside an already-booted Windows install on an AMD-V capable host, executed VMRUN against an attacker-controlled Virtual Machine Control Block (VMCB) whose initial state matched the current physical CPU. The CPU dropped out of SVM root mode and re-entered as a guest under the attacker's VMM. The OS continued running normally, with no boot-loader modification and no reboot.

By 2007, Rutkowska and Alexander Tereshkin returned to Black Hat USA with the more polished "IsGameOver(,) Anyone?" presentation, refining the technique and addressing the early critics' detection ideas [@wp-blue-pill].Rutkowska's marketing claim that Blue Pill was "100% undetectable" attracted a public counter-effort: in 2007, Edgar Barbosa, Nate Lawson, Peter Ferrie, and Tom Ptacek all proposed detection techniques relying on side channels (timing artifacts of trapped instructions, TSC skew, structural differences in how RDTSC behaves under VT-x). The claim softened in subsequent publications, but the underlying point survived: a hostile thin hypervisor below a victim OS can be made arbitrarily difficult to detect from inside that OS, and the only architecturally clean way to know what you are running under is to measure the boot chain before the OS starts.

CloudBurst. At Black Hat USA 2009, Kostya Kortchinsky of Immunity Inc. presented CLOUDBURST. It was the first publicly demonstrated arbitrary-code-execution guest-to-host escape against a commercial hypervisor: a heap overflow in VMware's emulated SVGA-II graphics adapter, tracked as CVE-2009-1244 [@nvd-cve-2009-1244]. A guest VM, executing entirely inside a VMware-managed user-mode process on the host, could overflow a buffer in that process and gain host code execution. CloudBurst's lasting operational lesson was not the specific bug but the attack surface: device emulation -- not the trap-and-emulate core of the hypervisor -- is the largest piece of guest-attacker-controlled code in any commercial VMM. Every Hyper-V guest-to-host escape Microsoft has shipped a patch for since 2018 lands in either this device-emulation surface or the hypercall input-validation surface that mediates the same kinds of structured guest-controlled input.

flowchart TD subgraph Before["Before hyperjacking"] OS1["Victim OS"] FW1["Firmware (UEFI)"] HW1["Hardware"] OS1 --> FW1 FW1 --> HW1 end subgraph After["After hyperjacking"] OS2["Victim OS (now a guest)"] VMM["Hostile VMM (SubVirt / Blue Pill)"] FW2["Firmware (UEFI)"] HW2["Hardware"] OS2 --> VMM VMM --> FW2 FW2 --> HW2 end

The three demonstrations established a difficult dual truth. The hypervisor is the most powerful defender against an OS-level attacker, and it is the most powerful attacker against an OS-level defender. The same primitive can play either role; which role it plays in any given system depends only on whose hypervisor it is and whether the OS above it can prove that. SubVirt-style attacks did not require Microsoft to invent anything new -- they only had to be a possibility -- to force Microsoft into a design constraint: any "hypervisor as security primitive" architecture has to start by being the only hypervisor on the box, with a measurement of the hypervisor binary recorded in a TPM platform configuration register so that any malicious VMBR underneath could be detected at attestation time. This is the role that System Guard Secure Launch (DRTM) plays in the architecture, and we will return to it in section 11.

Blue Pill (offense) and VBS (defense) are architecturally identical. Each is a thin Type-1 hypervisor that interposes between firmware and OS. Each owns the CPU's virtualization mode, the second-level page tables, and the IOMMU. Each is invisible to the OS unless the OS can prove what is underneath it. The only differences between them are whose hypervisor it is, whether it was measured at load time, and what it does with its privilege. The defense is the offense, run by the right people, in the right order, and attested to.

By 2010 the security community had agreed: the hypervisor is the most powerful primitive in the system, and whoever owns the SLAT page tables owns the box. Joanna Rutkowska's Invisible Things Lab launched Qubes OS, an explicitly hypervisor-rooted security OS, on April 7, 2010 [@qubes-introducing-2010]. Microsoft owned the SLAT page tables. They had a hypervisor on every Windows install. They had a server-consolidation product. What they did not yet have was a reason to re-purpose any of it for security. The reason was already being filed at the United States Patent and Trademark Office. The priority date was September 17, 2013.

6. The Pivot -- VSM, VTLs, and the Hepkin-Kishan Patent

On September 17, 2013, David Hepkin and Arun Kishan filed United States patent application 14/186,415, which would issue on August 30, 2016 as US Patent 9,430,642 B2 [@us9430642b2-patent]. The patent's title, "Providing virtual secure mode with different virtual trust levels," reads like marketing now because the words it introduced -- "Virtual Trust Level," "VTL," "Virtual Secure Mode" -- became Microsoft's own canonical terminology. In 2013 the words did not exist. The patent describes, in 2013, exactly what Microsoft shipped twenty-two months later in Windows 10 build 10240 [@ms-tlfs-vsm].

The patent's claim language is unusually specific. It teaches a virtual-machine manager that makes "multiple different virtual trust levels available to virtual processors of a virtual machine"; it teaches that "different memory access protections (such as the ability to read, write, and/or execute memory) can be associated with different portions of memory (e.g., memory pages) for each virtual trust level"; and it teaches that "the virtual trust levels are organized as a hierarchy with a higher level virtual trust level being more privileged than a lower virtual trust level." Each of those phrases is now a feature of the shipping Microsoft hypervisor.

A hypervisor-managed privilege level inside a single partition. Each VTL has its own SLAT mapping (so the same machine page can be readable in one VTL and absent in another), its own virtual-processor register state (so a VTL transition is a context switch, not a procedure call), and its own interrupt subsystem (so interrupts targeted at one VTL do not preempt code running in another). VTLs are hierarchical: a higher VTL can read all of a lower VTL's memory, but not vice versa. The shipping Microsoft hypervisor implements two VTLs (VTL0 = Normal world, VTL1 = Secure world); the architecture admits up to sixteen [@ms-tlfs-vsm].

Windows 10 RTM on July 29, 2015, and Windows Server 2016, shipped VBS atop the existing Hyper-V hypervisor [@wp-windows-10]. The architectural innovation -- the thing the patent was for -- was that VTL0 (Normal world, containing the NT kernel, user mode, and LSASS) and VTL1 (Secure world, containing the Secure Kernel and Isolated User Mode trustlets) ran inside the same partition rather than in two separate partitions. VBS is not a second VM. It is a per-VTL SLAT split inside the root partition, plus a per-VTL register-state snapshot, plus a per-VTL interrupt delivery surface. The hypervisor switches SLAT contexts on VTL transitions, exactly as it would switch SLAT contexts on a partition switch -- but the switch happens inside a single partition's address space, so there is no extra VM scheduling and no extra OS image to manage.

flowchart TD subgraph Root["Root partition"] subgraph VTL0["VTL0 -- Normal world"] NT["NT kernel (ntoskrnl.exe)"] User["User mode (lsass.exe, applications)"] end subgraph VTL1["VTL1 -- Secure world"] SK["Secure Kernel (securekernel.exe)"] IUM["Isolated User Mode trustlets"] LSAISO["LSAISO.EXE"] VTPM["vTPM trustlet"] IUM --- LSAISO IUM --- VTPM end end HV["Microsoft Hypervisor (hvix64 / hvax64)"] HW["Hardware (CPU, RAM, IOMMU, TPM)"] VTL0 -. "Secure call (hypercall + SynIC)" .-> VTL1 VTL1 --> HV VTL0 --> HV HV --> HW

The Hyper-V Top-Level Functional Specification, chapter 15, names the architectural facts verbatim. "VSM achieves and maintains isolation through Virtual Trust Levels (VTLs). VTLs are enabled and managed on both a per-partition and per-virtual processor basis." "Virtual Trust Levels are hierarchical, with higher levels being more privileged than lower levels." "Architecturally, up to 16 levels of VTLs are supported; however a hypervisor may choose to implement fewer than 16 VTL's. Currently, only two VTLs are implemented." The C-level definition #define HV_NUM_VTLS 2 is published in the same specification [@ms-tlfs-vsm]. Two VTLs are what ships; the architecture has room for more.

VSM enables operating system software in the root and guest partitions to create isolated regions of memory for storage and processing of system security assets. Access to these isolated regions is controlled and granted solely through the hypervisor, which is a highly privileged, highly trusted part of the system's Trusted Compute Base (TCB). -- Microsoft, *Hyper-V Top-Level Functional Specification*, chapter 15 [@ms-tlfs-vsm]

This is the second insight the article is built around: VBS is not a re-architecture. It is a re-purposing. The hypervisor was already on every Windows install for unrelated reasons. The 2015 pivot did not require new hardware, new VMs, or new CPUs. It required a new way to organize what was already there -- two SLAT mappings instead of one, two register snapshots instead of one, a secure-call ABI on top of the SynIC -- and a Windows-side Secure Kernel binary to run inside the new VTL1 view. The patent gave the design its formal expression; the engineering had been waiting since 2008 for the right architectural insight.David Hepkin spent over a decade on the NT kernel architecture team before the VSM design; Arun Kishan was an NT kernel architect and is now Microsoft's Corporate Vice President for the Operating Systems Platform group. Neither is a virtualization specialist by background. Their patent is, in retrospect, a kernel-team idea about how to put a piece of the kernel itself behind a hardware boundary the kernel cannot cross -- exactly the kind of design that an architect who had lived inside ntoskrnl.exe for years would invent.

Alex Ionescu's Black Hat USA 2015 deck "Battle of SKM and IUM: How Windows 10 Rewrites OS Architecture" reverse-engineered the entire VSM stack within four weeks of Windows 10 RTM [@ionescu-bh-2015]. The vocabulary Ionescu introduced has become the canonical research language for talking about VBS: VTL as "synthetic ring level managed by the hypervisor"; trustlets for the user-mode processes that run inside VTL1's Isolated User Mode; Signature Level 12 plus the IUM EKU 1.3.6.1.4.1.311.10.3.37 as the loader's signing requirement. Microsoft's own developer documentation now uses the same terms [@ms-iso-user-mode-trustlets].

The pivot, then, was not a sudden re-architecture. It was the cash-out of a deliberate multi-year engineering plan that began at least twenty-two months before Windows 10 RTM. To see what VBS actually enforces -- and which hypervisor primitive backs each piece of that enforcement -- we need to walk the hypervisor's public surface. There are five surfaces. They are the architectural body of the article.

7. Architecture Tour -- The Hypervisor's Public Surface

What does the Windows hypervisor actually look like as a piece of software? It is a small kernel, on the order of one to two hundred thousand lines of C and C++ by community estimate; Microsoft has not published a primary line count. It has five externally visible surfaces, all of which are documented in the Hyper-V Top-Level Functional Specification (TLFS) v6.0b [@ms-tlfs-pdf]. We walk them in turn.

7.1 Partitions, VMBus, and the VSP/VSC pair

A partition is the hypervisor's unit of isolation. From the Microsoft architecture page: "The Microsoft hypervisor must have at least one parent, or root, partition, running Windows. The virtualization management stack runs in the parent partition and has direct access to hardware devices. The root partition then creates the child partitions which host the guest operating systems" [@ms-hyperv-architecture]. The root partition is a full Windows install with privileged hypercalls and direct access to hardware; each child partition is a guest VM with only the hardware the root has chosen to expose.

A guest VM does I/O over the VMBus. A network packet, for example, travels from the guest application down to the guest's Windows NDIS stack; through the synthetic NIC miniport driver (the VSC) in the guest's kernel; over the VMBus message channel; into the network VSP in the root partition; into the root's real NDIS stack; into the physical NIC driver; out the wire. The hypervisor's role in this chain is structural: it owns the VMBus message channel, the SynIC interrupts that notify the VSP and VSC of new traffic, and the per-partition SLAT mappings that decide which bytes either side can read.

The architectural implication is that device emulation lives in the root partition, not in the hypervisor binary. The TCB the hypervisor binary itself has to protect is narrow. The TCB the root partition's drivers have to protect is much wider -- but those drivers live in normal Windows kernel mode, where Microsoft has thirty years of tooling. This is why almost every public Hyper-V CVE since 2018 has landed in vmswitch.sys, storvsp.sys, or the NT Kernel Integration VSP, rather than in hvix64.exe itself.

Note: Putting device emulation in the root partition means the hypervisor binary does not need to parse Ethernet frames, SCSI commands, USB descriptors, or graphics-adapter command rings. The trade-off is that the root partition becomes part of the TCB -- a root-partition kernel-mode bug is a hypervisor-equivalent break -- but the small hypervisor binary itself can be reviewed, fuzzed, and reasoned about as a single piece of code.

7.2 The hypercall ABI

Hypercalls are how partitions request services from the hypervisor. The TLFS documents two flavors. A fast hypercall passes its parameters inline in CPU registers: on x64, rcx carries a 64-bit hypercall input value (the low 16 bits are the call code; the upper 48 bits are a control word with fields for the Fast flag, variable-header size, Rep Count, and Rep Start Index), rdx carries the first input parameter, and r8 carries the second. A slow hypercall instead passes the GPA (guest physical address) of an input-parameter page in rdx, and the GPA of an output-parameter page in r8; the actual parameter content lives in those pages. The instruction that triggers the hypercall is vmcall on Intel and vmmcall on AMD; the hypervisor maps both onto the same internal entry point [@ms-tlfs-pdf].

A guest-to-hypervisor call. The guest issues `vmcall` (Intel) or `vmmcall` (AMD); the CPU traps via VM-EXIT into the hypervisor in VMX root mode; the hypervisor reads the call code from `rcx`, reads the inputs from registers (fast) or from a GPA-pointed page (slow), services the request, writes outputs back, and returns via VM-ENTRY. Hypercalls are the only legitimate way for a partition to invoke hypervisor services [@ms-tlfs-pdf].

{// A JavaScript model of the rcx hypercall input value layout. // In a real hypercall the guest sets rcx, rdx, r8 and issues vmcall / vmmcall. function packHypercallInput({ callCode, fastFlag, varHeaderSize, isNested, repCount, repStartIdx }) { // rcx layout (TLFS section 3 "Hypercall Interface", verbatim bit map) // bits 0..15 Call Code // bit 16 Fast (1 = inline params in rdx/r8) // bits 17..26 Variable header size (in QWORDs) // bits 27..30 RsvdZ // bit 31 Is Nested // bits 32..43 Rep Count // bits 44..47 RsvdZ // bits 48..59 Rep Start Index // bits 60..63 RsvdZ let rcx = 0n; rcx |= BigInt(callCode) & 0xFFFFn; if (fastFlag) rcx |= 1n << 16n; rcx |= (BigInt(varHeaderSize) & 0x3FFn) << 17n; if (isNested) rcx |= 1n << 31n; rcx |= (BigInt(repCount) & 0xFFFn) << 32n; rcx |= (BigInt(repStartIdx) & 0xFFFn) << 48n; return rcx; } // HvCallPostMessage = 0x005C, fast hypercall (TLFS section 11) const rcx = packHypercallInput({ callCode: 0x005C, fastFlag: 1, varHeaderSize: 0, isNested: 0, repCount: 0, repStartIdx: 0, }); console.log('rcx = 0x' + rcx.toString(16).padStart(16, '0')); // Output: rcx = 0x000000000001005c}

The call-code space is small and well-documented: a few hundred codes, each one a structured request with typed inputs and outputs. The hypercall path is also where the most consequential 2024 Hyper-V CVE lived. CVE-2024-21407 was a use-after-free in hvix64.exe's handling of a specific file-operation hypercall, the rare case where the bug was in the hypervisor binary itself rather than in a root-partition driver [@nvd-cve-2024-21407].

7.3 Intercepts

Intercepts are how the hypervisor virtualizes guest behavior. The TLFS distinguishes four categories: instruction intercepts (CPUID, MSR reads/writes, I/O-port instructions), exception intercepts (page faults, general protection faults), memory-access intercepts (a guest tries to read or write a specific guest-physical-address region), and partition-state intercepts (a guest hits a state that the hypervisor wants to be notified about). Each is configured per-partition through the Intel VMCS execution-control bits or the AMD VMCB control fields [@ms-tlfs-pdf].

A configurable hypervisor notification on a specific guest event. The hypervisor programs the VMCS or VMCB to fire a VM-EXIT when the guest issues a particular instruction, raises a particular exception, accesses a particular memory region, or transitions to a particular state. Intercepts are the policy mechanism that lets the hypervisor implement device emulation, security checks, and VTL transitions [@ms-tlfs-pdf].

For VBS, the load-bearing intercept is the memory-access intercept. When VTL0 code tries to access a region whose VTL0 SLAT mapping is unreadable or unwritable, the access traps to the hypervisor with the offending GPA; the hypervisor can deliver the intercept to the VTL1 Secure Kernel as a secure call, letting VTL1 see what VTL0 was trying to do and decide whether to allow it. This is how HVCI's W^X enforcement is wired: a VTL0 page that is marked writable in VTL0's SLAT is marked non-executable in the same SLAT; an attempt to switch the same page to executable becomes a memory-access intercept that VTL1 must approve.

7.4 The Synthetic Interrupt Controller (SynIC)

The Synthetic Interrupt Controller, SynIC, is the hypervisor's per-virtual-processor event delivery surface. Each VP has 16 Synthetic Interrupt Source (SINT) lines, a message page (where the hypervisor places message-shaped events), an event-flag page (where it places bit-flag events), and a set of synthetic timers. SynIC is the bus on which VMBus traffic between VSP and VSC moves; it is also the bus on which VTL transitions between VTL0 and VTL1 are delivered inside the root partition [@ms-tlfs-pdf].

A hypervisor-emulated interrupt controller, parallel to the hardware APIC, that delivers hypervisor-originated events to a virtual processor. Each VP has 16 SINT lines, a message page, an event-flag page, and synthetic timers. VMBus signaling rides on SynIC; secure-call delivery between VTL0 and VTL1 rides on SynIC; vTPM, virtual-PCI, and other paravirtualized device events ride on SynIC [@ms-tlfs-pdf].

For VBS, the secure-call ABI -- the way VTL0 code asks VTL1 to do something -- is built on SynIC. A VTL0 caller writes a request into a shared message page, signals a SINT, and yields the CPU; the hypervisor switches SLAT context to VTL1, delivers the message, and lets VTL1 read the request. When VTL1 finishes, it signals a SINT back to VTL0 and the hypervisor switches contexts again. Credential Guard's whole communication path between VTL0 LSASS and VTL1 LSAISO is one of these secure-call channels.

7.5 Memory and per-VTL SLAT

The last surface is also the most important: memory. Guest physical addresses (GPAs) are translated to system physical addresses (SPAs) by per-partition SLAT page tables. The hypervisor has exclusive control of these tables; no partition, including the root, can read or modify them directly. For VBS specifically, the hypervisor maintains two SLAT mappings per partition -- one for VTL0 and one for VTL1 -- and switches between them on VTL transitions.

This is the architectural reason VTL0 kernel mode, even with full Ring-0 code execution, cannot read or execute VTL1 memory. The VTL0 page-table walker on a load from a VTL1-only page does not see the page at all; the SLAT walker on the host returns no mapping; the hardware MMU raises an EPT/NPT violation; the hypervisor handles the violation according to the VTL0 partition's intercept policy. In the security-relevant case, the hypervisor delivers an access-denied result to VTL0 and continues. There is no kernel-mode mov instruction sequence that can defeat this, because the gating happens in hardware page-table walks that VTL0 kernel mode cannot influence.

Five surfaces. Two of them -- the hypercall ABI and the device-emulation paths that surface over VMBus -- are where every public Hyper-V escape since 2018 has lived. The other three (intercepts, SynIC, per-VTL SLAT) are the substrate on which VBS, HVCI, Credential Guard, and System Guard Secure Launch are built. We turn to those next.

8. How the Hypervisor Enforces Each VBS Feature

The hypervisor itself does not know anything about credentials, code signing, application allowlisting, or DMA protection. It knows about partitions, VTLs, intercepts, SLAT entries, and hypercalls. Each Windows security feature is built by composing those primitives in a specific way. The mapping is precise and worth walking, because it is what makes the substrate a security primitive rather than just a virtualization product [@ms-hardware-root-of-trust].

HVCI / Memory Integrity. Hypervisor-protected Code Integrity is the most consequential VBS feature on a per-byte basis: it changes Windows from a system that lets the kernel execute any signed driver to one where the kernel cannot execute any page until VTL1 has approved it. VTL1's code-integrity service inspects every kernel-mode page mapping change request before the SLAT entry that would make the page executable in VTL0 is granted. The W^X invariant -- a single page can be writable or executable, but never both -- is enforced not by NT kernel cooperation but by the per-VTL SLAT, exactly as described in section 7.5. An NT-kernel attempt to mark a writable page executable becomes a memory-access intercept that VTL1's CI service evaluates [@ms-enable-vbs-hvci]. The hypervisor primitives composed: per-VTL SLAT + memory-access intercepts + secure-call ABI.

A user-mode process that runs inside VTL1's Isolated User Mode (IUM). Trustlets must be signed with the Windows System Component Verification certificate (Signature Level 12) and carry the IUM EKU `1.3.6.1.4.1.311.10.3.37`. The shipping inbox trustlets include `LSAISO.EXE` (Credential Guard), `VMSP.EXE` (host side of virtual TPM), and the vTPM provisioning trustlet [@ms-iso-user-mode-trustlets, @ionescu-bh-2015].

Credential Guard. LSAISO.EXE -- the LSA-Isolated trustlet -- runs in VTL1 Isolated User Mode. NTLM password hashes and Kerberos Ticket-Granting Tickets that LSASS used to keep in normal VTL0 memory are moved to VTL1 memory that VTL0 cannot read. VTL0 LSASS performs credential operations by sending a request to LSAISO over a secure-call channel mediated by the hypervisor's SynIC; LSAISO does the cryptographic work and returns a result. The plaintext of the credential never leaves VTL1. This is why a Ring-0 attacker on a Credential Guard-enabled Windows install cannot dump LSASS hashes -- they aren't in LSASS [@ms-iso-user-mode-trustlets]. The hypervisor primitives composed: per-VTL SLAT (to hide LSAISO's memory) + SynIC (to deliver secure calls) + intercepts (to catch VTL0 attempts to access LSAISO memory). See the sibling Credential Guard / NTLMless article for VTL1 internals.

The VTL0-to-VTL1 calling convention. A VTL0 caller fills in a shared parameter page, signals a SynIC interrupt configured for VTL transition, and yields. The hypervisor switches SLAT context to VTL1, delivers the message, and lets the Secure Kernel dispatch it via `IumInvokeSecureService` to a registered VTL1 service. On return, the hypervisor switches contexts back. The whole round-trip is mediated by hypervisor primitives the calling VTL cannot bypass [@ionescu-bh-2015].

Application Control (WDAC). The same VTL1 code-integrity service that backs HVCI also evaluates user-mode policy. When VTL0 user mode tries to load a binary that is restricted by WDAC policy, the load becomes a secure call into VTL1; VTL1's policy engine evaluates the signature, the certificate chain, and the configured policy; the secure call returns approval or denial. WDAC policy lives in VTL1, the policy database lives in VTL1, and a VTL0 administrator who has been compromised cannot edit either. The hypervisor primitives composed: same as HVCI, plus a richer secure-call API for policy evaluation.

VBS Enclaves. A third-party application can load native code into a VTL1 IUM enclave. The enclave executes in VTL1, with its memory hidden from VTL0; the application talks to the enclave through a secure-call ABI exposed by the Secure Kernel. Architecturally parallel to Credential Guard but available to ordinary application developers. The hypervisor primitives composed: per-VTL SLAT (to hide enclave memory) + secure-call ABI (to invoke enclave code) + a Secure Kernel API for enclave creation, attestation, and destruction.

System Guard Secure Launch (DRTM). Intel TXT's SENTER instruction (and AMD's SKINIT on AMD platforms) executes a hardware-rooted dynamic measurement of the hypervisor and the Secure Kernel into TPM PCRs 17-22 after firmware initialization [@ms-system-guard-secure-launch]. This re-establishes the trust root post-firmware: a pre-boot firmware compromise that survived UEFI Secure Boot cannot silently poison the hypervisor's launch state without showing up as an unexpected measurement in a PCR that VTL1 can read. The hypervisor primitives composed: DRTM event registration with the hardware + TPM PCR extension + a VTL1-side attestation API. See the sibling Secure Boot article for the static-RTM half of the same story.

Kernel DMA Protection. External devices over Thunderbolt, USB4, or hot-plug PCIe can issue DMA to arbitrary physical addresses, bypassing the CPU's MMU entirely. The hypervisor configures the IOMMU (Intel VT-d / AMD-Vi) to deny DMA from externally-attached devices outside of explicitly-authorized memory regions, and to refuse DMA from any device before its kernel-mode driver has been loaded under a trusted policy [@ms-kernel-dma-protection]. The hypervisor primitives composed: hypervisor-owned IOMMU configuration + memory-access intercepts on the IOMMU configuration MMIO region.

The shape of the table is the point.

Feature	Composed primitives	Verbatim hypervisor mechanism
HVCI	per-VTL SLAT + memory-access intercepts + secure-call ABI	VTL1 vets each VTL0 page-mapping change before granting +X
Credential Guard	per-VTL SLAT + SynIC + intercepts	LSAISO trustlet memory absent from VTL0 SLAT mapping
WDAC (AppControl)	secure-call ABI + VTL1 policy engine	VTL0 binary load = secure call into VTL1 CI service
VBS Enclaves	per-VTL SLAT + secure-call ABI	Third-party VTL1 IUM enclave invoked over secure call
System Guard Secure Launch	hardware DRTM (TXT/SKINIT) + TPM PCR extension	`SENTER` / `SKINIT` measures hypervisor into PCRs 17-22
Kernel DMA Protection	hypervisor-owned IOMMU + MMIO intercepts	VT-d/AMD-Vi denies DMA outside authorized regions

The hypervisor knows nothing about NTLM hashes, Kerberos tickets, code-signing certificates, WDAC policy XML, or DMA-region authorization. All of that policy lives in VTL1 -- in the Secure Kernel, in LSAISO, in the WDAC service. The hypervisor only provides the *mechanism* for one piece of policy to evaluate a request from another piece of policy in isolation. This is the architectural separation that lets the hypervisor binary stay small and the Windows-side security feature set keep growing.

The pattern: each feature is a different composition of the same five primitives (partitions, hypercalls, intercepts, SynIC, per-VTL SLAT). The hypervisor is genuinely a primitive in the formal sense -- a small set of mechanisms that compose into many security policies. If the hypervisor is the mechanism, the boundary the hypervisor enforces is the contract. Microsoft commits to servicing certain attacks against that boundary and explicitly excludes others. To know what we are getting, we need to read the contract.

9. The Security Boundary Microsoft Commits To

The Microsoft Security Servicing Criteria for Windows is a public document. It enumerates which classes of attack Microsoft will issue a CVE and an out-of-band patch for, and which it will not. For the hypervisor, the document is unusually specific [@ms-msrc-servicing-criteria].

The two relevant boundaries:

Hypervisor / virtualization boundary. An L1-guest-to-host or guest-to-guest break is a serviced boundary. If a guest VM can execute code in the root partition or in another guest's address space, Microsoft will issue a CVE.
Virtual Secure Mode (VBS) boundary. VTL0 kernel-mode code reading or writing VTL1 memory, or executing VTL1 code, is a serviced break. If a Ring-0 attacker in VTL0 can defeat the per-VTL SLAT, Microsoft will issue a CVE.

What the servicing criteria does not commit to is also worth naming. A same-VTL elevation of privilege inside a guest (a guest user becoming guest SYSTEM) is not a hypervisor break -- it is a Windows EoP, serviced under the Windows kernel boundary, not the hypervisor boundary. A denial-of-service of the host from a guest is generally not a serviced hypervisor break unless it produces a memory corruption that an attacker can ride to RCE. An administrator in the root partition reading guest memory is not a break at all -- the root partition is part of the hypervisor's TCB by definition, and root-partition admin is hypervisor-admin in the threat model.

The dollar figures for these boundaries are documented in the Microsoft Hyper-V Bounty Program [@ms-msrc-bounty-hyperv]. The program ranges from $5,000 for the lowest-impact qualifying submission up to $250,000 for the highest. The eligibility language is verbatim:

An eligible submission includes a Remote Code Execution (RCE) vulnerability in Microsoft Hyper-V that enables a L1 guest virtual machine to compromise the hypervisor, escape from the guest virtual machine to the host, or escape to another L1 guest virtual machine. -- Microsoft Hyper-V Bounty Program [@ms-msrc-bounty-hyperv]

$250,000 is the highest standing Hyper-V bounty in the industry. Comparable programs from the other major hypervisor vendors do not publish the same calibration. KVM is a community project with no vendor-paid bounty pool of equivalent size. Xen is a Linux Foundation project that runs a bug bounty through HackerOne but does not publicly attach a $250,000 figure to a guest-to-host RCE. ESXi (Broadcom) does not publish a standing bounty program with a per-bug ceiling; bounty payments for ESXi RCEs typically flow through Pwn2Own and similar marketplaces, where Trend Micro's Zero Day Initiative sets the prize for any given competition.The bounty calibration is itself a data point. If $250,000 were too high, Microsoft would be drowning in submissions; if it were too low, the public CVE record would show more hypervisor breaks reported through Pwn2Own than directly to MSRC. The current equilibrium -- two to four Microsoft-direct Hyper-V CVEs per year, plus zero Pwn2Own Hyper-V guest-to-host escapes through Pwn2Own Berlin 2025 [@zdi-pwn2own-day3] -- is consistent with the bounty being calibrated roughly correctly relative to the cost of finding a real bug.

Vendor	Hypervisor	Published bounty	Ceiling	Servicing-criteria boundary published
Microsoft	Hyper-V / `hvix64.exe`	Yes	$250,000	Yes, verbatim language
Xen Project	Xen	Yes (HackerOne)	Lower, varies	Yes, security policy
KVM	KVM (community)	No standing program	--	No vendor-published criteria
Broadcom/VMware	ESXi	No standing public bounty	--	Vendor advisories per CVE
seL4 Project	seL4	No (proof-rooted argument)	--	Functional-correctness proof [@sel4-whitepaper]

The seL4 row is included because seL4 is the only hypervisor in the table whose claim to a security boundary is mathematical rather than operational. seL4 ships approximately ten thousand lines of C and assembly with a machine-checked proof of functional correctness against a higher-level specification. The proof took roughly twenty-five person-years and covers a microkernel that does not by itself ship the full surface area of Hyper-V. The Microsoft hypervisor is unverified at the §7-estimated line count an order of magnitude larger; its security argument is operational (a small TCB, heavy fuzzing, a standing bounty, public servicing) rather than mathematical.

A serviced boundary is a contract. Contracts are not promises; they are obligations that come due when an attacker finds a way around them. To see what the contract has actually had to pay out, we read the public CVE record.

10. The Public Track Record -- Six Worked CVEs Across Three Classes

We do not need an exhaustive Hyper-V CVE catalog to understand the boundary's real shape. Six worked examples, drawn from three distinct attack classes, cover every public failure mode the boundary has produced since 2018. We walk them in order.

Class A: Device emulation in the root partition

CVE-2021-28476 (vmswitch.sys, May 2021, CVSS 9.9). Discovered by Ophir Harpaz at Guardicore Labs and Peleg Hadar at SafeBreach Labs using Guardicore's hAFL1 hypervisor fuzzer, this was a guest-controlled OID_SWITCH_NIC_REQUEST OID parameter passed to the host-side vmswitch.sys driver. The driver dereferenced an attacker-influenced object pointer; the host kernel performed an arbitrary pointer dereference; the guest gained RCE in the root partition's kernel mode. The CVSS 9.9 score (AV:N/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H) reflects guest-to-host RCE with Azure-scale blast radius: the bug was reachable from the vmswitch driver shipped in Windows builds well before the May 2021 patch, per the Guardicore Labs technical analysis [@nvd-cve-2021-28476]. The bug is the canonical anchor for "device emulation in the root partition is the largest Hyper-V attack surface."

CVE-2025-21333 (NT Kernel Integration VSP, January 2025, CWE-122). The first publicly-acknowledged in-the-wild exploited Hyper-V CVE. The "Hyper-V NT Kernel Integration VSP" is a relatively new component that ties the Windows kernel-mode container architecture to Hyper-V's VSP/VSC pattern. A guest-controlled input triggered a heap-based buffer overflow on the host side of the integration; the host's address space was corruptible from a guest [@nvd-cve-2025-21333]. The operational pattern matches the vmswitch family: a host-side component receives structured, attacker-shaped input from a guest, and the host-side component overflows.

Class B: The hypercall input-validation path

CVE-2024-21407 (Hyper-V hypercall UAF, March 2024, CVSS 8.1, CWE-416). The rare case where the bug is in hvix64.exe / hvax64.exe itself, not in a root-partition driver. A guest crafted specially-formed file-operation hypercalls; the hypervisor dereferenced freed memory; the guest gained arbitrary host code execution [@nvd-cve-2024-21407].

CVE-2024-30092 (Hyper-V RCE, October 2024, CWE-20 + CWE-829). A Hyper-V remote code execution that combined improper input validation with inclusion of functionality from an untrusted control sphere -- another hypercall-path-class bug [@nvd-cve-2024-30092].

CVE-2024-49117 (Hyper-V RCE, December 2024, CVSS 8.8). A third 2024 Hyper-V RCE; the December Patch Tuesday entry rounded out a year in which three publicly-disclosed Hyper-V RCEs landed in twelve months, the most since the 2018 vmswitch family [@nvd-cve-2024-49117].

Class C: VTL0-to-VTL1 (the VBS break, not the hypervisor break)

CVE-2020-0917 and CVE-2020-0918 -- Amar and King, Black Hat USA 2020. Saar Amar and Daniel King's "Breaking VSM by Attacking SecureKernel" disclosed two paired vulnerabilities discovered with their Hyperseed hypercall fuzzer retargeted at securekernel!IumInvokeSecureService, the secure-call entry point. Vulnerability #1 -- which maps to CVE-2020-0917 -- is an out-of-bounds write in securekernel!SkmmObtainHotPatchUndoTable, the function that parses the hot-patch undo table at secure-call invocation time.The Black Hat USA 2020 deck (verified via pdftotext at the canonical MSRC-Security-Research GitHub URL) explicitly labels Vulnerability #1 as OOB Write, in slides titled "The Vulnerable Function" and "The OOB" in the "Hardening SK" section [@amar-king-bh-2020]. Several secondary writeups across the web have transcribed the bug class as "OOB read," which is incorrect; the deck itself is the primary source and says write. The functions involved are also commonly conflated: IumInvokeSecureService is the secure-call dispatcher Hyperseed retargets to reach the buggy code; the actual bug is in SkmmObtainHotPatchUndoTable. The NVD entries for both CVEs are tracked as CWE-269 (Improper Privilege Management). Vulnerability #2 -- CVE-2020-0918 -- is a design flaw in SkmmUnmapMdl that lets VTL0 pass a fully attacker-controlled Memory Descriptor List to SkmiReleaseUnknownPTEs.

The Microsoft response is documented end-to-end in the same deck: the Secure Kernel pool was migrated to segment heap in mid-2019, four W+X regions were reduced to +X only, and SkpgContext -- a HyperGuard equivalent for Secure Kernel -- was introduced.

This is a different failure class than vmswitch RCE: not guest-to-host, but VTL0-to-VTL1 -- a Secure Kernel break reached through the hypervisor's secure-call dispatch from a privileged VTL0 attacker. Microsoft services it under the VBS / VSM boundary in the servicing criteria document, even though no guest VM is involved.

Key idea: Every public Hyper-V CVE since 2018 lives in one of three narrow code paths -- device emulation, hypercall input validation, or VTL0-to-VTL1 secure-call dispatch. The TLFS-visible primitives (intercepts, SynIC, per-VTL SLAT) have produced none.

The Pwn2Own dimension

Through Pwn2Own Berlin 2025, no public live Hyper-V guest-to-host escape has been demonstrated at Pwn2Own. The cross-vendor analogue -- and the industry's best calibration of how hard a hypervisor escape is to find when a researcher has a public dollar incentive and a deadline -- is the first-ever ESXi escape in Pwn2Own history, executed by Nguyen Hoang Thach of STAR Labs SG on Day Two (May 16, 2025) using a single integer overflow vulnerability in the hypervisor's DMA-handling path. The award was $150,000 plus 15 Master of Pwn points; STAR Labs went on to win overall Master of Pwn for the competition with $320,000 across three days [@zdi-pwn2own-day3].

The technique class is a TOCTOU on a length field read twice during a DMA operation: the first read validates the length, the second read uses it; race the second read and you write past a fixed-size buffer on the host heap. The exploit class is structurally the same as the vmswitch family, just landed in a different vendor's device-emulation path.

CVE	Class	Year	CVSS	Location	Source
CVE-2021-28476	A: device emulation	2021	9.9	`vmswitch.sys` (root partition)	[@nvd-cve-2021-28476]
CVE-2025-21333	A: device emulation	2025	7.8	NT Kernel Integration VSP (root partition)	[@nvd-cve-2025-21333]
CVE-2024-21407	B: hypercall path	2024	8.1	`hvix64.exe` / `hvax64.exe` (hypervisor binary)	[@nvd-cve-2024-21407]
CVE-2024-30092	B: hypercall path	2024	7.5	Hyper-V hypercall validation	[@nvd-cve-2024-30092]
CVE-2024-49117	B: hypercall path	2024	8.8	Hyper-V hypercall validation	[@nvd-cve-2024-49117]
CVE-2020-0917/0918	C: VTL0-to-VTL1	2020	6.8 (per MSRC)	`securekernel.exe` (VTL1, reached via secure call)	[@amar-king-bh-2020]

flowchart LR subgraph CA["Class A: device emulation (root partition)"] Vmswitch["vmswitch.sys -- CVE-2021-28476"] Vsp["NT Kernel Integration VSP -- CVE-2025-21333"] end subgraph CB["Class B: hypercall input validation (hypervisor binary)"] UAF["CVE-2024-21407 (UAF)"] Input["CVE-2024-30092"] Hpcall["CVE-2024-49117"] end subgraph CC["Class C: VTL0-to-VTL1 (secure call dispatch)"] Oob["CVE-2020-0917 (OOB write)"] Mdl["CVE-2020-0918 (SkmmUnmapMdl)"] end Guest["Guest VM"] --> CA Guest --> CB Vtl0["Privileged VTL0 (kernel)"] --> CC

This is the third insight the article is built around. The reader's prior model may have been "hypervisors fail in mysterious, deep ways; the boundary is fragile in unknown places." The new model is "every public Hyper-V escape since 2018 lives in one of three narrow code paths, and the TLFS-visible primitives have produced none." The narrowness of the failure space is itself a security argument. The hypervisor's micro-kernelized design has held; what has not always held are the components Microsoft chose to put next to the hypervisor, in the root partition's user mode and kernel mode, by deliberate architectural choice in 2008.

Six worked examples; three classes; one boundary; an unflinching public record. The boundary is alive and producing CVEs at roughly two to four per year. But every CVE so far has lived somewhere the hypervisor itself controls. The interesting question is what lives in places it does not control.

11. The Residual Attack Surface -- Beneath, Beside, and Around

The hypervisor enforces a clean boundary against everything above it -- the NT kernel, user mode, even other guest VMs. It cannot, by construction, enforce anything against what lives below or beside it. Three structural classes of residual attack matter. We walk each.

11.1 Firmware below the hypervisor

System Management Mode (SMM), the UEFI runtime, the platform Manageability Engine (Intel ME), and the AMD Platform Security Processor (PSP) all run at higher privilege than the hypervisor for parts of boot and runtime. SMM in particular is a CPU mode that is invoked through System Management Interrupts (SMI) and has unrestricted access to all of physical memory, including the hypervisor's own pages. If the OEM-supplied SMM handler contains an exploitable bug, an SMI can run attacker code in a privilege mode strictly above the hypervisor's.

The threat is not hypothetical. The Binarly research team's 2023 LogoFAIL disclosures showed entire classes of image-parser bugs in UEFI firmware reachable from a privileged OS context; BootHole (CVE-2020-10713, a buffer overflow in GRUB2's grub.cfg parser) and BlackLotus (CVE-2022-21894, a UEFI Secure Boot bypass) showed that pre-boot bugs in widely-deployed bootloaders could ride past Secure Boot. None of these is a hypervisor bug; all of them are residual attack surface from the hypervisor's point of view.

Microsoft's mitigation is the dynamic root of trust for measurement -- System Guard Secure Launch -- which we touched on in section 8. After UEFI Secure Boot has done its static-RTM job, Intel TXT's SENTER (or AMD's SKINIT) executes a CPU-hardware-rooted late launch: the CPU resets to a known state, runs an Intel- or AMD-signed Authenticated Code Module (ACM), and measures the hypervisor binary into TPM PCRs 17-22 before transferring control to it. The result is that even if pre-boot firmware is compromised, the post-DRTM PCR values reflect the actual hypervisor binary; a compromised UEFI cannot silently substitute a different hypervisor without changing the attestation [@ms-system-guard-secure-launch, @ms-hardware-root-of-trust]. The residual after DRTM: OEMs that don't ship Secure Launch on their motherboards, or that ship buggy SMM handlers that can be invoked after launch.

11.2 Hardware side channels

Microarchitectural side-channel attacks cross the VTL boundary at the level of CPU implementation, not at the level of architectural specification. The 2018 Spectre and Meltdown disclosures -- followed by the L1TF, MDS, Retbleed, and CacheWarp families in the years since -- showed that speculatively-executed code on a CPU can leak microarchitectural state across privilege boundaries that the architectural ISA promises to protect.

Microsoft's mitigation cadence has been in-tree and aggressive: Kernel Virtual Address Shadow (the Windows equivalent of KPTI) for Meltdown; IBRS, STIBP, and retpolines for Spectre v2; HyperClear for L1TF on Hyper-V hosts. Each Patch Tuesday since 2018 has shipped at least one microarchitectural mitigation; cumulatively the cost has been measurable but bounded.

Note: The microarchitectural ceiling is hardware, not software. Intel TDX and AMD SEV-SNP -- the two confidential-computing architectures that move the trust root from the hypervisor to per-VM hardware encryption -- both explicitly disclaim resistance to this class. If the CPU leaks across a Spectre-class side channel, no software-level isolation primitive (VTL, partition, SEAM, SEV-SNP) can fully recover the property. The mitigation is hardware that doesn't leak, and that mitigation arrives one CPU generation at a time.

11.3 IOMMU and DMA bypass

The IOMMU -- Intel VT-d, AMD-Vi -- is the hardware that gates DMA from peripheral devices to physical memory. If the IOMMU is configured correctly, a Thunderbolt-attached device cannot read or write arbitrary memory; it can only DMA to regions the OS has explicitly mapped for it. If the IOMMU is disabled, configured permissively, or has firmware bugs of its own, DMA becomes an end-run around every architectural protection above it -- including the hypervisor's.

The threat is again not hypothetical. Bjorn Ruytenberg's Thunderspy disclosure in 2020 documented seven DMA-class vulnerabilities in Thunderbolt 3 firmware, demonstrating that an attacker with physical access could read or modify arbitrary memory on a powered-on system through a malicious peripheral [@thunderspy]. The Microsoft mitigation is Kernel DMA Protection (Windows 10 1803 and later): the hypervisor configures the IOMMU at boot to deny DMA from externally-attached devices outside of explicitly authorized regions, and DMA from any peripheral whose driver has not been loaded under a trusted policy is refused at the IOMMU [@ms-kernel-dma-protection]. The structural residual: pre-boot DMA, before Windows has finished configuring the IOMMU; client motherboards that still ship with VT-d or AMD-Vi disabled in BIOS; OEMs that disable Kernel DMA Protection by default.

11.4 Hypervisor downgrade and rollback

Alon Leviev's "Windows Downdate" at Black Hat USA 2024 disclosed a class of attack that the prior three sections do not cover: rollback of the hypervisor binary itself to a previously-vulnerable, but still validly-signed, build [@nvd-cve-2024-21302].

The structural argument: UEFI Secure Boot prevents loading an unsigned hvix64.exe. It does not prevent loading an older hvix64.exe that is unsigned only in the sense of being unrevoked. If Microsoft fixes a Secure Kernel bug in build N+1 and a VTL0 attacker can convince the system to load build N at the next reboot, the patched bug is alive again. CVE-2024-21302 demonstrated exactly this rollback against both the hypervisor and the Secure Kernel through manipulation of the Windows Update servicing pipeline. The mitigation is mandatory-update servicing combined with proactive revocation list (dbx) hygiene -- once an older binary's hash is in the UEFI revocation list, Secure Boot will refuse to load it -- and Microsoft completed mitigations across Windows 10 1507 through Windows Server 2019 in the July 8, 2025 update wave [@nvd-cve-2024-21302].

flowchart TD HW["Hardware (CPU, RAM, IOMMU, TPM)"] SM["System Management Mode (Ring -2) -- residual: SMM handler bugs"] FW["UEFI firmware -- residual: LogoFAIL, BootHole, BlackLotus"] DR["DRTM ACM (Intel TXT / AMD SKINIT)"] HV["Microsoft Hypervisor (hvix64 / hvax64)"] Iommu["IOMMU (VT-d / AMD-Vi) -- residual: Thunderspy, pre-boot DMA"] Vtl1["VTL1 (Secure Kernel + trustlets)"] Vtl0["VTL0 (NT kernel + user mode)"] Side["Microarchitectural side channels -- Spectre / Meltdown / MDS / Retbleed"] Update["Windows Update servicing -- residual: hypervisor rollback (CVE-2024-21302)"] HW --> SM SM --> FW FW --> DR DR --> HV HV --> Iommu HV --> Vtl1 HV --> Vtl0 Side -.->|"cross all boundaries"| HV Update -.->|"can roll hypervisor back"| HV The hypervisor is necessary but not sufficient. The firmware-Secure-Boot-DRTM substrate beneath it, the microarchitectural ceiling above it, the IOMMU configuration beside it, and the Windows Update pipeline that decides which hypervisor build runs next are co-equal members of the same boundary. None of them is the hypervisor; all of them have to do their job for the hypervisor's guarantees to hold. The substrate is real, but the boundary is the combination of the substrate and what holds it up.

Necessary, not sufficient. That phrase is the article's honest answer to the question "how good is the substrate?" The answer is that the substrate is genuine, the boundary is published, the bounty calibration is the highest in the industry, the public CVE record is alive and narrow, and the residual attack surface lives in places the hypervisor cannot by construction control. The substrate is what we have explored in detail; what holds it up is what we have just sketched. The last section turns from theory to practice.

12. Practical Guide, FAQ, and Closing

If you have read this far, the natural next question is "is this on, on my machine, and how do I check?" The practical answer is short.

12.1 Enabling and verifying VBS

VBS is configurable through several paths: Group Policy (Computer Configuration > Administrative Templates > System > Device Guard), Intune, MDM CSPs (DeviceGuard/EnableVirtualizationBasedSecurity, DeviceGuard/ConfigureSystemGuardLaunch), the Windows Security UI, or directly via bcdedit /set hypervisorlaunchtype Auto. Verification is best done with three small commands.

msinfo32 -> the Device Guard / Virtualization-based Security row. "Services Configured" lists what policy has requested; "Services Running" lists what is actually active. Kernel DMA Protection and Secure Launch each appear as their own row.
Get-CimInstance -ClassName Win32_DeviceGuard -> VirtualizationBasedSecurityStatus (0 = off, 1 = enabled but not running, 2 = running); SecurityServicesRunning array (HVCI, Credential Guard, etc.); RequiredSecurityProperties (the policy floor).
bcdedit /enum -> hypervisorlaunchtype Auto is the default; loadoptions DISABLE_VBS_* is how an administrator can opt out (you should not see these flags on a properly-configured machine).

{` // Given a parsed Win32_DeviceGuard object, compute whether VBS is healthy. // The actual Win32_DeviceGuard schema is on Microsoft Learn; this is the // decision logic an operator would write against it. function checkVbsHealth(dg) { const result = { ok: false, reasons: [] };

// VBS itself if (dg.VirtualizationBasedSecurityStatus !== 2) { result.reasons.push('VBS is not running (status != 2)'); }

// HVCI (Memory Integrity) if (!dg.SecurityServicesRunning.includes(2)) { result.reasons.push('HVCI / Memory Integrity is not running'); }

// Credential Guard if (!dg.SecurityServicesRunning.includes(1)) { result.reasons.push('Credential Guard is not running'); }

// Required floor properties (e.g. Secure Boot, DMA protection, SMM mitigation) const requiredFloor = [1, 2, 3]; // service codes per Win32_DeviceGuard for (const r of requiredFloor) { if (!dg.AvailableSecurityProperties.includes(r)) { result.reasons.push('Missing required security property: ' + r); } }

result.ok = result.reasons.length === 0; return result; }

const example = { VirtualizationBasedSecurityStatus: 2, SecurityServicesRunning: [1, 2, 3], AvailableSecurityProperties: [1, 2, 3, 4, 5], }; console.log(JSON.stringify(checkVbsHealth(example), null, 2)); // -> { ok: true, reasons: [] } `}

Note: Three commands, in order: msinfo32 for the human-readable summary; Get-CimInstance -ClassName Win32_DeviceGuard | Format-List * for the structured detail; bcdedit /enum {current} to confirm hypervisorlaunchtype Auto and the absence of DISABLE_VBS_* load options. If all three agree that VBS, HVCI, and Credential Guard are running, you are in the configuration this article describes.

12.2 Operational pitfalls

Two operational realities are worth flagging. First, HVCI has a driver block list and will refuse to enable Memory Integrity if any incompatible driver is installed; the usual offenders are older anti-cheat drivers, third-party virtualization clients (VMware Workstation pre-2021, VirtualBox pre-6.1), and certain disk-encryption or storage-filter drivers. Microsoft maintains a public block list; the Memory Integrity UI in Windows Security will report the specific blocking driver. Second, nested virtualization is supported for Hyper-V guests on Windows 10/11 client and Server 2016+, and is required by some development workflows (WSL2 with nested containers, certain Visual Studio device emulators). Nested virtualization changes the threat model -- the L0 hypervisor still owns the box, but the L1 guest now runs its own hypervisor with its own VTL split -- so a compromised L1 guest with VBS enabled still does not give an L1 attacker a path to the L0 host.

12.3 The substrate cross-reference

This article is the substrate of the Windows security series at paragmali.com. The siblings build on what is here:

Secure Boot in Windows -- the static-RTM half of the boot trust chain that hands off to the hypervisor.
VBS Trustlets: What Actually Runs in the Secure Kernel -- the VTL1 internals that the hypervisor's secure-call ABI delivers requests to.
NTLMless: The Death of NTLM in Windows -- the Credential Guard story from inside LSAISO.
Adminless: Administrator Protection in Windows -- the user-mode admin trust model that the kernel-mode VBS boundary makes possible.
Can This Code Do This? Windows Access Control -- the access-control surface that VBS supplements but does not replace.

12.4 Frequently asked questions

The 10-30 percent number is folklore from the pre-SLAT era or from systems running HVCI-incompatible drivers in compatibility mode. For typical workloads on modern hardware (post-2018 CPUs with VT-x or AMD-V and SLAT), the measured overhead of VBS plus HVCI plus Credential Guard sits in the low single digits. Gaming and high-throughput I/O workloads can show larger gaps, especially on systems where the BIOS forces nested virtualization off or where IOMMU is disabled. The trade-off for that overhead is the security-boundary set described in this article. No. VBS is a Virtual Trust Level split *inside* the root partition. There are no extra VMs. The normal Windows install is VTL0; the Secure Kernel plus its trustlets is VTL1. Both VTLs live in the same partition, share the same physical CPU, and are scheduled by the hypervisor as separate VTL contexts -- not as separate VMs. A Hyper-V guest VM, by contrast, is a child partition entirely separate from the root partition. The two architectures share a hypervisor binary but use different parts of it. No. SYSTEM is a high VTL0 user-mode token; the hypervisor sits architecturally above all of Ring 0, which is where SYSTEM-loaded kernel drivers ultimately run. The point of the entire article is that "SYSTEM owns the box" is wrong on a VBS-enabled Windows install. SYSTEM is the most privileged Windows identity; the hypervisor is the most privileged *software*, and the two are not the same thing. No. Secure Boot prevents loading an *unsigned* `hvix64.exe`. It does not prevent loading an older, signed-but-vulnerable `hvix64.exe` that has not been added to the UEFI revocation list. That gap is what CVE-2024-21302 (Windows Downdate) exploited, and the mitigation is mandatory-update servicing combined with prompt revocation-list (`dbx`) hygiene [@nvd-cve-2024-21302]. No. seL4 is formally verified at approximately ten thousand lines of code with a roughly twenty-five-person-year proof effort. The Microsoft hypervisor is unverified at an estimated one to two hundred thousand lines of code. The hypervisor's security argument is operational -- a small TCB, heavy continuous fuzzing, a standing \$5K-\$250K bounty, public servicing criteria, an unflinching public CVE record -- rather than mathematical [@sel4-whitepaper, @ms-msrc-bounty-hyperv]. Yes, in terms of binary identity, servicing criteria, and bounty eligibility. The Microsoft hypervisor that boots on a Windows 11 client laptop and the one that boots on an Azure host server are derived from the same codebase, ship with the same servicing commitments, and qualify for the same Hyper-V bounty. The threat model differs -- Azure adds multi-tenant guest-to-guest isolation, hardware confidential-VM extensions, and a different management surface -- but the substrate is shared.

12.5 Closing

The reason SYSTEM on a Windows 11 box cannot read LSASS, load an unsigned driver, or patch ntoskrnl.exe is now fully accounted for. An hvix64.exe or hvax64.exe loaded by hvloader.efi before winload.exe ever ran. A VTL split inside the root partition, made possible by Hepkin and Kishan's 2013 patent and shipped with Windows 10 RTM in 2015. Per-VTL SLAT enforcement that the NT kernel architecturally cannot touch, because the SLAT tables live in pages the hypervisor never maps into a VTL0 view. A Microsoft-published security boundary and a $5,000-$250,000 bounty calibrating the boundary's value, both of which are unique in the industry at this writing. A public CVE record of six worked examples across three narrow classes that the boundary has had to pay out on since 2018. And a residual attack surface -- firmware below, side channels above, IOMMU bypass beside, hypervisor rollback through the update pipeline -- that the substrate cannot, by construction, eliminate.

The hypervisor is what every other article in this series sits on. Now you have the substrate in hand. The Secure Kernel article reads differently when you have walked the per-VTL SLAT yourself. The Credential Guard article reads differently when you know that LSAISO is invoked through a hypercall-mediated secure call. The Secure Boot article reads differently when you know that the hypervisor's DRTM measurement re-establishes the trust root after firmware. The Adminless article reads differently when you know that the privilege ceiling on Windows 11 is not Ring 0 but a hardware boundary above it.

Above Ring Zero is not a metaphor. It is an instruction-set state. The Windows hypervisor lives there, owns the page tables that say what the OS can see, and is the architectural reason "SYSTEM-on-Windows-11" cannot do things SYSTEM used to be allowed to do.

VBS Trustlets: What Actually Runs in the Secure Kernel

noreply@paragmali.com (Parag Mali) — Sun, 10 May 2026 00:00:00 GMT

**Trustlets are the user-mode processes Microsoft places in Virtual Trust Level 1** to hold the secrets a SYSTEM-privilege attacker on the Windows kernel must never reach: NTLM hashes, Kerberos tickets, biometric templates, virtual TPM keys, and (in 2025-2026) just-in-time admin tokens. A binary becomes a trustlet by passing five gates at load time: a process attribute, two specific signing EKUs at Signature Level 12, a `.tpolicy` PE section containing `s_IumPolicyMetadata`, a Trustlet Instance GUID, and a stripped-down loader path. Once loaded, the trustlet talks to the rest of Windows over ALPC, services an agent process in VTL0, and uses only 48 of NT's roughly 480 syscalls. The Hyper-V hypervisor refuses to map its pages into VTL0. That is what "isolated" means.

1. Four Locked Rooms

It is 3:14 a.m. and a red-team operator on a fully patched Windows 11 25H2 box has, after eight hours of careful work, achieved the prize: a SYSTEM-privilege write primitive in the NT kernel. For two decades that has been the moment when the engagement ends and the report writes itself. SYSTEM in the kernel meant every process, every page, every secret. Game over.

It is not game over.

The operator's target list has four items on it. The NTLM hashes and Kerberos Ticket-Granting Tickets sitting in lsass.exe. The user's fingerprint template, in whatever process the Windows Hello biometric pipeline puts it. The just-in-time admin token that Administrator Protection issued thirty seconds ago. The keys of the four Hyper-V virtual machines running on the box, including the one hosting the user's corporate VPN. Four secrets. Four user-mode processes. And on this 2026 machine, four locked rooms whose pages the operator's kernel write primitive cannot touch and whose contents the operator's kernel does not have permission to ask.

Those four processes are trustlets. They run in a different kernel from the one the operator just compromised, on a different virtual trust level enforced by a hypervisor running underneath both. The operator owns the NT kernel; the NT kernel does not own them. That sentence is what changed in 2015, and the rest of this piece is what it actually means.

This is not "Microsoft hid the memory better." It is not obfuscation, not a clever access-control rule, not a kernel mitigation that the next CVE will erase. It is an architectural relocation: the user-mode processes that hold the secrets no longer live in the operating system the attacker compromised. The hypervisor refuses to map their pages into Virtual Trust Level 0 ("VTL0"), and the operator's kernel is in VTL0.

Key idea: Four user-mode processes survive a SYSTEM kernel write primitive on a 2026 Windows 11 box. That is what changed in 2015, and trustlets are the reason.

The promise of this piece is to explain trustlets at the level of "what does LsaIso.exe actually do, how is it built, how does it talk to the rest of the system, and where does the model end." Not at the level of "VBS isolates them." By the end, four locked rooms will have become something you can name, list, audit, and reason about. Where the public record runs out (some trustlet binary names and IDs are not on Microsoft's published list as of mid-2026), the piece will say so, and it will tell you what the actual records look like instead of inventing replacements.

So how does a user-mode process become unreachable from SYSTEM-in-the-NT-kernel? The answer is not new. It begins, like much of operating-system security, at MIT in the early 1970s.

2. The User-Mode-In-A-Higher-Privilege Problem

In March 1972 Michael Schroeder and Jerome Saltzer published a paper in the Communications of the ACM describing an unusual machine. The Multics team at MIT had been wrestling with a question that does not, at first glance, sound like a security question. What should happen when a user program calls a password-checking routine that needs to read the system password file? The user program must not be allowed to read that file directly. The routine must be allowed to read it. The two pieces of code run in the same process. How does the machine know which one is asking?

Schroeder and Saltzer's answer was eight hardware-enforced rings of privilege, with each segment in memory carrying a ring bracket in its descriptor word, and with cross-ring calls validated automatically by the hardware [@multicians-protection] [@multicians-papers]. The hardware that shipped this design was the Honeywell 6180 in 1973 [@wiki-protection-ring]. The pattern matters more than the gear. Some user code needed to run with more privilege than its caller and less privilege than the kernel. Multics arranged eight such layers from user code at the outermost ring down to the supervisor at ring 0 [@wiki-multics].

The set of hardware, firmware, and software whose correct operation is necessary to enforce a security policy. If any component of the TCB can be subverted, the policy can be subverted. The smaller the TCB, the easier it is to audit; the larger it is, the more places an attacker can find a foothold.

A few years later at Carnegie Mellon, William Wulf, Roy Levin, and the Hydra team took a different swing at the same problem. Hydra was a capability-based, object-oriented microkernel that ran on the C.mmp multiprocessor between 1971 and 1975 [@wiki-hydra]. Where Multics multiplied rings, Hydra multiplied vocabulary: every protected resource was an object addressable only through capability tokens, and security-critical subsystems lived not inside the kernel but as user-mode capability-holders trusted by the kernel to enforce their own policy. Levin et al.'s 1975 SOSP paper "Policy/Mechanism Separation in HYDRA" gave the design its slogan, and that slogan has outlived the system that produced it [@levy-capabook].Hydra's "policy versus mechanism" phrasing still appears verbatim in modern object-capability literature, in the design discussion of WebAssembly's component model, and in seL4's published rationale.

For two decades the L4 family answered "but is this fast enough to be practical?" Jochen Liedtke's 1993 prototype, hand-coded in i386 assembly, ran inter-process communication twenty times faster than Carnegie's Mach microkernel [@wiki-l4]. His 1995 SOSP paper "On µ-Kernel Construction" was inducted into the ACM SIGOPS Hall of Fame in 2015 and is the foundational statement of the minimal-kernel, maximal-user-mode-trusted-services design. By 2010, OKL4, a commercial L4 derivative, had shipped in over one billion mobile devices [@wiki-l4].

A kernel design that pushes as much functionality as possible out of kernel mode and into user-mode "servers" that communicate via inter-process calls. Filesystem code, networking stacks, even device drivers can run as user-mode processes. The kernel itself shrinks to a few thousand lines of code that schedule processes, route messages, and enforce memory isolation, and nothing else.

In 2009 the lineage reached an end that nobody had reached before. Gerwin Klein, Kevin Elphinstone, Gernot Heiser and the NICTA team published seL4: Formal Verification of an OS Kernel at SOSP, reporting a machine-checked proof of functional correctness from a formal specification down to the C implementation [@sel4-sosp-paper]. seL4 was open-sourced in July 2014 [@wiki-sel4]; the seL4 Foundation's About page states plainly that seL4 stands out because of its thoroughgoing formal verification [@sel4-about]. A kernel of about 8,700 lines of C, formally verified from specification to C implementation, with sub-microsecond inter-process calls.

Schroeder and Saltzer asked it for hardware rings. Hydra asked it for capabilities. Liedtke asked it for inter-process speed. Klein and Heiser asked it of formal logic. The question stayed the same: how do you let some user-mode code hold a secret that some other code in the same machine is not allowed to read, when both pieces of code are scheduled by the same kernel? The Multics answer was rings. The Hydra answer was capabilities. The L4 answer was a tiny kernel plus IPC. The seL4 answer was a tiny kernel plus IPC, plus a proof.

The Microsoft answer, in July 2015, was a hypervisor.

timeline title User-mode-in-higher-privilege lineage 1972 : Multics 8-ring hardware : Honeywell 6180 ring brackets 1974 : Hydra capabilities 1975 : Policy vs mechanism 1993 : L4 microkernel : Fast user-mode IPC : Windows NT ships ring 0/3 2007 : Vista Protected Processes 2009 : seL4 verification 2013 : Windows 8.1 PPL 2015 : Windows 10 IUM ships : Trustlets 0-3 enumerated 2024 : VBS Enclaves go third-party 2026 : Administrator Protection

If the architectural answer was already in the 1970s academic literature, why did Microsoft wait until 2015 to ship it on Windows? Because three earlier attempts to ship user-mode isolation on Windows -- under three different names, in three different decades -- each failed in the same way.

3. Three Tries Before Trustlets

Before 2015 Microsoft tried three times to ship user-mode isolation on Windows. All three shipped in production. All three failed in the same way.

2007: Vista Protected Processes

Windows Vista introduced Protected Processes in January 2007. The motivation was not credential security; it was Digital Rights Management. The Protected Media Path required a set of binaries -- audiodg.exe, mfpmp.exe, and a handful of others involved in Blu-ray playback -- whose memory non-protected processes could not read, whose threads could not be debugged from outside, and whose DLL imports could not be hijacked at runtime [@wiki-pmp]. The kernel enforced these rules by refusing to grant the relevant access masks (PROCESS_VM_READ, PROCESS_VM_WRITE, THREAD_ALL_ACCESS) to handles requested from non-protected processes.

The mechanism was elegant. The threat model was not. Alex Ionescu announced in January 2007 -- within weeks of Vista's general availability -- that he had developed a bypass method for the Protected Media Path [@wiki-pmp]. The same NT kernel that enforced the protection was the kernel an attacker would compromise to bypass it. A signed kernel driver, or any of the long stream of subsequent kernel vulnerabilities, would walk straight through.

2012: AppContainer and the LowBox token

Windows 8 introduced AppContainer process isolation in October 2012, originally to support Windows Store apps (later unified as the Universal Windows Platform in Windows 10) [@wiki-uwp]. Each AppContainer process ran with a LowBox token: a low-integrity primary token plus a SID, plus a set of named capabilities (internetClient, picturesLibrary, and so on), plus a per-AppContainer named-object subtree under \Sessions\<N>\AppContainerNamedObjects\<SID>. The NT kernel checked the SID against object DACLs at every object access, denying access by default and granting it only where the AppContainer's declared capabilities matched the requested operation.

This is a Hydra-style capability lattice bolted onto NT's existing access-control system. It is a useful sandboxing primitive for untrusted code, and modern browsers (the Edge renderer, the Chromium sandbox) consume it for exactly that purpose. It is not a defence against an attacker who already has kernel code execution. In August 2018 James Forshaw at Google Project Zero published an exploit for Issue 1550 that turned the AppContainer named-object namespace itself into an arbitrary-directory-creation primitive [@forshaw-2018]:

The AppInfo service... calls the undocumented API CreateAppContainerToken... As the API is called without impersonating the user... the object directories are created with the identity of the service, which is SYSTEM.

A low-integrity caller could direct that SYSTEM-owned creation at any directory it pleased and use the result to elevate. The lattice held; the lattice's enforcer did not. AppContainers continue to ship, doing their actual job (sandboxing untrusted code) reasonably well. They were never going to answer the trustlet question (isolating trusted code from a compromised kernel) because they are NT-kernel-enforced.

2013: Protected Process Light (PPL) and `RunAsPPL`

Windows 8.1 generalised the Vista mechanism into a signer-level lattice. Each protected process now had a two-dimensional protection level: a signer (WinTcb, Windows, Antimalware, Authenticode, others) and a protection type (PsProtectedSignerTcb, PsProtectedSignerAuthenticode, others). Higher-signer processes could manipulate lower-signer ones; same-signer processes could not see across the line. The first canonical use case was anti-malware services that registered an Early Launch Anti-Malware (ELAM) driver and then ran their user-mode service as a Protected Process Light [@msdocs-protecting-am].

A Windows 8.1 process attribute that constrains which other processes can request high-privilege access to it. PPL extends the Vista Protected Process mechanism with a signer-level lattice (WinTcb > Windows > Antimalware > Authenticode > None) and a protection type. The NT kernel enforces the rules. LSASS running as a PPL is the canonical use case, exposed to administrators via the `RunAsPPL` registry value [@itm4n-runasppl].

Alex Ionescu's 2013 essay "The Evolution of Protected Processes Part 3" documented the resulting Signing Levels table -- Signature Level 12 named "Windows," Level 13 "Windows Protected Process Light," Level 14 "Windows TCB" [@ionescu-ppp3] [@ionescu-ppp1]. That table is the load-bearing reference for every later trustlet design: every IUM binary on a 2026 Windows machine must satisfy at least Signature Level 12. Microsoft shipped LSASS-as-PPL ("LSA Protection," exposed through the RunAsPPL registry value under HKLM\SYSTEM\CurrentControlSet\Control\Lsa) as the canonical example: a way to keep the lower-privileged half of an administrator's session from reading credential material out of LSASS memory.

It worked, for some values of "worked." It worked against pass-the-hash tools that ran as an ordinary administrator without a signed kernel driver. It did not work against an attacker willing to load any signed driver, and -- as became clear in 2021 -- it did not work even from userland once the bypass class was identified.

In August 2018 James Forshaw, in the same Project Zero post that exposed the AppContainer issue, also documented a DefineDosDevice plus Known-DLL hijack technique. By creating a symbolic link in the NT object manager namespace that aliased a Known DLL section, an administrative caller could induce a target PPL process to load arbitrary code at the next image load [@forshaw-2018]. In 2021 the researcher who blogs as itm4n weaponised the same primitive into PPLdump, a userland tool that dumped lsass.exe memory from an administrator command prompt with no kernel driver involved [@itm4n-runasppl]. itm4n's writeup is honest about what this means:

Like any other protection though, it is not bulletproof and it is not sufficient on its own, but it is still particularly efficient.

Microsoft closed the DefineDosDevice corner of this class in Windows 10 21H2 build 19044.1826, shipped in July 2022 [@itm4n-end-of-ppldump]. That is eight years of mainstream PPL deployment during which the LSASS-as-PPL credential boundary was bypassable without ring 0 access at all.

The pattern

Three primitives. Three different protection mechanisms. One common failure mode.

Mechanism	Year	Enforcer	Threat model	Defeated by	Status today
Vista Protected Process	2007	NT kernel	Untrusted user code reading DRM-protected media buffers	Signed kernel drivers; Ionescu Jan 2007 [@wiki-pmp]	Superseded by PPL for non-DRM use
AppContainer / LowBox	2012	NT kernel	Untrusted store-app code escaping its capability sandbox	SYSTEM-owned directory creation via service impersonation [@forshaw-2018]	Active for sandboxing untrusted code; not a trustlet substitute
Protected Process Light (`RunAsPPL`)	2013	NT kernel	Userland administrative attacker reading LSASS credential material	`DefineDosDevice` plus Known-DLL hijack; PPLdump 2021 [@itm4n-runasppl]	Active as defence-in-depth; closed in build 19044.1826, July 2022
Isolated User Mode / trustlets	2015	Hypervisor + Secure Kernel	VTL0 kernel attacker reading user-mode secrets	Secure-call interface bugs; agent-side RPC residual [@amar-bh2020]	Active; subject of this article

Three rows, one diagnosis. Every NT-kernel-enforced isolation primitive shares the attacker's TCB. Improving the lattice the NT kernel enforces does not move the security ceiling, because the NT kernel itself can be compromised; once it is, any policy decision the NT kernel makes is the attacker's policy decision. Microsoft's own VBS hardware-requirements page admits the diagnosis verbatim:

VBS uses hardware virtualization and the Windows hypervisor to create an isolated virtual environment that becomes the root of trust of the OS that assumes the kernel can be compromised. -- Microsoft, OEM VBS hardware requirements [@msdocs-oem-vbs]

Note: RunAsPPL is useful defence in depth. It is not, and has never been, a substitute for Credential Guard. itm4n's 2021 PPLdump release was the proof for the userland half of that statement; signed-driver loaders are the proof for the ring-zero half. If your threat model includes a determined attacker with administrative rights, Credential Guard is the boundary; PPL is the speed bump in front of it [@itm4n-runasppl].

If every primitive the NT kernel enforces shares the attacker's TCB, the kernel that enforces user-mode isolation has to be a different kernel. In July 2015 Microsoft shipped one.

4. July 2015: The Hypervisor Becomes the Arbiter

On 29 July 2015 Microsoft shipped Windows 10 build 10240 [@wiki-win10-history]. Two new ideas shipped with it. The first was Hyper-V's hypervisor running underneath the NT kernel even on a laptop, not just on a server hosting virtual machines [@wiki-hyperv]. The second was a separate kernel running alongside the NT kernel, at a different Virtual Trust Level. Together those two ideas produce a substrate where the long-time equation "SYSTEM kernel write primitive equals every secret in user-mode memory" is no longer true.

A hypervisor-managed privilege axis added on top of x86's existing ring 0 / ring 3 split. Each VTL has its own kernel mode and its own user mode. Higher VTLs can read and write lower-VTL memory; lower VTLs cannot read or write higher-VTL memory at all. The Hyper-V Top-Level Functional Specification reserves up to 16 VTLs; the current Hyper-V implementation defines `#define HV_NUM_VTLS 2` [@msdocs-vsm].

The Hyper-V Top-Level Functional Specification states the rule directly: "VSM achieves and maintains isolation through Virtual Trust Levels (VTLs)... Architecturally, up to 16 levels of VTLs are supported; however a hypervisor may choose to implement fewer than 16 VTL's. Currently, only two VTLs are implemented" [@msdocs-vsm]. The NT kernel runs in VTL0 ring 0; user-mode applications run in VTL0 ring 3. The Secure Kernel runs in VTL1 ring 0; trustlets run in VTL1 ring 3. Each VTL transition takes the CPU through a VMEXIT and back, with VMCS save and restore on each crossing [@quarkslab-virtual-journey].The architectural cap of sixteen VTLs is in the published specification but is not deployed. Stocking the unused slots would require both hypervisor changes and a new design for who manages the additional kernel images. The two-VTL design is the entire shipped product.

Quarkslab's reverse-engineering team put the practical consequence in one sentence in their IUM-debugging writeup: "VTL0 is the Normal World, where the traditional kernel-mode and user-mode code run in ring 0 and ring 3, respectively. On top of that, a new world appears: VTL1 is the privileged Secure World, where the Secure Kernel runs in ring 0, and a limited number of IUM processes run in ring 3. Code running in VTL0, even in ring 0, cannot access the higher-privileged VTL1" [@quarkslab-debug-ium].

That sentence is the architectural fact the whole article rests on. The hypervisor configures each guest physical page's permissions on a per-VTL basis using the CPU's Second Level Address Translation tables. A page can be readable from VTL0 and VTL1, readable from VTL1 only, or readable from neither.On Intel hardware, the per-VTL permissions are implemented with Extended Page Tables (EPT); on AMD they use Nested Page Tables (NPT). The hypervisor keeps the per-VTL EPT/NPT entries in its own memory, not in the guest's.

The hardware mechanism (Intel EPT, AMD NPT) that lets a hypervisor define page-level read, write, and execute permissions independent of the guest's own page tables. With VTLs, SLAT entries are per-VTL: a page's permissions when the CPU is executing VTL1 code can differ from the same page's permissions when the CPU is executing VTL0 code. A SYSTEM-privilege VTL0 attacker who edits the NT kernel's page tables cannot change the VTL1-side permissions, because those live in hypervisor-managed structures that VTL0 page-table writes do not touch. flowchart LR subgraph VTL0["VTL0 (Normal World)"] ring3_0["Ring 3: lsass.exe, vmwp.exe, user apps"] ring0_0["Ring 0: NT kernel + signed drivers"] ring3_0 --> ring0_0 end subgraph VTL1["VTL1 (Secure World)"] ring3_1["Ring 3: LsaIso.exe, vmsp.exe, trustlets"] ring0_1["Ring 0: Secure Kernel (securekernel.exe)"] ring3_1 --> ring0_1 end VTL0 -. ALPC over agent ALPC port .-> VTL1 VTL1 -. read VTL0 memory .-> VTL0 hv["Hyper-V hypervisor: per-VTL SLAT permissions"] VTL0 --> hv VTL1 --> hv

The VTL hierarchy is not symmetric. VTL1 code can read VTL0 memory; that is how a trustlet can dispatch the contents of an lsass.exe RPC request the moment after VTL0 wrote it. VTL0 code cannot read VTL1 memory under any condition the hypervisor permits. A kernel write primitive in VTL0 lets the attacker corrupt the NT kernel's data structures, modify drivers, and walk every VTL0 process's pages. The attacker can do every one of those things and not be one byte closer to the contents of LsaIso.exe.

Microsoft's IUM documentation at Windows 10 RTM named two trustlets explicitly: Trustlet ID 0 = the Secure Kernel Process (hosts Device Guard and Hypervisor-protected Code Integrity policy decisions), and Trustlet ID 1 = LSAISO.EXE (Credential Guard's isolated LSA, holding NTLM hashes and Kerberos Ticket-Granting Tickets out of VTL0 reach). Two more (IDs 2 and 3, covered in §6) also shipped on the RTM image and were enumerated a week later by Ionescu's Black Hat reverse-engineering [@msdocs-ium] [@ionescu-bh2015]. Microsoft Learn's IUM page introduces the vocabulary the rest of this piece will use:

Trustlets (also known as trusted processes, secure processes, or IUM processes) are programs running as IUM processes in VSM... With VSM enabled, the Local Security Authority (LSASS) environment runs as a trustlet.

A week after Windows 10 shipped, on 5 August 2015, Alex Ionescu walked into a Black Hat USA briefing room in Mandalay Bay and reverse-engineered the entire thing in front of an audience [@ionescu-bh2015-infocondb]. His talk, "Battle of the SKM and IUM: How Windows 10 Rewrites OS Architecture," is the canonical first public account of the trustlet model and the source from which Microsoft's own later documentation borrows terminology one for one [@ionescu-bh2015]. Almost every concrete fact in the next section -- the syscall allow-list, the EKUs, the .tpolicy section, the Trustlet Instance GUID -- traces back to that single deck.

Now we know what world a trustlet lives in. What architecturally is one?

5. The Five Gates

A trustlet is not a special process class the way a Protected Process is. It is an ordinary Portable Executable binary that has been loaded under five very specific conditions. Walk through them once and you will be able to recognise a trustlet in a dumpbin /headers listing. The status is mechanical, not categorical. Chapter 9 of Windows Internals, Seventh Edition, Part 2 (Allievi, Russinovich, Ionescu, Solomon) covers the same architecture from the kernel-team side as a reference complement to Ionescu's BH2015 reverse-engineering [@windows-internals-7e-pt2].

A Windows user-mode process that runs in Virtual Trust Level 1 user mode (ring 3 of the Secure World), scheduled by the Secure Kernel and isolated from VTL0 by Hyper-V's per-VTL SLAT enforcement. A binary becomes a trustlet only if it satisfies five load-time conditions: a process attribute, two signing EKUs at Signature Level 12, a `.tpolicy` PE section containing `s_IumPolicyMetadata`, a Trustlet Instance GUID, and a stripped-down loader path. Trustlets are sometimes also called "trusted processes," "secure processes," or "IUM processes" [@msdocs-ium]. The user-mode environment of Virtual Trust Level 1. IUM is, structurally, ring 3 of VTL1. Its inhabitants are trustlets; its kernel is the Secure Kernel; its system-call surface is approximately one-tenth of NT's. Quarkslab's IUM-debugging writeup describes IUM as the place where *"a limited number of IUM processes run in ring 3"* of VTL1; Microsoft's Win32 documentation describes the same architectural placement with different wording [@quarkslab-debug-ium] [@msdocs-ium].

Gate 1: the process attribute

VTL0 user-mode code cannot call CreateProcess and produce a trustlet. The Win32 API does not expose the necessary primitive. A trustlet is born via a direct NtCreateUserProcess syscall that carries a PsAttributeSecureProcess attribute with a 64-bit Trustlet ID. Only callers that already live in VTL1, or callers in VTL0 that hold a specific brokering capability, can request that attribute and have the Secure Kernel honour it [@ionescu-bh2015].

This is intentional. The Win32 layering is one of the surfaces an attacker can compromise, so the trustlet boot path bypasses it. There is no "trustlet via shell" -- not for an administrator, not for SYSTEM, not for the Secure Kernel itself other than through the documented internal path.

Gate 2: two EKUs at Signature Level 12

The binary must be signed with a certificate chain that contains two specific Enhanced Key Usage identifiers, and the resulting Signing Level must be 12 or higher. From Ionescu's BH2015 deck (correcting a typo in the slide): "They must have a Signature Level of 12... This means they must have the Windows System Component Verification EKU (1.3.6.1.4.1.311.10.3.6)... They must have the IUM EKU 1.3.6.1.4.1.311.10.3.37" [@ionescu-bh2015].

An X.509 certificate extension that restricts which purposes a certificate can be used for. An EKU is an object identifier (OID); a code-signing certificate that claims an OID of `1.3.6.1.4.1.311.10.3.6` is asserting it is valid for the "Windows System Component Verification" purpose. The Windows code-integrity subsystem (`ci.dll`) checks the requested EKU against the actual certificate at signature time and refuses to load the image if the EKU is missing or the certificate is not chained to a trusted root [@ionescu-ppp3].

Both EKUs are required. The Windows System Component Verification EKU establishes the binary as a Microsoft-signed Windows component. The IUM EKU asserts the binary's intent to load as a trustlet. A PPL EKU may sit on top, layering the PPL signer-level check on the trustlet check, but the two-EKU minimum is what Signing Level 12 enforces.The system-component EKU check is skipped when both Test Signing is enabled and the local machine trusts the Microsoft Test Root. That is the exact attack class Ionescu names verbatim in the BH2015 deck: "compromise the platform via Test Signing" disables the signing gate that defines trustlet identity.

Gate 3: the `.tpolicy` section and `s_IumPolicyMetadata`

Every trustlet image must contain a PE section named .tpolicy marked IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ. The section must export the symbol s_IumPolicyMetadata, a structure with three required components: a version byte set to 1, a 64-bit Trustlet ID that must match the one the process attribute requested, and a per-trustlet policy table containing entries for ETW (event tracing), debug permissions, crash-dump key release, and other trustlet-specific runtime knobs [@ionescu-bh2015].

The Secure Kernel parses this section at load time via an internal routine the deck names SkpspFindPolicy. A binary with no .tpolicy section, or with one whose Trustlet ID disagrees with the process-attribute Trustlet ID, or whose version byte is anything other than 1, fails the gate. The Secure Kernel does not "infer" a trustlet identity; it reads it out of the binary the attacker would have had to sign.

Gate 4: the Trustlet Instance GUID

Once gates 1-3 pass, the trustlet calls a secure-service routine the deck names IumSetTrustletInstance, identified by secure-call ordinal 0x80000001. That routine binds the running process to a Trustlet Instance GUID, the runtime identity by which the Secure Kernel discriminates one instance of a trustlet from another. Hyper-V partition GUIDs flow into this identifier for the vTPM trustlets, so that the secrets a partition's vTPM holds are scoped to that partition's Instance GUID.

The same Instance GUID can be shared across distinct Trustlet IDs. That is the architectural primitive Microsoft uses for trustlet-to-trustlet authentication: the host-side Hyper-V vTPM (vmsp.exe, Trustlet ID 2) and the vTPM provisioning trustlet (ID 3) cooperate on a single partition's secrets by sharing the partition's Instance GUID. The Secure Kernel's SkCapabilities table hardcodes which Trustlet IDs are permitted to invoke which secure-storage operations against an Instance GUID; for the 2015-era IUM surface, the only ID-discriminated rules are CheckByTrustletId 2 for SecureStorageGet and CheckByTrustletId 3 for SecureStorageSet [@ionescu-bh2015].

Gate 5: the stripped-down loader

A trustlet's image loader is not the standard NT loader. The Secure Kernel routes trustlet loads through a path the deck names LdrpIsSecureProcess, which skips an unusually long list of features. Application Verifier hooks: skipped. Image File Execution Options registry checks: skipped. SxS / Fusion DLL redirection: skipped. The CSRSS connection ordinary NT processes establish during startup: skipped (the BASE_STATIC_SERVER_DATA structure CSRSS would normally hand back is fabricated locally on the trustlet's heap so dependent calls do not crash). Safer, AuthZ, Software Restriction Policies: all skipped. Any DLL load triggered from VTL0: refused.

The result is a loader path with no attack surface against VTL0 environment variables, no susceptibility to NT's normal "load this DLL instead" knobs, and no opportunity for the user's CSRSS process to inject anything into the trustlet's address space. The system-call surface available inside the trustlet is restricted to roughly fifty allowed entries. Ionescu's deck states the count verbatim: "Only 48 system calls are currently allowed from IUM Trustlets" [@ionescu-bh2015].

sequenceDiagram participant Caller as Caller (VTL1 or brokered VTL0) participant NT as NtCreateUserProcess participant CI as ci.dll (CipMincryptToSigningLevel) participant SK as Secure Kernel (SkpspFindPolicy) participant Ldr as LdrpIsSecureProcess participant Iset as IumSetTrustletInstance Caller->>NT: Create with PsAttributeSecureProcess + Trustlet ID NT->>CI: Verify EKUs System Component plus IUM and Signing Level ge 12 CI-->>NT: Pass or fail NT->>SK: Parse .tpolicy, validate s_IumPolicyMetadata SK-->>NT: Pass or fail NT->>Ldr: Strip down loader and deny VTL0-triggered DLL loads Ldr-->>NT: Image mapped under IUM rules NT->>Iset: Bind Trustlet Instance GUID Iset-->>NT: Trustlet alive in VTL1

Gate	What it checks	Where it lives	Failure outcome
1. Process attribute	`PsAttributeSecureProcess` with 64-bit Trustlet ID, requested via `NtCreateUserProcess`	NT kernel boot path	Normal NT process; no IUM bit ever set [@ionescu-bh2015]
2. EKUs + Signing Level	Windows System Component EKU (`1.3.6.1.4.1.311.10.3.6`) AND IUM EKU (`1.3.6.1.4.1.311.10.3.37`); Signing Level >= 12	`ci.dll` integrity check, `CipMincryptToSigningLevel`	Load refused; no trustlet [@ionescu-ppp3] [@ionescu-bh2015]
3. `.tpolicy` + `s_IumPolicyMetadata`	PE section with version 1, matching Trustlet ID, and per-trustlet policy entries	Secure Kernel `SkpspFindPolicy`	Load refused; no trustlet [@ionescu-bh2015]
4. Trustlet Instance GUID	`IumSetTrustletInstance` secure-call ordinal `0x80000001`; per-partition scoping for vTPM	Secure Kernel runtime	Process exists but cannot bind to per-instance secret storage
5. Loader strip-down	Skip Application Verifier, IFEO, SxS, CSRSS, Safer, AuthZ, SRP; deny VTL0-triggered DLL loads	NT `LdrpIsSecureProcess`	Normal NT loader runs; image loads but is not isolated

The pseudocode below walks each gate in order against a fake binary descriptor. It is not a loader, it is not an exploit, and it is not a security tool. It is a teaching aid: if you can read it, you can read the trustlet load path.

{` // Trustlet load-time gate check (educational pseudocode). // Inspired by Ionescu BH2015 reverse-engineering of Win10 RTM (2015). // Not a real loader; not a security tool.

const WINDOWS_SYSTEM_COMPONENT_EKU = "1.3.6.1.4.1.311.10.3.6"; const IUM_EKU = "1.3.6.1.4.1.311.10.3.37"; const MIN_SIGNING_LEVEL = 12; // "Windows"

function loadTrustlet(bin) { // Gate 1: process attribute if (!bin.attr || !bin.attr.PsAttributeSecureProcess) { return "fail at gate 1: no PsAttributeSecureProcess attribute"; } const requestedId = bin.attr.PsAttributeSecureProcess.trustletId;

// Gate 2: two EKUs at Signing Level 12+ const ekus = (bin.cert && bin.cert.ekus) || []; if (!ekus.includes(WINDOWS_SYSTEM_COMPONENT_EKU)) { return "fail at gate 2: missing Windows System Component EKU"; } if (!ekus.includes(IUM_EKU)) { return "fail at gate 2: missing IUM EKU"; } if ((bin.cert.signingLevel || 0) < MIN_SIGNING_LEVEL) { return "fail at gate 2: signing level below 12"; }

// Gate 3: .tpolicy section with s_IumPolicyMetadata const tpol = bin.sections && bin.sections[".tpolicy"]; if (!tpol || !tpol.exports || !tpol.exports.s_IumPolicyMetadata) { return "fail at gate 3: no .tpolicy section with s_IumPolicyMetadata"; } const meta = tpol.exports.s_IumPolicyMetadata; if (meta.version !== 1 || meta.trustletId !== requestedId) { return "fail at gate 3: malformed or mismatched s_IumPolicyMetadata"; }

// Gate 4: Trustlet Instance GUID (bound at runtime via IumSetTrustletInstance) const instance = bin.runtime && bin.runtime.instanceGuid; if (!instance) { return "fail at gate 4: no Trustlet Instance GUID bound"; }

// Gate 5: stripped-down loader (skip Application Verifier, IFEO, SxS, CSRSS, // Safer, AuthZ, SRP; deny VTL0-triggered DLL loads). // We don't simulate the loader here; we just refuse VTL0-injected DLL loads. if (bin.loaderTriggers && bin.loaderTriggers.fromVtl0) { return "fail at gate 5: VTL0-triggered DLL load denied"; }

return "trustlet loaded: id=" + requestedId + " instance=" + instance; }

// Smoke test. const sample = { attr: { PsAttributeSecureProcess: { trustletId: 1 } }, cert: { ekus: [ "1.3.6.1.4.1.311.10.3.6", "1.3.6.1.4.1.311.10.3.37", ], signingLevel: 12 }, sections: { ".tpolicy": { exports: { s_IumPolicyMetadata: { version: 1, trustletId: 1 }, } } }, runtime: { instanceGuid: "" }, loaderTriggers: { fromVtl0: false }, }; console.log(loadTrustlet(sample)); `}

Key idea: A trustlet is what passes all five gates. There is no other definition. Status is mechanical, not categorical: it is what the Secure Kernel's load path produces when a properly signed binary with a properly formed .tpolicy section calls NtCreateUserProcess with a proper secure-process attribute.

All five gates pass. The binary is now a trustlet. It is running in VTL1 user mode. The hypervisor refuses to map its pages into VTL0. Now what does it do? Who does it talk to?

6. The Inbox Roster

Five gates. Pass them all and you become a trustlet. Microsoft passes them on behalf of -- as of mid-2026 -- this list.

The agent / trustlet pattern

Before the roster, the pattern. Almost every shipping trustlet has a partner: an agent process in VTL0 that does the high-volume work of integrating with the rest of the operating system, and the trustlet itself in VTL1 holding the secret material. The two talk over an Asynchronous Local Procedure Call port whose server end is hosted by the trustlet.

A Windows inter-process communication primitive optimised for fast, fixed-size message exchange between processes on the same machine. The NT kernel hosts ALPC ports as named kernel objects (e.g., `\RPC Control\LSA_ISO_RPC_SERVER`); clients open a port and exchange messages with the server. For trustlets, the ALPC server runs inside the trustlet in VTL1; clients in VTL0 send requests, the Secure Kernel marshals the request across the VTL boundary, and the trustlet returns a result back to VTL0. The hash never leaves VTL1; the request and response do. flowchart LR NetClient[Network or local client] Agent["lsass.exe (VTL0 agent)
protocol parsing
session state
network I/O"] SK["Secure Kernel
(VTL1 ring 0)
marshals secure calls"] Trustlet["LsaIso.exe (VTL1 trustlet)
NTLM hashes
Kerberos TGTs
EncryptData / DecryptData"] NetClient -->|"network protocol"| Agent Agent -->|"ALPC: LSA_ISO_RPC_SERVER"| SK SK -->|"IUM Base API"| Trustlet Trustlet -->|"opaque blob"| SK SK --> Agent

The roster below names the agent for each trustlet where Microsoft has published one. Where the agent is not publicly named, the row says so.

Trustlet ID 0 -- the Secure Kernel Process

The first inhabitant of VTL1 user mode. Hosts Device Guard and Hypervisor-protected Code Integrity policy decisions. Architecturally close to a daemon: it does not service external clients; it provides services the Secure Kernel itself relies on for policy decisions about whether a given image is permitted to load in VTL0 [@ionescu-bh2015].

Trustlet ID 1 -- `LsaIso.exe` (Credential Guard)

The canonical trustlet. Holds NTLM hashes and Kerberos Ticket-Granting Tickets. Its agent in VTL0 is lsass.exe, the Local Security Authority Subsystem Service that has held those secrets directly for every version of Windows NT until 2015. The ALPC port name is LSA_ISO_RPC_SERVER. The IUM-side API the trustlet exposes is narrow: EncryptData and DecryptData on opaque blobs, plus a handful of internal management operations [@msdocs-credential-guard].

The Microsoft Learn explanation is the verbatim public account:

With Credential Guard enabled, the LSA process in the operating system talks to a component called the isolated LSA process that stores and protects those secrets, LSAIso.exe. Data stored by the isolated LSA process is protected using VBS and isn't accessible to the rest of the operating system. LSA uses remote procedure calls to communicate with the isolated LSA process [@msdocs-credential-guard].

A VTL0 caller -- including SYSTEM-in-the-NT-kernel -- can ask the trustlet to encrypt a freshly supplied credential or to authenticate a freshly received challenge. It cannot ask the trustlet to expose the underlying NTLM hash. The hash never leaves VTL1. That is the entire point.

Trustlet ID 2 -- `vmsp.exe` (Hyper-V vTPM, host side)

The Hyper-V Virtual Trusted Platform Module on the host side. One vmsp.exe instance per guest partition; the agent is vmwp.exe, the Hyper-V Virtual Machine Worker Process for that partition. The Instance GUID is the partition's GUID, so that the keys a partition's vTPM holds are scoped to that partition and that partition only. Storage primitives include a Mailbox primitive (protected by a per-instance Security Cookie) and a Secure Storage primitive that produces Ingress and Egress blobs encrypted with per-Instance IDK material [@ionescu-bh2015] [@msdocs-guarded-fabric].

Shielded VMs on Windows Server 2016 and later consume vmsp.exe. A shielded VM, per Microsoft Learn, "has a virtual TPM, is encrypted using BitLocker, and can run only on healthy and approved hosts in the fabric" [@msdocs-guarded-fabric]. The vTPM keys live in the host's vmsp.exe trustlet; the BitLocker volume master key in the guest is sealed against that vTPM; and a SYSTEM-privilege NT-kernel write primitive on the host cannot read the partition's vTPM secrets even though the host can otherwise reach the partition's memory.

Trustlet ID 3 -- vTPM provisioning trustlet

Pushes initial secrets into a partition's Instance GUID at vTPM creation time. The Secure Kernel's SkCapabilities array hardcodes CheckByTrustletId 2 for SecureStorageGet and CheckByTrustletId 3 for SecureStorageSet; those are the only Trustlet-ID-checked secure-storage operations in the 2015-era IUM secure-call surface [@ionescu-bh2015]. The pair of trustlets cooperates on the same Instance GUID so the provisioning trustlet writes and vmsp.exe reads, with the Secure Kernel enforcing that no other trustlet can do either.

Enhanced Sign-in Security (ESS) biometric matching component (Windows 11+)

Microsoft Learn documents the architectural placement of Windows Hello's facial-recognition algorithm verbatim:

When ESS is enabled, the face algorithm is protected using VBS to isolate it from the rest of Windows. The hypervisor is used to specify and protect memory regions, so that they can only be accessed by processes running in VBS. The hypervisor allows the face camera to write to these memory regions providing an isolated pathway... Sensors that support ESS have a certificate embedded during manufacturing [@msdocs-ess].

The page also documents the certificate chain that authenticates the camera to the matcher and the match-on-sensor requirement for fingerprint readers under ESS. Microsoft does not publicly name the binary that hosts the face algorithm, and it does not publicly assign that binary a Trustlet ID. The architectural placement is a trustlet. The naming is not on the record.

Administrator Protection / Adminless issuer (Windows 11, rolling out 2025-26)

In October 2025 Microsoft shipped a preview of Administrator Protection in KB5067036 [@kb5067036] and reverted the rollout in the same update note [@msdocs-admin-protection]. The Microsoft Learn page describes the security model:

Once authorized, Windows uses a hidden, system-generated, profile-separated user account to create an isolated admin token. This token is issued to the requesting process and is destroyed once the process ends, ensuring that admin privileges don't persist. Administrator protection introduces a new security boundary with support to fix any reported security bugs [@msdocs-admin-protection].

The implementation surface that issues those tokens is not publicly named. The architectural family resemblance to a trustlet is strong, and the "new security boundary with support to fix any reported security bugs" line is the formal commitment Microsoft makes for VBS-isolated components. Whether the issuer is a trustlet, a VBS Enclave, or a separately isolated VTL0 process is, as of mid-2026, not on the public record.

Third-party VBS Enclaves (Windows 11 24H2 and later)

For the first time since 2015, the trustlet primitive is exposed to third-party developers. A VBS Enclave is a DLL signed with a Trusted Signing certificate and loaded into a VTL1 enclave region of a host process via CreateEnclave and CallEnclave. The OS support is narrow:

Windows 11 Build 26100.2314 or later... Windows Server 2025 or later... Visual Studio 2022 version 17.9 or later... The Windows Software Development Kit (SDK) version 10.0.22621.3233 or later, which provides veiid.exe (the VBS Enclave import ID binding utility) and signtool.exe... A Trusted Signing account [@msdocs-vbs-enclaves].

Azure SQL's "Always Encrypted with secure enclaves" is the public flagship consumer. The architectural difference from an inbox trustlet is the API surface and the enclave-versus-process model: a VBS Enclave is a region inside an existing process's address space, not a separately scheduled process. The threat model is identical: the host (the rest of the process, including its VTL0 code) is the attacker, the enclave is the defender [@pulapaka-vbs-enclaves].

Roster table

Trustlet ID	Binary	VTL0 agent	ALPC endpoint	Secret / operation	Source
0	Secure Kernel Process	(internal; no external agent)	(internal)	Device Guard / HVCI policy decisions	[@ionescu-bh2015]
1	`LsaIso.exe`	`lsass.exe`	`LSA_ISO_RPC_SERVER`	NTLM hashes, Kerberos TGTs; `EncryptData` / `DecryptData`	[@msdocs-credential-guard] [@ionescu-bh2015]
2	`vmsp.exe`	`vmwp.exe` (per partition)	per-instance, partition GUID scoped	Hyper-V vTPM, host side; secure storage `Get`	[@ionescu-bh2015] [@msdocs-guarded-fabric]
3	vTPM provisioning trustlet	(Hyper-V provisioning agent)	per-instance, partition GUID scoped	Initial secret provisioning; secure storage `Set`	[@ionescu-bh2015]
(unpublished)	ESS face-algorithm component	Hello biometric pipeline; sensor-issued cert auth	not publicly named	Face template matching (fingerprint matching under ESS is match-on-sensor)	[@msdocs-ess]
(unpublished)	Administrator Protection issuer	UAC / Authorization Manager broker	not publicly named	Just-in-time admin token issuance	[@msdocs-admin-protection]
(third-party)	VBS Enclave DLL	host process (`CreateEnclave` caller)	direct calls via `CallEnclave`	Application-defined; e.g., Azure SQL Always Encrypted	[@msdocs-vbs-enclaves] [@pulapaka-vbs-enclaves]

The published authoritative trustlet list still stops at Trustlet IDs 0-3 from August 2015. Every roster published after that point has been inferred from secondary evidence: kernel symbols, ALPC port enumeration via NtQuerySystemInformation, documented architectural placements. Microsoft has not republished an authoritative roster for any later Windows release.

Two trustlets in the list above are *architecturally* trustlets per Microsoft's published documentation but have not been publicly named or numbered. The ESS face-algorithm matcher is documented to live in VBS-isolated memory, with sensor-certificate authentication and template-encryption keys held in VBS, but the binary's name and Trustlet ID are not on the public record [@msdocs-ess]. The Administrator Protection token issuer's implementation surface is even less precisely specified -- "a hidden, system-generated, profile-separated user account" inside "a new security boundary," but no commitment to whether the issuer is a trustlet, a VBS Enclave, or a separate isolated process [@msdocs-admin-protection]. This article will not invent names or numbers for either. Empirical enumeration via `NtQuerySystemInformation(SystemIsolatedUserModeInformation)` on a current Windows 11 build is the only way to obtain a current roster, and that route is outside the scope of this piece.

Note: Credential Guard prevents the memory-resident NTLM hash or Kerberos TGT from being read out of VTL0. It does not protect typed-in credentials, the agent-side relay surface, plaintext-secret protocols (CredSSP / NTLMv1 / MS-CHAPv2 / Digest), or liveness; the full four-item enumeration with citations lives in Section 10. Microsoft documents one corner of the limit verbatim: Credential Guard "doesn't prevent an attacker with malware on the PC from using the privileges associated with any credential" [@msdocs-credential-guard].

The published roster stops at Trustlet IDs 0-3 from 2015. The actual roster on a 2026 box is bigger. How much bigger Microsoft hasn't said. That is one of the open problems Section 9 will pick up.

7. Competing Approaches

Microsoft is not alone. The same threat model -- "protect user-mode code from a compromised OS kernel" -- has been answered six other ways. None is strictly better than a trustlet. None is strictly worse. The right answer depends on what platform you are on, what threat model you have, and what workload you are trying to protect.

A hardware-enforced or hypervisor-enforced execution context whose memory and state are inaccessible to the surrounding host operating system, including its kernel. The Open Mobile Terminal Platform (OMTP) first defined the term, and GlobalPlatform now publishes the standard APIs (TEE Client API for the host, TEE Internal Core API for the trusted code). Windows trustlets, Intel SGX enclaves, ARM TrustZone Trusted Applications, AMD SEV-SNP confidential VMs, Apple's Secure Enclave, and seL4 user-mode security servers are all variants of TEE [@wiki-tee].

Intel SGX

Software Guard Extensions launched with the sixth-generation Intel Core processors (Skylake) in 2015 [@wiki-sgx]. SGX adds two CPU instructions with different privilege requirements: ENCLS (ring 0; the OS issues leaves like ECREATE on behalf of a user-mode application) and ENCLU (ring 3; the application issues leaves like EENTER and EEXIT to enter and leave its enclave) [@intel-sdm-sgx]. The result is a user-mode-controllable enclave whose memory is encrypted on the way out of the CPU's Enclave Page Cache to DRAM. The CPU microcode itself, plus the Quoting Enclave, is the TCB. Neither the OS kernel nor the hypervisor sits in the trust path.

That sounded ideal in 2015. It has not aged well. Foreshadow (USENIX Security 2018, Van Bulck et al.) demonstrated that transient-execution attacks could extract not only enclave memory but the platform's attestation key [@foreshadow-usenix]. The Foreshadow team's site states the consequence:

Foreshadow demonstrates how speculative execution can be exploited for reading the contents of SGX-protected memory as well as extracting the machine's private attestation key... due to SGX's privacy features, an attestation report cannot be linked to the identity of its signer. Thus, it only takes a single compromised SGX machine to erode trust in the entire SGX system. -- Foreshadow project site [@foreshadow-attack-eu]

SGAxe (attestation-key extraction) [@sgaxe], Plundervolt (software-controlled undervolting to fault SGX computations) [@plundervolt], SgxPectre (branch-target injection across the enclave boundary) [@sgxpectre], and others followed. Intel deprecated SGX on 11th-generation Core and later client CPUs, which incidentally removed Ultra HD Blu-ray playback on officially licensed software including PowerDVD [@wiki-sgx]. SGX continues on Xeon for confidential cloud workloads but is no longer a target architects pick on Windows clients.The Ultra HD Blu-ray collapse is the closest the SGX deprecation has come to mainstream visibility. PowerDVD's SGX dependency meant that a client SGX deprecation broke a consumer product line, and Cyberlink had to ship updates rerouting around the dropped CPU feature.

AMD SEV-SNP and Intel TDX

AMD's Secure Encrypted Virtualization with Secure Nested Paging (SEV-SNP), introduced on EPYC 7003 (Milan, launched 15 March 2021) [@wiki-amd-epyc], and Intel's Trust Domain Extensions (TDX), introduced on 4th-generation Xeon Scalable (Sapphire Rapids, launched 10 January 2023) [@wiki-sapphire-rapids], provide whole-VM confidential computing [@amd-sev-overview] [@intel-tdx-overview]. AMD's verbatim claim: "SEV-SNP adds strong memory integrity protection to help prevent malicious hypervisor-based attacks like data replay, memory re-mapping, and more to create an isolated execution environment" [@amd-sev-overview]. Intel's verbatim claim about TDX: "A CPU-measured Intel TDX module enables Intel TDX. This software module runs in a new CPU Secure Arbitration Mode (SEAM) as a peer virtual machine manager (VMM)" [@intel-tdx-overview]. The AMD SEV-SNP whitepaper "Strengthening VM Isolation with Integrity Protection and More" is the canonical technical reference [@amd-sev-snp-whitepaper].

The granularity is different from a trustlet. SEV-SNP and TDX isolate an entire virtual machine from its hypervisor and host. They do not isolate a process from its own VM's kernel. For "this user-mode process should be protected from a SYSTEM kernel write primitive on the same OS," a trustlet is the primitive; for "this entire VM should be protected from a compromised cloud provider," a CVM is the primitive. Use the right one.

ARM TrustZone and OP-TEE

The two-world hardware split that has shipped on every Cortex-A processor since the mid-2000s -- the Wikipedia ARM architecture article states verbatim that "the Security Extensions, marketed as TrustZone Technology, is in ARMv6KZ and later application profile architectures," the lineage every Cortex-A core inherits [@wiki-arm-architecture]. The CPU enforces a Non-Secure World and a Secure World; switching between the two is mediated by a Secure Monitor Call (SMC) instruction. OP-TEE is the canonical open-source secure-world OS for Cortex-A TrustZone, with Trusted Applications running as user-mode binaries in Secure World EL-0 and the OP-TEE OS itself running at EL-1 [@optee-about]. The OP-TEE about page describes the design: "OP-TEE is a Trusted Execution Environment (TEE) designed as companion to a non-secure Linux kernel running on Arm; Cortex-A cores using the TrustZone technology" [@optee-about].

TrustZone is the closest non-Windows analogue to a trustlet at the architectural level. The vocabulary maps one for one.

Concept	Windows VBS / IUM	ARM TrustZone / OP-TEE
Isolation primitive	Hyper-V hypervisor + SLAT	TrustZone Address Space Controller; CPU NS/S bit
Secure-side kernel	Secure Kernel (VTL1 ring 0)	OP-TEE OS (Secure World EL-1)
Secure-side user mode	IUM (VTL1 ring 3)	Trusted Applications (Secure World EL-0)
Agent / supplicant	The trustlet's VTL0 agent (e.g., `lsass.exe`)	`tee-supplicant` and TEE Client API on the Linux side
Trust gate	Microsoft EKUs + Signature Level 12	OP-TEE TA signing key configured at build time

Apple Secure Enclave Processor (SEP)

Apple's answer is a dedicated on-die security subsystem. SEP is a separate processor core, isolated from the Application Processor on the same SoC, with its own boot ROM, its own AES engine, and its own random number generator. It has been in every iPhone since iPhone 5s (2013), every Apple Silicon Mac, every Apple Watch from Series 1 [@apple-sep]. Apple's verbatim description:

The Secure Enclave Processor runs an Apple-customized version of the L4 microkernel. It's designed to operate efficiently at a lower clock speed that helps to protect it against clock and power attacks [@apple-sep].

SEP is the strongest counter to microarchitectural side channels among the production options, because the cores genuinely do not share microarchitectural state with the Application Processor. The price is that everything is firmware-class: patching a SEP bug means rolling SEP firmware on every Apple device, not pushing an OS update. The cycle is slower and more centralised.

seL4 plus user-mode security servers

The academic conscience of the lineage. About 8,700 lines of formally verified C, with machine-checked proofs of functional correctness, confidentiality, and integrity [@sel4-sosp-paper] [@sel4-about]. Sub-microsecond IPC. The price is that seL4 is a separation microkernel, not a desktop OS; building a Credential-Guard-equivalent on seL4 means designing the application architecture from the microkernel up, not retrofitting it onto a Windows-compatible stack. seL4 has shipping deployments in defence (the DARPA HACMS programme), automotive ECUs, and Qualcomm's Hexagon DSP secure OS.

When to pick which

A decision table of the kind a colleague would actually use.

You want	Pick
Protect a user-mode Windows process from a SYSTEM kernel write primitive	Trustlet (inbox) or VBS Enclave (third-party) [@msdocs-vbs-enclaves]
Protect an entire VM from your cloud provider's host	AMD SEV-SNP or Intel TDX [@amd-sev-overview] [@intel-tdx-overview]
Protect a user-mode Linux-on-ARM service from a compromised Linux kernel	TrustZone + OP-TEE Trusted Application [@optee-about]
Hold an iPhone owner's Touch ID / Face ID template safely from iOS	Apple SEP [@apple-sep]
Build a high-assurance system with a machine-checked proof of kernel correctness	seL4 [@sel4-sosp-paper]
Run Intel SGX enclaves on Xeon for confidential cloud	SGX (modulo Foreshadow-class side channels) [@foreshadow-attack-eu]

Trustlets are the right answer for Windows. They are not the right answer for every platform, every threat model, or every workload. They are also not without limits on Windows itself. What are those?

8. The Floor of the Threat Model

By 2020 the trustlet model had been shipping for five years. Two researchers at the Microsoft Security Response Center, Saar Amar and Daniel King, pointed a fuzzer at the secure-call interface for two weeks and reported back with five VTL0-to-VTL1 bugs [@amar-bh2020]. Their Black Hat USA 2020 talk, "Breaking VSM by Attacking Secure Kernel," is the most important public document on what the trustlet model actually guarantees and what it does not [@amar-publications].

The talk is honest in a way Microsoft is rarely honest about its own products. The slides enumerate the bugs by CVE number, name the specific Secure Kernel routines they exploited, and -- unusually -- list the hardening changes Microsoft shipped because of what was found. Reading the deck is the closest thing to a Q-and-A with the Secure Kernel team.

Bug class 1: the secure-call interface is the floor

The Secure Kernel exposes about three dozen "secure services" callable from VTL0 via the IumInvokeSecureService dispatcher. Each takes a parameter block from VTL0, parses it inside VTL1, and returns. That dispatcher is, by definition, the largest VTL0-controllable input surface in the model. Amar and King retargeted the Hyperseed hypercall fuzzer, originally written by Daniel King and Shawn Denbow for hypercall fuzzing, at securekernel!IumInvokeSecureService [@amar-bh2020]. Two weeks of fuzzing produced five bugs.

Two of them shipped with public CVE numbers in 2020. CVE-2020-0917 is an out-of-bounds read in the secure-call surface; CVE-2020-0918 is a design flaw in SkmmUnmapMdl where a VTL0 caller could pass a fully attacker-controlled Memory Descriptor List to SkmiReleaseUnknownPTEs [@nvd-cve-2020-0917] [@nvd-cve-2020-0918] [@amar-bh2020]. The NVD entries describe both with the same boilerplate ("Windows Hyper-V Elevation of Privilege Vulnerability") and classify the CWE as "Insufficient Information"; the technical detail lives in the Amar/King deck.

Microsoft hardened in response. The Amar/King deck enumerates what changed:

The Secure Kernel pool moved to segment heap in mid-2019, breaking the heap layout the public exploit depended on.
Four W+X regions in VTL1 were reduced to +X only, eliminating attacker-controlled code-injection targets.
SkpgContext, a HyperGuard-style control-flow integrity check for the Secure Kernel, was introduced [@amar-bh2020].

Alex Ionescu's term for an attacker-controlled trustlet, enabled by a substrate compromise rather than a trustlet bug. If Test Signing is on, or if a production Microsoft signing key leaks, or if Secure Boot can be bypassed, an attacker can sign and load their own "trustlet" that passes the five gates of Section 5 and operates with VTL1 privilege. The trustlet model itself remains intact; the trust roots underneath it are what fail [@ionescu-bh2015].

Bug class 2: denial of service is not a security boundary

Amar's deck states the rule that excludes liveness from the VBS threat model verbatim:

VTL0 can DOS VTL1 by design. -- Saar Amar and Daniel King, Black Hat USA 2020 [@amar-bh2020]

The hypervisor schedules VTL1; VTL0 is the agent for almost every communication channel into VTL1; VTL0 can stop talking to VTL1 at any time. None of this is, in Microsoft's stated model, a security violation. A VTL0 kernel attacker who can prevent Credential Guard from issuing tickets has not stolen any credential; they have, in the language of the threat model, achieved denial of service, which is out of scope. This matters in practice: a defender cannot reason about a trustlet "always being available." They can only reason about its memory not being readable from VTL0 when it is available.

Bug class 3: the agent RPC surface lives in VTL0

The trustlet's pages are safe even from VTL0 ring 0. The agent process that services the trustlet's ALPC port is not safe. The agent is lsass.exe for Credential Guard, vmwp.exe for the vTPM, presumably the Hello biometric pipeline for ESS. Every byte of every protocol whose state machine the agent implements is reachable from VTL0. The hash never leaves VTL1; the authentication outcomes the hash produces can be relayed.

In December 2022 Oliver Lyak published "Pass-the-Challenge: Defeating Windows Defender Credential Guard" [@lyak-pass-the-challenge]. The technique recovers usable NTLM challenge responses from encrypted credential blobs that LsaIso.exe returns to lsass.exe in VTL0:

In this blog post, we present new techniques for recovering the NTLM hash from an encrypted credential protected by Windows Defender Credential Guard. While previous techniques for bypassing Credential Guard focus on attackers targeting new victims who log into a compromised server, these new techniques can also be applied to victims logged on before the server was compromised [@lyak-pass-the-challenge].

A network authentication protocol that uses NTLM works in challenge-response form: the server sends a challenge, the client encrypts it with its NTLM hash, the server (or a domain controller) verifies the response. With Credential Guard, the client's NTLM hash lives in `LsaIso.exe`; only `LsaIso.exe` can perform the encryption. A VTL0 attacker who can talk to `lsass.exe` can ask `lsass.exe` to ask `LsaIso.exe` to compute an NTLM response for an attacker-supplied challenge. The attacker never sees the hash; they see an authentication response computed with it. Many real-world relay attacks need only the response, not the hash. Lyak's writeup is the worked example; the architectural fact is that the agent RPC channel is a VTL0 surface even though the hash itself is not.

Microsoft documents one corner of the limit verbatim: Credential Guard "doesn't prevent an attacker with malware on the PC from using the privileges associated with any credential" [@msdocs-credential-guard]. The "use" is the agent-side operation; the trustlet is doing the cryptography, and the cryptography is being used by the attacker.

Bug class 4: trustlet-to-trustlet via shared Instance GUIDs

Trustlets that share an Instance GUID can read and write storage blobs the Secure Kernel scopes per-Instance. The pair vmsp.exe and the vTPM provisioning trustlet uses exactly this primitive: provisioning writes, vmsp.exe reads, the Secure Kernel hard-codes which Trustlet IDs may invoke SecureStorageSet versus SecureStorageGet on each Instance GUID. The defence is in the SkCapabilities table; bugs in that table are exploit-class.

In Ionescu's vocabulary, a "malwarelet" is the worst case here: an attacker-controlled trustlet -- enabled by a Secure Boot or Test Signing compromise -- could request access to the Instance GUIDs of other trustlets, and any missing rule in SkCapabilities would let it read what those trustlets stored. There are no public exploits in this class as of mid-2026. There also is not a published audit of the table.

Bug class 5: substrate compromise (Secure Boot, firmware, signing keys)

If Test Signing is on; if a production signing key leaks; if Secure Boot can be bypassed to boot a kernel that accepts attacker-controlled trustlet roots; if the UEFI firmware itself permits a DMA attack against early-boot memory -- the entire trustlet model is moot. Ionescu's BH2015 deck states the diagnosis: "VBS' key weakness is its reliance on Secure Boot" [@ionescu-bh2015]. Rafal Wojtczuk's Black Hat USA 2016 attack-surface analysis empirically validated the warning, demonstrating one non-critical VBS-feature bypass and one critical firmware exploit [@wojtczuk-bh2016]. The firmware below VBS is the substrate trustlets sit on; the trustlet model is no stronger than that substrate.

flowchart TD Attacker["VTL0 kernel attacker"] SK["Secure Kernel"] Trustlet["Trustlet (VTL1 user)"] Agent["VTL0 agent process (lsass.exe, vmwp.exe...)"] Substrate["Substrate: UEFI firmware, Secure Boot, signing roots"] Attacker -->|"1. Secure-call interface bugs
CVE-2020-0917, CVE-2020-0918"| SK Attacker -->|"2. DoS by design (out of scope)"| SK Attacker -->|"3. Agent RPC surface
Pass-the-Challenge"| Agent Agent -->|"authentication outcome"| Trustlet Attacker -->|"4. Trustlet-to-trustlet
via shared Instance GUID"| Trustlet Substrate -->|"5. Substrate compromise
malwarelets, BootHole-class"| SK Substrate --> Trustlet

The Hyperseed fuzzer had a prior life. Daniel King and Shawn Denbow first presented it at OffensiveCon 2019 as a hypercall fuzzer [@amar-bh2020]. The retargeting at the secure-call interface is the same tool, pointed at a different parser. The two-weeks-five-bugs result is therefore not "Microsoft wrote bad code" but "a well-built fuzzer aimed at a complex parser will find bugs in ~2 weeks." That is the empirical bar for an unverified TCB.

Key idea: The trustlet model is hypervisor-strong against the VTL0 kernel; it is not stronger than the substrate it sits on. Five attack classes -- secure-call interface bugs, designed-out denial-of-service, the agent RPC residual, trustlet-to-trustlet via shared Instance GUIDs, and substrate compromise -- bound what the model can guarantee. None of them invalidates trustlets; all of them are reasons to deploy trustlets alongside other controls rather than as a sole defence.

The trustlet model has a finite, audited attack surface. The surface is not zero. Liveness is not promised. The firmware and Secure Boot underneath everything still matter. What is new on this surface in 2024 to 2026?

9. Open Problems

Three things you might expect Microsoft to have published by 2026 -- the current inbox trustlet roster, an architecture diagram of Administrator Protection on par with Credential Guard's, and a public CVE wave around VBS Enclaves -- are still partial or missing. Here is the frontier.

1. Trustlet enumeration drift. Ionescu's August 2015 enumeration of Trustlet IDs 0 through 3 remains the only authoritative published list. Eleven years later, the ESS biometric matcher has not been named with a Trustlet ID and the Administrator Protection issuer has not been committed to as a trustlet at all. A researcher with a debugger and the Quarkslab IUM-debugging recipe can recover the current roster empirically [@quarkslab-debug-ium]; Microsoft has not republished it.

2. VBS Enclave trust-boundary hardening. Microsoft's Security Response Center published a blog post in June 2025 -- "Everything Old Is New Again" -- explicitly committing to host-to-enclave pointer validation, copy-before-check discipline, and TOCTOU avoidance as the active hardening surface for VBS Enclaves [@ms-everything-old]. The post is unambiguous that a CVE wave is foreseeable as researchers turn their attention to the host-enclave seam. As of the publication of this article no public CVE has been issued against a VBS Enclave-using product, but Microsoft's narrowing of supported Windows builds in 2025 (from "Windows 11 24H2 or later" to "Windows 11 Build 26100.2314 or later") is the kind of build-floor adjustment that historically precedes a documented hardening change [@msdocs-vbs-enclaves].

3. Side channels against VTL1. Transient-execution attacks against VTL1 memory have not been publicly demonstrated end to end. The Foreshadow class of attacks against SGX is the existence proof that a co-resident TEE can leak through microarchitectural side channels, and the threat model explicitly includes them [@foreshadow-attack-eu]. There is no VBS-specific transient-execution mitigation; platform-wide mitigations (Kernel Virtual Address Shadow, Retpoline, Indirect Branch Restricted Speculation) are the only defence. A demonstration of "Foreshadow-against-LsaIso" would not be surprising; its absence to date is, given the research community's interest, mildly so.

4. Debugging asymmetry. Researchers have a working trustlet-debugging recipe; defenders have an explicit "no" from Microsoft. The Quarkslab writeup walks through nested virtualisation to attach to a trustlet under controlled conditions [@quarkslab-debug-ium]; Microsoft's product-facing page states verbatim that "it is not possible to attach to an IUM process" and that "other APIs, such as CreateRemoteThread, VirtualAllocEx, and Read/WriteProcessMemory will also not work as expected when used against Trustlets" [@msdocs-ium]. The asymmetry favours offence: an attacker with the time, hardware, and tooling Quarkslab demonstrates can study trustlet internals in ways a defender on a production box cannot. Live-system trustlet introspection for incident response is the missing capability.

5. Administrator Protection transparency. As of 10 May 2026, the Administrator Protection feature has been shipped in preview (KB5067036, 28 October 2025), then reverted in the same update note pending a future re-rollout [@kb5067036] [@msdocs-admin-protection]. There is no architecture diagram on the level of Credential Guard's "how it works" page. There is no published Trustlet ID. There is no public commitment to whether the token issuer is a trustlet, a VBS Enclave, or something else inside the new security boundary. For a feature that materially changes the local-elevation model of Windows, that is unusual reticence.

6. Cross-architecture portability. A workload that wants to run as a trustlet on Windows, a Confidential VM on Linux, a Trusted Application on ARM, and a Secure Enclave Application on Apple silicon must, today, be written four times. GlobalPlatform's TEE Client API standardises one side of TrustZone, the Open Enclave SDK abstracts a subset of SGX and TrustZone, and VBS Enclaves do their own thing. No universal portable TEE API exists. For workloads where portability matters more than peak isolation, this is the open problem with the most direct commercial pressure behind it.

Two answers, both incomplete. The defensive answer: an enumerated trustlet list is an attacker's targeting list, and Microsoft prefers not to publish targeting lists for components whose exact attack surface is still under active study. The historical answer: the 2015 list was a side-effect of Ionescu reverse-engineering Windows 10 RTM. There has been no comparable public reverse-engineering push for any post-2015 Windows release at the same level of completeness, and Microsoft has not chosen to fill the gap with first-party documentation. Empirical enumeration via `NtQuerySystemInformation(SystemIsolatedUserModeInformation)` works on a live system, but doing it on every Windows 11 servicing build is a research programme, not a citation.

These are questions a researcher with a year of grant time could move the field on. The next section is the question a practitioner has today.

10. Practitioner Guide

What changes in a real workflow once you know what a trustlet is? Four short answers.

Windows administrator

Verify Credential Guard is actually running before you assume it is. Two ways.

Note: GUI: Run msinfo32 and check Virtualization-based security Services Running. You should see at least "Credential Guard" and ideally "Hypervisor enforced Code Integrity." PowerShell: Get-CimInstance -ClassName Win32_DeviceGuard -Namespace root\Microsoft\Windows\DeviceGuard. The properties SecurityServicesRunning and VirtualizationBasedSecurityStatus are the load-bearing ones; values of 1 and 2 respectively indicate Credential Guard is running with VBS in full enforcement [@msdocs-credential-guard].

Enumerating live trustlets on a 2026 box requires more care than enumerating ordinary processes. Process Explorer's Image tab carries an IUM marker for trustlet processes. SysInternals Sigcheck on a candidate binary surfaces the Signing Level. The Microsoft Learn IUM page is explicit that "other APIs, such as CreateRemoteThread, VirtualAllocEx, and Read/WriteProcessMemory will also not work as expected when used against Trustlets" [@msdocs-ium] -- the same APIs many EDR products rely on for behavioural monitoring will silently fail or report sentinel values when targeted at a trustlet. Plan detections accordingly.

Security researcher

The Quarkslab blog post "Debugging Windows Isolated User Mode (IUM) Processes" is the canonical recipe for attaching to a trustlet under nested virtualisation [@quarkslab-debug-ium]. The empirical enumeration path is NtQuerySystemInformation with class SystemIsolatedUserModeInformation; the structure returned includes a count of running trustlets and their identifying metadata.The driver-side pattern Microsoft documents for "is this process a trustlet?" is IsSecureProcess, an internal Win32K predicate the IUM page names as the canonical check. Tools that need to behave differently against trustlets (memory scanners, integrity checkers, EDR sensors) should call the supported equivalent rather than parsing process attributes by hand [@msdocs-ium].

Application developer (VBS Enclaves)

If you are writing third-party code that needs trustlet-class isolation, the primitive you target is a VBS Enclave, not a trustlet. The toolchain is specific:

Visual Studio 2022 version 17.9 or later.
Windows SDK version 10.0.22621.3233 or later (provides veiid.exe, the VBS Enclave import ID binding utility, and signtool.exe).
A Trusted Signing account for production signing [@msdocs-vbs-enclaves].

The architectural rule is never trust the host. The host process's address space is reachable by the enclave; the enclave's address space is not reachable by the host. Range-validate every pointer the host hands the enclave; copy before you check (so the host cannot mutate the data between your check and your use); avoid TOCTOU windows. Microsoft's "Everything Old Is New Again" post is explicit that this is the hardening surface researchers are looking at right now [@ms-everything-old].

The development guide includes a sample with a comment that captures the discipline:

Every DLL loaded in an enclave requires a configuration. This configuration is defined using a global const variable named __enclave_config of type IMAGE_ENCLAVE_CONFIG... // DO NOT SHIP DEBUGGABLE ENCLAVES TO PRODUCTION [@msdocs-vbs-enclaves-dev-guide].

The IMAGE_ENCLAVE_POLICY_DEBUGGABLE flag is for development only. The VbsEnclaveTooling repository on GitHub provides a NuGet package and a code generator that make the cross-VTL marshalling less error-prone, plus reference documentation including Edl.md, HelloWorldWalkthrough.md, and CodeGeneration.md [@vbs-enclave-tooling].

1. Confirm OS support: Windows 11 Build 26100.2314+ or Windows Server 2025+ [@msdocs-vbs-enclaves]. 2. Install Visual Studio 2022 17.9+ and Windows SDK 10.0.22621.3233+. 3. Acquire a Trusted Signing account; configure `signtool.exe` for it. 4. Define `__enclave_config` as `IMAGE_ENCLAVE_CONFIG`; set family/image/SVN fields. 5. Use `veiid.exe` to bind import IDs. 6. Sign the enclave DLL with `signtool.exe` and the Trusted Signing certificate. 7. Test with `IMAGE_ENCLAVE_POLICY_DEBUGGABLE` set; remove it before production. 8. Range-validate every host-supplied pointer; copy before check.

Defender

Know what Credential Guard does not protect, because that is where most exposure remains.

Note: The trustlet protects memory-resident NTLM hashes and Kerberos TGTs from a VTL0 kernel attacker. It does not protect: - Supplied credentials at the logon prompt (keyloggers, screen-scrapers, hardware shimming). - The agent RPC channel (Pass-the-Challenge-class relay against lsass.exe is reachable from VTL0) [@lyak-pass-the-challenge]. - Protocols that require a usable secret in plaintext: CredSSP, NTLMv1, MS-CHAPv2, Digest. These are unsupported with the trustlet-protected token by design [@msdocs-credential-guard]. - Liveness: a VTL0 kernel attacker can stop talking to VTL1 and prevent the trustlet from being available. Denial of service is out of the VBS threat model [@amar-bh2020]. The summary: trustlets shrink the credential-theft attack surface, they do not eliminate it.

The trustlet model is finite, audited, and useful. Use the lock; do not assume the lock is the only thing on the door.

11. Frequently asked questions

No. Protected Process Light (PPL) and trustlets sit in the same lineage but differ at the architectural level. A PPL is enforced by the NT kernel, which is also the attacker's likely foothold; itm4n's 2021 PPLdump showed the result over eight years of LSASS-as-PPL deployment [@itm4n-runasppl]. A trustlet is enforced by the Hyper-V hypervisor and the Secure Kernel, both running in a different Virtual Trust Level from the NT kernel; a VTL0 kernel write primitive does not touch the trustlet's pages [@quarkslab-debug-ium]. The signing-level lattice is similar (both rely on Signature Level 12); the enforcement architecture is not. Not directly. Inbox trustlets require the Microsoft IUM EKU (`1.3.6.1.4.1.311.10.3.37`), which Microsoft does not grant to third parties [@ionescu-bh2015]. Since Windows 11 24H2, the third-party-shippable equivalent is a VBS Enclave: a DLL signed with a Trusted Signing certificate, loaded into an enclave region of a host process via `CreateEnclave` and `CallEnclave`. The architectural threat model is identical (the host is the attacker, the enclave is the defender); the API surface and the enclave-versus-process model differ. VBS Enclaves require Windows 11 Build 26100.2314 or later, Windows SDK 10.0.22621.3233 or later, Visual Studio 2022 17.9 or later, and a Trusted Signing account [@msdocs-vbs-enclaves]. No. It means that the *memory-resident* NTLM hash or Kerberos TGT cannot be read out of `LsaIso.exe` by a VTL0 kernel attacker. It does not mean credentials are unstealable. Section 10 enumerates the four classes of residual exposure -- typed-in credentials, the agent-side RPC relay (Pass-the-Challenge) [@lyak-pass-the-challenge], plaintext-secret protocols (CredSSP / NTLMv1 / MS-CHAPv2 / Digest are unsupported with the trustlet-protected token), and liveness (denial of service against VTL1 is out of the VBS threat model) -- with citations [@msdocs-credential-guard] [@amar-bh2020]. For that trustlet, yes; for the model, by design. The Secure Kernel plus trustlets are the VBS TCB. Amar and King's 2020 work demonstrated practical VTL0-to-VTL1 vulnerabilities (CVE-2020-0917, CVE-2020-0918) [@amar-bh2020] [@nvd-cve-2020-0917] [@nvd-cve-2020-0918]; Microsoft hardened in response, moving the Secure Kernel pool to segment heap, reducing four W+X regions to +X only, and introducing `SkpgContext` HyperGuard for VTL1 [@amar-bh2020]. The surface remains finite and audited; the trustlet model is hypervisor-strong against the VTL0 kernel and not stronger than the substrate it sits on. Not on ESS-capable systems. The Microsoft Learn page is clear that *"when ESS is enabled, the face algorithm is protected using VBS to isolate it from the rest of Windows... The hypervisor is used to specify and protect memory regions, so that they can only be accessed by processes running in VBS"* [@msdocs-ess]. The biometric *template* is encrypted with VBS-only keys and lives in VBS-isolated memory. The TPM still has a role -- it holds the per-user Hello *private keys* that authenticate against the local credential provider -- but the biometric template itself does not live in the TPM [@msdocs-tpm]. No. The Microsoft Learn page describes the new model: an authorised user triggers a Windows Hello-backed prompt; Windows then *"uses a hidden, system-generated, profile-separated user account to create an isolated admin token. This token is issued to the requesting process and is destroyed once the process ends"* [@msdocs-admin-protection]. The in-session prompt is still there; the elevated token's *origin* is what changed (from a split-token impersonation of the same account to a transient system-generated admin account). The October 2025 preview shipped in KB5067036 and was then reverted in the same update note pending a future rollout [@kb5067036]. As of 10 May 2026 the feature is not generally available.

<StudyGuide slug="vbs-trustlets-what-actually-runs-in-the-secure-kernel" keyTerms={[ { term: "Trustlet", definition: "A user-mode process running in VTL1 user mode, scheduled by the Secure Kernel, isolated from VTL0 by per-VTL SLAT permissions. Defined by passing five load-time gates." }, { term: "Virtual Trust Level (VTL)", definition: "A hypervisor-managed privilege axis added on top of x86 rings. Currently two VTLs are implemented out of an architecturally supported sixteen." }, { term: "Isolated User Mode (IUM)", definition: "Ring 3 of VTL1. The user-mode environment trustlets run in. Restricted to about 48 of NT's ~480 syscalls." }, { term: "Secure Kernel", definition: "The kernel that runs in VTL1 ring 0. Schedules trustlets, parses .tpolicy sections, enforces SkCapabilities rules on secure-call invocations." }, { term: "IUM EKU", definition: "The Enhanced Key Usage OID 1.3.6.1.4.1.311.10.3.37. Required alongside the Windows System Component Verification EKU for a binary to be loaded as a trustlet at Signature Level 12." }, { term: "Trustlet Instance GUID", definition: "A runtime identifier the Secure Kernel uses to scope per-instance secrets. Set via IumSetTrustletInstance; shared between cooperating trustlets (e.g., vmsp.exe and the vTPM provisioning trustlet) so they can read each other's storage blobs under SkCapabilities control." }, { term: "Malwarelet", definition: "Ionescu's term for an attacker-controlled trustlet, enabled by a Test Signing or Secure Boot compromise rather than by a trustlet-internal bug." }, { term: "ALPC", definition: "Asynchronous Local Procedure Call: Windows IPC primitive used by VTL0 agent processes to communicate with their VTL1 trustlet counterparts." } ]} questions={[ { q: "Name the five gates a Windows binary must pass at load time to become a trustlet.", a: "(1) PsAttributeSecureProcess process attribute with a 64-bit Trustlet ID. (2) Two EKUs at Signature Level 12: Windows System Component Verification (1.3.6.1.4.1.311.10.3.6) and IUM (1.3.6.1.4.1.311.10.3.37). (3) A .tpolicy PE section exporting s_IumPolicyMetadata with matching Trustlet ID. (4) A Trustlet Instance GUID bound via IumSetTrustletInstance. (5) The stripped-down LdrpIsSecureProcess loader path." }, { q: "Why does a SYSTEM-privilege NT-kernel write primitive on Windows 11 25H2 fail to read LsaIso.exe memory?", a: "Because the NT kernel runs in VTL0, LsaIso.exe runs in VTL1, and the Hyper-V hypervisor configures per-VTL SLAT entries that refuse VTL0 read access to VTL1-only pages. The attacker's kernel write primitive can edit NT kernel structures but cannot change the hypervisor-managed SLAT entries." }, { q: "What does Pass-the-Challenge demonstrate about the limits of Credential Guard?", a: "That while the NTLM hash itself never leaves VTL1, the agent process (lsass.exe in VTL0) can be asked to ask the trustlet to compute an authentication response for an attacker-supplied challenge. The resulting response is reachable by the VTL0 attacker and is sufficient for many relay attacks. The hash is protected; the authentication outcomes it produces are not." }, { q: "What is the practical floor of the trustlet attack surface that Amar and King exposed at Black Hat USA 2020?", a: "The secure-call interface (IumInvokeSecureService) parses VTL0-controlled inputs in VTL1. Hyperseed retargeted at it found five VTL0->VTL1 bugs in two weeks, including CVE-2020-0917 (OOB read in the secure-call surface) and CVE-2020-0918 (SkmmUnmapMdl design flaw). Microsoft responded with segment-heap migration, W+X reduction, and SkpgContext (Secure Kernel HyperGuard)." }, { q: "What is the third-party equivalent of an inbox trustlet on Windows 11 24H2 and later?", a: "A VBS Enclave: a DLL signed with a Trusted Signing certificate and loaded into an enclave region of a host process via CreateEnclave / CallEnclave. Requires Windows 11 Build 26100.2314 or later, Windows SDK 10.0.22621.3233 or later, and Visual Studio 2022 17.9 or later." } ]} />