# Hyper-V Enlightenments, VMBus, and the Synthetic Device Model > How Hyper-V guests get high-performance device I/O without emulating legacy hardware: enlightenments, the TLFS, VMBus rings, the VSP/VSC pair, and why the host-side parser is the attack surface. *Published: 2026-05-14* *Canonical: https://paragmali.com/blog/hyper-v-enlightenments-vmbus-and-the-synthetic-device-model* *License: CC BY 4.0 - https://creativecommons.org/licenses/by/4.0/* --- Hyper-V's guest OSes do not see emulated 1990s hardware. They see a published, versioned hypervisor ABI called the **Top-Level Functional Specification**, a transport called **VMBus** that consists of two ring buffers per channel, and a catalogue of synthetic devices whose backends live in the privileged root partition. This design is what makes Windows and Linux equally fast inside Hyper-V, and it is also why the host-side parsers in `vmswitch.sys` keep producing critical CVEs. The 2024 OpenHCL paravisor moves those parsers into the guest's own trust boundary in memory-safe Rust, which is the most consequential change to the Hyper-V device model since 2008. ## 1. The Type-1 hypervisor foundation Open `Task Manager` on a modern Windows 11 desktop, switch to the `Performance` tab, and look at the line that says "Virtualization: Enabled." That single line hides one of the most consequential design choices in modern operating systems: when Microsoft shipped [Hyper-V with Windows Server 2008 in June 2008](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/hyper-v-on-windows-server), they did not bolt a virtualization product on top of Windows. They put a small hypervisor *underneath* it. That ordering matters more than it sounds. In the older Microsoft Virtual Server 2005 model, Windows ran on the bare metal and a user-mode service emulated PC hardware for guests inside it. In the [Hyper-V architecture documented by Microsoft in 2008](https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/about/), the hypervisor boots first and Windows itself becomes a guest of the hypervisor. Microsoft calls this guest the **root partition**. Every other VM on the box is a **child partition**. A hypervisor that runs directly on the physical hardware rather than inside a host operating system. Hyper-V, VMware ESXi, and Xen are Type-1; VirtualBox and the original Microsoft Virtual Server are Type-2 (hosted). In a Type-1 design no general-purpose OS sits between the hypervisor and the silicon, which lets the hypervisor enforce isolation directly using CPU virtualization extensions like Intel VT-x and AMD-V. The root partition is not just another VM. It is a privileged partition: it owns the physical I/O devices, runs the parent stack of synthetic-device backends, and brokers everything that touches real hardware. Children get virtual processors and a slice of memory, and they communicate with the root over a software bus called VMBus that we will spend most of this article taking apart. flowchart TD HW["Physical hardware (CPU, RAM, NICs, NVMe)"] HV["Hyper-V hypervisor (microkernel)"] Root["Root partition (Windows Server)"] VSP["Virtualization Service Providers (VSPs): vmswitch.sys, storvsp.sys, ..."] C1["Child partition: Windows VM"] C2["Child partition: Linux VM"] VSC1["VSCs: netvsc, storvsc, ..."] VSC2["VSCs: hv_netvsc, hv_storvsc, ..."] HW --> HV HV --> Root HV --> C1 HV --> C2 Root --> VSP VSP -. "VMBus channel" .-> VSC1 VSP -. "VMBus channel" .-> VSC2 C1 --> VSC1 C2 --> VSC2 The hypervisor itself is small by design. The [Hyper-V architecture page on Microsoft Learn](https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/role/hyper-v-server/architecture) describes it as a microkernel: it does the minimum a hypervisor must do (CPU scheduling, memory partitioning, interrupt routing, an inter-partition message bus) and pushes everything else, including the device models, out to the root partition. This is the opposite of the early VMware ESX design, where the hypervisor itself contained large device drivers. The microkernel choice was pragmatic, not ideological. A monolithic hypervisor with built-in NIC and storage drivers would have been a catastrophic certification problem: every NIC firmware update would risk a hypervisor patch. By delegating I/O to the Windows root partition, Microsoft re-used the entire Windows driver stack. The split also explains why Hyper-V "feels Windows-shaped" even though it is technically not Windows. The root partition is Windows, with all of its drivers, its WMI, its event log, its `Get-VM` PowerShell cmdlets. The hypervisor underneath is a small, separate binary (`hvix64.exe` on Intel, `hvax64.exe` on AMD) that you almost never have a reason to think about. Microsoft itself goes further: in the same architecture document, it stresses that all device-model traffic flows through the root: "the management operating system hosts virtual service providers (VSPs) that communicate over the VMBus to handle device access requests from child partitions" ([Microsoft Learn: Overview of Hyper-V](https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v)). This sets up the question the rest of the article answers: if the hypervisor is small, the guest is unmodified Windows or Linux, and the root partition owns the real devices, then how does a guest actually do disk and network I/O at gigabit-or-better speeds without paying enormous costs to traverse all of these boundaries? The short answer is in three pieces: **enlightenments** (the guest knows it is virtualized and uses hypercalls), **VMBus** (the inter-partition transport), and the **VSP/VSC pair** (split drivers that share memory through VMBus rings). The next section starts with the first of those three. ## 2. Enlightenments: what "knowing you are virtualized" buys you In the early 2000s, the dominant intuition was that a hypervisor's job is to fool the guest. A perfectly faithful emulation of an Intel 440BX motherboard, a DEC 21140 NIC, and an IDE controller is what made VMware Workstation a useful product in 1999. It is also what made Microsoft Virtual Server 2005 too slow to saturate gigabit links: every `out` instruction on a fake NIC port trapped to the hypervisor, was decoded against an in-memory chip model, and produced a synthetic interrupt that itself trapped on the way out. The [Microsoft Virtual Server retrospective on Wikipedia](https://en.wikipedia.org/wiki/Microsoft_Virtual_Server) notes that the architecture had no paravirtualization support and that performance was constrained relative to later hardware-assisted designs. Hyper-V's answer was to drop the pretence. If the guest *knows* it is in a VM, it can use a fast path designed for VMs instead of pretending to drive imaginary chips. Microsoft calls this knowledge an **enlightenment**, and the [Hyper-V feature discovery page](https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery) is the contract a guest uses to learn what enlightenments the hypervisor offers. A modification or feature in a guest operating system that takes advantage of running under a specific hypervisor. An enlightened guest detects the hypervisor (on x86, by reading the `cpuid` leaves at `0x40000000` and above), then opts in to using paravirtual interfaces (hypercalls, synthetic timers, synthetic interrupt controllers, shared TSC pages) instead of trapping on emulated hardware. An unmodified guest would still boot, but slower. Detection is the cheap part. The Linux kernel's [Hyper-V overview document](https://www.kernel.org/doc/html/latest/virt/hyperv/overview.html) describes four cooperating mechanisms, layered atop one another: implicit traps that the hypervisor handles transparently, **explicit hypercalls** the guest issues on purpose, **synthetic registers** exposed as model-specific registers (MSRs) in the architectural CPU register file, and **VMBus** for high-bandwidth device traffic. Each layer builds on the one below it. > **Key idea:** The contract between Hyper-V and its guests is *published*. Microsoft maintains the **Top-Level Functional Specification** as a public document under the Open Specification Promise. That single decision is why Linux ships an in-tree Hyper-V driver stack and why VMBus is not a black box. ### The hypercall page The first thing an enlightened guest does is set up a hypercall page. The [TLFS Hypercall Interface page](https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface) describes the dance: the guest writes its identity into `HV_X64_MSR_GUEST_OS_ID` (MSR `0x40000000`), then writes a guest-physical address and an `enable` bit into `HV_X64_MSR_HYPERCALL` (MSR `0x40000001`). The hypervisor responds by populating that page with the right opcode for the current CPU: `vmcall` on Intel, `vmmcall` on AMD. From that moment on, "make a hypercall" is a normal `call` into a known address rather than an opcode the kernel must hand-assemble per CPU vendor. This trick neatly externalises the vendor-specific calling convention. Microsoft can later swap to a new opcode (say, on ARM64, where the equivalent is an `HVC` instruction) without any guest code change. The guest just learns the new page contents. The same TLFS page documents two hypercall classes: **simple** hypercalls (one operation, returns or faults) and **rep** (repeated) hypercalls that take a counter and a start index, so a long-running operation can yield mid-flight without losing work. Three calling conventions exist: a memory-based one for large parameter blocks, a register-only fast variant for the very common case of one or two inputs, and an XMM-register variant that lets a guest pass up to 112 bytes of input through SSE registers. That XMM variant is unusual enough to flag. Most kernel ABIs do not touch SSE in privileged code because saving and restoring the full SSE state is expensive. Hyper-V's hypercall ABI uses XMM precisely because the round-trip cost of a hypercall is dominated by the `VMEXIT` itself, so squeezing a few more bytes into registers is cheaper than spilling them to memory and reading them back. ### Synthetic interrupts and synthetic timers A guest's virtual processor has its own emulated local APIC by default, but an enlightened guest can also use a **Synthetic Interrupt Controller (SynIC)**, defined in the TLFS. Each virtual processor gets 16 SINT slots, a per-CPU shared message page, and a per-CPU shared event page. SINTs are how VMBus signals events to the guest without going through the legacy LAPIC fast path. One of 16 logical interrupt sources per virtual processor that the Hyper-V Synthetic Interrupt Controller can signal. SINTs are reachable through MSRs (`HV_X64_MSR_SINT0` through `HV_X64_MSR_SINT15`) and back the doorbell mechanism for VMBus channels and for synthetic timers. They are paravirtual: they would not exist on a bare-metal CPU. The clock side is even more interesting. The [Linux kernel Hyper-V clocks documentation](https://www.kernel.org/doc/html/latest/virt/hyperv/clocks.html) describes a **reference TSC page** that the hypervisor maintains in shared memory: it contains a scale factor and an offset such that $$ \text{guest\_time} = (\text{TSC} \times \text{scale}) >> 64 + \text{offset} $$ ticks at a constant 10 MHz frequency regardless of the underlying TSC. The guest's `clock_gettime` and `gettimeofday` can read TSC, multiply, shift, add, and return, all in user space via vDSO, with no kernel transition and no hypercall. Synthetic timers complete the picture. Each virtual CPU has four synthetic timers programmable via MSRs; they fire SINTs into the SynIC. The guest does not need to touch an emulated PIT or HPET. Combined, SynIC + synthetic timers + the reference TSC page mean that an enlightened guest can do most of its time-keeping and inter-partition signalling without ever touching the legacy interrupt/timer chip surface. ### The TLFS as a contract All of this is published. The [Top-Level Functional Specification](https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs) is the document a guest author reads to know which MSRs to write, which `cpuid` leaves to query, which hypercalls exist, and which features the hypervisor signals via feature flags. Microsoft maintains it under the Open Specification Promise. That promise is a deliberate contractual choice. Without it, Linux could not ship `drivers/hv/` in-tree and Microsoft could not credibly claim that Linux is a first-class Hyper-V guest. The TLFS is the artefact that makes the rest of the architecture cooperative rather than reverse-engineered. The next layer up uses these primitives to build something more ambitious: a general-purpose inter-partition transport. ## 3. VMBus: the inter-partition transport If enlightenments are the alphabet, VMBus is the language that synthetic devices speak. The [Linux kernel VMBus document](https://www.kernel.org/doc/html/latest/virt/hyperv/vmbus.html) puts the definition tersely: "VMBus is a software construct provided by Hyper-V to guest VMs. It consists of a control path and common facilities used by synthetic devices that Hyper-V presents to guest VMs. The common facilities include software channels for communicating between the device driver in the guest VM and the synthetic device implementation that is part of Hyper-V, and signaling primitives to allow Hyper-V and the guest to interrupt each other." There is a lot in that paragraph. Let me unpack it, because this is the architectural core. A software-only inter-partition communication bus provided by Hyper-V. It has a control path (channel offer, open, close, rescind), and per-device data channels built on shared memory ring buffers. VMBus is not a real bus in any hardware sense; nothing on the PCIe topology is named VMBus. It is a contract between guest drivers and the hypervisor. ### Channels and the offer protocol Every synthetic device a guest sees corresponds to a **VMBus channel**. The root partition advertises (`OfferChannel`) the list of devices a guest is permitted to use. The guest's VMBus driver iterates the offers, matches each to a class GUID (synthetic SCSI is one GUID, synthetic NIC is another, the input-style `vmbusrhid` device is a third), and binds an in-kernel device driver to each one. The reverse operation, `RescindChannel`, lets the host revoke a device cleanly, which is what happens during live migration when an SR-IOV virtual function gets pulled out from under a running VM. sequenceDiagram participant Root as Root partition (VSP) participant HV as Hyper-V hypervisor participant Guest as Guest VM (VSC) Root->>HV: OfferChannel(class_guid, instance_guid) HV->>Guest: ChannelOffer message via SynIC Guest->>HV: OpenChannel(ringbuf_gpa, signal_event) HV->>Root: Channel opened loop steady-state I/O Guest->>Root: write descriptor + payload to ring, signal SINT Root->>Guest: write response to ring, signal SINT end Root->>HV: RescindChannel(instance_guid) HV->>Guest: ChannelRescind via SynIC Guest->>Root: CloseChannel ### Two ring buffers, one channel Each open channel is two unidirectional ring buffers in shared memory: one for guest-to-host messages, one for host-to-guest. Each ring has a 4 KiB header page that holds the read index, the write index, and control flags, plus a power-of-two payload region. The guest tells the hypervisor which guest-physical pages back the ring through an object called a **GPA Descriptor List** (GPADL), built up via the `vmbus_establish_gpadl` API. The kernel doc reveals a small but durable engineering detail. It maps the ring buffer twice in the guest's kernel virtual address space: header page first, ring contents next, and then *the ring contents again*, contiguously. Why? Because that lets a copy loop walk past the end of the ring without writing wrap-around code; the next byte after the ring's last byte is the ring's first byte, by virtual-memory arrangement. It is the same trick used inside the Linux page cache for `fbdev` and inside DPDK's mempool. It costs a little address space; it saves a branch on every payload byte. The Linux kernel doc is explicit that this double-mapping convenience exists in the guest only. If you are writing a userspace tool that ingests a captured VMBus ring (for forensics or debugging) you must implement wrap-around manually. This is exactly the kind of detail that source code documentation captures and prose articles forget. The total amount of GPADL-shared memory a single guest can hold is capped per Windows version. The kernel doc records the numbers: roughly **1280 MiB on Windows Server 2019 and later**, roughly **384 MiB on earlier hosts** ([Linux kernel: VMBus](https://www.kernel.org/doc/html/latest/virt/hyperv/vmbus.html)). For a guest with 30+ channels (multiple netvsc subchannels, multiple storvsc subchannels, vPCI, KVP, time sync, VSS, balloon, framebuffer), that ceiling is real but not yet limiting at typical ring sizes of 1 to 16 MiB per direction. ### The doorbell Shared memory alone is not enough. The guest can write into the ring all it wants; the host will not look until it is told to. Conversely, the host can write into the ring; the guest will not check until something signals it. That signal is the doorbell, and it is implemented via the **Synthetic Interrupt Controller** SINTs introduced in the previous section. When the guest enqueues a request and the host's read pointer is already chasing it (i.e., the host is still processing the last batch), the guest can suppress the doorbell entirely. Only the *first* request after the host has caught up triggers a hypercall. This is **interrupt coalescing in software**, and it is the single most important performance lever on a software data plane: the round-trip cost of a `VMEXIT` is amortised across many packets. > **Note:** This same shape, shared memory rings plus an event-channel doorbell, was the central insight of [Xen's split-driver paravirtualization model in 2003](https://wiki.xenproject.org/wiki/Paravirtualization_(PV)). Hyper-V's contribution was not the shape; it was packaging the shape so unmodified Windows guests could use it via in-box drivers, and publishing the protocol so unmodified Linux could too. ### VSPs and VSCs The two endpoints of a channel have specific names. The **Virtualization Service Provider (VSP)** is the kernel module in the root partition that owns the device backend. The **Virtualization Service Client (VSC)** is the guest-side driver that talks to the VSP through the channel. Microsoft's own architecture page is precise: "the Hyper-V-specific I/O architecture consists of virtualization service providers (VSPs) in the root partition and virtualization service clients (VSCs) in the child partition. Each service is exposed as a device over VM Bus, which acts as an I/O bus and enables high-performance communication between VMs that use mechanisms such as shared memory" ([Microsoft Learn: Hyper-V architecture](https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/role/hyper-v-server/architecture)). **VSP** (Virtualization Service Provider): a kernel module in the root partition that exposes a synthetic device backend to guests over a VMBus channel. Examples: `vmswitch.sys` (synthetic NIC), `storvsp.sys` (synthetic SCSI), the `vmbusrhid` server (synthetic input). **VSC** (Virtualization Service Client): the matching driver in the guest that consumes the channel and presents an OS-native device interface (a NIC, a SCSI controller, a keyboard) to the rest of the kernel. The split is symmetric in transport (both sides use the same ring) but asymmetric in trust. The VSP runs in the *most* privileged context on the box, the root partition's kernel. The VSC runs in a normal guest kernel. Every byte that flows from guest to host crosses a trust boundary and gets parsed by code with full system privilege. The next two sections will return to this fact at length, because it is where the security story lives. ### Why this works for closed-source guests The Xen project tried something similar in 2003 with `netfront`/`blkfront` rings and event channels, but Xen PV required a paravirtualised guest kernel: the guest had to know it was running on Xen at compile time. Closed-source guests like Windows could not be modified, so [Xen's wiki](https://wiki.xenproject.org/wiki/Paravirtualization_(PV)) eventually documents PV-on-HVM as a workaround. Hyper-V finessed this with hardware virtualization. The guest kernel runs unmodified inside VT-x or AMD-V; CPU-level privilege separation handles the privileged instructions. The only thing the guest needs to do to opt into VMBus is *load a driver*. Every supported Windows version since Windows 7 / Server 2008 R2 ships those drivers in-box. Linux ships them in-tree from kernel 2.6.32 onward. There is no separate "install paravirt drivers" step, which is why Hyper-V "just works" for almost any guest you point at it. The transport is settled. What rides on it is a catalogue. ## 4. Synthetic device classes: storage, network, input, video, vPCI A modern Hyper-V guest, on first boot, sees a small zoo of devices that have nothing to do with PC hardware. There is no IDE controller, no PS/2 keyboard, no Cirrus VGA. There is a synthetic SCSI controller, a synthetic NIC, a synthetic keyboard and mouse, a synthetic framebuffer, and (often) a synthetic PCI passthrough channel. Each is a VSP/VSC pair on top of VMBus. The [Linux kernel VMBus document](https://www.kernel.org/doc/html/latest/virt/hyperv/vmbus.html) enumerates the catalogue: synthetic SCSI controller (`storvsc`), synthetic NIC (`netvsc`), synthetic framebuffer (`synthvid`), synthetic keyboard, synthetic mouse, PCI passthrough, plus the non-device services: heartbeat, time sync, shutdown, memory balloon, KVP exchange, and online backup (VSS). flowchart LR subgraph Guest nv["netvsc (NIC)"] st["storvsc (SCSI)"] sv["synthvid (framebuffer)"] kb["hyperv-keyboard"] ms["hyperv-mouse"] pc["pci-hyperv (vPCI)"] kvp["hv_kvp (KVP)"] ts["hv_utils (timesync, shutdown, heartbeat)"] end subgraph Root vsw["vmswitch.sys"] sto["storvsp.sys"] sfb["synthvid VSP"] rhid["vmbusrhid VSP"] vpci["vPCI VSP"] kvpd["KVP daemon"] tsd["IS daemons"] end nv -- "VMBus channel" --- vsw st -- "VMBus channel(s)" --- sto sv -- "VMBus channel" --- sfb kb -- "VMBus channel" --- rhid ms -- "VMBus channel" --- rhid pc -- "VMBus channel" --- vpci kvp -- "VMBus channel" --- kvpd ts -- "VMBus channel" --- tsd ### Synthetic SCSI: storvsc The `storvsc` VSC presents itself to the guest as a SCSI host bus adapter. Disks attached to the VM appear as SCSI LUNs hanging off that HBA. The wire protocol uses ring buffers carrying SRB (SCSI Request Block) style commands. To scale, storvsc can open multiple **sub-channels**, one per host CPU, so that I/O completion interrupts and request submission spread across cores rather than serialising on a single VMBus channel. This is also why Hyper-V's "Generation 2" VMs work. A [Generation 2 VM](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v), introduced in Windows Server 2012 R2 in 2013, has no IDE controller in the boot path at all. UEFI loads the OS loader from a synthetic SCSI device, the OS loader hands off to the kernel, and the kernel binds storvsc to the same device. The legacy IDE emulator simply never runs. That removes a lot of attack surface and lets boot volumes grow up to 64 TB on VHDX. ### Synthetic NIC: netvsc `netvsc` is the synthetic NIC. The wire protocol historically wrapped Microsoft's NDIS-style RNDIS frames around payloads sent through the channel ring, which is why some Linux discussions mention "RNDIS frames over VMBus." The Linux driver lives in `drivers/net/hyperv/` and the [kernel netvsc documentation](https://docs.kernel.org/networking/device_drivers/ethernet/microsoft/netvsc.html) describes how it can spread receive-side traffic across multiple VMBus subchannels via Receive Side Scaling. netvsc is also the one device class where Hyper-V composes with hardware passthrough. Section 8 will take this apart in detail; for now, note that the same `netvsc` VSC can run alongside an SR-IOV virtual function in the guest, with `netvsc` acting as the slow-path failover and the VF carrying the steady-state traffic. ### Synthetic input: vmbusrhid The synthetic keyboard, the synthetic mouse, and a few related input streams ride on a server in the root partition called `vmbusrhid` (the name is shorthand for "VMBus relay HID"). It is a small surface in bytes, but architecturally it has the same shape as netvsc: guest-controllable messages parsed in kernel mode in the root partition. Anyone evaluating the trust boundary should treat it the same way as netvsc, even though the data rate is six orders of magnitude lower. > **Note:** A path that carries 100 keystrokes per second is, on the wire, almost free. As an attack surface, it is identical to a path that carries a million packets per second: both are guest-controlled bytes parsed by privileged code. Section 7 walks through why the security community treats `vmbusrhid` the way it treats `vmswitch.sys`. ### Synthetic video: synthvid `synthvid` is a synthetic framebuffer. It is what lets you connect to a Hyper-V VM through the Virtual Machine Connection client without dragging in an emulated VGA. It is intentionally simple: there is no 3D acceleration in the synthetic path. Workloads that need GPU acceleration use a different mechanism, vPCI / DDA, to assign a real GPU to the VM. ### vPCI: synthetic PCI passthrough The most subtle device class is `pci-hyperv`, which exposes a virtual PCIe topology to the guest. The Linux kernel [vPCI document](https://www.kernel.org/doc/html/latest/virt/hyperv/vpci.html) describes the trick: a passthrough device is offered to the guest *initially* over VMBus (the channel carries the device's PCI configuration space and BARs), and once the guest's vPCI driver has constructed a real PCI device object for it, the device dual-identifies as a normal PCIe device. The vendor driver can then load against it. This is the mechanism behind both Hyper-V's [Discrete Device Assignment (DDA)](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment) and Azure's Accelerated Networking, which we will return to in Section 8. The DDA planning document is explicit that Microsoft formally supports DDA for **GPUs and NVMe storage** as device classes; other PCIe devices are "likely to work" but require vendor support. ### Generation-1 vs Generation-2: a quick decoder Putting the device classes side by side clarifies why the move from Generation-1 to Generation-2 VMs simplified so much: | Element | Generation-1 VM (legacy) | Generation-2 VM (since 2013) | |---|---|---| | Firmware | BIOS | UEFI with Secure Boot | | Boot disk | Emulated IDE | Synthetic SCSI (`storvsc`) | | Network on boot | Emulated DEC 21140 fallback | Synthetic NIC (`netvsc`) | | Input | Emulated PS/2 + `vmbusrhid` | `vmbusrhid` only | | Display | Emulated VGA + `synthvid` | `synthvid` only | | Max boot VHDX | 2 TB | 64 TB | | Source | [Microsoft Learn: Gen 1 vs Gen 2](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v) | Same | Generation-2 is what the Hyper-V architecture wanted to be from the beginning: an all-synthetic stack with no fallback to imaginary 1990s chipsets. The two-generation existence was not a design preference; it was the cost of supporting older operating systems whose boot loaders only knew about BIOS and IDE. Today, every modern Windows and modern Linux supports Generation-2; Generation-1 remains for legacy guests. ### Counting boundary crossings The shape of the hot path is now visible. To send one network packet from a guest: 1. The guest writes one descriptor and one payload copy into the netvsc TX ring (one memory copy). 2. The guest possibly fires a doorbell (one hypercall, often suppressed if the host has not caught up). 3. The host's `vmswitch.sys` reaps the descriptor, parses it, and forwards it through the virtual switch to a real NIC. A single packet's hot path is **at most one hypercall and one memory copy in the guest**, plus host-side ring traversal. Section 8's comparison table will quantify how this stacks up against virtio and SR-IOV, but the scale is clear: paravirt I/O on Hyper-V is orders of magnitude cheaper per packet than full PC emulation, and the gap closes only when you go all the way to hardware passthrough. The catalogue is set. Now, who actually wrote the Linux side of all this? ## 5. Linux Integration Services: Microsoft writes Linux drivers In December 2009, Microsoft did something quietly historic. Linux kernel 2.6.32 merged a set of drivers under `drivers/staging/hv/`, contributed by Microsoft itself, that taught the Linux kernel to be an enlightened Hyper-V guest. The [kernel.org Hyper-V index page](https://www.kernel.org/doc/html/latest/virt/hyperv/index.html) is the maintained landing page for that work. Over the next several releases the drivers moved out of `staging/`, settled at `drivers/hv/`, `drivers/net/hyperv/`, `drivers/scsi/storvsc_drv.c`, and `drivers/pci/controller/pci-hyperv.c`, and became the default in every mainstream distribution. That set of drivers is collectively called **Linux Integration Services (LIS)**. The set of in-kernel Hyper-V guest drivers that Microsoft contributes to upstream Linux. Includes `hv_vmbus` (the VMBus core), `hv_netvsc` (synthetic NIC), `hv_storvsc` (synthetic SCSI), `hv_utils` (KVP, time sync, shutdown, heartbeat, VSS), `pci-hyperv` (vPCI), and `hv_balloon` (memory ballooning). The same code that Microsoft maintains in the Linux tree powers Linux guests on Hyper-V on Windows Server, on Azure, and on developer Hyper-V on Windows 11. The reason this matters is bigger than convenience. In 2009, Linux had a long, painful history with Hyper-V's competitors. VMware shipped `open-vm-tools` but the deepest paravirt drivers (VMXNET3, PVSCSI) lived in vendor packages. Xen's PV drivers existed in-tree but their evolution depended on Citrix and the Xen project. By contributing the full driver stack upstream and committing to keep it there, Microsoft chose a different route: they put the *spec* (the TLFS) and the *implementation* (LIS) in the open at the same time. Microsoft did not just publish a hypervisor specification and hope Linux would adopt it. They wrote the Linux drivers themselves and upstreamed them, and then they kept doing it for fifteen years. You can see the maintenance pattern in any current kernel. The `drivers/hv/` directory has continuous commit activity from Microsoft engineers. Kernel-doc files like the [VMBus](https://www.kernel.org/doc/html/latest/virt/hyperv/vmbus.html), [clocks](https://www.kernel.org/doc/html/latest/virt/hyperv/clocks.html), [vPCI](https://www.kernel.org/doc/html/latest/virt/hyperv/vpci.html), [overview](https://www.kernel.org/doc/html/latest/virt/hyperv/overview.html), and [CoCo VM](https://www.kernel.org/doc/html/latest/virt/hyperv/coco.html) pages are written by the same engineers who write the drivers. Several of those documents are the most lucid descriptions of the architecture that exist anywhere in public. One unexpected consequence: the Linux kernel docs are often easier to read for the architecture than Microsoft's own customer-facing docs. The customer docs answer "how do I configure this?"; the kernel docs answer "what is actually happening?" When researching this article, I found that the cleanest single description of VMBus channel lifecycle is the Linux kernel doc, not the TLFS. ### What "in-box" really means Both major guests now ship VMBus support without any post-install step: - On Windows, the VMBus client stack is built into every supported Windows version since Windows 7 / Windows Server 2008 R2. The legacy Integration Services package, which once shipped as an ISO you mounted into the VM, is no longer needed on supported Windows. - On Linux, the drivers are in-tree from kernel 2.6.32 (December 2009) onward and ship in every mainstream distro. The [kernel.org Hyper-V overview document](https://www.kernel.org/doc/html/latest/virt/hyperv/overview.html) explicitly warns against installing legacy LIS packages on top of a kernel that already has the in-tree drivers: it can break MSI-X handling and PCI passthrough. This is the kind of operational footgun that survives precisely because the in-box answer is correct and the LIS package is a holdover from earlier kernels. ### A practical smoke test You can confirm a Linux guest is using its enlightenments without any vendor tooling. The kernel exposes `cpuid` leaves and Hyper-V detection through `dmesg` and through `/sys`. A small script makes it concrete: {` // This logic mirrors what \`dmesg | grep -i hyperv\` and a peek into // /sys/devices/virtual/misc/vmbus would tell you on a real Linux Hyper-V guest. const guestObservations = { cpuidSig: '0x40000000', // Microsoft's vendor signature for Hyper-V guestOsIdMsr: 0x40000000, // HV_X64_MSR_GUEST_OS_ID, written by the guest hypercallMsr: 0x40000001, // HV_X64_MSR_HYPERCALL, returns the hypercall page vmbusModuleLoaded: true, netvscDevice: '/sys/class/net/eth0/device/driver', netvscDriverName: 'hv_netvsc', storvscModuleLoaded: true, }; function isEnlightenedHyperVGuest(o) { if (o.cpuidSig !== '0x40000000') return false; if (!o.vmbusModuleLoaded) return false; if (o.netvscDriverName !== 'hv_netvsc') return false; return true; } console.log( isEnlightenedHyperVGuest(guestObservations) ? 'Yes: Hyper-V enlightened, using netvsc + storvsc' : 'No: running on emulated PC hardware or non-Hyper-V hypervisor' ); `} The point is not the script itself (anyone can write a few lines of `awk` against `dmesg`); it is that the verification surface is *public*. The CPU vendor signature, the MSRs, the kernel module names, the `/sys` paths are all documented. There is nothing to reverse-engineer. ### Why this earned trust Two pieces of practical evidence persuaded the Linux community that LIS was not a strategic trap: 1. **The drivers stayed upstream.** From 2009 to the present, Microsoft has maintained the `drivers/hv/` tree, responded to maintainer feedback, and shipped patches through the normal kernel process. 2. **The TLFS stayed accurate.** Successive Hyper-V releases either matched what the TLFS said or updated the TLFS. There was no second, secret protocol. The combination put Microsoft in the unusual position of being the most open hypervisor vendor for Linux guest support. (VirtIO on KVM has a richer cross-vendor story; that comparison is Section 8.) This open posture is also what set up the 2024 OpenVMM open-sourcing as a credible move rather than a stunt. But before we get to OpenVMM, we need to look at a different way Hyper-V matters: not just as a substrate for VMs, but as a substrate for in-VM security boundaries inside Windows itself. ## 6. VBS and HVCI: Hyper-V as the trust anchor inside Windows Up to this point the article has treated Hyper-V as a virtualization product: a thing that hosts VMs. Starting in Windows 10 and [Windows Server 2016](https://learn.microsoft.com/en-us/windows-server/get-started/whats-new-in-windows-server-2016), Microsoft began using the same hypervisor for a different job: enforcing security boundaries inside a single OS install. The umbrella name is **Virtualization-Based Security (VBS)**. The mechanism is simple in description and subtle in consequences. The hypervisor splits a single guest's address space into two **Virtual Trust Levels (VTLs)**. The lower one, VTL0, runs the normal Windows kernel and user mode (this is where `explorer.exe` and your browser live). The higher one, VTL1, runs a much smaller stack called the **Secure Kernel** plus a set of isolated user-mode services called **trustlets**. A compromise of VTL0, even of `ntoskrnl.exe`, cannot read or write VTL1 memory because the hypervisor enforces that boundary using the same hardware machinery (Intel EPT / AMD NPT, plus Intel VT-d / AMD-Vi for DMA) that it uses to isolate one VM from another. A Hyper-V construct that partitions a single guest's address space into multiple privilege tiers enforced by the hypervisor. VTL0 hosts the normal kernel and user mode; VTL1 hosts the Secure Kernel and trustlets. The hypervisor presents each VTL with its own separate set of memory mappings, system registers, and interrupt state, so code running at VTL0 cannot read VTL1's memory even if it has run-as-NT-AUTHORITY-SYSTEM privilege. flowchart TD HV["Hyper-V hypervisor"] subgraph Guest["A single Windows guest"] subgraph VTL0["VTL0 (normal world)"] User0["User mode: apps"] Kernel0["NT kernel"] end subgraph VTL1["VTL1 (secure world)"] SK["Secure Kernel"] Trustlets["Trustlets: LSAIso, BIOiso, ..."] end end HV --> Guest HV -. "EPT + IOMMU enforcement" .-> VTL0 HV -. "EPT + IOMMU enforcement" .-> VTL1 Kernel0 -. "VTL switch (hypercall)" .-> SK ### What lives in VTL1 The flagship inhabitant of VTL1 is **Hypervisor-protected Code Integrity (HVCI)**, which moves kernel-mode page-table integrity checking into the Secure Kernel. With HVCI on, no VTL0 driver can mark a kernel page as both writable and executable; the Secure Kernel mediates the page tables and refuses the request. The result is that attackers who already have code execution in the NT kernel cannot trivially load arbitrary unsigned kernel code or build new executable JIT pages on the fly. The other tenants of VTL1 are **trustlets**. The most familiar is `lsaiso.exe` (LSA Isolation), which holds the cached domain credentials that historically lived in `lsass.exe` and were the prime target for tools like Mimikatz. With Credential Guard on, those secrets move to a trustlet whose memory is unreadable from VTL0; even SYSTEM-level malware in the normal world cannot extract them. Other trustlets handle biometric template storage, key isolation for code integrity policy, and similar small, security-sensitive workloads. ### Why the hypervisor is the right place for this Putting these protections inside the hypervisor rather than inside the kernel has a property that no in-kernel mitigation can match: **the protected component does not share an address space with the attacker**. A defence built inside `ntoskrnl.exe` (`PatchGuard`, `KASLR`, control-flow guard) lives in the same memory the attacker is trying to corrupt. A defence built inside VTL1 lives in memory the attacker cannot touch, because the page tables that map it are themselves invisible from VTL0. > **Note:** Pre-VBS Windows had decades of memory-safety bugs in the NT kernel. After VBS, exploiting one of those bugs no longer immediately yields the attacker the ability to read LSASS secrets or load arbitrary kernel code. The attacker now needs a *second* bug, in the much smaller Secure Kernel codebase. The defender's effective budget went up by a large multiplier without rewriting a single line of NT. ### How this connects back to VMBus VBS would not be possible without the work the previous sections described. The Secure Kernel is what runs in VTL1; it needs to communicate with VTL0 for ordinary system services (the `lsaiso.exe` process must respond to authentication requests from VTL0 callers, the HVCI mediator must answer page-table requests, and so on). The signalling and shared-memory primitives that make those calls cheap are the same SynIC and shared-page primitives that VMBus uses between partitions. In other words, the architecture Microsoft built in 2008 to give a Windows VM a fast network card became, in 2016, the architecture that gives a single Windows install a security boundary stronger than its own kernel. The same hypervisor, the same trust-mediation primitives, two completely different applications. [Windows Server 2019](https://learn.microsoft.com/en-us/windows-server/get-started/whats-new-in-windows-server-2019) extended this further with Hyper-V isolation for containers, where a container's lightweight VM gets its own kernel inside a tiny VTL0 of its own. The pattern is consistent: every time Windows wanted a stronger isolation primitive, the answer was "use the hypervisor." This dual-use is the reason a serious Windows security review touches the Hyper-V codebase even on machines that nobody thinks of as virtualization hosts. A Hyper-V escape (a guest-to-host VMBus exploit) is not just "an exploit against Azure"; it is also, on a typical Windows 11 desktop with VBS enabled, an exploit against the boundary that protects LSASS secrets from kernel-mode malware. That makes the next section's question urgent: how strong is the VMBus boundary, in practice? ## 7. VMBus security: every message is a parser at the trust boundary Here is the part of the architecture worth being honest about. The same property that makes VMBus fast, namely that the host-side VSP runs in the root partition's kernel and parses guest-supplied bytes directly, also makes the VSP the most consequential piece of attack surface in the entire stack. Microsoft itself prices it that way: the [Hyper-V Bug Bounty Program](https://www.microsoft.com/en-us/msrc/bounty-hyper-v) pays up to **USD 250,000** specifically for guest-to-host escapes that hit this surface, which is among the highest payouts Microsoft offers for any category of vulnerability. > **Key idea:** Every byte that crosses a VMBus channel from a guest is a byte that a kernel-mode parser in the most privileged partition on the host has to interpret. The performance argument for a software data plane and the security argument against it are the same argument, looked at from opposite directions. ### The historical record Three CVEs make the pattern concrete: - **CVE-2017-0075** is the Hyper-V escape that the Qihoo 360 Vulcan Team demonstrated at Pwn2Own 2017. The [NVD entry](https://nvd.nist.gov/vuln/detail/CVE-2017-0075) describes it as a Hyper-V flaw that "allows guest OS users to execute arbitrary code on the host OS via a crafted application." The reachable code was in a VMBus message handler on the host side. - **CVE-2021-28476** is the canonical example. The [NVD record](https://nvd.nist.gov/vuln/detail/CVE-2021-28476) classifies it as a critical Hyper-V remote code execution vulnerability with a CVSS score of 9.9. The [Akamai writeup with Guardicore and SafeBreach](https://www.akamai.com/blog/security/critical-vulnerability-in-hyper-v-allowed-attackers-to-exploit-azure) traces the bug to `vmswitch.sys`, the synthetic-NIC VSP, and shows it had been present in production since the August 2019 vmswitch build. The exploit primitive is exactly what the architecture invites: a guest crafts an OID-style RNDIS request, sends it through the netvsc VMBus channel, and the host's kernel parser misvalidates a length, producing memory corruption in the most privileged kernel on the box. - **CVE-2024-21407** is a more recent Hyper-V remote code execution vulnerability patched in March 2024 ([NVD](https://nvd.nist.gov/vuln/detail/CVE-2024-21407)). Its existence demonstrates that the bug class did not vanish; the same shape (guest-controlled message, host kernel parser, escalation to host code execution) keeps reappearing. ### Why the bug class is structural The pattern in all three CVEs is the same: 1. A guest writes carefully crafted bytes into a VMBus channel ring. 2. The guest fires the doorbell. 3. The host's VSP, running in the root partition's kernel, dequeues the message. 4. The VSP parses the message in C or C++ kernel code. 5. A memory-safety mistake (length confusion, missing bounds check, integer overflow) becomes a write or read primitive in the host kernel. There is no exotic mechanism here. The exploit surface is "kernel C code parsing untrusted input," which has been the dominant source of remote-code-execution bugs in operating systems since the 1990s. The novelty is the location: the parser sits below the most privileged supervisor on the box, with full access to every other tenant's memory. sequenceDiagram participant Mal as Malicious guest VM participant Ring as VMBus ring (shared memory) participant SInt as Synthetic Interrupt Controller participant VSP as Host VSP (e.g., vmswitch.sys, kernel) Mal->>Ring: Write crafted RNDIS-style message Mal->>SInt: Hypercall: signal channel event SInt-->>VSP: SINT delivered on host CPU VSP->>Ring: Read message header note over VSP: Length confusion / missing bounds check VSP->>VSP: Out-of-bounds write in root partition kernel note over VSP: Result: arbitrary code in the most privileged partition ### Mitigations short of a rewrite Microsoft's first line of defence is the same one every kernel team uses: ASLR, control-flow integrity, kernel hardening, fuzzing the parsers, code review of every new device class, and, on Azure specifically, isolating each tenant's compute hypervisor so a single compromised host does not become a multi-tenant disaster. The MSRC bounty program is partly a procurement mechanism for this same effort: pay researchers to find and report bugs before attackers find them in the wild. A second line of defence is **Generation-2 VMs** ([Microsoft Learn](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v)), which remove the legacy emulators (IDE, PS/2, PIC) from the host data path entirely. Every emulator removed is one fewer parser in the most privileged kernel. A third is the [Microsoft Hyper-V architecture page](https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/role/hyper-v-server/architecture)'s "minimise root-partition exposure" guidance: configure hosts with the smallest set of root-partition services that the workload requires, since every service is potential surface. These all help, but none of them change the structural fact that VSPs parse guest-controlled data in C/C++ kernel code. The next architectural shift, the one that does change that fact, is what Section 9 is about. ### Side channels and the Spectre era VMBus also has to defend against side-channel attacks across the partition boundary. The same Spectre / Meltdown / L1TF mitigations that apply to a multi-tenant hypervisor in general apply to Hyper-V specifically. Microsoft's broader hypervisor mitigation strategy interacts with VMBus mostly indirectly: the SynIC, the hypercall page, and the timer subsystem all needed audit and adjustment when these classes of attacks emerged. The detail is largely outside the scope of an article about the device model, but the takeaway is consistent with the rest of this section: any shared CPU resource between partitions is a potential attack surface, and "shared via the hypervisor's bus" is no exception. The structural answer to all of this, the one Microsoft itself has been working toward, is to change the languages and the trust boundaries. To set that up, the next section first widens the field by comparing VMBus to its peer in the KVM world, virtio. ## 8. VMBus vs virtio: two answers to the same question Hyper-V is not the only hypervisor with a paravirt I/O story. The KVM world evolved its own answer to the same problem at roughly the same time, and it ended up with a different design with different trade-offs. The standard is **virtio**. The original virtio paper, [Rusty Russell's "virtio: Towards a De-Facto Standard For Virtual I/O Devices"](https://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf), was published at OLS 2008, the same year Hyper-V shipped. The proposal was explicit in its motivation: every hypervisor was reinventing paravirt drivers, and a single hypervisor-independent specification could let one guest driver work everywhere. OASIS later standardised virtio 1.0 in 2016, then [virtio 1.1 in 2019](https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.pdf), then [virtio 1.2 as a Committee Specification in 2023](https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html). A hypervisor-independent paravirtual I/O specification, governed by OASIS. A virtio device is presented to the guest over a transport (PCI, MMIO, or s390 channel I/O) that advertises capability bits. The data plane is a generic ring layout called a **virtqueue**: a ring of descriptors, an `avail` ring (guest-to-host), and a `used` ring (host-to-guest). Each device class (virtio-net, virtio-blk, virtio-scsi, virtio-fs, virtio-gpu) defines its own message format on top of virtqueues. ### The same shape, viewed sideways Architecturally, virtio and VMBus are sibling answers to the same shaped problem. flowchart LR subgraph virtio_pci["virtio over PCI"] gv["Guest virtio driver"] vq["virtqueue (descriptors + avail + used)"] host_be["Host backend (vhost-net, vhost-user, OpenVMM)"] gv -- "PIO doorbell write" --> host_be gv -- "shared memory" --- vq host_be -- "shared memory" --- vq host_be -- "MSI-X" --> gv end subgraph vmbus["Hyper-V VMBus"] gv2["Guest VSC"] ring["Two ring buffers + GPADL"] vsp["Host VSP (kernel)"] gv2 -- "Hypercall doorbell" --> vsp gv2 -- "shared memory" --- ring vsp -- "shared memory" --- ring vsp -- "SINT" --> gv2 end Both: - Use **shared-memory rings** for payload.The phrase "shared-memory rings" hides a small subtlety: a ring buffer is a circular buffer with separate read and write indices. Producer and consumer can run concurrently as long as they only touch their own index, which is what makes ring buffers a wait-free communication primitive on cache-coherent hardware. - Use a **doorbell** for signalling. - **Batch** many requests per doorbell so per-message hypercall cost amortises. - Have **per-class device protocols** layered on top of a common transport. The differences are where the world bites: | Dimension | VMBus | virtio (1.2) | |---|---|---| | Transport | Software-only "bus", channel offer/open/close | PCI, MMIO, s390 channel I/O | | Doorbell | Hypercall (`HV_SIGNAL_EVENT`) | PIO write to a doorbell BAR | | Reverse signal | Synthetic interrupt (SINT) | MSI-X | | Standardisation | Microsoft-owned, [Open Specification Promise](https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs) | OASIS-ratified, multi-vendor | | Windows in-box drivers | Yes, every supported version | No; out-of-box signed VirtIO INFs from cloud vendors | | Device classes beyond I/O | Yes: KVP, time sync, VSS, balloon | Limited; non-I/O often built on virtio-vsock or out-of-band agents | | Cross-hypervisor portability | Hyper-V only | Universal: KVM, QEMU, Cloud Hypervisor, Firecracker, Xen HVM, OpenVMM | | Spec governance | Single vendor under OSP | Multi-vendor with formal conformance clauses | | Source for Linux side | [drivers/hv/](https://www.kernel.org/doc/html/latest/virt/hyperv/index.html) | drivers/virtio in the Linux tree | ### Where each design wins Virtio's strongest claim is portability. The same Linux guest VM image, with the same in-tree virtio drivers, runs on KVM, QEMU, Cloud Hypervisor, AWS Firecracker, and (since 2024) Microsoft's own OpenVMM, which added virtio backend support. A workload that has to move between cloud providers benefits from this directly: the guest does not need a different driver stack per host. Virtio also has a richer multi-vendor governance story. The spec is OASIS-ratified, with explicit conformance clauses; multiple commercial hypervisors implement it; multiple SmartNIC vendors implement virtio data planes in hardware (the `vDPA` and `VDUSE` work, [described by Red Hat](https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework) and the [Linux kernel VDUSE doc](https://www.kernel.org/doc/html/latest/userspace-api/vduse.html)). VMBus's strongest claim is **integration**. Every supported Windows ships with the VSCs in-box; there is nothing for an admin to install. The transport carries not just I/O but a service catalogue: KVP for guest configuration, time sync, VSS for online backup, the heartbeat and shutdown channels. The TLFS, while owned by Microsoft, is published under the Open Specification Promise and is a *single* document a guest author can read end-to-end. This is why "VirtIO drivers for Windows" exist as a separate project (the Fedora/Red Hat-signed `virtio-win` package) for KVM clouds: out of the box, Windows does not know virtio. The Hyper-V world inverts the problem: out of the box, Linux does not need any third-party install because the drivers are upstream. ### Where they coexist The most interesting recent development is that the two camps have stopped being purely competitive. Microsoft's [OpenVMM](https://github.com/microsoft/openvmm) implements both VMBus and virtio backends, so a Linux guest using virtio drivers can run on a Microsoft-developed VMM, and a Windows guest using VMBus drivers can run on the same VMM. This is partially ideological (Microsoft is no longer pretending its way is the only way) and partially pragmatic (a single VMM that supports both transports is simpler than maintaining two). Beyond the protocol-level comparison, both VMBus and virtio sit inside a larger composition with hardware passthrough, where the **transport becomes the slow path** and a real PCIe device carries the steady-state traffic. ### Hardware passthrough as a complement The composition that runs almost every modern Azure VM is **VMBus + SR-IOV**, packaged as [Accelerated Networking](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview). The same VM gets both a synthetic NIC (`netvsc` over VMBus) and an SR-IOV virtual function. The Linux netvsc driver documentation describes the failover mechanic: "If SR-IOV is enabled in both the vSwitch and the guest configuration, then the Virtual Function (VF) device is passed to the guest as a PCI device. In this case, both a synthetic (netvsc) and VF device are visible in the guest OS and both NIC's have the same MAC address. The VF is enslaved by netvsc device. The netvsc driver will transparently switch the data path to the VF when it is available and up." ([Linux kernel: netvsc](https://docs.kernel.org/networking/device_drivers/ethernet/microsoft/netvsc.html)). When live migration starts, Azure revokes the VF, the data plane falls back to the netvsc/VMBus path, the VM moves, and a new VF on the destination host gets re-attached, all without dropping TCP connections. The VMBus path was never the production hot path, but its existence is what enables migration. The KVM world's analogue is **vDPA**, which gives a virtio-shaped guest interface backed by a hardware data plane. A modern Azure NIC stack is pushing this even further. [Azure Boost](https://learn.microsoft.com/en-us/azure/azure-boost/overview) moves both storage and networking data planes off the host CPU into dedicated FPGAs, with a stable Microsoft-engineered NIC interface called [MANA](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-mana-overview). Microsoft's documentation reports up to 200 Gbps of network bandwidth and 6.6 million IOPS on local storage with this design, with the host's vmswitch still acting as the live-migration fallback path. The architectural insight is that the VMBus-based slow path is the durable invariant; what changes is whether the steady-state data plane is software, an SR-IOV VF, or a SmartNIC firmware path. Frameworks like [DPDK](https://www.dpdk.org/about/) sit on top of whichever data plane the VM exposes. What none of this changes is the property Section 7 cared about: as long as a host-side VSP exists and parses guest-controlled bytes in kernel C/C++, the bug class is open. The next section is about the architectural move that closes it. ## 9. OpenVMM and OpenHCL: the 2024 open-source pivot In 2024, Microsoft did two things that would have been hard to imagine a decade earlier. First, they open-sourced [OpenVMM](https://github.com/microsoft/openvmm), a Rust implementation of the virtualization stack including the VSPs and the VMBus protocol. Second, they introduced [OpenHCL](https://techcommunity.microsoft.com/blog/windowsosplatform/openhcl-the-new-open-source-paravisor/4273172), a "paravisor" configuration of OpenVMM that runs *inside* a confidential VM as a higher-trust mediator between the workload and the (now-untrusted) host. Both moves are explained by the same trend the article has been circling: confidential computing fundamentally inverts the trust boundary, and the device model has to follow. A higher-privileged software layer that runs *inside* a guest VM (not on the host) and mediates the guest's interaction with the hypervisor. In the Hyper-V model, a paravisor lives in VTL2 of the same VM whose workload runs in VTL0; the host hypervisor is outside the VM's trust boundary. The paravisor presents the workload with a familiar VMBus + VSP interface while internally talking to a hardware-isolated confidential VM substrate (AMD SEV-SNP or Intel TDX). ### What changed in confidential computing The classical Hyper-V trust model places the root partition at the apex of trust. The guest trusts the host. Memory the guest writes is, in the worst case, readable by the host. In **confidential computing**, that is no longer acceptable. A regulated workload (a healthcare database, a financial processor) needs to run in a VM whose contents are protected even from a malicious or compromised hypervisor. AMD's **SEV-SNP** and Intel's **TDX** are CPU features that encrypt and integrity-protect VM memory in hardware so that a compromised host cannot read the guest's secrets. [Azure Confidential Computing](https://learn.microsoft.com/en-us/azure/confidential-computing/) made these capabilities available as a product starting around 2022. The [Azure confidential VM options page](https://learn.microsoft.com/en-us/azure/confidential-computing/virtual-machine-options) documents the SKUs. This breaks the old VMBus story. In the classical model, the host's `vmswitch.sys` reads the guest's network packets out of the VMBus ring. In a confidential VM that protection demands you can no longer let the host see those bytes; that defeats the entire point. So the question becomes: where does the synthetic-device backend live, if not in the host? ### The paravisor answer The Linux kernel's [Hyper-V CoCo VMs document](https://www.kernel.org/doc/html/latest/virt/hyperv/coco.html) describes the design directly: "Paravisor mode. In this mode, a paravisor layer between the guest and the host provides some operations needed to run as a CoCo VM. The guest operating system can have fewer CoCo enlightenments than is required in the fully-enlightened case ... some aspects of CoCo VMs are handled by the Hyper-V paravisor while the guest OS must be enlightened for other aspects." OpenHCL is that paravisor. It runs in a higher-trust virtual trust level inside the same confidential VM (VTL2), it has access to the encrypted-memory primitives the CPU provides, and it presents the workload (in VTL0) with the same VMBus + VSP world a non-confidential VM would see. The workload OS does not need to be heavily modified; it sees what looks like Hyper-V, talks to what look like normal VSPs, and never has to know that those VSPs are now inside its own VM rather than on the host. flowchart TD HW["Confidential CPU (SEV-SNP / TDX)"] HV["Host hypervisor (untrusted by the workload)"] subgraph CoCoVM["Confidential VM (memory encrypted)"] VTL2["VTL2: OpenHCL paravisor (Rust VSPs)"] VTL0["VTL0: workload OS (Windows or Linux, lightly enlightened)"] VTL0 -- "VMBus, looks normal" --- VTL2 end HW --> HV HV --> CoCoVM HV -. "no access to guest plaintext" .-> CoCoVM ### The Rust rewrite The other half of the story is **memory safety**. Recall Section 7's CVE list: every headline Hyper-V escape in the past decade involved a parser bug in C/C++ kernel code. OpenVMM's choice to implement the entire VMM, including the VSPs, in Rust is a direct response to that history. Rust's ownership model rules out, by construction, a large class of memory-safety bugs (use-after-free, out-of-bounds access on slices, double-free) that produced those CVEs. This does not magically eliminate every vulnerability. A logic bug in a state machine, an integer-overflow on a length field, a side-channel timing leak: all of these still exist in Rust. But the categories that produced CVE-2017-0075, CVE-2021-28476, and CVE-2024-21407 are exactly the categories Rust was designed to make hard. ### What you can actually look at OpenVMM is not a press release; it is a public repository that ships: - The full Rust source tree at [github.com/microsoft/openvmm](https://github.com/microsoft/openvmm). - A separate repository for the Linux kernel fork that the paravisor runs on top of, at [github.com/microsoft/OHCL-Linux-Kernel](https://github.com/microsoft/OHCL-Linux-Kernel). - Project documentation centred at [openvmm.dev](https://openvmm.dev/). - Both VMBus and virtio backends, so the same VMM can host Windows guests on VMBus and Linux guests on virtio. - Documentation through the deeper [Microsoft Tech Community explainer](https://techcommunity.microsoft.com/blog/windowsosplatform/openhcl-the-new-open-source-paravisor/4273172) and the [original announcement](https://techcommunity.microsoft.com/blog/windowsosplatform/openhcl-the-new-open-source-paravisor/4242991) describing the paravisor's role. For a security researcher or a regulated-cloud customer, this is a meaningful change. For the first time, the VMBus + VSP stack is auditable end-to-end in source. If you want to see how a VSP actually consumes a channel, the OpenVMM repository contains the Rust modules that implement the VMBus channel state machine. Cloning the repo and grepping for `Channel::open` and `RingBuffer` shows the same offer/open/close/rescind pattern Section 3 described, expressed in Rust types whose lifetimes the compiler checks. Reading the same logic in Rust after reading the Linux C version in `drivers/hv/channel_mgmt.c` is a useful exercise; the abstraction is identical, and the safety guarantees diverge. ### What still has to be solved The kernel CoCo doc is candid about an open architectural problem that OpenHCL alone cannot solve: "Unfortunately, there is no standardized enumeration of feature/functions that might be provided in the paravisor, and there is no standardized mechanism for a guest OS to query the paravisor for the feature/functions it provides. The understanding of what the paravisor provides is hard-coded in the guest OS." ([Linux kernel: CoCo VMs](https://www.kernel.org/doc/html/latest/virt/hyperv/coco.html)). In other words, the TLFS gave us a portable contract between guests and Hyper-V hypervisors. The paravisor world does not yet have an equivalent portable contract between guests and paravisors. Today's guests have OpenHCL-specific knowledge baked in. A future "paravisor TLFS" would let any compliant paravisor host any compliant guest, the same way the original TLFS did for the hypervisor. That standard does not exist yet, and writing it is the most consequential open problem in this corner of the architecture. The architecture is moving. Section 10 takes stock of what that means for engineers building or operating on this stack today. ## 10. Engineering takeaways and open problems A working architecture is one where the trade-offs are *visible*. Hyper-V's enlightenments + VMBus + VSP/VSC stack is a working architecture in exactly that sense: every property it has, including the security ones, is a consequence of design choices a reader can name. ### What the design optimises for Three explicit optimisations: 1. **In-box drivers for closed-source guests.** Hardware virtualization handles privileged CPU instructions; the guest only needs to load a VMBus client driver to opt in to the fast path. Every supported Windows ships those drivers in-box. Every modern Linux ships them in-tree. There is no "install paravirt drivers" step, which is a large reason "it just works." 2. **A single transport that carries everything.** VMBus carries 12+ device classes plus non-device services (KVP, time sync, VSS, balloon, heartbeat). One protocol, one set of primitives, one debugging surface. This is the engineering equivalent of "everything is a file" applied to inter-partition communication. 3. **Live migration.** Because the data plane is software in the root partition, the VM is not bound to a specific host. The VSPs serialise their state during migration without guest cooperation. This is the property that makes VMBus the durable invariant under hardware-passthrough acceleration: SR-IOV gives you throughput; VMBus gives you mobility. ### What it pays for those properties Two costs: 1. **The host CPU is on the data plane.** A software ring serviced by `vmswitch.sys` cannot match a 100 GbE NIC's line rate per host CPU core. Microsoft's answer is hybrid composition with SR-IOV ([Accelerated Networking](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview)) and SmartNIC offload ([Azure Boost + MANA](https://learn.microsoft.com/en-us/azure/azure-boost/overview)). The KVM analogue is [vDPA](https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework). Both of these accept the structural truth that for the highest throughputs, the host CPU has to leave the data plane. 2. **The host kernel parses guest-controlled bytes.** Section 7's CVE record is the catalogue of what that costs. The architectural answer is OpenHCL: move the parser into the guest's own trust boundary and rewrite it in Rust. ### A four-property idealisation It is useful to write down what an idealised paravirt I/O stack would do, so it is clear which properties any real stack today is trading away. The four idealised properties: 1. **Zero hypercalls per packet** in steady state. 2. **Live-migration parity** with a software baseline. 3. **Cross-vendor / cross-hypervisor portability** of the guest driver. 4. **No host-side memory-unsafe parser** of guest-controlled data. | Approach | (1) Zero hypercall | (2) Live migration | (3) Portability | (4) No unsafe host parser | |---|---|---|---|---| | VMBus + in-kernel VSP | partial (batched) | yes | no | no | | virtio + vhost-net | partial (batched) | yes | yes | no | | SR-IOV / DDA | yes | no | no | yes | | Accelerated Networking (VMBus + SR-IOV) | yes (steady) | yes | no | no | | vDPA | yes | partial | yes | no | | OpenHCL paravisor + VMBus | partial | yes | partial | yes | | Azure Boost + MANA | yes | yes | no | partial | No single approach today matches all four properties. The Hyper-V production composition is roughly **(VMBus baseline) + (Accelerated Networking for throughput) + (OpenHCL for confidential workloads)**. The KVM-world composition is **(virtio baseline) + (vDPA / SmartNIC for throughput)**. SmartNIC-based stacks (Azure Boost, AWS Nitro, Google's offload) approach the same four-corner problem from yet another angle. This is a synthesis, not a single-source claim: the matrix combines properties documented separately in the [Microsoft Accelerated Networking docs](https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview), the [Linux kernel CoCo doc](https://www.kernel.org/doc/html/latest/virt/hyperv/coco.html), the [Discrete Device Assignment doc](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment), the [SR-IOV overview](https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-), the [Linux netvsc driver doc](https://docs.kernel.org/networking/device_drivers/ethernet/microsoft/netvsc.html), the [VDUSE userspace interface](https://www.kernel.org/doc/html/latest/userspace-api/vduse.html), the [vPCI doc](https://www.kernel.org/doc/html/latest/virt/hyperv/vpci.html), and the [OpenHCL explainer](https://techcommunity.microsoft.com/blog/windowsosplatform/openhcl-the-new-open-source-paravisor/4273172). Each individual cell is sourced; the ranking is the author's reading of those sources. ### Practical pitfalls for operators A few things the customer-facing docs do not always say plainly: - **`vmbusrhid` is not low-risk.** The keyboard/mouse channel is a kernel-level RPC surface from guest to root. Treat it the same way you would treat netvsc when modelling threat exposure. - **Generation-2 VMs reduce attack surface.** Choosing Generation-2 for new workloads removes the legacy IDE/PS/2/PIC emulators from the host data path entirely ([Microsoft Learn: Gen 1 vs Gen 2](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v)). - **Mixing in-box and out-of-band Integration Services breaks things.** Modern Windows and modern Linux already have the drivers; installing the legacy LIS package on top can break MSI-X handling and PCI passthrough ([Linux kernel: overview](https://www.kernel.org/doc/html/latest/virt/hyperv/overview.html)). - **DDA is not SR-IOV.** Discrete Device Assignment covers any PCIe device passthrough, but Microsoft formally supports only **GPUs and NVMe** as device classes ([Microsoft Learn: DDA planning](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment)). - **Confidential VMs do not have the same device set.** Hardware constraints reduce or alter the device classes available; always validate the specific synthetic devices your workload depends on are present in the target SKU ([Linux kernel: CoCo](https://www.kernel.org/doc/html/latest/virt/hyperv/coco.html)). > **Note:** 1. Confidential VM (SEV-SNP / TDX)? Use the OpenHCL paravisor mode ([Azure CoCo VM options](https://learn.microsoft.com/en-us/azure/confidential-computing/virtual-machine-options)). 2. Need ≥40 Gbps with live migration? Use Accelerated Networking; on Boost-enabled SKUs, Boost adds another tier of offload. 3. Need ≥100 Gbps and accept binding to host? Use Discrete Device Assignment / SR-IOV. 4. Maximum guest portability across hypervisors? Use virtio; for bandwidth-sensitive workloads, vDPA. 5. Default Hyper-V workload, broad device coverage, native migration? VMBus + VSP (the default). ### Open problems worth watching The substantive open problems are: 1. **A standardised paravisor feature-enumeration interface.** OpenHCL is the first auditable paravisor, but there is no portable contract a guest can use to query "what does this paravisor support." The TLFS gave us this for hypervisors; the paravisor analogue is missing ([Linux kernel: CoCo](https://www.kernel.org/doc/html/latest/virt/hyperv/coco.html)). 2. **Confidential-VM-friendly live migration with paravirt devices.** Hardware-attested state cannot be cloned trivially; today's pragmatic answer is to constrain migration in CoCo VMs. A general solution is open. 3. **A formal model of the VMBus offer/rescind state machine.** The kernel docs describe it narratively. A model that the VSP code could be checked against would let static analysis rule out the bug class behind the headline CVEs. 4. **Live-migrating stateful SR-IOV VFs without device cooperation.** Vendor proposals exist; an industry standard does not. 5. **Erasing memory-unsafety in legacy VSPs.** The Rust rewrite path in OpenVMM is correct; the multi-year engineering effort to convert every existing VSP is real. CVE-2024-21407 is recent enough to remind everyone the bug class is still producing fresh entries. ### What to remember in five years The most important sentence in this article is one I have been quietly preparing throughout: the durable architectural invariant in Hyper-V is **shared-memory ring + doorbell, with a published guest-side contract**. Everything else, including the choice of programming language for the VSP, the question of whether the data plane is software or hardware, and even whether the trust boundary places the VSP on the host or in a paravisor, is implementation. The transport is the invariant. That is the lesson the next decade of CoCo VMs and SmartNIC offload is converging toward: keep the contract stable, and let everything else change. ## FAQ No. The drivers (`hv_vmbus`, `hv_netvsc`, `hv_storvsc`, `hv_utils`, `pci-hyperv`, `hv_balloon`) have been in the upstream Linux kernel since 2.6.32 in December 2009 and ship in every mainstream distribution. The legacy LIS package is a holdover from the era before in-tree support and can in fact break MSI-X handling and PCI passthrough if installed on top of a modern kernel ([Linux kernel: Hyper-V overview](https://www.kernel.org/doc/html/latest/virt/hyperv/overview.html)). Because the trust gradient is asymmetric. The VSP runs in the root partition's kernel, the most privileged context on the box; the VSC runs in a normal guest kernel. Bytes flowing from guest to host get parsed by code with full system privilege. A VSC bug typically harms only the guest; a VSP bug can be a cross-tenant compromise. The pattern is visible in the CVE record: [CVE-2017-0075](https://nvd.nist.gov/vuln/detail/CVE-2017-0075), [CVE-2021-28476](https://nvd.nist.gov/vuln/detail/CVE-2021-28476), and [CVE-2024-21407](https://nvd.nist.gov/vuln/detail/CVE-2024-21407) all hit host-side parsers. For live migration. SR-IOV gives you near-bare-metal throughput but binds the VM to a specific physical NIC; you cannot migrate that state. Keeping a VMBus-backed `netvsc` device in the same guest gives the hypervisor a software path it can fall back to during migration windows. The Linux kernel netvsc doc describes this failover explicitly: when SR-IOV is enabled, the VF is enslaved by netvsc and the data path switches transparently when the VF is up ([Linux kernel: netvsc](https://docs.kernel.org/networking/device_drivers/ethernet/microsoft/netvsc.html)). OpenHCL is a *configuration* of OpenVMM, not a separate codebase. OpenVMM is the Rust virtualization stack at [github.com/microsoft/openvmm](https://github.com/microsoft/openvmm); OpenHCL is OpenVMM run as a paravisor inside a confidential VM's higher-trust virtual trust level (VTL2), so that the synthetic-device backends sit inside the guest's own trust boundary rather than on a host the guest cannot trust. The same Rust code can run as a host-side VMM (when paired with a hypervisor on the host) or as an in-guest paravisor (when running inside a SEV-SNP or TDX VM). Both directions exist with caveats. OpenVMM, when used as a host VMM, supports both VMBus and virtio backends, so a Linux virtio guest can run on a Microsoft-developed VMM ([github.com/microsoft/openvmm](https://github.com/microsoft/openvmm)). Native Hyper-V on a Windows Server host historically expects VMBus-driven guests; there is no in-box virtio device emulation on a stock Hyper-V Server. KVM hosts can technically present a VMBus-shaped device, but in practice the production answer on KVM is virtio. Generation-2 VMs use UEFI with Secure Boot, boot from synthetic SCSI, and have no emulated IDE, PS/2, or PIC in the data path ([Microsoft Learn: Gen 1 vs Gen 2](https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v)). Every emulator that is removed is one fewer parser running in the most privileged kernel on the host, so the host-side attack surface is meaningfully smaller. Generation-1 still exists for legacy guests that only know how to boot from BIOS + IDE. VBS uses the Hyper-V hypervisor to split a single Windows install into VTL0 (the normal kernel and apps) and VTL1 (the Secure Kernel and trustlets like `lsaiso.exe`). The hypervisor enforces that VTL0 cannot read or modify VTL1's memory, even with kernel privileges. So an attacker who already has SYSTEM-level code execution in the normal world cannot trivially extract LSASS secrets or load arbitrary unsigned kernel code; the hypervisor stops them. This works on any modern Windows machine with the right CPU features, regardless of whether you ever run a VM yourself ([Microsoft Learn: Windows Server 2016 What's New](https://learn.microsoft.com/en-us/windows-server/get-started/whats-new-in-windows-server-2016)).