Hyper-V Enlightenments, VMBus, and the Synthetic Device Model
How Hyper-V guests get high-performance device I/O without emulating legacy hardware: enlightenments, the TLFS, VMBus rings, the VSP/VSC pair, and why the host-side parser is the attack surface.
Permalink1. The Type-1 hypervisor foundation
Open Task Manager on a modern Windows 11 desktop, switch to the Performance tab, and look at the line that says "Virtualization: Enabled." That single line hides one of the most consequential design choices in modern operating systems: when Microsoft shipped Hyper-V with Windows Server 2008 in June 2008, they did not bolt a virtualization product on top of Windows. They put a small hypervisor underneath it.
That ordering matters more than it sounds. In the older Microsoft Virtual Server 2005 model, Windows ran on the bare metal and a user-mode service emulated PC hardware for guests inside it. In the Hyper-V architecture documented by Microsoft in 2008, the hypervisor boots first and Windows itself becomes a guest of the hypervisor. Microsoft calls this guest the root partition. Every other VM on the box is a child partition.
A hypervisor that runs directly on the physical hardware rather than inside a host operating system. Hyper-V, VMware ESXi, and Xen are Type-1; VirtualBox and the original Microsoft Virtual Server are Type-2 (hosted). In a Type-1 design no general-purpose OS sits between the hypervisor and the silicon, which lets the hypervisor enforce isolation directly using CPU virtualization extensions like Intel VT-x and AMD-V.
The root partition is not just another VM. It is a privileged partition: it owns the physical I/O devices, runs the parent stack of synthetic-device backends, and brokers everything that touches real hardware. Children get virtual processors and a slice of memory, and they communicate with the root over a software bus called VMBus that we will spend most of this article taking apart.
Diagram source
flowchart TD
HW["Physical hardware (CPU, RAM, NICs, NVMe)"]
HV["Hyper-V hypervisor (microkernel)"]
Root["Root partition (Windows Server)"]
VSP["Virtualization Service Providers (VSPs): vmswitch.sys, storvsp.sys, ..."]
C1["Child partition: Windows VM"]
C2["Child partition: Linux VM"]
VSC1["VSCs: netvsc, storvsc, ..."]
VSC2["VSCs: hv_netvsc, hv_storvsc, ..."]
HW --> HV
HV --> Root
HV --> C1
HV --> C2
Root --> VSP
VSP -. "VMBus channel" .-> VSC1
VSP -. "VMBus channel" .-> VSC2
C1 --> VSC1
C2 --> VSC2 The hypervisor itself is small by design. The Hyper-V architecture page on Microsoft Learn describes it as a microkernel: it does the minimum a hypervisor must do (CPU scheduling, memory partitioning, interrupt routing, an inter-partition message bus) and pushes everything else, including the device models, out to the root partition. This is the opposite of the early VMware ESX design, where the hypervisor itself contained large device drivers.
The microkernel choice was pragmatic, not ideological. A monolithic hypervisor with built-in NIC and storage drivers would have been a catastrophic certification problem: every NIC firmware update would risk a hypervisor patch. By delegating I/O to the Windows root partition, Microsoft re-used the entire Windows driver stack.The split also explains why Hyper-V "feels Windows-shaped" even though it is technically not Windows. The root partition is Windows, with all of its drivers, its WMI, its event log, its Get-VM PowerShell cmdlets. The hypervisor underneath is a small, separate binary (hvix64.exe on Intel, hvax64.exe on AMD) that you almost never have a reason to think about. Microsoft itself goes further: in the same architecture document, it stresses that all device-model traffic flows through the root: "the management operating system hosts virtual service providers (VSPs) that communicate over the VMBus to handle device access requests from child partitions" (Microsoft Learn: Overview of Hyper-V).
This sets up the question the rest of the article answers: if the hypervisor is small, the guest is unmodified Windows or Linux, and the root partition owns the real devices, then how does a guest actually do disk and network I/O at gigabit-or-better speeds without paying enormous costs to traverse all of these boundaries?
The short answer is in three pieces: enlightenments (the guest knows it is virtualized and uses hypercalls), VMBus (the inter-partition transport), and the VSP/VSC pair (split drivers that share memory through VMBus rings). The next section starts with the first of those three.
2. Enlightenments: what "knowing you are virtualized" buys you
In the early 2000s, the dominant intuition was that a hypervisor's job is to fool the guest. A perfectly faithful emulation of an Intel 440BX motherboard, a DEC 21140 NIC, and an IDE controller is what made VMware Workstation a useful product in 1999. It is also what made Microsoft Virtual Server 2005 too slow to saturate gigabit links: every out instruction on a fake NIC port trapped to the hypervisor, was decoded against an in-memory chip model, and produced a synthetic interrupt that itself trapped on the way out. The Microsoft Virtual Server retrospective on Wikipedia notes that the architecture had no paravirtualization support and that performance was constrained relative to later hardware-assisted designs.
Hyper-V's answer was to drop the pretence. If the guest knows it is in a VM, it can use a fast path designed for VMs instead of pretending to drive imaginary chips. Microsoft calls this knowledge an enlightenment, and the Hyper-V feature discovery page is the contract a guest uses to learn what enlightenments the hypervisor offers.
A modification or feature in a guest operating system that takes advantage of running under a specific hypervisor. An enlightened guest detects the hypervisor (on x86, by reading the cpuid leaves at 0x40000000 and above), then opts in to using paravirtual interfaces (hypercalls, synthetic timers, synthetic interrupt controllers, shared TSC pages) instead of trapping on emulated hardware. An unmodified guest would still boot, but slower.
Detection is the cheap part. The Linux kernel's Hyper-V overview document describes four cooperating mechanisms, layered atop one another: implicit traps that the hypervisor handles transparently, explicit hypercalls the guest issues on purpose, synthetic registers exposed as model-specific registers (MSRs) in the architectural CPU register file, and VMBus for high-bandwidth device traffic. Each layer builds on the one below it.
The contract between Hyper-V and its guests is published. Microsoft maintains the Top-Level Functional Specification as a public document under the Open Specification Promise. That single decision is why Linux ships an in-tree Hyper-V driver stack and why VMBus is not a black box.
The hypercall page
The first thing an enlightened guest does is set up a hypercall page. The TLFS Hypercall Interface page describes the dance: the guest writes its identity into HV_X64_MSR_GUEST_OS_ID (MSR 0x40000000), then writes a guest-physical address and an enable bit into HV_X64_MSR_HYPERCALL (MSR 0x40000001). The hypervisor responds by populating that page with the right opcode for the current CPU: vmcall on Intel, vmmcall on AMD. From that moment on, "make a hypercall" is a normal call into a known address rather than an opcode the kernel must hand-assemble per CPU vendor.
HVC instruction) without any guest code change. The guest just learns the new page contents.
The same TLFS page documents two hypercall classes: simple hypercalls (one operation, returns or faults) and rep (repeated) hypercalls that take a counter and a start index, so a long-running operation can yield mid-flight without losing work. Three calling conventions exist: a memory-based one for large parameter blocks, a register-only fast variant for the very common case of one or two inputs, and an XMM-register variant that lets a guest pass up to 112 bytes of input through SSE registers.
That XMM variant is unusual enough to flag. Most kernel ABIs do not touch SSE in privileged code because saving and restoring the full SSE state is expensive. Hyper-V's hypercall ABI uses XMM precisely because the round-trip cost of a hypercall is dominated by the VMEXIT itself, so squeezing a few more bytes into registers is cheaper than spilling them to memory and reading them back.
Synthetic interrupts and synthetic timers
A guest's virtual processor has its own emulated local APIC by default, but an enlightened guest can also use a Synthetic Interrupt Controller (SynIC), defined in the TLFS. Each virtual processor gets 16 SINT slots, a per-CPU shared message page, and a per-CPU shared event page. SINTs are how VMBus signals events to the guest without going through the legacy LAPIC fast path.
One of 16 logical interrupt sources per virtual processor that the Hyper-V Synthetic Interrupt Controller can signal. SINTs are reachable through MSRs (HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) and back the doorbell mechanism for VMBus channels and for synthetic timers. They are paravirtual: they would not exist on a bare-metal CPU.
The clock side is even more interesting. The Linux kernel Hyper-V clocks documentation describes a reference TSC page that the hypervisor maintains in shared memory: it contains a scale factor and an offset such that
ticks at a constant 10 MHz frequency regardless of the underlying TSC. The guest's clock_gettime and gettimeofday can read TSC, multiply, shift, add, and return, all in user space via vDSO, with no kernel transition and no hypercall.
Synthetic timers complete the picture. Each virtual CPU has four synthetic timers programmable via MSRs; they fire SINTs into the SynIC. The guest does not need to touch an emulated PIT or HPET. Combined, SynIC + synthetic timers + the reference TSC page mean that an enlightened guest can do most of its time-keeping and inter-partition signalling without ever touching the legacy interrupt/timer chip surface.
The TLFS as a contract
All of this is published. The Top-Level Functional Specification is the document a guest author reads to know which MSRs to write, which cpuid leaves to query, which hypercalls exist, and which features the hypervisor signals via feature flags. Microsoft maintains it under the Open Specification Promise. That promise is a deliberate contractual choice. Without it, Linux could not ship drivers/hv/ in-tree and Microsoft could not credibly claim that Linux is a first-class Hyper-V guest. The TLFS is the artefact that makes the rest of the architecture cooperative rather than reverse-engineered.
The next layer up uses these primitives to build something more ambitious: a general-purpose inter-partition transport.
3. VMBus: the inter-partition transport
If enlightenments are the alphabet, VMBus is the language that synthetic devices speak. The Linux kernel VMBus document puts the definition tersely: "VMBus is a software construct provided by Hyper-V to guest VMs. It consists of a control path and common facilities used by synthetic devices that Hyper-V presents to guest VMs. The common facilities include software channels for communicating between the device driver in the guest VM and the synthetic device implementation that is part of Hyper-V, and signaling primitives to allow Hyper-V and the guest to interrupt each other."
There is a lot in that paragraph. Let me unpack it, because this is the architectural core.
A software-only inter-partition communication bus provided by Hyper-V. It has a control path (channel offer, open, close, rescind), and per-device data channels built on shared memory ring buffers. VMBus is not a real bus in any hardware sense; nothing on the PCIe topology is named VMBus. It is a contract between guest drivers and the hypervisor.
Channels and the offer protocol
Every synthetic device a guest sees corresponds to a VMBus channel. The root partition advertises (OfferChannel) the list of devices a guest is permitted to use. The guest's VMBus driver iterates the offers, matches each to a class GUID (synthetic SCSI is one GUID, synthetic NIC is another, the input-style vmbusrhid device is a third), and binds an in-kernel device driver to each one. The reverse operation, RescindChannel, lets the host revoke a device cleanly, which is what happens during live migration when an SR-IOV virtual function gets pulled out from under a running VM.
Diagram source
sequenceDiagram
participant Root as Root partition (VSP)
participant HV as Hyper-V hypervisor
participant Guest as Guest VM (VSC)
Root->>HV: OfferChannel(class_guid, instance_guid)
HV->>Guest: ChannelOffer message via SynIC
Guest->>HV: OpenChannel(ringbuf_gpa, signal_event)
HV->>Root: Channel opened
loop steady-state I/O
Guest->>Root: write descriptor + payload to ring, signal SINT
Root->>Guest: write response to ring, signal SINT
end
Root->>HV: RescindChannel(instance_guid)
HV->>Guest: ChannelRescind via SynIC
Guest->>Root: CloseChannel Two ring buffers, one channel
Each open channel is two unidirectional ring buffers in shared memory: one for guest-to-host messages, one for host-to-guest. Each ring has a 4 KiB header page that holds the read index, the write index, and control flags, plus a power-of-two payload region. The guest tells the hypervisor which guest-physical pages back the ring through an object called a GPA Descriptor List (GPADL), built up via the vmbus_establish_gpadl API.
The kernel doc reveals a small but durable engineering detail. It maps the ring buffer twice in the guest's kernel virtual address space: header page first, ring contents next, and then the ring contents again, contiguously. Why? Because that lets a copy loop walk past the end of the ring without writing wrap-around code; the next byte after the ring's last byte is the ring's first byte, by virtual-memory arrangement. It is the same trick used inside the Linux page cache for fbdev and inside DPDK's mempool. It costs a little address space; it saves a branch on every payload byte.
The doorbell
When the guest enqueues a request and the host's read pointer is already chasing it (i.e., the host is still processing the last batch), the guest can suppress the doorbell entirely. Only the first request after the host has caught up triggers a hypercall. This is interrupt coalescing in software, and it is the single most important performance lever on a software data plane: the round-trip cost of a VMEXIT is amortised across many packets.
VSPs and VSCs
The two endpoints of a channel have specific names. The Virtualization Service Provider (VSP) is the kernel module in the root partition that owns the device backend. The Virtualization Service Client (VSC) is the guest-side driver that talks to the VSP through the channel. Microsoft's own architecture page is precise: "the Hyper-V-specific I/O architecture consists of virtualization service providers (VSPs) in the root partition and virtualization service clients (VSCs) in the child partition. Each service is exposed as a device over VM Bus, which acts as an I/O bus and enables high-performance communication between VMs that use mechanisms such as shared memory" (Microsoft Learn: Hyper-V architecture).
VSP (Virtualization Service Provider): a kernel module in the root partition that exposes a synthetic device backend to guests over a VMBus channel. Examples: vmswitch.sys (synthetic NIC), storvsp.sys (synthetic SCSI), the vmbusrhid server (synthetic input). VSC (Virtualization Service Client): the matching driver in the guest that consumes the channel and presents an OS-native device interface (a NIC, a SCSI controller, a keyboard) to the rest of the kernel.
The split is symmetric in transport (both sides use the same ring) but asymmetric in trust. The VSP runs in the most privileged context on the box, the root partition's kernel. The VSC runs in a normal guest kernel. Every byte that flows from guest to host crosses a trust boundary and gets parsed by code with full system privilege. The next two sections will return to this fact at length, because it is where the security story lives.
Why this works for closed-source guests
The Xen project tried something similar in 2003 with netfront/blkfront rings and event channels, but Xen PV required a paravirtualised guest kernel: the guest had to know it was running on Xen at compile time. Closed-source guests like Windows could not be modified, so Xen's wiki eventually documents PV-on-HVM as a workaround.
Hyper-V finessed this with hardware virtualization. The guest kernel runs unmodified inside VT-x or AMD-V; CPU-level privilege separation handles the privileged instructions. The only thing the guest needs to do to opt into VMBus is load a driver. Every supported Windows version since Windows 7 / Server 2008 R2 ships those drivers in-box. Linux ships them in-tree from kernel 2.6.32 onward. There is no separate "install paravirt drivers" step, which is why Hyper-V "just works" for almost any guest you point at it.
The transport is settled. What rides on it is a catalogue.
4. Synthetic device classes: storage, network, input, video, vPCI
A modern Hyper-V guest, on first boot, sees a small zoo of devices that have nothing to do with PC hardware. There is no IDE controller, no PS/2 keyboard, no Cirrus VGA. There is a synthetic SCSI controller, a synthetic NIC, a synthetic keyboard and mouse, a synthetic framebuffer, and (often) a synthetic PCI passthrough channel. Each is a VSP/VSC pair on top of VMBus.
The Linux kernel VMBus document enumerates the catalogue: synthetic SCSI controller (storvsc), synthetic NIC (netvsc), synthetic framebuffer (synthvid), synthetic keyboard, synthetic mouse, PCI passthrough, plus the non-device services: heartbeat, time sync, shutdown, memory balloon, KVP exchange, and online backup (VSS).
Diagram source
flowchart LR
subgraph Guest
nv["netvsc (NIC)"]
st["storvsc (SCSI)"]
sv["synthvid (framebuffer)"]
kb["hyperv-keyboard"]
ms["hyperv-mouse"]
pc["pci-hyperv (vPCI)"]
kvp["hv_kvp (KVP)"]
ts["hv_utils (timesync, shutdown, heartbeat)"]
end
subgraph Root
vsw["vmswitch.sys"]
sto["storvsp.sys"]
sfb["synthvid VSP"]
rhid["vmbusrhid VSP"]
vpci["vPCI VSP"]
kvpd["KVP daemon"]
tsd["IS daemons"]
end
nv -- "VMBus channel" --- vsw
st -- "VMBus channel(s)" --- sto
sv -- "VMBus channel" --- sfb
kb -- "VMBus channel" --- rhid
ms -- "VMBus channel" --- rhid
pc -- "VMBus channel" --- vpci
kvp -- "VMBus channel" --- kvpd
ts -- "VMBus channel" --- tsd Synthetic SCSI: storvsc
The storvsc VSC presents itself to the guest as a SCSI host bus adapter. Disks attached to the VM appear as SCSI LUNs hanging off that HBA. The wire protocol uses ring buffers carrying SRB (SCSI Request Block) style commands. To scale, storvsc can open multiple sub-channels, one per host CPU, so that I/O completion interrupts and request submission spread across cores rather than serialising on a single VMBus channel.
This is also why Hyper-V's "Generation 2" VMs work. A Generation 2 VM, introduced in Windows Server 2012 R2 in 2013, has no IDE controller in the boot path at all. UEFI loads the OS loader from a synthetic SCSI device, the OS loader hands off to the kernel, and the kernel binds storvsc to the same device. The legacy IDE emulator simply never runs. That removes a lot of attack surface and lets boot volumes grow up to 64 TB on VHDX.
Synthetic NIC: netvsc
netvsc is the synthetic NIC. The wire protocol historically wrapped Microsoft's NDIS-style RNDIS frames around payloads sent through the channel ring, which is why some Linux discussions mention "RNDIS frames over VMBus." The Linux driver lives in drivers/net/hyperv/ and the kernel netvsc documentation describes how it can spread receive-side traffic across multiple VMBus subchannels via Receive Side Scaling.
netvsc is also the one device class where Hyper-V composes with hardware passthrough. Section 8 will take this apart in detail; for now, note that the same netvsc VSC can run alongside an SR-IOV virtual function in the guest, with netvsc acting as the slow-path failover and the VF carrying the steady-state traffic.
Synthetic input: vmbusrhid
The synthetic keyboard, the synthetic mouse, and a few related input streams ride on a server in the root partition called vmbusrhid (the name is shorthand for "VMBus relay HID"). It is a small surface in bytes, but architecturally it has the same shape as netvsc: guest-controllable messages parsed in kernel mode in the root partition. Anyone evaluating the trust boundary should treat it the same way as netvsc, even though the data rate is six orders of magnitude lower.
Synthetic video: synthvid
synthvid is a synthetic framebuffer. It is what lets you connect to a Hyper-V VM through the Virtual Machine Connection client without dragging in an emulated VGA. It is intentionally simple: there is no 3D acceleration in the synthetic path. Workloads that need GPU acceleration use a different mechanism, vPCI / DDA, to assign a real GPU to the VM.
vPCI: synthetic PCI passthrough
The most subtle device class is pci-hyperv, which exposes a virtual PCIe topology to the guest. The Linux kernel vPCI document describes the trick: a passthrough device is offered to the guest initially over VMBus (the channel carries the device's PCI configuration space and BARs), and once the guest's vPCI driver has constructed a real PCI device object for it, the device dual-identifies as a normal PCIe device. The vendor driver can then load against it.
This is the mechanism behind both Hyper-V's Discrete Device Assignment (DDA) and Azure's Accelerated Networking, which we will return to in Section 8. The DDA planning document is explicit that Microsoft formally supports DDA for GPUs and NVMe storage as device classes; other PCIe devices are "likely to work" but require vendor support.
Generation-1 vs Generation-2: a quick decoder
Putting the device classes side by side clarifies why the move from Generation-1 to Generation-2 VMs simplified so much:
| Element | Generation-1 VM (legacy) | Generation-2 VM (since 2013) |
|---|---|---|
| Firmware | BIOS | UEFI with Secure Boot |
| Boot disk | Emulated IDE | Synthetic SCSI (storvsc) |
| Network on boot | Emulated DEC 21140 fallback | Synthetic NIC (netvsc) |
| Input | Emulated PS/2 + vmbusrhid | vmbusrhid only |
| Display | Emulated VGA + synthvid | synthvid only |
| Max boot VHDX | 2 TB | 64 TB |
| Source | Microsoft Learn: Gen 1 vs Gen 2 | Same |
Generation-2 is what the Hyper-V architecture wanted to be from the beginning: an all-synthetic stack with no fallback to imaginary 1990s chipsets. The two-generation existence was not a design preference; it was the cost of supporting older operating systems whose boot loaders only knew about BIOS and IDE. Today, every modern Windows and modern Linux supports Generation-2; Generation-1 remains for legacy guests.
Counting boundary crossings
The shape of the hot path is now visible. To send one network packet from a guest:
- The guest writes one descriptor and one payload copy into the netvsc TX ring (one memory copy).
- The guest possibly fires a doorbell (one hypercall, often suppressed if the host has not caught up).
- The host's
vmswitch.sysreaps the descriptor, parses it, and forwards it through the virtual switch to a real NIC.
A single packet's hot path is at most one hypercall and one memory copy in the guest, plus host-side ring traversal. Section 8's comparison table will quantify how this stacks up against virtio and SR-IOV, but the scale is clear: paravirt I/O on Hyper-V is orders of magnitude cheaper per packet than full PC emulation, and the gap closes only when you go all the way to hardware passthrough.
The catalogue is set. Now, who actually wrote the Linux side of all this?
5. Linux Integration Services: Microsoft writes Linux drivers
In December 2009, Microsoft did something quietly historic. Linux kernel 2.6.32 merged a set of drivers under drivers/staging/hv/, contributed by Microsoft itself, that taught the Linux kernel to be an enlightened Hyper-V guest. The kernel.org Hyper-V index page is the maintained landing page for that work. Over the next several releases the drivers moved out of staging/, settled at drivers/hv/, drivers/net/hyperv/, drivers/scsi/storvsc_drv.c, and drivers/pci/controller/pci-hyperv.c, and became the default in every mainstream distribution.
That set of drivers is collectively called Linux Integration Services (LIS).
The set of in-kernel Hyper-V guest drivers that Microsoft contributes to upstream Linux. Includes hv_vmbus (the VMBus core), hv_netvsc (synthetic NIC), hv_storvsc (synthetic SCSI), hv_utils (KVP, time sync, shutdown, heartbeat, VSS), pci-hyperv (vPCI), and hv_balloon (memory ballooning). The same code that Microsoft maintains in the Linux tree powers Linux guests on Hyper-V on Windows Server, on Azure, and on developer Hyper-V on Windows 11.
The reason this matters is bigger than convenience. In 2009, Linux had a long, painful history with Hyper-V's competitors. VMware shipped open-vm-tools but the deepest paravirt drivers (VMXNET3, PVSCSI) lived in vendor packages. Xen's PV drivers existed in-tree but their evolution depended on Citrix and the Xen project. By contributing the full driver stack upstream and committing to keep it there, Microsoft chose a different route: they put the spec (the TLFS) and the implementation (LIS) in the open at the same time.
Microsoft did not just publish a hypervisor specification and hope Linux would adopt it. They wrote the Linux drivers themselves and upstreamed them, and then they kept doing it for fifteen years.
You can see the maintenance pattern in any current kernel. The drivers/hv/ directory has continuous commit activity from Microsoft engineers. Kernel-doc files like the VMBus, clocks, vPCI, overview, and CoCo VM pages are written by the same engineers who write the drivers. Several of those documents are the most lucid descriptions of the architecture that exist anywhere in public.
What "in-box" really means
Both major guests now ship VMBus support without any post-install step:
- On Windows, the VMBus client stack is built into every supported Windows version since Windows 7 / Windows Server 2008 R2. The legacy Integration Services package, which once shipped as an ISO you mounted into the VM, is no longer needed on supported Windows.
- On Linux, the drivers are in-tree from kernel 2.6.32 (December 2009) onward and ship in every mainstream distro.
The kernel.org Hyper-V overview document explicitly warns against installing legacy LIS packages on top of a kernel that already has the in-tree drivers: it can break MSI-X handling and PCI passthrough. This is the kind of operational footgun that survives precisely because the in-box answer is correct and the LIS package is a holdover from earlier kernels.
A practical smoke test
You can confirm a Linux guest is using its enlightenments without any vendor tooling. The kernel exposes cpuid leaves and Hyper-V detection through dmesg and through /sys. A small script makes it concrete:
// This logic mirrors what `dmesg | grep -i hyperv` and a peek into
// /sys/devices/virtual/misc/vmbus would tell you on a real Linux Hyper-V guest.
const guestObservations = {
cpuidSig: '0x40000000', // Microsoft's vendor signature for Hyper-V
guestOsIdMsr: 0x40000000, // HV_X64_MSR_GUEST_OS_ID, written by the guest
hypercallMsr: 0x40000001, // HV_X64_MSR_HYPERCALL, returns the hypercall page
vmbusModuleLoaded: true,
netvscDevice: '/sys/class/net/eth0/device/driver',
netvscDriverName: 'hv_netvsc',
storvscModuleLoaded: true,
};
function isEnlightenedHyperVGuest(o) {
if (o.cpuidSig !== '0x40000000') return false;
if (!o.vmbusModuleLoaded) return false;
if (o.netvscDriverName !== 'hv_netvsc') return false;
return true;
}
console.log(
isEnlightenedHyperVGuest(guestObservations)
? 'Yes: Hyper-V enlightened, using netvsc + storvsc'
: 'No: running on emulated PC hardware or non-Hyper-V hypervisor'
); Press Run to execute.
The point is not the script itself (anyone can write a few lines of awk against dmesg); it is that the verification surface is public. The CPU vendor signature, the MSRs, the kernel module names, the /sys paths are all documented. There is nothing to reverse-engineer.
Why this earned trust
Two pieces of practical evidence persuaded the Linux community that LIS was not a strategic trap:
- The drivers stayed upstream. From 2009 to the present, Microsoft has maintained the
drivers/hv/tree, responded to maintainer feedback, and shipped patches through the normal kernel process. - The TLFS stayed accurate. Successive Hyper-V releases either matched what the TLFS said or updated the TLFS. There was no second, secret protocol.
The combination put Microsoft in the unusual position of being the most open hypervisor vendor for Linux guest support. (VirtIO on KVM has a richer cross-vendor story; that comparison is Section 8.) This open posture is also what set up the 2024 OpenVMM open-sourcing as a credible move rather than a stunt.
But before we get to OpenVMM, we need to look at a different way Hyper-V matters: not just as a substrate for VMs, but as a substrate for in-VM security boundaries inside Windows itself.
6. VBS and HVCI: Hyper-V as the trust anchor inside Windows
Up to this point the article has treated Hyper-V as a virtualization product: a thing that hosts VMs. Starting in Windows 10 and Windows Server 2016, Microsoft began using the same hypervisor for a different job: enforcing security boundaries inside a single OS install. The umbrella name is Virtualization-Based Security (VBS).
The mechanism is simple in description and subtle in consequences. The hypervisor splits a single guest's address space into two Virtual Trust Levels (VTLs). The lower one, VTL0, runs the normal Windows kernel and user mode (this is where explorer.exe and your browser live). The higher one, VTL1, runs a much smaller stack called the Secure Kernel plus a set of isolated user-mode services called trustlets. A compromise of VTL0, even of ntoskrnl.exe, cannot read or write VTL1 memory because the hypervisor enforces that boundary using the same hardware machinery (Intel EPT / AMD NPT, plus Intel VT-d / AMD-Vi for DMA) that it uses to isolate one VM from another.
A Hyper-V construct that partitions a single guest's address space into multiple privilege tiers enforced by the hypervisor. VTL0 hosts the normal kernel and user mode; VTL1 hosts the Secure Kernel and trustlets. The hypervisor presents each VTL with its own separate set of memory mappings, system registers, and interrupt state, so code running at VTL0 cannot read VTL1's memory even if it has run-as-NT-AUTHORITY-SYSTEM privilege.
Diagram source
flowchart TD
HV["Hyper-V hypervisor"]
subgraph Guest["A single Windows guest"]
subgraph VTL0["VTL0 (normal world)"]
User0["User mode: apps"]
Kernel0["NT kernel"]
end
subgraph VTL1["VTL1 (secure world)"]
SK["Secure Kernel"]
Trustlets["Trustlets: LSAIso, BIOiso, ..."]
end
end
HV --> Guest
HV -. "EPT + IOMMU enforcement" .-> VTL0
HV -. "EPT + IOMMU enforcement" .-> VTL1
Kernel0 -. "VTL switch (hypercall)" .-> SK What lives in VTL1
The flagship inhabitant of VTL1 is Hypervisor-protected Code Integrity (HVCI), which moves kernel-mode page-table integrity checking into the Secure Kernel. With HVCI on, no VTL0 driver can mark a kernel page as both writable and executable; the Secure Kernel mediates the page tables and refuses the request. The result is that attackers who already have code execution in the NT kernel cannot trivially load arbitrary unsigned kernel code or build new executable JIT pages on the fly.
The other tenants of VTL1 are trustlets. The most familiar is lsaiso.exe (LSA Isolation), which holds the cached domain credentials that historically lived in lsass.exe and were the prime target for tools like Mimikatz. With Credential Guard on, those secrets move to a trustlet whose memory is unreadable from VTL0; even SYSTEM-level malware in the normal world cannot extract them. Other trustlets handle biometric template storage, key isolation for code integrity policy, and similar small, security-sensitive workloads.
Why the hypervisor is the right place for this
Putting these protections inside the hypervisor rather than inside the kernel has a property that no in-kernel mitigation can match: the protected component does not share an address space with the attacker. A defence built inside ntoskrnl.exe (PatchGuard, KASLR, control-flow guard) lives in the same memory the attacker is trying to corrupt. A defence built inside VTL1 lives in memory the attacker cannot touch, because the page tables that map it are themselves invisible from VTL0.
How this connects back to VMBus
VBS would not be possible without the work the previous sections described. The Secure Kernel is what runs in VTL1; it needs to communicate with VTL0 for ordinary system services (the lsaiso.exe process must respond to authentication requests from VTL0 callers, the HVCI mediator must answer page-table requests, and so on). The signalling and shared-memory primitives that make those calls cheap are the same SynIC and shared-page primitives that VMBus uses between partitions.
In other words, the architecture Microsoft built in 2008 to give a Windows VM a fast network card became, in 2016, the architecture that gives a single Windows install a security boundary stronger than its own kernel. The same hypervisor, the same trust-mediation primitives, two completely different applications.
Windows Server 2019 extended this further with Hyper-V isolation for containers, where a container's lightweight VM gets its own kernel inside a tiny VTL0 of its own. The pattern is consistent: every time Windows wanted a stronger isolation primitive, the answer was "use the hypervisor."
This dual-use is the reason a serious Windows security review touches the Hyper-V codebase even on machines that nobody thinks of as virtualization hosts. A Hyper-V escape (a guest-to-host VMBus exploit) is not just "an exploit against Azure"; it is also, on a typical Windows 11 desktop with VBS enabled, an exploit against the boundary that protects LSASS secrets from kernel-mode malware.
That makes the next section's question urgent: how strong is the VMBus boundary, in practice?
7. VMBus security: every message is a parser at the trust boundary
Here is the part of the architecture worth being honest about. The same property that makes VMBus fast, namely that the host-side VSP runs in the root partition's kernel and parses guest-supplied bytes directly, also makes the VSP the most consequential piece of attack surface in the entire stack. Microsoft itself prices it that way: the Hyper-V Bug Bounty Program pays up to USD 250,000 specifically for guest-to-host escapes that hit this surface, which is among the highest payouts Microsoft offers for any category of vulnerability.
Every byte that crosses a VMBus channel from a guest is a byte that a kernel-mode parser in the most privileged partition on the host has to interpret. The performance argument for a software data plane and the security argument against it are the same argument, looked at from opposite directions.
The historical record
Three CVEs make the pattern concrete:
-
CVE-2017-0075 is the Hyper-V escape that the Qihoo 360 Vulcan Team demonstrated at Pwn2Own 2017. The NVD entry describes it as a Hyper-V flaw that "allows guest OS users to execute arbitrary code on the host OS via a crafted application." The reachable code was in a VMBus message handler on the host side.
-
CVE-2021-28476 is the canonical example. The NVD record classifies it as a critical Hyper-V remote code execution vulnerability with a CVSS score of 9.9. The Akamai writeup with Guardicore and SafeBreach traces the bug to
vmswitch.sys, the synthetic-NIC VSP, and shows it had been present in production since the August 2019 vmswitch build. The exploit primitive is exactly what the architecture invites: a guest crafts an OID-style RNDIS request, sends it through the netvsc VMBus channel, and the host's kernel parser misvalidates a length, producing memory corruption in the most privileged kernel on the box. -
CVE-2024-21407 is a more recent Hyper-V remote code execution vulnerability patched in March 2024 (NVD). Its existence demonstrates that the bug class did not vanish; the same shape (guest-controlled message, host kernel parser, escalation to host code execution) keeps reappearing.
Why the bug class is structural
The pattern in all three CVEs is the same:
- A guest writes carefully crafted bytes into a VMBus channel ring.
- The guest fires the doorbell.
- The host's VSP, running in the root partition's kernel, dequeues the message.
- The VSP parses the message in C or C++ kernel code.
- A memory-safety mistake (length confusion, missing bounds check, integer overflow) becomes a write or read primitive in the host kernel.
There is no exotic mechanism here. The exploit surface is "kernel C code parsing untrusted input," which has been the dominant source of remote-code-execution bugs in operating systems since the 1990s. The novelty is the location: the parser sits below the most privileged supervisor on the box, with full access to every other tenant's memory.
Diagram source
sequenceDiagram
participant Mal as Malicious guest VM
participant Ring as VMBus ring (shared memory)
participant SInt as Synthetic Interrupt Controller
participant VSP as Host VSP (e.g., vmswitch.sys, kernel)
Mal->>Ring: Write crafted RNDIS-style message
Mal->>SInt: Hypercall: signal channel event
SInt-->>VSP: SINT delivered on host CPU
VSP->>Ring: Read message header
note over VSP: Length confusion / missing bounds check
VSP->>VSP: Out-of-bounds write in root partition kernel
note over VSP: Result: arbitrary code in the most privileged partition Mitigations short of a rewrite
Microsoft's first line of defence is the same one every kernel team uses: ASLR, control-flow integrity, kernel hardening, fuzzing the parsers, code review of every new device class, and, on Azure specifically, isolating each tenant's compute hypervisor so a single compromised host does not become a multi-tenant disaster. The MSRC bounty program is partly a procurement mechanism for this same effort: pay researchers to find and report bugs before attackers find them in the wild.
A second line of defence is Generation-2 VMs (Microsoft Learn), which remove the legacy emulators (IDE, PS/2, PIC) from the host data path entirely. Every emulator removed is one fewer parser in the most privileged kernel.
A third is the Microsoft Hyper-V architecture page's "minimise root-partition exposure" guidance: configure hosts with the smallest set of root-partition services that the workload requires, since every service is potential surface.
These all help, but none of them change the structural fact that VSPs parse guest-controlled data in C/C++ kernel code. The next architectural shift, the one that does change that fact, is what Section 9 is about.
Side channels and the Spectre era
VMBus also has to defend against side-channel attacks across the partition boundary. The same Spectre / Meltdown / L1TF mitigations that apply to a multi-tenant hypervisor in general apply to Hyper-V specifically. Microsoft's broader hypervisor mitigation strategy interacts with VMBus mostly indirectly: the SynIC, the hypercall page, and the timer subsystem all needed audit and adjustment when these classes of attacks emerged. The detail is largely outside the scope of an article about the device model, but the takeaway is consistent with the rest of this section: any shared CPU resource between partitions is a potential attack surface, and "shared via the hypervisor's bus" is no exception.
The structural answer to all of this, the one Microsoft itself has been working toward, is to change the languages and the trust boundaries. To set that up, the next section first widens the field by comparing VMBus to its peer in the KVM world, virtio.
8. VMBus vs virtio: two answers to the same question
Hyper-V is not the only hypervisor with a paravirt I/O story. The KVM world evolved its own answer to the same problem at roughly the same time, and it ended up with a different design with different trade-offs. The standard is virtio.
The original virtio paper, Rusty Russell's "virtio: Towards a De-Facto Standard For Virtual I/O Devices", was published at OLS 2008, the same year Hyper-V shipped. The proposal was explicit in its motivation: every hypervisor was reinventing paravirt drivers, and a single hypervisor-independent specification could let one guest driver work everywhere. OASIS later standardised virtio 1.0 in 2016, then virtio 1.1 in 2019, then virtio 1.2 as a Committee Specification in 2023.
A hypervisor-independent paravirtual I/O specification, governed by OASIS. A virtio device is presented to the guest over a transport (PCI, MMIO, or s390 channel I/O) that advertises capability bits. The data plane is a generic ring layout called a virtqueue: a ring of descriptors, an avail ring (guest-to-host), and a used ring (host-to-guest). Each device class (virtio-net, virtio-blk, virtio-scsi, virtio-fs, virtio-gpu) defines its own message format on top of virtqueues.
The same shape, viewed sideways
Architecturally, virtio and VMBus are sibling answers to the same shaped problem.
Diagram source
flowchart LR
subgraph virtio_pci["virtio over PCI"]
gv["Guest virtio driver"]
vq["virtqueue (descriptors + avail + used)"]
host_be["Host backend (vhost-net, vhost-user, OpenVMM)"]
gv -- "PIO doorbell write" --> host_be
gv -- "shared memory" --- vq
host_be -- "shared memory" --- vq
host_be -- "MSI-X" --> gv
end
subgraph vmbus["Hyper-V VMBus"]
gv2["Guest VSC"]
ring["Two ring buffers + GPADL"]
vsp["Host VSP (kernel)"]
gv2 -- "Hypercall doorbell" --> vsp
gv2 -- "shared memory" --- ring
vsp -- "shared memory" --- ring
vsp -- "SINT" --> gv2
end Both:
- Use shared-memory rings for payload. The phrase "shared-memory rings" hides a small subtlety: a ring buffer is a circular buffer with separate read and write indices. Producer and consumer can run concurrently as long as they only touch their own index, which is what makes ring buffers a wait-free communication primitive on cache-coherent hardware.
- Use a doorbell for signalling.
- Batch many requests per doorbell so per-message hypercall cost amortises.
- Have per-class device protocols layered on top of a common transport.
The differences are where the world bites:
| Dimension | VMBus | virtio (1.2) |
|---|---|---|
| Transport | Software-only "bus", channel offer/open/close | PCI, MMIO, s390 channel I/O |
| Doorbell | Hypercall (HV_SIGNAL_EVENT) | PIO write to a doorbell BAR |
| Reverse signal | Synthetic interrupt (SINT) | MSI-X |
| Standardisation | Microsoft-owned, Open Specification Promise | OASIS-ratified, multi-vendor |
| Windows in-box drivers | Yes, every supported version | No; out-of-box signed VirtIO INFs from cloud vendors |
| Device classes beyond I/O | Yes: KVP, time sync, VSS, balloon | Limited; non-I/O often built on virtio-vsock or out-of-band agents |
| Cross-hypervisor portability | Hyper-V only | Universal: KVM, QEMU, Cloud Hypervisor, Firecracker, Xen HVM, OpenVMM |
| Spec governance | Single vendor under OSP | Multi-vendor with formal conformance clauses |
| Source for Linux side | drivers/hv/ | drivers/virtio in the Linux tree |
Where each design wins
Virtio's strongest claim is portability. The same Linux guest VM image, with the same in-tree virtio drivers, runs on KVM, QEMU, Cloud Hypervisor, AWS Firecracker, and (since 2024) Microsoft's own OpenVMM, which added virtio backend support. A workload that has to move between cloud providers benefits from this directly: the guest does not need a different driver stack per host.
Virtio also has a richer multi-vendor governance story. The spec is OASIS-ratified, with explicit conformance clauses; multiple commercial hypervisors implement it; multiple SmartNIC vendors implement virtio data planes in hardware (the vDPA and VDUSE work, described by Red Hat and the Linux kernel VDUSE doc).
VMBus's strongest claim is integration. Every supported Windows ships with the VSCs in-box; there is nothing for an admin to install. The transport carries not just I/O but a service catalogue: KVP for guest configuration, time sync, VSS for online backup, the heartbeat and shutdown channels. The TLFS, while owned by Microsoft, is published under the Open Specification Promise and is a single document a guest author can read end-to-end.
This is why "VirtIO drivers for Windows" exist as a separate project (the Fedora/Red Hat-signedvirtio-win package) for KVM clouds: out of the box, Windows does not know virtio. The Hyper-V world inverts the problem: out of the box, Linux does not need any third-party install because the drivers are upstream.
Where they coexist
The most interesting recent development is that the two camps have stopped being purely competitive. Microsoft's OpenVMM implements both VMBus and virtio backends, so a Linux guest using virtio drivers can run on a Microsoft-developed VMM, and a Windows guest using VMBus drivers can run on the same VMM. This is partially ideological (Microsoft is no longer pretending its way is the only way) and partially pragmatic (a single VMM that supports both transports is simpler than maintaining two).
Beyond the protocol-level comparison, both VMBus and virtio sit inside a larger composition with hardware passthrough, where the transport becomes the slow path and a real PCIe device carries the steady-state traffic.
Hardware passthrough as a complement
The composition that runs almost every modern Azure VM is VMBus + SR-IOV, packaged as Accelerated Networking. The same VM gets both a synthetic NIC (netvsc over VMBus) and an SR-IOV virtual function. The Linux netvsc driver documentation describes the failover mechanic: "If SR-IOV is enabled in both the vSwitch and the guest configuration, then the Virtual Function (VF) device is passed to the guest as a PCI device. In this case, both a synthetic (netvsc) and VF device are visible in the guest OS and both NIC's have the same MAC address. The VF is enslaved by netvsc device. The netvsc driver will transparently switch the data path to the VF when it is available and up." (Linux kernel: netvsc).
When live migration starts, Azure revokes the VF, the data plane falls back to the netvsc/VMBus path, the VM moves, and a new VF on the destination host gets re-attached, all without dropping TCP connections. The VMBus path was never the production hot path, but its existence is what enables migration. The KVM world's analogue is vDPA, which gives a virtio-shaped guest interface backed by a hardware data plane.
A modern Azure NIC stack is pushing this even further. Azure Boost moves both storage and networking data planes off the host CPU into dedicated FPGAs, with a stable Microsoft-engineered NIC interface called MANA. Microsoft's documentation reports up to 200 Gbps of network bandwidth and 6.6 million IOPS on local storage with this design, with the host's vmswitch still acting as the live-migration fallback path. The architectural insight is that the VMBus-based slow path is the durable invariant; what changes is whether the steady-state data plane is software, an SR-IOV VF, or a SmartNIC firmware path. Frameworks like DPDK sit on top of whichever data plane the VM exposes.
What none of this changes is the property Section 7 cared about: as long as a host-side VSP exists and parses guest-controlled bytes in kernel C/C++, the bug class is open. The next section is about the architectural move that closes it.
9. OpenVMM and OpenHCL: the 2024 open-source pivot
In 2024, Microsoft did two things that would have been hard to imagine a decade earlier. First, they open-sourced OpenVMM, a Rust implementation of the virtualization stack including the VSPs and the VMBus protocol. Second, they introduced OpenHCL, a "paravisor" configuration of OpenVMM that runs inside a confidential VM as a higher-trust mediator between the workload and the (now-untrusted) host.
Both moves are explained by the same trend the article has been circling: confidential computing fundamentally inverts the trust boundary, and the device model has to follow.
A higher-privileged software layer that runs inside a guest VM (not on the host) and mediates the guest's interaction with the hypervisor. In the Hyper-V model, a paravisor lives in VTL2 of the same VM whose workload runs in VTL0; the host hypervisor is outside the VM's trust boundary. The paravisor presents the workload with a familiar VMBus + VSP interface while internally talking to a hardware-isolated confidential VM substrate (AMD SEV-SNP or Intel TDX).
What changed in confidential computing
The classical Hyper-V trust model places the root partition at the apex of trust. The guest trusts the host. Memory the guest writes is, in the worst case, readable by the host. In confidential computing, that is no longer acceptable. A regulated workload (a healthcare database, a financial processor) needs to run in a VM whose contents are protected even from a malicious or compromised hypervisor. AMD's SEV-SNP and Intel's TDX are CPU features that encrypt and integrity-protect VM memory in hardware so that a compromised host cannot read the guest's secrets.
Azure Confidential Computing made these capabilities available as a product starting around 2022. The Azure confidential VM options page documents the SKUs.
This breaks the old VMBus story. In the classical model, the host's vmswitch.sys reads the guest's network packets out of the VMBus ring. In a confidential VM that protection demands you can no longer let the host see those bytes; that defeats the entire point. So the question becomes: where does the synthetic-device backend live, if not in the host?
The paravisor answer
The Linux kernel's Hyper-V CoCo VMs document describes the design directly: "Paravisor mode. In this mode, a paravisor layer between the guest and the host provides some operations needed to run as a CoCo VM. The guest operating system can have fewer CoCo enlightenments than is required in the fully-enlightened case ... some aspects of CoCo VMs are handled by the Hyper-V paravisor while the guest OS must be enlightened for other aspects."
OpenHCL is that paravisor. It runs in a higher-trust virtual trust level inside the same confidential VM (VTL2), it has access to the encrypted-memory primitives the CPU provides, and it presents the workload (in VTL0) with the same VMBus + VSP world a non-confidential VM would see. The workload OS does not need to be heavily modified; it sees what looks like Hyper-V, talks to what look like normal VSPs, and never has to know that those VSPs are now inside its own VM rather than on the host.
Diagram source
flowchart TD
HW["Confidential CPU (SEV-SNP / TDX)"]
HV["Host hypervisor (untrusted by the workload)"]
subgraph CoCoVM["Confidential VM (memory encrypted)"]
VTL2["VTL2: OpenHCL paravisor (Rust VSPs)"]
VTL0["VTL0: workload OS (Windows or Linux, lightly enlightened)"]
VTL0 -- "VMBus, looks normal" --- VTL2
end
HW --> HV
HV --> CoCoVM
HV -. "no access to guest plaintext" .-> CoCoVM The Rust rewrite
The other half of the story is memory safety. Recall Section 7's CVE list: every headline Hyper-V escape in the past decade involved a parser bug in C/C++ kernel code. OpenVMM's choice to implement the entire VMM, including the VSPs, in Rust is a direct response to that history. Rust's ownership model rules out, by construction, a large class of memory-safety bugs (use-after-free, out-of-bounds access on slices, double-free) that produced those CVEs.
This does not magically eliminate every vulnerability. A logic bug in a state machine, an integer-overflow on a length field, a side-channel timing leak: all of these still exist in Rust. But the categories that produced CVE-2017-0075, CVE-2021-28476, and CVE-2024-21407 are exactly the categories Rust was designed to make hard.
What you can actually look at
OpenVMM is not a press release; it is a public repository that ships:
- The full Rust source tree at github.com/microsoft/openvmm.
- A separate repository for the Linux kernel fork that the paravisor runs on top of, at github.com/microsoft/OHCL-Linux-Kernel.
- Project documentation centred at openvmm.dev.
- Both VMBus and virtio backends, so the same VMM can host Windows guests on VMBus and Linux guests on virtio.
- Documentation through the deeper Microsoft Tech Community explainer and the original announcement describing the paravisor's role.
For a security researcher or a regulated-cloud customer, this is a meaningful change. For the first time, the VMBus + VSP stack is auditable end-to-end in source.
Try it: read the OpenVMM VMBus channel-state code
If you want to see how a VSP actually consumes a channel, the OpenVMM repository contains the Rust modules that implement the VMBus channel state machine. Cloning the repo and grepping for Channel::open and RingBuffer shows the same offer/open/close/rescind pattern Section 3 described, expressed in Rust types whose lifetimes the compiler checks. Reading the same logic in Rust after reading the Linux C version in drivers/hv/channel_mgmt.c is a useful exercise; the abstraction is identical, and the safety guarantees diverge.
What still has to be solved
The kernel CoCo doc is candid about an open architectural problem that OpenHCL alone cannot solve: "Unfortunately, there is no standardized enumeration of feature/functions that might be provided in the paravisor, and there is no standardized mechanism for a guest OS to query the paravisor for the feature/functions it provides. The understanding of what the paravisor provides is hard-coded in the guest OS." (Linux kernel: CoCo VMs).
In other words, the TLFS gave us a portable contract between guests and Hyper-V hypervisors. The paravisor world does not yet have an equivalent portable contract between guests and paravisors. Today's guests have OpenHCL-specific knowledge baked in. A future "paravisor TLFS" would let any compliant paravisor host any compliant guest, the same way the original TLFS did for the hypervisor. That standard does not exist yet, and writing it is the most consequential open problem in this corner of the architecture.
The architecture is moving. Section 10 takes stock of what that means for engineers building or operating on this stack today.
10. Engineering takeaways and open problems
A working architecture is one where the trade-offs are visible. Hyper-V's enlightenments + VMBus + VSP/VSC stack is a working architecture in exactly that sense: every property it has, including the security ones, is a consequence of design choices a reader can name.
What the design optimises for
Three explicit optimisations:
- In-box drivers for closed-source guests. Hardware virtualization handles privileged CPU instructions; the guest only needs to load a VMBus client driver to opt in to the fast path. Every supported Windows ships those drivers in-box. Every modern Linux ships them in-tree. There is no "install paravirt drivers" step, which is a large reason "it just works."
- A single transport that carries everything. VMBus carries 12+ device classes plus non-device services (KVP, time sync, VSS, balloon, heartbeat). One protocol, one set of primitives, one debugging surface. This is the engineering equivalent of "everything is a file" applied to inter-partition communication.
- Live migration. Because the data plane is software in the root partition, the VM is not bound to a specific host. The VSPs serialise their state during migration without guest cooperation. This is the property that makes VMBus the durable invariant under hardware-passthrough acceleration: SR-IOV gives you throughput; VMBus gives you mobility.
What it pays for those properties
Two costs:
- The host CPU is on the data plane. A software ring serviced by
vmswitch.syscannot match a 100 GbE NIC's line rate per host CPU core. Microsoft's answer is hybrid composition with SR-IOV (Accelerated Networking) and SmartNIC offload (Azure Boost + MANA). The KVM analogue is vDPA. Both of these accept the structural truth that for the highest throughputs, the host CPU has to leave the data plane. - The host kernel parses guest-controlled bytes. Section 7's CVE record is the catalogue of what that costs. The architectural answer is OpenHCL: move the parser into the guest's own trust boundary and rewrite it in Rust.
A four-property idealisation
It is useful to write down what an idealised paravirt I/O stack would do, so it is clear which properties any real stack today is trading away.
The four idealised properties:
- Zero hypercalls per packet in steady state.
- Live-migration parity with a software baseline.
- Cross-vendor / cross-hypervisor portability of the guest driver.
- No host-side memory-unsafe parser of guest-controlled data.
| Approach | (1) Zero hypercall | (2) Live migration | (3) Portability | (4) No unsafe host parser |
|---|---|---|---|---|
| VMBus + in-kernel VSP | partial (batched) | yes | no | no |
| virtio + vhost-net | partial (batched) | yes | yes | no |
| SR-IOV / DDA | yes | no | no | yes |
| Accelerated Networking (VMBus + SR-IOV) | yes (steady) | yes | no | no |
| vDPA | yes | partial | yes | no |
| OpenHCL paravisor + VMBus | partial | yes | partial | yes |
| Azure Boost + MANA | yes | yes | no | partial |
No single approach today matches all four properties. The Hyper-V production composition is roughly (VMBus baseline) + (Accelerated Networking for throughput) + (OpenHCL for confidential workloads). The KVM-world composition is (virtio baseline) + (vDPA / SmartNIC for throughput). SmartNIC-based stacks (Azure Boost, AWS Nitro, Google's offload) approach the same four-corner problem from yet another angle.
This is a synthesis, not a single-source claim: the matrix combines properties documented separately in the Microsoft Accelerated Networking docs, the Linux kernel CoCo doc, the Discrete Device Assignment doc, the SR-IOV overview, the Linux netvsc driver doc, the VDUSE userspace interface, the vPCI doc, and the OpenHCL explainer. Each individual cell is sourced; the ranking is the author's reading of those sources.
Practical pitfalls for operators
A few things the customer-facing docs do not always say plainly:
vmbusrhidis not low-risk. The keyboard/mouse channel is a kernel-level RPC surface from guest to root. Treat it the same way you would treat netvsc when modelling threat exposure.- Generation-2 VMs reduce attack surface. Choosing Generation-2 for new workloads removes the legacy IDE/PS/2/PIC emulators from the host data path entirely (Microsoft Learn: Gen 1 vs Gen 2).
- Mixing in-box and out-of-band Integration Services breaks things. Modern Windows and modern Linux already have the drivers; installing the legacy LIS package on top can break MSI-X handling and PCI passthrough (Linux kernel: overview).
- DDA is not SR-IOV. Discrete Device Assignment covers any PCIe device passthrough, but Microsoft formally supports only GPUs and NVMe as device classes (Microsoft Learn: DDA planning).
- Confidential VMs do not have the same device set. Hardware constraints reduce or alter the device classes available; always validate the specific synthetic devices your workload depends on are present in the target SKU (Linux kernel: CoCo).
Open problems worth watching
The substantive open problems are:
- A standardised paravisor feature-enumeration interface. OpenHCL is the first auditable paravisor, but there is no portable contract a guest can use to query "what does this paravisor support." The TLFS gave us this for hypervisors; the paravisor analogue is missing (Linux kernel: CoCo).
- Confidential-VM-friendly live migration with paravirt devices. Hardware-attested state cannot be cloned trivially; today's pragmatic answer is to constrain migration in CoCo VMs. A general solution is open.
- A formal model of the VMBus offer/rescind state machine. The kernel docs describe it narratively. A model that the VSP code could be checked against would let static analysis rule out the bug class behind the headline CVEs.
- Live-migrating stateful SR-IOV VFs without device cooperation. Vendor proposals exist; an industry standard does not.
- Erasing memory-unsafety in legacy VSPs. The Rust rewrite path in OpenVMM is correct; the multi-year engineering effort to convert every existing VSP is real. CVE-2024-21407 is recent enough to remind everyone the bug class is still producing fresh entries.
What to remember in five years
The most important sentence in this article is one I have been quietly preparing throughout: the durable architectural invariant in Hyper-V is shared-memory ring + doorbell, with a published guest-side contract. Everything else, including the choice of programming language for the VSP, the question of whether the data plane is software or hardware, and even whether the trust boundary places the VSP on the host or in a paravisor, is implementation. The transport is the invariant. That is the lesson the next decade of CoCo VMs and SmartNIC offload is converging toward: keep the contract stable, and let everything else change.
FAQ
Frequently asked questions
Do I have to install Linux Integration Services on a modern Linux Hyper-V guest?
No. The drivers (hv_vmbus, hv_netvsc, hv_storvsc, hv_utils, pci-hyperv, hv_balloon) have been in the upstream Linux kernel since 2.6.32 in December 2009 and ship in every mainstream distribution. The legacy LIS package is a holdover from the era before in-tree support and can in fact break MSI-X handling and PCI passthrough if installed on top of a modern kernel (Linux kernel: Hyper-V overview).
Why is the host-side VSP code path the security focus, not the guest-side VSC?
Because the trust gradient is asymmetric. The VSP runs in the root partition's kernel, the most privileged context on the box; the VSC runs in a normal guest kernel. Bytes flowing from guest to host get parsed by code with full system privilege. A VSC bug typically harms only the guest; a VSP bug can be a cross-tenant compromise. The pattern is visible in the CVE record: CVE-2017-0075, CVE-2021-28476, and CVE-2024-21407 all hit host-side parsers.
If Accelerated Networking uses an SR-IOV VF for the steady-state path, why keep VMBus at all?
For live migration. SR-IOV gives you near-bare-metal throughput but binds the VM to a specific physical NIC; you cannot migrate that state. Keeping a VMBus-backed netvsc device in the same guest gives the hypervisor a software path it can fall back to during migration windows. The Linux kernel netvsc doc describes this failover explicitly: when SR-IOV is enabled, the VF is enslaved by netvsc and the data path switches transparently when the VF is up (Linux kernel: netvsc).
Is OpenHCL the same thing as OpenVMM?
OpenHCL is a configuration of OpenVMM, not a separate codebase. OpenVMM is the Rust virtualization stack at github.com/microsoft/openvmm; OpenHCL is OpenVMM run as a paravisor inside a confidential VM's higher-trust virtual trust level (VTL2), so that the synthetic-device backends sit inside the guest's own trust boundary rather than on a host the guest cannot trust. The same Rust code can run as a host-side VMM (when paired with a hypervisor on the host) or as an in-guest paravisor (when running inside a SEV-SNP or TDX VM).
Can I run virtio devices on Hyper-V or VMBus devices on KVM?
Both directions exist with caveats. OpenVMM, when used as a host VMM, supports both VMBus and virtio backends, so a Linux virtio guest can run on a Microsoft-developed VMM (github.com/microsoft/openvmm). Native Hyper-V on a Windows Server host historically expects VMBus-driven guests; there is no in-box virtio device emulation on a stock Hyper-V Server. KVM hosts can technically present a VMBus-shaped device, but in practice the production answer on KVM is virtio.
What is the relationship between Generation-2 VMs and security?
Generation-2 VMs use UEFI with Secure Boot, boot from synthetic SCSI, and have no emulated IDE, PS/2, or PIC in the data path (Microsoft Learn: Gen 1 vs Gen 2). Every emulator that is removed is one fewer parser running in the most privileged kernel on the host, so the host-side attack surface is meaningfully smaller. Generation-1 still exists for legacy guests that only know how to boot from BIOS + IDE.
How does VBS make a Windows desktop more secure if the desktop is not a virtualization host?
VBS uses the Hyper-V hypervisor to split a single Windows install into VTL0 (the normal kernel and apps) and VTL1 (the Secure Kernel and trustlets like lsaiso.exe). The hypervisor enforces that VTL0 cannot read or modify VTL1's memory, even with kernel privileges. So an attacker who already has SYSTEM-level code execution in the normal world cannot trivially extract LSASS secrets or load arbitrary unsigned kernel code; the hypervisor stops them. This works on any modern Windows machine with the right CPU features, regardless of whether you ever run a VM yourself (Microsoft Learn: Windows Server 2016 What's New).
Study guide
Key terms
- Type-1 hypervisor
- A hypervisor that runs directly on hardware rather than inside a host OS. Hyper-V is Type-1; the original Microsoft Virtual Server was Type-2.
- Root partition
- The privileged partition under Hyper-V that owns physical I/O devices and hosts the synthetic-device VSPs. Runs Windows Server.
- Child partition
- An unprivileged partition that hosts a guest OS. Communicates with the root partition over VMBus.
- Enlightenment
- A guest-OS modification or feature that takes advantage of running under a specific hypervisor by using paravirtual interfaces (hypercalls, synthetic timers, SINTs) instead of trapping on emulated hardware.
- Top-Level Functional Specification (TLFS)
- Microsoft's published hypervisor ABI for Hyper-V, governing hypercalls, synthetic MSRs, synthetic interrupts, synthetic timers, and the VMBus protocol. Released under the Open Specification Promise.
- VMBus
- Hyper-V's software-only inter-partition transport. Has a control path (channel offer/open/close/rescind) and per-device shared-memory ring channels with SINT-based doorbells.
- VSP / VSC
- Virtualization Service Provider (root-partition kernel module that owns a synthetic-device backend) and Virtualization Service Client (guest-side driver that consumes the channel).
- Synthetic Interrupt Controller (SynIC)
- Per-vCPU synthetic interrupt subsystem with 16 SINT slots and shared message/event pages; the doorbell mechanism for VMBus and synthetic timers.
- Reference TSC page
- A guest-readable page maintained by Hyper-V containing scale and offset such that the guest can compute a 10 MHz monotonic clock from the hardware TSC entirely in user space.
- Generation-2 VM
- A Hyper-V VM that boots UEFI with Secure Boot from synthetic SCSI, with no emulated IDE/PS/2/PIC. Reduces host-side attack surface and supports VHDX up to 64 TB.
- Discrete Device Assignment (DDA)
- Hyper-V's general PCIe-passthrough mechanism. Microsoft formally supports GPUs and NVMe; other devices may work with vendor support.
- Accelerated Networking
- An Azure/Hyper-V feature that attaches both a synthetic NIC (netvsc over VMBus) and an SR-IOV virtual function to a guest, with netvsc as the live-migration fallback path.
- VBS / HVCI / VTL
- Virtualization-Based Security uses the Hyper-V hypervisor to split a single guest into Virtual Trust Levels (VTL0 normal, VTL1 secure). HVCI (Hypervisor-protected Code Integrity) and trustlets like lsaiso.exe live in VTL1.
- Paravisor
- A higher-trust software layer running inside a confidential VM (typically in VTL2) that mediates between the workload and the untrusted host hypervisor; presents the workload with a familiar VMBus + VSP world.
- OpenVMM / OpenHCL
- Microsoft's 2024 open-source Rust virtualization stack and its paravisor configuration. Re-implements the VSPs in memory-safe Rust to address the bug class behind CVE-2017-0075, CVE-2021-28476, and CVE-2024-21407.
Comprehension questions
Why does Microsoft maintain the Top-Level Functional Specification under the Open Specification Promise rather than as an internal document?
Because the OSP is what makes it legally and practically safe for the Linux community to ship in-tree drivers (drivers/hv/) implementing the hypervisor's guest-side ABI. Without the published, OSP-protected spec, Linux could only support Hyper-V via reverse-engineering, which would not have been politically or technically acceptable upstream. The OSP is the contractual artefact that turned 'Hyper-V can host Linux' from a vendor claim into a maintained, in-tree reality.
Walk through the lifecycle of a single network packet from a Hyper-V guest's userspace to the wire.
(1) The guest application calls send(); (2) the guest TCP/IP stack hands a packet to the hv_netvsc driver; (3) hv_netvsc allocates a slot in the netvsc TX VMBus ring, copies the descriptor and payload, and writes the new write index; (4) if the host is not already chasing the writes, the guest issues a HV_SIGNAL_EVENT hypercall (one VMEXIT) to fire the SINT for that channel; (5) the host's vmswitch.sys VSP reaps the descriptor from the ring, parses the RNDIS frame, and forwards it to the virtual switch; (6) the virtual switch dispatches it to a real NIC. In the steady state, a single VMEXIT can amortise across many packets through batching.
Explain why the host-side VSP is the historical CVE locus for Hyper-V escapes.
Because the VSP runs in the root partition's kernel (the most privileged context on the box) and parses guest-controlled bytes from the VMBus ring. Any memory-safety mistake (length confusion, missing bounds check, integer overflow) in C/C++ kernel code translates directly to code execution in the most privileged supervisor on the host. CVE-2017-0075, CVE-2021-28476 (vmswitch.sys), and CVE-2024-21407 all instantiate this pattern. The attack surface is structural, not incidental.
What does an enlightened Linux guest do when it first boots on Hyper-V, before any network or storage I/O happens?
It executes cpuid leaf 0x40000000 to detect the Microsoft hypervisor signature; reads further leaves to enumerate available enlightenments; writes HV_X64_MSR_GUEST_OS_ID to declare itself; writes HV_X64_MSR_HYPERCALL with a guest-physical address and an enable bit, prompting the hypervisor to populate that page with the right vmcall/vmmcall opcode; sets up SINT slots and a per-CPU SynIC message page; optionally reads the reference TSC page; loads the hv_vmbus driver, which begins receiving channel offers from the root partition; and binds class-specific drivers (hv_netvsc, hv_storvsc, etc.) to each offered channel.
Why is OpenHCL described as a paravisor rather than a hypervisor or a VMM?
Because it sits inside a guest VM (in VTL2 of that VM), not on the host, and its job is to mediate between the guest workload and a hypervisor that the guest does not trust. A hypervisor on the host runs underneath all VMs; a VMM owns and controls VMs from outside; a paravisor lives inside one VM, at higher privilege than that VM's workload, and presents the workload with a familiar device-model surface (VMBus + VSPs) that is now backed by code inside the guest's own trust boundary rather than by the host kernel. The architecture inverts the historical Hyper-V trust model so that confidential VMs can be protected from a malicious host.
Compare VMBus's ring-buffer transport to virtio's virtqueues. What is the same and what is different?
Same: shared-memory rings carrying descriptors and payload; doorbell-based signalling so per-message hypercall cost amortises across batches; per-device-class protocols layered on a common transport. Different: VMBus uses a software-only 'bus' with offer/open/close/rescind control, while virtio rides on a real PCI/MMIO/channel-I/O transport with a generic capability-bit mechanism. VMBus's reverse signal is a SINT; virtio's is MSI-X. VMBus is Microsoft-owned under the OSP; virtio is OASIS-ratified and multi-vendor. VMBus has in-box Windows drivers and broader synthetic-service coverage (KVP, time sync, VSS); virtio has cross-hypervisor portability and a multi-vendor implementation pool.
References
- Hyper-V on Windows Server. https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/hyper-v-on-windows-server ↩
- About Hyper-V on Windows. https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/about/ ↩
- Hyper-V TLFS: Feature Discovery. https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery ↩
- Hyper-V Top-Level Functional Specification. https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs ↩
- Linux Kernel: Hyper-V Enlightenments. https://www.kernel.org/doc/html/latest/virt/hyperv/index.html ↩
- What's new in Windows Server 2016. https://learn.microsoft.com/en-us/windows-server/get-started/whats-new-in-windows-server-2016 ↩
- What's new in Windows Server 2019. https://learn.microsoft.com/en-us/windows-server/get-started/whats-new-in-windows-server-2019 ↩
- Azure Confidential Computing. https://learn.microsoft.com/en-us/azure/confidential-computing/ ↩
- OpenHCL: the new open-source paravisor (announce). https://techcommunity.microsoft.com/blog/windowsosplatform/openhcl-the-new-open-source-paravisor/4242991 ↩
- Microsoft Virtual Server (Wikipedia). https://en.wikipedia.org/wiki/Microsoft_Virtual_Server ↩
- Linux Kernel: Hyper-V Overview. https://www.kernel.org/doc/html/latest/virt/hyperv/overview.html ↩
- Hyper-V TLFS: Hypercall Interface. https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface ↩
- Linux Kernel: VMBus. https://www.kernel.org/doc/html/latest/virt/hyperv/vmbus.html ↩
- Hyper-V Architecture (Performance Tuning). https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/role/hyper-v-server/architecture ↩
- NVD: CVE-2017-0075. https://nvd.nist.gov/vuln/detail/CVE-2017-0075 ↩
- NVD: CVE-2021-28476. https://nvd.nist.gov/vuln/detail/CVE-2021-28476 ↩
- (2008). virtio: Towards a De-Facto Standard For Virtual I/O Devices. https://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf ↩
- OASIS Virtio 1.1 Specification. https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.pdf ↩
- microsoft/openvmm (GitHub). https://github.com/microsoft/openvmm ↩
- Generation 1 vs Generation 2 VMs in Hyper-V. https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v ↩
- OpenHCL: the new open-source paravisor (deep explainer). https://techcommunity.microsoft.com/blog/windowsosplatform/openhcl-the-new-open-source-paravisor/4273172 ↩
- Linux Kernel: Hyper-V Clocks. https://www.kernel.org/doc/html/latest/virt/hyperv/clocks.html ↩
- Overview of Hyper-V (Network Drivers). https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v ↩
- Akamai: Critical Vulnerability in Hyper-V (CVE-2021-28476). https://www.akamai.com/blog/security/critical-vulnerability-in-hyper-v-allowed-attackers-to-exploit-azure ↩
- NVD: CVE-2024-21407. https://nvd.nist.gov/vuln/detail/CVE-2024-21407 ↩
- Microsoft Hyper-V Bounty Program. https://www.microsoft.com/en-us/msrc/bounty-hyper-v ↩
- OASIS Virtio 1.2 Specification. https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html ↩
- Discrete Device Assignment Planning. https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment ↩
- Linux Kernel: vPCI. https://www.kernel.org/doc/html/latest/virt/hyperv/vpci.html ↩
- Overview of SR-IOV. https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov- ↩
- Azure Accelerated Networking. https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview ↩
- Linux Kernel: netvsc Driver. https://docs.kernel.org/networking/device_drivers/ethernet/microsoft/netvsc.html ↩
- Red Hat: Introduction to vDPA Kernel Framework. https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework ↩
- Linux Kernel: VDUSE. https://www.kernel.org/doc/html/latest/userspace-api/vduse.html ↩
- microsoft/OHCL-Linux-Kernel (GitHub). https://github.com/microsoft/OHCL-Linux-Kernel ↩
- OpenVMM project documentation. https://openvmm.dev/ ↩
- Linux Kernel: Hyper-V CoCo VMs. https://www.kernel.org/doc/html/latest/virt/hyperv/coco.html ↩
- Azure Confidential VM Options. https://learn.microsoft.com/en-us/azure/confidential-computing/virtual-machine-options ↩
- Azure Boost Overview. https://learn.microsoft.com/en-us/azure/azure-boost/overview ↩
- Microsoft Azure Network Adapter (MANA). https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-mana-overview ↩
- About DPDK. https://www.dpdk.org/about/ ↩
- Xen Project: Paravirtualization (PV). https://wiki.xenproject.org/wiki/Paravirtualization_(PV ↩