# eBPF vs ETW: Two Generations of Kernel Observability

> Why Windows ETW emits events and Linux eBPF computes them -- and what eBPF-for-Windows reveals about the convergence of two operating systems.

*Published: 2026-05-16*
*Canonical: https://paragmali.com/blog/ebpf-vs-etw-two-generations-of-kernel-observability*
*License: CC BY 4.0 - https://creativecommons.org/licenses/by/4.0/*

---
<TLDR>
**ETW (Windows 2000) is event emission only.** Per-CPU lock-free ring buffers, manifest-defined providers, kernel-mediated dispatch. Sessions filter by provider, keyword, and level; every enabled event is fully serialized and crosses the kernel/user boundary.

**eBPF (Linux 2014) inverts the model.** The consumer ships verified bytecode into the kernel; programs filter and aggregate at the hook site before any data crosses the boundary. JIT-compiled, with hooks across kprobe, uprobe, tracepoint, XDP, TC, and LSM.

**The verifier is the trust boundary -- and the catch.** Rice's theorem says no in-kernel verifier can be simultaneously sound, complete, and decidable. Linux's verifier trades soundness in the corner cases (CVE-2023-2163 and three predecessors); PREVAIL (the verifier used by eBPF-for-Windows) trades completeness more heavily for stronger formal grounding.

**eBPF-for-Windows is the first cross-OS-portable kernel-observability primitive.** PREVAIL verifies in user mode, `bpf2c` transliterates verified bytecode to C, MSVC compiles to a signed `.sys` driver. Networking-subset hooks only as of 2026; full kprobe-equivalent coverage is the work in progress.
</TLDR>

## 1. The SOC Analyst Sees the Same Thing Twice

A Security Operations Center analyst opens two `Sysmon/Operational` event channels side by side. One channel is streaming from a Red Hat Enterprise Linux host; the other is streaming from a Windows Server 2022 domain controller. The XML configuration is the same. The Event IDs are the same. A `ProcessCreate` record from either host carries the same `Image`, `CommandLine`, `ParentImage`, `IntegrityLevel`, and `Hashes` fields. Detection rules written against one channel match the other. To the analyst, the two operating systems are interchangeable.

Underneath, they are not even close.

On the Windows side, every event was emitted by a kernel provider -- `Microsoft-Windows-Sysmon`, `Microsoft-Windows-Threat-Intelligence`, `Microsoft-Windows-Kernel-Process` -- before the Sysmon user-mode service ever ran its XML filter. The kernel produced a fully formatted event, dropped it into a per-CPU ring buffer, and let user space pick it up. Every enabled event made the kernel-to-user trip in full. The filter inside Sysmon's user-mode service is what kept the on-disk log small. The wire between the kernel and the consumer carried the full firehose.

On the Linux side, no kernel module owned by Microsoft is running. The same Sysmon binary is attached to roughly twenty Linux kernel probes through [the `SysinternalsEBPF` library](https://github.com/microsoft/SysmonForLinux). Each probe is an eBPF program: bytecode that was [compiled by clang, verified by the kernel before load, JIT-compiled to native instructions, and attached to a hook inside the kernel](https://ebpf.io/what-is-ebpf/). When `execve` fires, the verified program runs on the producing CPU, reads its arguments out of the kernel context, decides whether the call matches the XML configuration's predicates, and -- only then -- writes a record into a ring buffer. The events that arrive in user space were already filtered inside the kernel. The wire carries only what the configuration cares about.

The output channels match because [Sysmon for Linux is engineered to look exactly like Sysmon for Windows](https://github.com/microsoft/SysmonForLinux). The substrate underneath is engineered for two different decades. ETW is from 2000. eBPF is from 2014. The fourteen-year gap shows up not in features but in *how the kernel does its job*.

> **Key idea:** ETW emits. eBPF computes. That gap is the entire generation difference. Everything else in this article is a consequence of it.

This article is about why those two designs exist, why the second one is strictly more powerful, why "strictly more powerful" cost the Linux kernel a new class of CVE, and what Microsoft's [`microsoft/ebpf-for-windows`](https://github.com/microsoft/ebpf-for-windows) project -- now in its sixth year of development -- reveals about which design wins at the point of convergence. By the end you will know both substrates well enough to choose between them, understand their failure modes, and see why "two generations" is not marketing language but a literal description of the engineering arc.

## 2. A Tale of Two Lineages

In 1992, Van Jacobson and Steven McCanne at Lawrence Berkeley Laboratory wrote [a small virtual machine for packet filtering](https://www.tcpdump.org/papers/bpf-usenix93.pdf). In 2000, a separate Microsoft team shipped a kernel event bus inside Windows 2000. Neither group knew the other existed. Each was solving a different version of the same problem: *how do you watch the kernel from user space without owning the kernel?*

The two answers ran in parallel for twenty-two years before they collided.

**1992 -- The BSD Packet Filter.** McCanne and Jacobson published "The BSD Packet Filter: A New Architecture for User-level Packet Capture" at USENIX Winter 1993, describing work that landed in 4.3BSD-Reno earlier in 1992. The motivation was painfully concrete: `tcpdump` was copying every packet through the kernel-user boundary, then discarding the ones the user did not want. BPF moved that filter into the kernel. A tiny two-register, 32-bit virtual machine evaluated a user-supplied predicate against each packet before any copy; only matching packets crossed into user space. The architectural insight that would survive thirty years is one sentence: *filter where the data is produced, not where it is consumed.*

<Definition term="eBPF (Extended Berkeley Packet Filter)">
A safe, sandboxed virtual machine inside the Linux kernel that runs user-supplied programs at attached hook points. Programs are written in restricted C, compiled to a 64-bit RISC-style bytecode, statically verified before load, and JIT-compiled to native code. The "extended" version, introduced in [Linux 3.18 (December 2014)](https://www.kernel.org/doc/html/latest/bpf/index.html), generalized BPF from a packet-filter language into a general kernel-extensibility mechanism.
</Definition>

**2000 -- Event Tracing for Windows.** Microsoft shipped ETW with Windows 2000. [The reference portal](https://learn.microsoft.com/en-us/windows/win32/etw/event-tracing-portal) describes the design Microsoft had been refining since the late 1990s: a kernel-mediated event bus with three roles -- providers, sessions, and consumers -- and per-CPU lock-free ring buffers. ETW's architectural insight was the inverse of BPF's: *event identity and causal order are first-class. A kernel-mediated dispatch makes them cheap.* A `tcpdump` filter wants to throw events away. A security telemetry system wants to keep them, attribute them, and order them.

<Definition term="ETW (Event Tracing for Windows)">
A kernel-mediated tracing facility shipped in Windows 2000. Providers (kernel or user-mode components) emit structured events to per-CPU ring buffers; sessions own the buffers and select which providers to enable at which level; consumers receive the event stream either in real time or by reading the on-disk `.etl` log. ETW is documented at [`learn.microsoft.com/.../etw/event-tracing-portal`](https://learn.microsoft.com/en-us/windows/win32/etw/event-tracing-portal).
</Definition>

**2003-2005 -- DTrace.** Bryan Cantrill, Mike Shapiro, and Adam Leventhal at Sun Microsystems started work in 2003 on what would become the first production-grade dynamic tracing system. [DTrace shipped publicly in Solaris 10 in January 2005](https://en.wikipedia.org/wiki/DTrace) and quickly ported to FreeBSD and macOS. Its central idea -- safe in-kernel scripts attached to probes, with a single language for tracing the entire system -- is the spiritual ancestor of every modern kernel observability tool, including eBPF.<Sidenote>Wikipedia gives DTrace's initial public release as January 2005, with Sun's internal development starting around 2003. The "DTrace 2003" claim that appears in some retrospectives conflates project inception with public release; we use the 2005 ship date here and note 2003 only as a development start.</Sidenote> Linux could not adopt it directly: DTrace is licensed under the CDDL, which is GPLv2-incompatible.

**2005 -- SystemTap.** Red Hat attempted to fill the Linux DTrace gap with [SystemTap](https://sourceware.org/systemtap/). The architectural compromise that doomed it: SystemTap scripts compile to a *kernel module*, loaded at runtime. Allowing user-supplied kernel modules to be loaded on demand is a privileged operation by definition, so production SystemTap deployments restricted use to local root. That made the observability case study moot: if you already have root, you can use any debugging tool. SystemTap survives as a niche tracing system; it did not become the Linux answer to DTrace.

**1992-2014 -- classic BPF stagnates.** The original BPF VM kept finding new jobs. [Linux Socket Filtering](https://www.kernel.org/doc/Documentation/networking/filter.txt) ported the BSD filter into the Linux kernel in 1997. seccomp-bpf in 2012 gave it a second job: filtering system calls for sandboxing. But the language remained a 32-bit two-register packet-filter VM. It could not be extended to general kernel observability without rewriting the instruction set architecture from the ground up.

**2014 -- eBPF.** Alexei Starovoitov's "extended BPF" patch series landed in [Linux 3.18 in December 2014](https://www.kernel.org/doc/html/latest/bpf/index.html), described in LWN's contemporaneous article on [Starovoitov's eBPF patch set](https://lwn.net/Articles/603983/). The rewrite was thorough: 64-bit instruction set, eleven registers, maps for in-kernel state, helper calls into kernel APIs, a JIT compiler, and -- the part that mattered most -- a kernel verifier that statically proves safety before any program runs. The verifier is what turned the packet filter into a general kernel extension mechanism. Without it, every BPF program would have to be trusted; with it, untrusted user code can execute in kernel mode.

By the time eBPF shipped, Windows had ETW everywhere. Linux had `auditd`'s pull-based audit log and a handful of `perf` events. Then Starovoitov rewrote BPF, and the architectural balance shifted overnight. The next decade of Linux observability was built on the new instruction set. The next decade of Windows observability stayed on ETW. The two designs ran in parallel until 2021, when Microsoft announced that eBPF would also run on Windows.

<Mermaid caption="Timeline of kernel observability primitives, 1992-2026.">
flowchart LR
    A[BPF — 1992 — LBL]
    B[ETW — 2000 — Windows 2000]
    C[DTrace — 2005 — Solaris 10]
    D[SystemTap — 2005 — Red Hat]
    E[seccomp-bpf — 2012 — Linux 3.5]
    F[eBPF — 2014 — Linux 3.18]
    G[BPF Trampoline — 2019 — Linux 5.5]
    H[BPF Ringbuf — 2020 — Linux 5.8]
    I[eBPF for Windows — 2021 — Microsoft]
    J[RFC 9669 BPF ISA — 2024 — IETF]
    A --> B --> C --> D --> E --> F --> G --> H --> I --> J
</Mermaid>

The diagram lays the substrate stories side by side. Each arrow is an architectural decision that constrained what came after. The next two sections walk each design end to end -- ETW first, because it is older and emission-only and easier to internalize.

## 3. ETW: Pure Event Emission

A natural question that turns out to be the wrong one: *why didn't Microsoft just keep extending performance counters?* By the late 1990s, Windows already had a mature counter facility -- `perfmon`, the [Windows Performance Counters portal](https://learn.microsoft.com/en-us/windows/win32/perfctrs/performance-counters-portal). It exposed CPU percentage, page-fault rate, queue lengths, and hundreds of other scalar metrics. If you wanted to know how loaded your system was, perfmon told you.

It also told you almost nothing useful for security telemetry.

<Aside label="Why perfmon failed for security telemetry">
Three structural failures of the counter model show up the moment you try to use it as the substrate for an EDR.

1. **Sampling-rate floor.** A counter can only be observed at the rate the consumer queries. On a busy host -- sshd children, container init forks, a CI runner -- process-creation rates routinely exceed any sane query rate. The counter aggregates the events it cannot expose into a single integer that hides the structure of what happened.
2. **No identity.** "Three hundred process creations in the last second" is a counter. "User `bob` ran `/tmp/.x` with parent `/usr/sbin/cron` at 14:33:07.221Z" is an event. The security model requires identity; the counter model erases it.
3. **No causal order.** Two counters sampled in sequence are not causally ordered with respect to the system events they describe. ETW's per-CPU buffers with QPC timestamps preserve causal order across CPUs to within the timer's accuracy.
</Aside>

The fix was not a faster perfmon. The fix was an entirely different shape of telemetry. ETW was that shape: push-based, per-event, kernel-attributed, with stable schemas declared up front. The contrast between perfmon (a sampling counter) and ETW (an event bus) is not parametric. The two systems answer different questions. Security needs the event-bus answer.

### Provider, session, consumer

ETW's data plane has three roles, every one of them a kernel-mediated object.

A *provider* is a kernel or user-mode component that calls `EventWrite` or `EtwWrite` to emit a structured event. Providers identify themselves by GUID. They declare the schema of their events ahead of time: classic providers via MOF, the [Vista-and-later manifest format](https://learn.microsoft.com/en-us/windows/win32/etw/about-event-tracing) called `WEVT`, or [TraceLogging](https://learn.microsoft.com/en-us/windows/win32/tracelogging/trace-logging-portal) for self-describing events. The schema is part of the contract: a consumer that knows the provider's manifest knows the field layout of every event the provider will ever emit.

A *session* is a kernel object created by `StartTrace`. It owns a set of per-CPU buffers and a list of enabled providers, with per-provider level and keyword masks. Sessions can write events to disk (`.etl` files) or be consumed in real time.<MarginNote>The `.etl` file extension stands for "Event Trace Log." It is the on-disk format read by Windows Performance Analyzer and by `tracerpt.exe` for post-hoc analysis.</MarginNote>

A *consumer* is a user-mode process that calls `OpenTrace` and `ProcessTrace` and receives event callbacks. EDR agents like Sysmon, Defender, and the third-party agents that ship with [Microsoft Defender for Endpoint](https://learn.microsoft.com/en-us/defender-endpoint/) are real-time consumers.

<Definition term="Provider, Session, Consumer">
ETW's three-role architecture. *Providers* emit events into per-CPU ring buffers. *Sessions* are kernel objects that own buffers and select which providers to enable. *Consumers* are user-mode processes that read the buffers in real time or open the on-disk `.etl` file. The taxonomy is defined in [the ETW provider documentation](https://learn.microsoft.com/en-us/windows/win32/etw/about-event-tracing).
</Definition>

### The per-CPU ring buffer

The algorithmic core of ETW is a per-CPU lock-free ring buffer. When a provider on CPU 3 calls `EventWrite`, the kernel formats the event according to the provider's manifest, stamps it with a QPC timestamp, and `memcpy`s the result into the per-CPU buffer for CPU 3. A kernel writer thread drains the buffer asynchronously into the session's destination -- either an `.etl` file on disk or a consumer's callback queue. The producer-side cost is constant: a function call plus a buffered `memcpy`, all on the local CPU, with no cross-CPU synchronization.

<Definition term="QPC (QueryPerformanceCounter)">
The Windows monotonic timestamp source used for ETW event timestamps. QPC is backed by hardware timers (TSC on modern x86, generic counter on ARM64) and provides a high-resolution counter that does not go backward.
</Definition>

QPC guarantees monotonic timestamps per CPU.<Sidenote>QPC is monotonic per CPU on modern hardware, but cross-CPU ordering still relies on the kernel writer thread's serialization when events from different CPUs are merged into a single output stream. Per-event timestamps from different CPUs can be ordered after the fact, but the merge happens in the writer, not in the producer.</Sidenote>

<Mermaid caption="ETW dispatch: provider -> per-CPU buffer -> session writer -> consumer.">
flowchart LR
    P1[Provider on CPU 0]
    P2[Provider on CPU 1]
    P3[Provider on CPU 2]
    B0[Per-CPU buffer 0]
    B1[Per-CPU buffer 1]
    B2[Per-CPU buffer 2]
    W[Kernel writer thread]
    S[Session]
    F[.etl file]
    C[Real-time consumer]
    P1 -- EventWrite --> B0
    P2 -- EventWrite --> B1
    P3 -- EventWrite --> B2
    B0 --> W
    B1 --> W
    B2 --> W
    W --> S
    S --> F
    S --> C
</Mermaid>

### The cost story

[Microsoft's reference portal](https://learn.microsoft.com/en-us/windows/win32/etw/event-tracing-portal) describes ETW as "high-volume, low-overhead." That qualitative claim has been the consensus practitioner finding for two decades. The most useful practical writeup is [Bruce Dawson's *ETW Central* index](https://randomascii.wordpress.com/2015/09/24/etw-central/), which links to more than forty blog posts on real ETW deployments and measurements. The honest summary, anchored to Dawson's practical experience plus the architectural reason (per-CPU lock-free buffers and a `memcpy` per event), is that typical telemetry configurations sit in the low single-digit-percent CPU range, and pathological "log everything" configurations can reach measurable user-visible slowdowns -- on the order of 5-10% in the worst cases. These are practitioner estimates, not benchmarked figures; the [BenchmarkDotNet documentation](https://benchmarkdotnet.org/articles/configs/diagnosers.html) for the `EtwProfiler` diagnoser explicitly acknowledges the cost: *"In order to not affect main results we perform a separate run if any diagnoser is used."* The overhead is small but it is not zero.

The cost has a structural cause. ETW has no in-kernel filter. The producer pays the full event-formatting cost on every emission, and the only filter is the session's level and keyword mask. If you enable a provider, every event that provider emits flows through the buffer. Filtering happens at the consumer, in user mode, after the event has crossed the boundary.

### The Threat-Intelligence provider

ETW providers are not equal. The most architecturally important one for security is `Microsoft-Windows-Threat-Intelligence`, a kernel-only provider that emits signals only the kernel can see: image loads, remote-thread creations, `VirtualProtect` changes that flip memory from data to executable. Only a process running under [Protected Process Light with the AntiMalware signer](https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon) can subscribe. That is why Defender, [CrowdStrike Falcon, SentinelOne, and Carbon Black](https://github.com/repnz/etw-providers-docs) all run as PPL-Antimalware: it is the entry ticket to the kernel-only telemetry that distinguishes serious EDR from script-level monitoring.

> **Note:** ETW's biggest weakness is that providers run inside the very process they are observing. A process can patch its own copy of `ntdll!EtwEventWrite` with a `ret` instruction and silence its own emissions before they reach the kernel buffer. EDR vendors monitor for this integrity violation out of band, treating the patch itself as a high-confidence detection signal. The very existence of the tell is an admission that ETW's original design assumed an honest user-mode producer -- a reasonable assumption in 2000, increasingly untenable in 2025.

[Sysmon 6.20](https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon), released in 2018, was the version that tied ETW into the modern EDR stack as a turnkey configuration.<Sidenote>The 2018 Sysmon 6.20 release added the configuration schema that the cybersecurity community converged on. By 2026, the same XML configuration -- including the `ProcessCreate`, `NetworkConnect`, `ImageLoad`, and `FileCreate` event IDs -- works on both Sysmon for Windows and Sysmon for Linux.</Sidenote> Sysmon, Microsoft's own free reference consumer authored by [Mark Russinovich and Thomas Garnier](https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon), demonstrated that an XML configuration plus an ETW consumer plus protected-process status was enough to build a useful EDR. Sysmon is not Defender; it is the open shape that the commercial EDR vendors built proprietary versions of.

### Closing on ETW

ETW emits. Every enabled event crosses the kernel-user boundary, fully formatted, with no in-kernel filtering language whatsoever. The session's level and keyword mask is a coarse on/off switch, not a programmable filter. Aggregation, sampling, and stack-trace folding happen in user mode, after the event is already across the boundary.

Now you can read the question that drove Starovoitov's 2014 rewrite: *what if you could filter in the kernel itself? What if you could compute -- not just emit?*

## 4. eBPF: Programmable In-Kernel Computation

The architectural inversion is one sentence. ETW is the producer telling the consumer what happened. eBPF is the consumer telling the producer what to compute. The producer is the kernel; the consumer is a user-mode process that has compiled, verified, and attached a small program that will run inside the kernel at a chosen hook. The roles are inverted, the data flow is inverted, and the trust model is inverted.

### The lifecycle

A canonical eBPF program goes through six stages before it does any useful work. The flow below is the same on every Linux kernel since 3.18, with refinements added over the years for BTF (BPF Type Format), CO-RE (Compile Once, Run Everywhere), and link primitives:

```text
1. clang -target bpf -O2 -c prog.c -o prog.o            # ELF with BTF
2. fd = bpf(BPF_PROG_LOAD, &attr)                       # kernel verifier runs
3. for each map referenced:
       map_fd = bpf(BPF_MAP_CREATE, &attr)
4. link = bpf(BPF_LINK_CREATE, kprobe|tracepoint|xdp|lsm|cgroup, fd)
5. at hook fire: JIT-compiled native code runs on the
   producing CPU, reads context, calls bpf_* helpers,
   writes to map or ringbuf
6. user space mmaps the ringbuf and consumes records
```

The lifecycle is documented in [the canonical kernel BPF documentation index](https://www.kernel.org/doc/html/latest/bpf/index.html). It is worth lingering on stage 2. Between the user-space `bpf()` syscall and the moment the kernel hands back a file descriptor for the loaded program, a static analyzer runs. That analyzer is the most consequential piece of code in this entire article. We treat it on its own in section 5.

<Mermaid caption="eBPF load and attach lifecycle: clang -> verifier -> JIT -> kernel hook.">
flowchart TD
    A["Restricted C source — (prog.c)"]
    B["clang -target bpf — BPF ELF + BTF"]
    C[bpf BPF_PROG_LOAD]
    D[Kernel verifier]
    E[JIT compiler]
    F[Kernel hook]
    G[bpf BPF_MAP_CREATE]
    H["BPF maps — (arrays, hashes, ringbuf)"]
    I["bpf BPF_LINK_CREATE — (kprobe/xdp/lsm/...)"]
    J[Hook fires]
    K[User space mmap ringbuf]
    A --> B --> C --> D
    D -->|reject| Z[E_INVAL to userspace]
    D -->|accept| E --> F
    C --> G --> H
    F --> I --> J
    J --> H
    H --> K
</Mermaid>

### Hooks: where programs attach

The thing that distinguishes eBPF from a packet filter is its hook surface. A *hook* is a place inside the kernel where a verified program can be attached, fired at the moment something happens. Linux has a lot of hooks.

<Definition term="Hook (eBPF)">
An attachment point in kernel code where a verified eBPF program runs. Different hook types receive different context arguments: a kprobe receives the function's CPU registers; an XDP program receives a packet buffer; an LSM hook receives the security operation's parameters. The hook type also determines what helpers and map types the verifier allows.
</Definition>

The hook taxonomy, drawn from [the kernel BPF docs](https://www.kernel.org/doc/html/latest/bpf/index.html) and [Cilium's BPF architecture reference](https://docs.cilium.io/en/latest/reference-guides/bpf/architecture/), is broad:

- `kprobe` and `kretprobe` -- entry and return of any non-inlined kernel function.
- `fentry` and `fexit` -- BPF trampoline replacement for kprobes, with no `int3` trap-frame cost.
- `uprobe` -- any user-space symbol in any process.
- `tracepoint` -- stable kernel tracepoints with version-locked schemas.
- `perf_event` -- sampling-profile hooks tied to perf events.
- `XDP` -- driver tail-call, before allocation of an `sk_buff`.
- `TC` -- Linux traffic-control qdisc hooks.
- `LSM` -- Linux Security Module hooks (mandatory-access-control points), available since Linux 5.7.
- `cgroup`, `sched`, `sock_ops` -- policy and socket-state hooks.

<Mermaid caption="eBPF hook surface across the Linux kernel.">
flowchart TD
    K["eBPF — Programs"]
    T["Tracing — (kprobe, fentry, — uprobe, tracepoint)"]
    N["Networking — (XDP, TC, sock_ops, — sk_lookup)"]
    S["Security — (LSM, seccomp, — landlock)"]
    P["Policy & scheduling — (cgroup, sched, — perf_event)"]
    K --> T
    K --> N
    K --> S
    K --> P
</Mermaid>

That hook surface is what makes eBPF the universal Linux instrumentation substrate. Once a developer learns the load-verify-attach lifecycle, the same toolchain instruments a TCP retransmit, a `do_sys_open` call, an LSM `file_open` check, and an XDP fast-path drop -- all in the same language with the same verifier and the same JIT.

### Maps: in-kernel state

The second piece of architecture eBPF adds over classic BPF is the *map* -- a kernel-managed key-value store accessible from inside a verified program and from user space. Maps are how eBPF programs hold state between invocations and how they communicate with user space.

<Definition term="BPF Map">
A kernel-managed data structure that an eBPF program can read and write from inside the kernel, and a user-space process can read and write through the `bpf()` syscall. Common map types include hash, array, LRU hash, per-CPU hash, ring buffer, and program array (used for tail calls). Each map has a maximum capacity declared at creation and a verifier-checked size for keys and values.
</Definition>

[The kernel hash-map documentation](https://docs.kernel.org/bpf/map_hash.html) distinguishes shared and per-CPU variants. The decision between them is one of the consequential design choices in writing real eBPF code.

| Map type | Cross-CPU semantics | Update cost | Memory cost | Best for |
| --- | --- | --- | --- | --- |
| `BPF_MAP_TYPE_HASH` | One value per key, shared across CPUs | Atomic `__sync_fetch_and_add` or `BPF_F_LOCK` spinlock | `max_entries * (key_size + value_size)` | State that must be globally consistent |
| `BPF_MAP_TYPE_PERCPU_HASH` | Separate value slot per CPU | Non-atomic read-modify-write | `max_entries * value_size * num_cpus` | Counters and histograms where rate matters and snapshot consistency does not |
| `BPF_MAP_TYPE_RINGBUF` | Single MPSC ring with global FIFO order | Reservation-spinlock on producer | Fixed buffer | Event streams whose user-space order must match cross-CPU producer order |

The per-CPU variant exists because cache-coherence cost on a contended hash slot dominates the time spent updating it; per-CPU maps remove that contention entirely at the price of cross-CPU consistency. A per-CPU counter on a 96-vCPU host occupies `96 * value_size` bytes per key, but updates are local loads and stores. A shared counter on the same host is `value_size` bytes per key, but every increment is an atomic.

<Definition term="BPF Ring Buffer (ringbuf)">
A multi-producer single-consumer kernel-to-user transport added in Linux 5.8 and documented at [`docs.kernel.org/bpf/ringbuf.html`](https://docs.kernel.org/bpf/ringbuf.html). Unlike the legacy `perf_event_array` (one ring per CPU), the BPF ringbuf is a single ring shared across all CPUs, with cross-CPU producer ordering preserved in the user-visible record stream.
</Definition>

[The ringbuf documentation](https://docs.kernel.org/bpf/ringbuf.html) is explicit about why the design exists: *"more efficient memory utilization by sharing ring buffer across CPUs; preserving ordering of events that happen sequentially in time, even across multiple CPUs (e.g., fork/exec/exit events for a task)."* A security telemetry consumer that needs to see `fork` on CPU 0 before `kill` on CPU 1 cannot use a per-CPU ring; it needs a single MPSC ring. The trade-off is real: the producer pays a brief spinlock for slot reservation, where a per-CPU ring would pay nothing. For event streams the trade is worth it; for histograms it is not.

### The aggregation pattern

The reason eBPF is strictly more powerful than ETW is captured in one bpftrace one-liner. The DSL [`bpftrace`](https://github.com/iovisor/bpftrace) -- inspired explicitly by DTrace -- compiles a single-line query into a verified eBPF program:

```bpftrace
kprobe:vfs_read { @[comm] = hist(arg2); }
```

This program attaches to the `vfs_read` kernel function. For every call, it indexes a per-CPU map by the calling process's name (`comm`), buckets the `arg2` value (the read length) into a power-of-two histogram, and increments the bucket. Nothing crosses the kernel-user boundary while `vfs_read` is firing -- not at 10K calls per second, not at 10M. When the user hits Ctrl-C, bpftrace iterates the per-CPU maps from user space, merges the buckets across CPUs, and prints a histogram.

ETW cannot do this. To produce the same histogram with ETW, a consumer would have to subscribe to every `vfs_read`-equivalent kernel event, receive each one in user mode, compute its bucket, and update an in-process histogram. The kernel-user wire would carry the full firehose. eBPF carries only the final histogram.

<RunnableCode lang="js" title="The bpftrace histogram pattern in pseudocode">{`
// The bpftrace one-liner:
//   kprobe:vfs_read { @[comm] = hist(arg2); }
// lowers (conceptually) to this kernel-side and user-side flow.

// --- inside the kernel, at every vfs_read call ---
function on_vfs_read(ctx) {
  const comm = bpf_get_current_comm();
  const len  = ctx.regs.rsi;                  // arg2: read length
  const bucket = log2(len);                   // 0..63

  // per-CPU hash keyed by (comm, bucket); no cross-CPU atomics.
  const key = { comm, bucket };
  const slot = percpu_map.lookup_or_init(key, 0);
  *slot += 1;
}

// --- in user space, on Ctrl-C ---
function print_histogram() {
  const merged = {};
  for (const cpu of all_cpus) {
    for (const [key, count] of percpu_map.iter(cpu)) {
      merged[key] = (merged[key] || 0) + count;
    }
  }
  render_power_of_two_histogram(merged);
}
`}</RunnableCode>

The kernel-side per-event cost is a few instructions plus a non-atomic increment. The user-space cost is paid once, at print time. The wire between kernel and user carries one batch read of the entire per-CPU map. ETW's equivalent would carry every single `vfs_read` event in full.

### The instruction-count and complexity limits

Two distinct limits constrain what the verifier will accept. The constants are easy to confuse, and earlier drafts of this article confused them. The correct distinction comes straight from the kernel headers.

`BPF_MAXINSNS` is defined as 4096 in `include/uapi/linux/bpf_common.h`. This is the maximum number of bytecode instructions per program for unprivileged callers. A program longer than 4096 instructions is rejected at load time regardless of what the verifier finds.

`BPF_COMPLEXITY_LIMIT_INSNS` is defined as 1,000,000 in `kernel/bpf/verifier.c`. This is the maximum number of *explored states* the verifier will visit during its symbolic execution. It applies to privileged callers with `CAP_BPF`, who are allowed to load larger programs but still bound the cost of verifying them.

<Sidenote>The two limits answer different questions. `BPF_MAXINSNS = 4096` bounds the *size* of an unprivileged program. `BPF_COMPLEXITY_LIMIT_INSNS = 1,000,000` bounds the *cost* of verification for privileged programs. Conflating them is a common error: production EDRs run with `CAP_BPF` plus `CAP_PERFMON` or root and load programs much longer than 4096 instructions, but the verifier's exploration is still bounded.</Sidenote>

[Linux 5.16 (March 2022)](https://www.kernel.org/doc/html/latest/bpf/index.html) made `kernel.unprivileged_bpf_disabled=1` the default.<Sidenote>The change followed a series of verifier soundness CVEs, including CVE-2020-8835 and CVE-2021-3490, that were exploitable from unprivileged user space. Production EDRs run with `CAP_BPF` plus `CAP_PERFMON` or full root; the unprivileged path is reserved for sandboxed workloads where the kernel team has weighed the risk.</Sidenote>

### The JIT and the trampoline

[Brendan Gregg's *BPF Performance Tools*](https://www.brendangregg.com/bpf-performance-tools-book.html), published by Addison-Wesley in 2019 ([ISBN-13 9780136554820](https://www.pearson.com/en-us/subject-catalog/p/bpf-performance-tools-linux-and-application-observability/P200000007897/9780136554820)), reports a 10x to 12x speedup of the JIT over the interpreter on x86-64. The number is qualitative -- the workload, the kernel version, and the program shape all matter -- but the order of magnitude is consistent across kernel docs and measurements. The JIT is what makes eBPF practically usable inside hot kernel paths.

A second performance refinement landed in 2019 with the BPF trampoline patch series. [Starovoitov's v1 cover letter](https://lore.kernel.org/bpf/20191102220025.2475981-1-ast@kernel.org/) introduced `fentry` and `fexit` -- BPF program attach points that use a tiny JIT-emitted dispatcher to call the attached programs directly, rather than relying on kprobe's `int3` trap mechanism. The framing is worth quoting:

<PullQuote>
"Unlike k[ret]probe there is practically zero overhead to call a set of BPF programs before or after kernel function." -- Alexei Starovoitov, [BPF trampoline cover letter](https://lore.kernel.org/bpf/20191102220025.2475981-1-ast@kernel.org/)
</PullQuote>

[The v3 patch in the same series](https://lore.kernel.org/bpf/20191108064039.2041889-4-ast@kernel.org/) explains the structural reason: *"To avoid the high cost of retpoline the attached BPF programs are called directly."* kprobe goes through an indirect-jump dispatch, which on Spectre-mitigated kernels pays a retpoline penalty per call. The BPF trampoline replaces the indirect jump with a direct call patched in at attach time, eliminating that penalty entirely. The qualitative result is "practically zero overhead" relative to the function call itself. The exact numbers vary; the architectural reason does not.

### Tail calls

`bpf_tail_call(ctx, &prog_array, index)` is a helper that, when the `prog_array` slot at `index` contains a loaded program, replaces the current program's execution context with the target program's. The architecture is documented in the [Cilium BPF architecture reference](https://docs.cilium.io/en/latest/reference-guides/bpf/architecture/), which describes the 33-call nesting ceiling: *"This, too, comes with an upper nesting limit of 33 calls, and is usually used to decouple parts of the program logic, for example, into stages."* The 33-call cap bounds the worst-case execution time of a chain that the verifier cannot symbolically follow (the destination is a runtime-resolved map slot, not a static call target). We will return to the security implications of tail calls in section 7.

> **Key idea:** eBPF inverts the observability model. ETW asks the kernel "what happened?" eBPF asks the kernel "compute this and tell me the answer." The asymmetry is the reason a histogram of `vfs_read` lengths costs nothing on the wire under eBPF, and costs a fully formatted event per call under ETW.

eBPF is strictly more powerful than ETW: programmable filter, programmable aggregation, hooks everywhere. But that power has a cost that does not exist in ETW at all. The verifier.

## 5. The Verifier: Where Mathematics Meets the Kernel

May 2023. NIST publishes [CVE-2023-2163](https://nvd.nist.gov/vuln/detail/CVE-2023-2163). The advisory describes the eBPF verifier in every Linux kernel since 5.4 quietly accepting programs it should have rejected: *"Incorrect verifier pruning in BPF in Linux Kernel >=5.4 leads to unsafe code paths being incorrectly marked as safe, resulting in arbitrary read/write in kernel memory, lateral privilege escalation, and container escape."* The fix was a small correction to a state-pruning heuristic. The lesson is bigger than the patch: *no in-kernel verifier for a Turing-complete instruction set can be simultaneously sound, complete, and decidable.* That is not a bug. It is a theorem.

### Rice's theorem in the kernel

Alan Turing proved in 1936 that the halting problem is undecidable: no algorithm can decide, for every possible program, whether that program halts on every input. Henry Gordon Rice extended the result in 1953: any *non-trivial semantic property* of a program -- including memory safety, type safety, and bounded resource use -- is undecidable for the general case. The verifier has to decide a non-trivial semantic property: *does this eBPF program access kernel memory only through valid pointers, with valid offsets, and terminate?*

It cannot. Not in general. The verifier has to give up at least one of three properties:

- *Soundness* -- never accept an unsafe program.
- *Completeness* -- never reject a safe program.
- *Scalability* -- run in polynomial time on real programs.

<Aside label="Why Rice's theorem applies here">
The halting problem is about a single property: termination. Rice's theorem generalizes the result to all non-trivial extensional properties -- any property that depends on what a program computes rather than how it is written. Memory safety on a Turing-complete instruction set is a non-trivial extensional property: there exist programs that are safe and programs that are unsafe. Rice's theorem says no decision procedure can correctly classify every program. Any real verifier must therefore be an *approximation* -- either it sometimes rejects safe programs (loss of completeness), sometimes accepts unsafe ones (loss of soundness), or runs out of resources on hard inputs (loss of scalability).
</Aside>

[Jia and colleagues at HotOS 2023](https://sigops.org/s/conferences/hotos/2023/papers/jia.pdf) formalized this trilemma for in-kernel verifiers. The paper's title is the thesis: *"Kernel Extension Verification Is Untenable."* The authors argue that any verifier for a kernel extension language with the expressiveness of eBPF must trade off at least one of the three properties, and that real verifiers ship by trading all three approximately.

<PullQuote>
"Kernel Extension Verification Is Untenable." -- Jia et al., HotOS 2023, [`sigops.org/s/conferences/hotos/2023/papers/jia.pdf`](https://sigops.org/s/conferences/hotos/2023/papers/jia.pdf)
</PullQuote>

<Mermaid caption="The soundness-completeness-scalability triangle: a verifier can be at most two of the three.">
flowchart TD
    A[Soundness — never accept — unsafe programs]
    B[Completeness — never reject — safe programs]
    C[Scalability — polynomial time — on real programs]
    A --- B
    B --- C
    C --- A
    X["No verifier can have — all three on a — Turing-complete ISA"]
    A -.-> X
    B -.-> X
    C -.-> X
</Mermaid>

The Linux verifier ships with all three approximately. PREVAIL, the verifier used by eBPF-for-Windows, ships with stronger soundness and weaker completeness. The two designs occupy different points on the triangle, and the difference shows up in production.

### The Linux verifier

[The kernel verifier documentation](https://docs.kernel.org/bpf/verifier.html) describes the algorithm:

> "The safety of the eBPF program is determined in two steps. First step does DAG check to disallow loops and other CFG validation. ... Second step starts from the first insn and descends all possible paths. It simulates execution of every insn and observes the state change of registers and stack."

The state the verifier tracks is a register-state lattice. Each register holds a type from a finite set: `PTR_TO_CTX` (a pointer to the program's context argument), `PTR_TO_MAP_VALUE` (a pointer into a map entry), `PTR_TO_MAP_VALUE_OR_NULL` (the return type of `bpf_map_lookup_elem`, which can be null), `SCALAR_VALUE` (an integer with min/max range), and so on. Each register also has a min/max range that tightens at every operation.

<Definition term="Verifier (eBPF)">
The kernel-side static analyzer that proves termination and memory safety of every eBPF program before load. The Linux verifier is documented at [`docs.kernel.org/bpf/verifier.html`](https://docs.kernel.org/bpf/verifier.html). It uses a register-state lattice plus min/max range tracking and explores all reachable program paths with state pruning to keep the cost manageable.
</Definition>

Consider the canonical pattern: look up a map value, check for null, dereference. Every eBPF tracing program does some version of this.

```c
struct value *v = bpf_map_lookup_elem(&map, &key);   // r0 := PTR_TO_MAP_VALUE_OR_NULL
if (!v) return 0;                                    // branch on r0 == 0
return v->field;                                     // deref r0 + offset(field)
```

The verifier traces both branches. On the taken branch (`r0 == 0`), the type stays nullable, and the program returns. On the not-taken branch, the verifier refines the type from `PTR_TO_MAP_VALUE_OR_NULL` to `PTR_TO_MAP_VALUE` -- the null qualifier is gone, the dereference is bounds-checked against the map's value size, and the program is accepted.

This refinement is exactly the thing that broke in CVE-2023-2163. The bug was not in the dereference logic; it was in the *state pruning* that keeps the verifier's exploration tractable. Once the verifier has visited a program point with a given abstract state, it prunes subsequent visits from different predecessors with "the same" state. CVE-2023-2163 was a case where the pruner's notion of "the same state" was *narrower* than the predecessor's true state. The verifier accepted a program in which a register's true type at a join point did not match the type the verifier had pruned against. The program ran with hidden type confusion. Kernel arbitrary read/write followed.

### PREVAIL, the abstract-interpretation verifier

[PREVAIL](https://github.com/vbpf/ebpf-verifier), published by [Gershuni and colleagues at PLDI 2019](https://vbpf.github.io/assets/prevail-paper.pdf), takes a structurally different approach. Where Linux's verifier is a heuristic abstract interpreter with a discrete type lattice, PREVAIL uses *numerical abstract interpretation* over the *zone domain* plus intervals.

<Definition term="Abstract Interpretation">
A general framework for static analysis, introduced by Patrick and Radhia Cousot in 1977. The analyzer computes over an *abstract domain* -- intervals, zones, polyhedra, octagons -- rather than concrete program states. A safe abstract operation must over-approximate every possible concrete behavior. The soundness of the analysis reduces to the soundness of the abstract domain operations, which can be proved once and reused.
</Definition>

In the zone domain, the abstract state can express *relational* constraints between registers and memory base addresses -- not just "register `r0` is in `[base, base + size)`" but "`r0 - map_base` is in `[0, value_size)`." That extra expressiveness is what lets PREVAIL prove pointer-arithmetic safety more directly than the Linux verifier's case enumeration. Walking the same null-check program:

| Program point | Linux verifier (register lattice) | PREVAIL (zone domain) |
| --- | --- | --- |
| After `bpf_map_lookup_elem` | `PTR_TO_MAP_VALUE_OR_NULL` | r0 in {0} U [base, base+sz) |
| Taken branch (r0 == 0) | refined to NULL | r0 = 0 (equality) |
| Not-taken branch | `PTR_TO_MAP_VALUE` (qualifier dropped) | r0 - base in [0, sz) |
| At deref `v->field` | bounds-checked deref | r0 - base in [off, off+access) |

Both verifiers accept the program. The difference is in the proof strategy. Linux's verifier reasons case-by-case over a finite lattice; PREVAIL reasons numerically over an abstract domain whose soundness is proved once and reused. The PREVAIL paper [(Gershuni et al., PLDI 2019)](https://vbpf.github.io/assets/prevail-paper.pdf) showed that the zone-domain approach is sound and runs in polynomial time per fixed abstract domain.

<Mermaid caption="Abstract states at each program point of the null-check pattern: Linux verifier vs PREVAIL.">
flowchart LR
    A["r0 := bpf_map_lookup_elem"]
    B&#123;"r0 == 0?"&#125;
    C["return 0"]
    D["return r0->field"]
    A --> B
    B -- yes --> C
    B -- no --> D
    A -. "Linux: PTR_TO_MAP_VALUE_OR_NULL — PREVAIL: r0 in &#123,0&#125, U [base, base+sz)" .-> A
    C -. "Linux: NULL — PREVAIL: r0 = 0" .-> C
    D -. "Linux: PTR_TO_MAP_VALUE — PREVAIL: r0 - base in [0, sz)" .-> D
</Mermaid>

The trade-off is concrete. PREVAIL accepts a broader class of programs the Linux verifier rejects (some bounded loops, some longer programs), and rejects others the Linux verifier accepts (Linux's heuristic pruning is more aggressive than zone-domain reasoning in some patterns). The contrast is a *trade*, not a strict ordering. Each verifier is sound with respect to its own abstract domain. The Linux verifier's CVE history is what happens when the domain itself is implemented heuristically rather than from a once-and-for-all soundness proof. The work of [Paul Chaignon](https://pchaigno.github.io/ebpf/2023/09/06/prevail-understanding-the-windows-ebpf-verifier.html) walks through the architectural differences in more detail.

### Four CVEs, one pattern

The Linux verifier has shipped four widely-disclosed soundness bugs, each one a case where the verifier accepted a program it should have rejected.

| CVE | Year | Subsystem at fault | Class |
| --- | --- | --- | --- |
| [CVE-2020-8835](https://nvd.nist.gov/vuln/detail/CVE-2020-8835) | 2020 | 32-bit register bounds tracking | Out-of-bounds read/write |
| [CVE-2021-3490](https://nvd.nist.gov/vuln/detail/CVE-2021-3490) | 2021 | ALU32 bitwise-op bounds tracking | Out-of-bounds R/W, arbitrary RCE |
| [CVE-2022-23222](https://nvd.nist.gov/vuln/detail/CVE-2022-23222) | 2022 | `*_OR_NULL` type-state tracking | Local privilege escalation via type confusion |
| [CVE-2023-2163](https://nvd.nist.gov/vuln/detail/CVE-2023-2163) | 2023 | Branch-pruning logic | Arbitrary kernel R/W |

The CVE-2020-8835 NVD entry describes a flaw where the verifier *"did not properly restrict the register bounds for 32-bit operations, leading to out-of-bounds reads and writes in kernel memory."* CVE-2021-3490, also reported on the NVD, identifies the same class of bug in the bitwise-operation paths. The CVE-2022-23222 record is tracked across the [SUSE bug](https://bugzilla.suse.com/show_bug.cgi?id=1194765), [Debian DSA-5050](https://www.debian.org/security/2022/dsa-5050), and the [openwall oss-security disclosure thread](https://www.openwall.com/lists/oss-security/2022/01/13/1).

> **Note:** All four CVEs are the same shape: the verifier's abstract state at some program point was *narrower* than the program's true reachable state, so the verifier proved a property that did not hold. Each fix tightened the abstract operation that introduced the narrowing -- range-tracking for the 2020 and 2021 bugs, type-state for 2022, branch pruning for 2023. None of the fixes were "fix the runtime"; they were all "fix the static analysis." That is exactly the shape Rice's theorem predicts: a heuristic abstract interpreter that occasionally drops information at a join point.

> **Key idea:** The verifier is a research-grade static analyzer running as kernel code. When it gets the abstract domain wrong, the safety guarantee is a CVE. ETW does not have this failure mode because ETW does not run user-supplied code in the kernel.

ETW has driver signing as its safety mechanism. eBPF has the verifier. Microsoft's eBPF-for-Windows project asked an interesting question: *what if you want both?*

## 6. eBPF for Windows: The Convergence

On May 10, 2021, Dave Thaler of Microsoft published a blog post announcing a new project. The opening line is the kind of announcement that sounds modest and is not:

<PullQuote>
"Today we are excited to announce a new Microsoft open source project to make eBPF work on Windows 10 and Windows Server 2016 and later." -- Dave Thaler, ["Making eBPF work on Windows"](https://cloudblogs.microsoft.com/opensource/2021/05/10/making-ebpf-work-on-windows/), Microsoft Open Source Blog, May 2021
</PullQuote>

The promise was a near-source-compatible eBPF surface on NT, so that programs and toolchains written for Linux eBPF -- libbpf, bpftool, BCC, clang `-target bpf` -- would work on Windows with minimal change. The architectural surprise, visible only once you read the design docs, is that the Linux design does not port directly. The Windows trust model is different. The Windows code-integrity story is different. The choices Microsoft made reveal which parts of eBPF *are* genuinely portable and which parts are deeply Linux-shaped.

### Three execution modes

The [`microsoft/ebpf-for-windows` README](https://github.com/microsoft/ebpf-for-windows) decomposes the runtime into three modes:

1. *Native eBPF program (preferred, HVCI-compatible).* PREVAIL verifies the bytecode in user mode. On success, the [`bpf2c`](https://github.com/microsoft/ebpf-for-windows/tree/main/tests/bpf2c_tests/expected) tool transliterates each verified BPF instruction to equivalent C, MSVC compiles the C, and the result is a signed `.sys` kernel driver. The signed driver is what gets loaded into the kernel.
2. *JIT compiler.* A user-mode service (`eBPFSvc.exe`) calls the [uBPF](https://github.com/iovisor/ubpf) JIT to produce x64 or ARM64 native code, loaded into the kernel-mode execution context. Disabled on HVCI hosts because dynamic code generation cannot be SiPolicy-signed.
3. *Interpreter.* uBPF's interpreter, debug-only.

The native mode is the architecturally interesting one. It treats eBPF bytecode as a *source language* for a signed-driver compile, not as a target for a kernel-mode JIT. The choice is forced by Windows' kernel-mode security model.

<Definition term="HVCI (Hypervisor-enforced Code Integrity)">
A Windows feature that uses the hypervisor to enforce that only signed code runs in kernel mode. With HVCI on, the kernel will refuse to execute any page that does not match a Code Integrity policy signature. Dynamic code generation -- the kind a JIT does -- is impossible on an HVCI host unless the JIT itself is privileged to bless the pages it produces.
</Definition>

### bpf2c: the literal transliterator

The thing that makes the native pipeline work is `bpf2c`. It takes verified eBPF bytecode and emits portable C that any modern compiler can build into a kernel driver. The transliteration is one bytecode instruction per C statement. [A concrete excerpt from `droppacket_raw.c`](https://raw.githubusercontent.com/microsoft/ebpf-for-windows/main/tests/bpf2c_tests/expected/droppacket_raw.c), the expected output for the XDP-class [`droppacket.c`](https://github.com/microsoft/ebpf-for-windows/blob/main/tests/sample/droppacket.c) sample, shows the shape:

<RunnableCode lang="ts" title="bpf2c output for droppacket.c (excerpt)">{`
// Excerpt from microsoft/ebpf-for-windows
//   tests/bpf2c_tests/expected/droppacket_raw.c
// One verified BPF instruction maps to one C statement.

#pragma code_seg(push, "xdp")
static uint64_t
DropPacket(void* context, const program_runtime_context_t* runtime_context)
{
  uint64_t stack[(UBPF_STACK_SIZE + 7) / 8];
  register uint64_t r0 = 0;
  register uint64_t r1 = 0;
  // ... r2 .. r6, r10 declarations ...

  // EBPF_OP_MOV64_REG pc=0 dst=r6 src=r1 offset=0 imm=0
  r6 = r1;
  // EBPF_OP_MOV64_IMM pc=1 dst=r1 src=r0 offset=0 imm=0
  r1 = IMMEDIATE(0);
  // EBPF_OP_STXDW pc=2 dst=r10 src=r1 offset=-8 imm=0
  WRITE_ONCE_64(r10, (uint64_t)r1, OFFSET(-8));

  // ... one C statement per verified BPF instruction ...

  r0 = runtime_context->helper_data[0].address(r1, r2, r3, r4, r5, context);
}
`}</RunnableCode>

<Definition term="bpf2c">
The eBPF-for-Windows transliterator from verified BPF bytecode to portable C suitable for MSVC compilation. The output is a signed-driver source file, one C statement per BPF instruction, that can be compiled and signed through the same pipeline as any other kernel driver. The golden test corpus lives at [`microsoft/ebpf-for-windows/tests/bpf2c_tests/expected`](https://github.com/microsoft/ebpf-for-windows/tree/main/tests/bpf2c_tests/expected).
</Definition>

Four things stand out in the excerpt. *One BPF instruction maps to one C statement*; the `// EBPF_OP_*` comments name the opcode, and the line below it is the equivalent C. The eBPF VM's eleven registers become eleven C `uint64_t` locals; MSVC's optimizer assigns them to native registers in the final `.sys`. The `#pragma code_seg(push, "xdp")` directive names the program section the same way `SEC("xdp")` does on Linux. And helper calls dispatch through a runtime table -- `runtime_context->helper_data[0].address(...)` -- so the signed driver remains portable across helper-ABI changes.

The result is a kernel module that is a signed driver in every Windows sense of the term: HVCI checks pass, [Kernel Mode Code Integrity (KMCI)](https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon) is satisfied, the Authenticode chain validates. eBPF-for-Windows native mode does not invent a new in-kernel trust boundary. It composes with the one Windows already has.

<Mermaid caption="eBPF-for-Windows native mode pipeline: source -> PREVAIL -> bpf2c -> MSVC -> signed .sys driver -> kernel.">
flowchart LR
    A["Restricted C source"]
    B["clang -target bpf"]
    C["BPF bytecode"]
    D["PREVAIL verifier — (user mode)"]
    E["bpf2c — transliterator"]
    F["Portable C"]
    G["MSVC compile"]
    H["Signed .sys driver"]
    I["Windows kernel — (HVCI / KMCI)"]
    A --> B --> C --> D --> E --> F --> G --> H --> I
</Mermaid>

### The verifier moved

The most consequential architectural choice in eBPF-for-Windows is not visible in the binary. PREVAIL does not run inside the kernel. It runs inside the user-mode `eBPFSvc.exe` service, which orchestrates verification and the subsequent compile-and-sign pipeline. The kernel never sees an unverified BPF program. By the time anything enters the kernel, it is either a signed driver (native mode) or a JIT-produced buffer that has already passed verification in user space (JIT mode, on non-HVCI hosts).

This is a deliberate divergence from Linux. Linux runs its verifier inside the kernel because the kernel is the only place that can prevent unprivileged user space from loading unsafe programs. Windows can move the verifier out of the kernel because the kernel-mode trust boundary -- *the thing that can run* -- is already protected by code signing. The verifier becomes a *correctness* check rather than a *safety* check at the kernel boundary; safety at the boundary is enforced by HVCI.

### Hook coverage as of 2026

The hook surface on Windows is narrower than Linux's. As of 2026, eBPF-for-Windows exposes XDP-class network hooks, BIND, SOCK_OPS, SOCK_ADDR, and process-creation and process-exit hooks via Windows Filtering Platform callouts plus a process hook surface. There is no full kprobe surface. There are no LSM-equivalent hooks. [The project README](https://github.com/microsoft/ebpf-for-windows) labels itself "work-in-progress." The networking-subset claim in this article is not marketing softening; it is the actual hook list.

<Aside label="Why the verifier is the portability boundary">
The naive model of cross-OS eBPF says: same bytecode runtime, runs on both kernels. The actual model is more subtle and more interesting.

The bytecode is portable because both verifiers accept the same instruction encoding, now standardized at IETF as [RFC 9669](https://www.rfc-editor.org/rfc/rfc9669.html). The verifier is portable because PREVAIL is an abstract interpreter that does not depend on Linux-specific kernel data structures. The *runtime* is not portable: Linux runs verified bytecode through its in-kernel JIT; Windows transliterates verified bytecode to C and compiles it into a signed driver.

So the cross-platform abstraction is the verifier, not the runtime. PREVAIL is the contract; each OS lifts verified bytecode into its own trust model. Linux trusts the verifier's output enough to JIT it in kernel mode; Windows distrusts in-kernel dynamic code by policy and lifts the verified bytecode out through a signed-driver compile. The portability boundary moved from "same VM" to "same static analysis," and that is the architectural insight that makes the project work.
</Aside>

> **Key idea:** The runtime is not the cross-platform abstraction. The verifier is. PREVAIL is the contract; each OS lifts verified bytecode into its own trust model -- in-kernel JIT on Linux, signed-driver compile on Windows. eBPF-for-Windows is not "same kernel hook, different OS"; it is "same bytecode contract, different OS-specific lifting."

Cross-OS eBPF works for the networking subset today. The general kernel observability case -- arbitrary kprobes, full LSM hooks, deep process introspection -- is still Linux-only because the *hooks themselves* are Linux-internal. eBPF-for-Windows is a real convergence, but it is a *subset* convergence. Section 7 zooms out and compares the two designs across the full set of dimensions practitioners actually use to choose.

## 7. Head-to-Head: Performance and Trust Models

Two designs. One emits, one computes. Practitioners need to know what each one costs, where each one's edges cut, and what attack classes each design enables. The right form for that comparison is a table.

| Dimension | ETW | Linux eBPF | eBPF for Windows | DTrace |
| --- | --- | --- | --- | --- |
| In-kernel filter language | None (level + keyword mask only) | Verified bytecode | Verified bytecode | D scripting language |
| In-kernel aggregation | None | Maps (per-CPU and shared) | Maps | Aggregations primitive |
| Producer per-event cost | Constant: format + memcpy to per-CPU buffer | JIT-compiled native code at hook | JIT or signed-driver call at hook | Probe handler call |
| Verifier | Driver signing only | Linux in-kernel heuristic verifier | PREVAIL in user mode + KMCI | None (D is interpreted, safe-by-construction) |
| Verifier soundness incidents | Not applicable | 4 widely-disclosed CVEs (2020-2023) | None disclosed | None |
| Hook coverage | Universal across Windows API surface | Universal: kprobe, uprobe, tracepoint, XDP, TC, LSM, sched | XDP, BIND, SOCK_OPS, SOCK_ADDR, process | Solaris/BSD/macOS provider set |
| Cross-platform | Windows only | Linux only | Source-compatible with Linux subset | Solaris, FreeBSD, macOS (legacy) |
| Transport | Per-CPU ring buffer, .etl files | Ringbuf, perf_event_array, maps | Ringbuf, maps | Per-CPU buffers |
| Trust model | Manifest registration + driver signing | Verifier + CAP_BPF + CAP_PERFMON | Verifier + HVCI + driver signing | Privilege check + safe-by-construction |
| Adoption pattern | Defender, Sysmon, CrowdStrike, SentinelOne, Carbon Black | Cilium, Falco, Tetragon, Tracee, Pixie, Sysmon for Linux | Pre-production; Azure test deployments | Solaris/macOS legacy + bpftrace via inspiration |
| Best suited for | Forensic capture across the entire Windows API surface | Hot-path filtering and aggregation with arbitrary kernel hooks | Cross-platform networking observability | Interactive debugging on Solaris-lineage systems |

### The asymptotic argument

Two designs can be compared asymptotically. ETW carries N events of average size S; the kernel-to-user wire cost is Omega(NS) -- the unavoidable lower bound for streaming N events. eBPF can reduce that to O(M) where M is the aggregation size, for workloads that aggregate before the events cross the boundary. The bpftrace histogram from section 4 is the concrete example: `vfs_read` can fire ten million times per second while the user-side bandwidth is zero, because the per-CPU histogram never crosses the boundary until print time.

The asymmetry is the entire reason eBPF makes sense for high-frequency telemetry. It is also the reason every cloud-native observability tool from 2018 onward is on eBPF. When the producer rate exceeds the user-space consumption rate, you do not have a choice: you either drop events or aggregate them in-kernel. ETW can drop. Only eBPF can aggregate.

### The tail-call attack class

`bpf_tail_call(ctx, &prog_array, index)` is powerful and its power has structural consequences. From [the BPF trampoline v3 cover letter](https://lore.kernel.org/bpf/20191108064039.2041889-1-ast@kernel.org/), the kernel team is explicit that the trampoline was designed in part as a *replacement* for tail-call-based chaining: *"In many cases it can be used as a replacement for bpf_tail_call-based program chaining."* The motivation is structural -- there are three attack classes implicit in the tail-call mechanism, and the trampoline avoids them.

*Branch-target injection on the tail-call dispatcher.* Pre-mitigation kernels exposed an indirect branch from kernel mode -- the dispatcher selecting its target from a user-controllable `prog_array` index. That is exactly the shape of a Spectre-v2 gadget. Mitigation: retpolined dispatcher and the BPF trampoline replacement that avoids the indirect branch entirely.<Sidenote>The qualitative reason fentry beats kprobe is not a benchmark; it is the avoidance of a retpoline. The v3 patch cover letter spells this out: *"To avoid the high cost of retpoline the attached BPF programs are called directly."* Real numbers vary by microarchitecture, retpoline implementation, and the rest of the kernel-build configuration, but the structural reason is the same on every machine.</Sidenote>

*Recursion-bound bypass.* The 33-call cap protects the verifier's termination proof for a single program from being bypassed by chaining, but it is a per-execution counter. A sequence of attached programs at different attach points can still produce arbitrary aggregate work. The mitigation lives in per-event scheduling, not in the verifier.

*Speculative type confusion.* The verifier proves a single program's register-type invariants. The target of a tail call is selected at runtime from a map, so speculative execution can execute a different program under the calling program's type-state. Mitigation: indirect-call hardening shared with the rest of the kernel.

<Mermaid caption="Tail-call dispatcher as a Spectre-v2 gadget. Mitigation: BPF trampoline plus indirect-call hardening.">
flowchart LR
    A["Calling BPF program"]
    B["bpf_tail_call(ctx, &arr, idx)"]
    C["JIT dispatcher — (indirect jump)"]
    D&#123;"Map slot at idx"&#125;
    E["Target BPF program"]
    F["Speculative path — (wrong target)"]
    G["Retpoline / BPF trampoline — (direct call)"]
    A --> B --> C --> D
    D -- correct --> E
    D -. speculative .-> F
    G -. mitigation .-> C
</Mermaid>

### The ETW user-mode bypass

ETW has its own structural attack class, mentioned in section 3 and worth restating in the trust-model context. A process that wants to silence its own ETW emissions can patch `ntdll!EtwEventWrite` to a `ret` instruction in its own address space. The kernel buffer never sees the event. EDR vendors monitor for this integrity violation out of band, and use the patch itself as a high-confidence detection signal.

> **Note:** ETW's emission path runs in the calling process's own address space. A process that wants to hide its activity can patch the `ntdll!EtwEventWrite` thunk to `ret`, silencing emissions before they reach the kernel buffer. EDR vendors monitor for this integrity violation out of band, and treat the patch as a detection in its own right. The deeper question is whether any user-mode emission primitive can be tamper-resistant under hostile user-mode code. The current answer is "no": the mitigation has been to move the trust boundary into the kernel, via PPL, the kernel-only Threat-Intelligence provider, and (on Linux) LSM hooks that observe `mprotect` and image-load operations directly.

### Trust models, side by side

ETW trusts manifest registration plus Code Integrity for kernel drivers. The kernel only emits events; the only adversary-controllable surface is the user-mode provider, and the integrity-violation tell catches the obvious attack.

Linux eBPF trusts the verifier plus `CAP_BPF` and `CAP_PERFMON`. The verifier is the kernel-mode safety boundary; capabilities gate who can load programs at all. Both have been the source of soundness CVEs and exploitation paths. Defense in depth: unprivileged eBPF off by default since 5.16, hardening of the indirect-call dispatcher, ongoing verifier work.

eBPF for Windows trusts PREVAIL plus HVCI driver signing. The verifier runs in user mode; the kernel only ever sees a signed driver or a JIT-emitted buffer that has already passed the verifier. The composition is *strictly more conservative* than Linux eBPF, because it stacks the verifier on top of the signing model rather than replacing it. Microsoft is using the Windows kernel-mode trust mechanism *and* adding the eBPF verifier to it, not choosing between them.

The next layer up from the kernel substrate is the consumer layer -- the agents and SIEM pipelines practitioners actually ship. That production stack is what determines which substrate practitioners reach for first.

## 8. Production Adoption: The Agent Layer

The substrate matters because the consumer stack does. On Linux, eBPF is the foundation of every serious cloud-native security and observability project. On Windows, ETW is the same. The portable subset is small but real, and it is growing.

### The Linux side

[Cilium](https://cilium.io/) is the dominant eBPF-based networking project, [CNCF-graduated](https://falco.org/docs/) and shipping Kubernetes cluster networking, NetworkPolicy enforcement, and a service mesh implementation. [Falco](https://falco.org/), originally created by Sysdig and now CNCF-graduated, provides eBPF-based runtime threat detection driven by a rules engine. [Tetragon](https://tetragon.io/docs/overview/), a Cilium subproject, attaches eBPF programs to kprobes and LSM hooks for in-kernel enforcement -- not just observation but the ability to block. [Tracee](https://github.com/aquasecurity/tracee) from Aqua Security is an eBPF runtime security tool. [Pixie](https://docs.px.dev/), originally Pixie Labs and now under New Relic, uses eBPF for auto-instrumentation of services running in Kubernetes.

[Sysmon for Linux](https://github.com/microsoft/SysmonForLinux) is the most architecturally interesting member of the list. Microsoft, the company that built ETW and Sysmon, ported Sysmon to Linux by replacing the ETW back end with eBPF kprobes via the `SysinternalsEBPF` library. The XML configuration schema and Event IDs are preserved, so SOC analysts see the same channel from either OS. It is the production demonstration that ETW and eBPF can be made surface-equivalent to a consumer.

### The Windows side

[Sysmon](https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon) is the canonical ETW consumer reference design, authored by Mark Russinovich and Thomas Garnier and free from Microsoft. [Microsoft Defender for Endpoint](https://learn.microsoft.com/en-us/defender-endpoint/) is the commercial Microsoft EDR product, ETW-driven and cloud-connected. CrowdStrike Falcon, SentinelOne, and Carbon Black are the major third-party EDRs, all built on ETW. [krabsetw](https://github.com/microsoft/krabsetw) is Microsoft's C++ ETW consumer library; the `Microsoft.Diagnostics.Tracing.TraceEvent` package is the .NET equivalent.

### The toolchain layer

The eBPF world comes with a toolchain that does not have a direct ETW counterpart. [`libbpf`](https://github.com/libbpf/libbpf) is the canonical C library for loading and managing eBPF programs. [`bpftool`](https://github.com/libbpf/bpftool) is the inspection utility. [`BCC`](https://github.com/iovisor/bcc) is the older Python-binding toolkit. [`bpftrace`](https://github.com/iovisor/bpftrace) is the DSL inspired by DTrace. [`cilium/ebpf`](https://github.com/cilium/ebpf) is the Go library; [`aya`](https://github.com/aya-rs/aya) and [`libbpf-rs`](https://github.com/libbpf/libbpf-rs) are the Rust libraries. The toolchain coverage tells you something about the substrate: a Go developer can write an eBPF program and have it loaded by their existing service binary, because the load-verify-attach lifecycle has a Go binding.

ETW has its own toolchain -- `tracerpt.exe`, Windows Performance Analyzer, BenchmarkDotNet, krabsetw -- but the toolchain is shaped around *consuming* events, not around emitting programs into the kernel. The asymmetry of the toolchains mirrors the asymmetry of the substrates.

### The decision guide

<Spoiler kind="solution" label="Which substrate should I pick first?">
**Windows EDR or building on Microsoft Defender for Endpoint.** Use ETW plus Sysmon plus the `Microsoft-Windows-Threat-Intelligence` provider. eBPF for Windows is not yet a substitute for Defender-grade kernel telemetry; the hook surface is too narrow.

**Linux runtime-security or cluster networking.** Use eBPF. Pick `libbpf` or `cilium/ebpf` for the language binding. Attach LSM hooks for enforcement; fentry for observability. The verifier will fight you; that is expected.

**Cross-platform networking observability with one source surface.** Use eBPF for Windows and Linux eBPF together, restricted to the XDP, SOCK_ADDR, SOCK_OPS, and BIND hooks. The Linux source compiles unchanged on Windows for this subset.

**Forensic capture across the full Windows API surface.** Use ETW into `.etl` files, analyzed in Windows Performance Analyzer. Nothing else covers that breadth on Windows.
</Spoiler>

> **Note:** The Sysmon-for-Linux case study is the cleanest practical justification for the abstract-surface convergence. If your SIEM consumes Sysmon XML and matches on Event ID and field, you can run a fleet of Windows hosts on ETW and Linux hosts on eBPF and the SIEM will not know the difference. The substrate is invisible at the consumer's contract; what matters is that the contract is preserved across the back-end change. This is the production realization of the engineering pattern -- different mechanisms, identical schemas -- that the rest of the article has been describing in architectural terms.

The consumer stack has converged at the surface layer: XML configs, Event IDs, EDR vendor APIs. The substrate has not, and the open problems in the next section are what stands in the way.

## 9. Open Problems and the Frontier

What can we not do yet? Four open problems will shape the next five years of kernel observability.

### 9.1 Verifier-driven false rejection

Programs that PREVAIL and a human can both prove safe still get rejected by the Linux verifier, which returns the cryptic *"verifier complexity limit reached"* error. EDR vendors end up fighting the verifier rather than writing the program they want. The workarounds are real and ugly: `__attribute__((noinline))` annotations to force the compiler to emit function boundaries the verifier can prune around, explicit bound assertions that re-derive properties the compiler already knows, `bpf_loop()` to externalize loops the verifier cannot trace. The HotOS 2023 thesis is exactly that this is not a bug -- it is a property of any heuristic verifier under the soundness-completeness-scalability triangle. The completeness leg is the one the Linux verifier gives up first, every time.

The frontier here is twofold. On one side, the verifier is becoming more capable: bounded loops, `bpf_for_each_map_elem`, kfuncs, and the trampoline-based attach mechanisms have all expanded what the verifier can prove. On the other side, PREVAIL's polynomial-time abstract-interpretation approach represents an alternative architectural lineage. Neither approach removes the underlying undecidability. Both make the rejection threshold higher.

### 9.2 Cross-OS eBPF ABI

The eBPF Foundation's [RFC 9669](https://www.rfc-editor.org/rfc/rfc9669.html), published as an IETF Independent Submission in October 2024, standardized the *instruction set architecture* for BPF programs. The RFC describes the 64-bit ISA, the encoding of instructions, the memory model, and the verifier's basic obligations. It is the cleanest cross-OS contract eBPF has ever had.

What the RFC does *not* standardize: helpers, map types, and hook semantics. Those remain Linux-defined-in-practice. The eBPF-for-Windows helper set is a subset, with extensions for Windows-specific concepts. The FreeBSD and illumos ports have their own subsets. A single observability agent that runs everywhere needs more than a standardized ISA; it needs a standardized helper API and a standardized hook taxonomy. Today, EDR vendors writing cross-OS agents ship two distinct programs that share a build system and not much else.

> **Note:** RFC 9669 is the ISA standard. It defines what BPF bytecode looks like and what the verifier must check. It does not define which helpers a program can call, what the map types are, or what hooks the program can attach to. Those are the parts that vary between Linux, Windows, and the BSDs. Standardizing them is more of a committee problem than a research problem -- a meaningful subset is achievable; a full superset probably is not.

### 9.3 ETW evasion at the trust boundary

The user-mode `EtwEventWrite` patching attack class is roughly 2020-vintage but has not gone away. The kernel-emitted `Microsoft-Windows-Threat-Intelligence` provider is the current best mitigation: kernel signals cannot be patched from user mode, so an attacker who silences user-mode emissions still trips kernel-only signals on `mprotect`, image load, and remote thread creation.

The deeper structural question is whether any user-mode primitive can ever be tamper-resistant under hostile user-mode code. The short answer is no, which is why the answer keeps moving the trust boundary into the kernel -- through PPL, through LSM, through signed drivers. On Linux, the same pattern shows up: hostile-user-mode-resistant telemetry must run inside the kernel, which is why the LSM hooks are the part of the eBPF hook surface that matters most for EDR.

### 9.4 Hot-path overhead at scale

Production environments routinely run Falco, Cilium, and a vendor EDR on the same kernel, each attaching probes to the same hook. The marginal cost of an eBPF kprobe on a five-million-events-per-second syscall is not zero, and the cost compounds non-linearly when three different agents attach to the same hook with three different programs.

The current partial mitigations are real. `fentry`/`fexit` plus the BPF trampoline removed the per-attach trap-frame cost. `kprobe.multi`, added in Linux 5.18, lets a single program attach to multiple functions with one trampoline. BPF-link iteration lets one agent observe what another has attached. But none of these compose perfectly: three different vendors with three different agents end up with three different trampolines on the same function. The structural fix is *trampoline sharing*, and the implementation is attach-type-specific.<Sidenote>The multi-agent attach problem is the eBPF version of a familiar systems issue: when N independent consumers each install their own instrumentation at the same point, the cost is N times the cost of one. Linux has solved this once for kprobes (with `kprobe.multi`) and is solving it again for the BPF trampoline. Whether the same pattern can be made cheap for fentry attaches across LSM hooks is an open implementation question.</Sidenote>

The frontier of kernel observability is not "build a new substrate." It is "make the existing substrates compose under multi-tenant production load."

## 10. Two Generations

Return to the SOC analyst from section 1. The Sysmon Operational channel looks the same on both hosts. Now you know why -- and also why the similarity is a deliberate engineering choice rather than a coincidence.

ETW is mature, has full Windows coverage, is emission-only. It is a *catalog* of events. Every Windows subsystem registers a provider, every provider declares a manifest, every event has a stable schema. A consumer that knows the manifest knows what to expect. The trust boundary is the kernel-mode driver signing model. The cost is that aggregation, sampling, and filtering all happen in user space, after the event has crossed the boundary.

eBPF is programmable, has filter and aggregation in-kernel, has a verifier. It is a *language* for asking questions of the kernel, not a catalog of pre-defined answers. The trust boundary is the verifier, which is a research-grade static analyzer running as kernel code. Linux's verifier shipped four widely-disclosed soundness bugs in four years. PREVAIL trades that soundness leg for a more conservative completeness story. The trade-offs are not finished.

eBPF-for-Windows is the convergence experiment. The native mode -- PREVAIL plus `bpf2c` plus MSVC plus a signed `.sys` driver -- is the first cross-OS-portable kernel-observability primitive. As of 2026 it covers a networking subset of hooks, not the full Linux surface. That gap is not architectural; it is a list of hooks Microsoft has not yet exposed. The pattern is generalizable: cross-OS observability lives in the verifier, not in the runtime, and each OS lifts verified bytecode into its own trust model.

The generation gap is literal. ETW (2000) is an event bus. eBPF (2014) is a programmable kernel substrate. Both will still ship in 2035. Both will still be the right answer for some workloads. The interesting work for the next decade is in the convergence layer -- helper-API standardization, hook-point taxonomy alignment, verifier completeness -- and in the multi-tenant production engineering that makes ten different agents on one kernel cheaper than ten times one agent.

> **Key idea:** Kernel observability has matured from event emission to programmable kernel computation. That generation gap is why eBPF-for-Windows -- a small, work-in-progress project -- is one of the more architecturally significant operating-system-telemetry events of the last decade. The portable abstraction is not the runtime. It is the static analyzer.

<FAQ title="Frequently asked questions">

<FAQItem question="Is eBPF replacing ETW on Windows?">
No. As of 2026, [eBPF for Windows](https://github.com/microsoft/ebpf-for-windows) covers a networking-heavy subset of hooks -- XDP, BIND, SOCK_OPS, SOCK_ADDR, and process creation and exit -- and is not yet a substitute for Defender-grade kernel telemetry. ETW remains the canonical Windows observability substrate. The convergence between the two is real for the networking subset, and is the work-in-progress for the rest of the surface.
</FAQItem>

<FAQItem question="Why does Linux's eBPF verifier have soundness CVEs?">
Because it is a heuristic abstract interpreter on a Turing-complete ISA, and Rice's theorem says no such verifier can be simultaneously sound, complete, and decidable. Real verifiers ship with all three approximately, and the soundness leg fails first when state pruning loses information at a join point. [CVE-2023-2163](https://nvd.nist.gov/vuln/detail/CVE-2023-2163), [CVE-2022-23222](https://nvd.nist.gov/vuln/detail/CVE-2022-23222), [CVE-2021-3490](https://nvd.nist.gov/vuln/detail/CVE-2021-3490), and [CVE-2020-8835](https://nvd.nist.gov/vuln/detail/CVE-2020-8835) are all instances of that pattern.
</FAQItem>

<FAQItem question="Can I write one observability agent for Linux and Windows?">
For the networking subset (XDP, SOCK_ADDR, SOCK_OPS, BIND), yes -- [eBPF for Windows](https://github.com/microsoft/ebpf-for-windows) is source-compatible with Linux eBPF for those hooks. For arbitrary kprobes or LSM hooks, no -- those hooks are Linux-internal and eBPF for Windows does not expose equivalents. Cross-platform agents typically ship two binaries that share a build system.
</FAQItem>

<FAQItem question="Is unprivileged eBPF safe to leave enabled?">
Since [Linux 5.16 (March 2022)](https://www.kernel.org/doc/html/latest/bpf/index.html), `kernel.unprivileged_bpf_disabled=1` is the kernel default. Production EDRs run with `CAP_BPF` plus `CAP_PERFMON` or root. Leaving unprivileged eBPF enabled was the entry point for several verifier CVEs, so the conservative default is correct.
</FAQItem>

<FAQItem question="What's the difference between kprobe and fentry?">
A kprobe is a runtime breakpoint mechanism: the kernel patches a trap instruction at the target address, and the trap handler invokes the attached eBPF program. fentry uses [the BPF trampoline](https://lore.kernel.org/bpf/20191102220025.2475981-1-ast@kernel.org/) -- a small JIT-emitted dispatcher that calls attached BPF programs with a direct call, avoiding the retpoline penalty an indirect dispatch would pay on Spectre-mitigated kernels. Starovoitov's framing: *"practically zero overhead"* for fentry, relative to the kprobe trap-frame cost.
</FAQItem>

<FAQItem question="Does ETW have any programmable filter at all?">
No. ETW sessions filter by provider, keyword, and level. That is it. Any per-event computation -- counting, sampling, stack-trace folding, downsampling -- runs in user mode on the consumer side, after the event has crossed the kernel-user boundary. The lack of an in-kernel filter language is the structural reason eBPF can do things ETW cannot, like aggregate ten million `vfs_read` calls per second into a histogram without saturating the wire.
</FAQItem>

<FAQItem question="How does Sysmon for Linux work without ETW?">
[Sysmon for Linux](https://github.com/microsoft/SysmonForLinux) replaces the ETW back end with eBPF kprobes via Microsoft's `SysinternalsEBPF` library. The XML configuration schema, Event IDs, and Operational channel output are preserved, so a SIEM consumer sees identical telemetry from either OS. It is the production demonstration that ETW and eBPF can be made surface-equivalent to a consumer.
</FAQItem>

</FAQ>

<StudyGuide slug="ebpf-vs-etw-two-generations-of-kernel-observability" keyTerms={[
  { term: "ETW", definition: "Event Tracing for Windows. The Windows 2000-onward kernel-mediated event bus, with providers, sessions, consumers, and per-CPU ring buffers." },
  { term: "eBPF", definition: "Extended Berkeley Packet Filter. A safe, sandboxed kernel virtual machine introduced in Linux 3.18 (2014) that runs verified user-supplied bytecode at attached hook points." },
  { term: "Verifier", definition: "The kernel-side static analyzer that proves termination and memory safety of every eBPF program before load. The Linux verifier uses a heuristic register-state lattice; PREVAIL uses zone-domain abstract interpretation." },
  { term: "BPF Map", definition: "A kernel-managed key-value store accessible from inside an eBPF program and from user space. Types include hash, array, per-CPU hash, and ring buffer." },
  { term: "Ringbuf", definition: "The BPF ring buffer map type (Linux 5.8). A multi-producer single-consumer transport that preserves cross-CPU event ordering." },
  { term: "HVCI", definition: "Hypervisor-enforced Code Integrity. The Windows feature that uses the hypervisor to enforce kernel-mode code signing. Blocks dynamic kernel-mode code generation by default." },
  { term: "PREVAIL", definition: "The user-mode eBPF verifier used by eBPF for Windows. Based on numerical abstract interpretation over the zone domain plus intervals, with formal grounding in Gershuni et al. PLDI 2019." },
  { term: "bpf2c", definition: "The eBPF-for-Windows transliterator that emits portable C from verified BPF bytecode, one C statement per BPF instruction. The C is compiled by MSVC into a signed .sys driver." }
]} questions={[
  { q: "Why did performance counters fail for security telemetry?", a: "Three structural reasons: sampling-rate floor (counters aggregate at the consumer's query rate, hiding individual events), no event identity (a count tells you N happened, not which user did what), and no causal order (two counters sampled in sequence are not causally ordered with respect to the events they describe)." },
  { q: "What three properties does the soundness-completeness-scalability triangle say a verifier can't have all of?", a: "Soundness (never accept an unsafe program), completeness (never reject a safe program), and scalability (run in polynomial time on real programs). Rice's theorem implies no decision procedure for a non-trivial semantic property on a Turing-complete ISA can have all three. Real verifiers must trade off." },
  { q: "How does eBPF for Windows lift verified bytecode into the Windows kernel?", a: "In native mode, PREVAIL verifies the bytecode in user space. On success, the bpf2c tool transliterates each verified BPF instruction to one C statement, MSVC compiles the C to a signed .sys kernel driver, and the kernel loads the driver through the standard Authenticode / HVCI / KMCI signing pipeline." },
  { q: "Name two structural attack-class implications of bpf_tail_call.", a: "Branch-target injection on the tail-call dispatcher (an indirect jump from kernel mode selecting its target from a user-controllable map slot is a Spectre-v2 gadget) and speculative type confusion (the verifier proves a single program's register types, but a tail call's target is a runtime-resolved map slot, so speculative execution can run a different program under the wrong type-state)." }
]} />