# Control Flow Integrity on Windows: CFG, XFG, and the CET Shadow Stack

> Three generations of control-flow integrity on Windows -- the CFG bitmap (2014), the XFG prototype-hash (never fully shipped), and the Intel CET shadow stack (2020). Why each shipped, what each closes, and what the ~70% memory-safety statistic still leaves open.

*Published: 2026-05-12*
*Canonical: https://paragmali.com/blog/control-flow-integrity-on-windows-cfg-xfg-and-the-cet-shadow*
*License: CC BY 4.0 - https://creativecommons.org/licenses/by/4.0/*

---
<TLDR>
Windows ships three generations of control-flow integrity in 2026. **CFG** (Control Flow Guard, 2014) is a per-process bitmap of valid indirect-call targets, one or two bits per 16 bytes of address space. **XFG** (eXtended Flow Guard, announced 2019) refines CFG with a 64-bit per-prototype hash stored eight bytes before each function entry, but was never fully instrumented in shipping Windows and is now deprecated. **Intel CET** (Tiger Lake silicon, September 2, 2020) adds a CPU-managed shadow stack and an `ENDBR64`-based indirect branch tracker; Windows uses only the shadow-stack half. User-mode shadow stack is default-on for `/CETCOMPAT`-marked binaries on CET-capable hardware. Kernel-mode shadow stack is **off** by default on Windows 11 24H2 and Windows Server 2025, requires Virtualization-Based Security plus Hypervisor-enforced Code Integrity, and must be enabled explicitly. None of these mitigations close the data-only attack class identified by Hu and colleagues in 2016, and roughly 70% of CVEs Microsoft issued between 2006 and 2018 were memory-safety bugs the entire CFI stack cannot prevent.
</TLDR>

## 1. One Status Code, Two Processes

Open PowerShell on a Windows 11 24H2 machine and run `Get-ProcessMitigation -Name notepad.exe`. Run it again with `-Name msedge.exe`. Three rows will be different: `CFG.Enable`, `UserShadowStack.Enable`, and `UserShadowStack.StrictMode`. The same operating system, on the same hardware, is applying three different control-flow integrity contracts to two different processes. This article is the answer to *why*.

To make the question concrete, here is the failure mode each contract is trying to prevent. Crash a process with a deliberately corrupted indirect-call target and Windows reports `STATUS_STACK_BUFFER_OVERRUN` (0xC0000409). Dig into the fast-fail subcode and you find `FAST_FAIL_GUARD_ICALL_CHECK_FAILURE`, value `0x0A` in `winnt.h`. That is the canonical CFG fast-fail.

Trip a corrupted return address while user-mode Hardware-enforced Stack Protection is active and the same status code fires with a CET-specific subcode, this time raised by the CPU's `#CP` (Control Protection) exception rather than by a compiler-inserted check thunk [@ms-hsp-techcommunity].

<Sidenote>`FAST_FAIL_GUARD_ICALL_CHECK_FAILURE` is defined in the Windows SDK `winnt.h` header with value `0x0A`. The Microsoft Learn CFG primary documents the runtime check-thunk path that routes the fault [@ms-learn-cfg].</Sidenote>

<RunnableCode lang="js" title="A/B test: querying per-process CFI state">{`
// Emulates Get-ProcessMitigation -Name notepad.exe vs msedge.exe
// In real PowerShell this calls GetProcessMitigationPolicy() under the hood.

const policies = {
  'notepad.exe': {
    CFG:  { Enable: 'ON',  StrictMode: 'OFF' },
    USS:  { Enable: 'OFF', StrictMode: 'OFF', AuditEnable: 'OFF' }
  },
  'msedge.exe': {
    CFG:  { Enable: 'ON',  StrictMode: 'ON' },
    USS:  { Enable: 'ON',  StrictMode: 'ON',  AuditEnable: 'OFF' }
  }
};

for (const [proc, p] of Object.entries(policies)) {
  console.log(proc);
  console.log('  CFG.Enable                  :', p.CFG.Enable);
  console.log('  UserShadowStack.Enable      :', p.USS.Enable);
  console.log('  UserShadowStack.StrictMode  :', p.USS.StrictMode);
}
`}</RunnableCode>

The three rows that differ are the three generations of Windows CFI. CFG arrived first, in November 2014 on Windows 8.1 Update 3. Five years later, in 2019, Microsoft announced its prototype-hash refinement called eXtended Flow Guard. Five years after that, in 2024, an academic measurement and a Black Hat retrospective confirmed XFG was never fully shipped. In between, in September 2020, Intel taped out the first commercial silicon with a hardware shadow stack, and Microsoft routed `#CP` faults from that silicon into the same `STATUS_STACK_BUFFER_OVERRUN` channel. Each generation closes a different attack class. The status code is the lens.

To understand why three generations exist, we have to start where the attacker did: in 1996, with a stack and a buffer.

<Mermaid caption="Three generations of Windows CFI on a 2026 release timeline">
timeline
    title Three generations of Windows control-flow integrity
    1996 : Aleph One Phrack 49-14 : Stack smashing tutorial
    1997 : Solar Designer BugTraq : Return-into-libc
    2004 : Windows XP SP2 ships DEP
    2005 : Abadi et al. name CFI
    2007 : Shacham CCS 2007 : ROP
    2011 : Bletsch et al. ASIACCS 2011 : JOP
    2014 : Windows 8.1 Update 3 ships CFG
    2015 : Schuster et al. IEEE S&P 2015 : COOP
    2016 : Hu et al. IEEE S&P 2016 : DOP
    2016 : Intel publishes CET spec
    2019 : Weston announces XFG at BlueHat Shanghai
    2020 : Tiger Lake ships CET silicon : AMD Zen 3 ships compatible shadow stack
    2024 : WOOT 2024 measures Windows CFI coverage
    2025 : McGarr BHUSA : XFG never fully instrumented
</Mermaid>

## 2. The Attack That Started Everything

Aleph One, the pseudonym of Elias Levy (BugTraq moderator and later CTO of SecurityFocus), sat down in November 1996 and wrote Phrack Magazine Volume 7, Issue 49, File 14 of 16: *Smashing The Stack For Fun And Profit* [@aleph1-1996-phrack]. The tutorial is meticulous. Levy walks the reader from C's stack-frame layout through `gets()` and `strcpy()` to a working shellcode payload that overflows a fixed-size automatic buffer, overwrites the saved return address, and redirects `ret` to attacker-supplied instructions in the same buffer.

It is the first widely-distributed step-by-step exposition of stack-buffer-overflow exploitation. Every Windows control-flow-integrity story has to recap it, because every later defense is a reaction to the bug class it demonstrated.

<Sidenote>"Aleph One" is the pen name of Elias Levy, BugTraq mailing-list moderator from 1996 to 2001 and later CTO of SecurityFocus. The pseudonym refers to the first transfinite cardinal in Cantor set theory [@wiki-elias-levy].</Sidenote>

The natural question the 1996 article raises is the one every defender after Levy had to answer. If overflowing a stack buffer can rewrite the return address, what stops the attacker from making `ret` point anywhere they want?

For a decade the answer was "almost nothing." Researchers prototyped stack-canary schemes (StackGuard, then `/GS` in Visual C++ 2002) and proposed compiler-rewriting defenses, but the fundamental shift waited for hardware [@ms-learn-gs].

On September 23, 2003, AMD shipped the Athlon 64 with the NX bit -- bit 63 of the AMD64 page-table entry, marketed under the label "Enhanced Virus Protection." Intel followed with the XD bit on Prescott-based Pentium 4 in 2004 [@wiki-nx-bit]. With per-page no-execute enforcement in silicon, an operating system could finally mark data pages as non-executable and refuse to dispatch a `jmp` into the stack. Windows XP Service Pack 2, on August 6, 2004, was the first mainstream OS to enable hardware-enforced Data Execution Prevention by default for system binaries on NX-capable CPUs [@wiki-xp-sp2].

DEP did exactly what it advertised. It also broke the attacker's model. If data pages cannot execute, no amount of clever shellcode injection helps -- the bytes in the buffer simply will not run. The next move belongs to Solar Designer.

On August 10, 1997 -- seven years before DEP shipped -- Alexander Peslyak, posting as Solar Designer on the BugTraq mailing list, published the first public exploit demonstrating *code reuse*: overflow the buffer, redirect `ret` not to attacker-supplied shellcode but to the entry of an existing libc function such as `system()`, and hand-craft the stack so the function's arguments come from attacker-controlled data [@solar-1997-bugtraq]. Solar Designer was prescient. In the same post he observed that this method "might sometimes be better than usual one (with shellcode) even if the stack is executable."

If the data pages cannot execute, reuse the code already there. That single sentence is the structural premise of every code-reuse attack from that day forward.

<Definition term="Control-Flow Integrity (CFI)">
A static safety property defined by Abadi, Budiu, Erlingsson and Ligatti in 2005: every indirect control-flow transfer at runtime must follow an edge in a precomputed static control-flow graph of the program. CFI as a contract makes no claim about *data* integrity -- only about the targets of `call`, `jmp` through a register, and `ret`. Modern Windows mitigations (CFG, XFG, CET) each implement part of this contract.
</Definition>

By the mid-2000s the structural answer was overdue. In November 2005, at the 12th ACM Conference on Computer and Communications Security, Martin Abadi, Mihai Budiu and Ulfar Erlingsson (all at Microsoft Research Silicon Valley) together with Jay Ligatti named the contract: *Control-Flow Integrity* [@abadi-2005-mr]. Their paper's abstract states the thesis plainly: "enforcement of a basic safety property, Control-Flow Integrity (CFI), can prevent such attacks from arbitrarily controlling program behavior."

Every indirect control-flow transfer at runtime must follow an edge in a precomputed static control-flow graph. The paper demonstrated a prototype binary rewriter that placed a unique ID-check label before every indirect-call target and inserted a label-comparison stub before every indirect call and return, refusing to dispatch unless the labels matched. Benchmarks reported 16% average overhead -- impractical for production, but the contract was now formal.

The contract was the easy part. Implementing it took Microsoft another nine years. In the interim, attackers built three generations of code reuse: ROP, JOP and COOP. We have to understand all three before we can read CFG's source listing.

## 3. The Mitigation Stack Before CFI

Between Aleph One's 1996 tutorial and the first Windows CFI shipment in 2014 lies an eighteen-year sequence of defenses that closed every direct attack path one by one. None of them protected indirect transfers. Together they make injecting attacker-controlled instructions uneconomic and force the attacker into code reuse. The attacker's choice in 2007 is no longer "inject" -- it is "reuse."

The pieces matter because each is referenced by name in the CFG documentation. Visual C++ 2002 introduced `/GS`, which inserts a random cookie between local buffers and the saved return address and validates it on function epilogue [@ms-learn-gs]. A contiguous overflow that overwrites the return address must also pass through the cookie, and the runtime check terminates the process before `ret` dispatches. Stack-cookie schemes do not stop the attacker who has a non-contiguous write primitive, but they raise the cost of the canonical exploit Aleph One described.

<MarginNote>Vista RTM was build 6000, shipped November 8, 2006 [@wiki-vista]. ASLR was opt-in via `/DYNAMICBASE` and required PE images to be re-linked; pre-Vista binaries continued to load at fixed bases until the developer recompiled.</MarginNote>

DEP, again on Windows XP SP2 in August 2004, paired the NX bit with a per-process OptIn / OptOut / AlwaysOn / AlwaysOff policy surface. ASLR followed two years later in Windows Vista RTM build 6000 on November 8, 2006: image-base, heap, stack and PEB/TEB locations randomised per boot or process, with binaries opting in via the `/DYNAMICBASE` linker flag [@ms-learn-aslr-vista]. The PaX project on Linux had pioneered the technique in July 2001, OpenBSD 3.4 shipped ASLR by default in 2003, and Linux mainline followed in 2005; Vista was the third major OS in the column [@wiki-aslr].

SafeSEH (a linker-side table of legal exception handlers, validated at SEH dispatch) and its runtime sibling SEHOP closed the SEH-overwrite technique David Litchfield formalised in his September 2003 NGSSoftware paper [@ms-learn-safeseh]. That class became canonical in the 2004-2008 browser and Office client-side exploit lineage, distinct from the return-address-overwrite worms (Code Red 2001 against IIS, Slammer 2003 against SQL Server 2000, Blaster 2003 and Sasser 2004 against RPC and LSASS) that motivated DEP and `/GS`.

By late 2007, every direct attack was closed. Stack canaries caught contiguous overflows on epilogue. DEP refused to dispatch into data pages. ASLR forced the attacker to leak a pointer before any hardcoded address would resolve. SafeSEH constrained the SEH chain.

The structural gap each left was the same: *indirect calls and indirect jumps remained unconstrained*. An attacker who could write a corrupted function pointer through any means -- type-confusion, use-after-free, integer-overflow-feeding-allocator -- could still redirect an indirect call to any legitimately-executable byte in the process. Hovav Shacham turned the question inside out. Instead of inventing new instructions, he used the ones already there.

## 4. Three Generations of CFI on Windows

### 4.1 Generation 1: Control Flow Guard

Shacham steps up to a CCS 2007 podium in Alexandria, Virginia, and demonstrates a Turing-complete instruction set discovered *inside* an unmodified libc binary -- `ret`-terminated gadgets at byte offsets the binary's author never intended [@shacham-2007-rop]. The audience now has a name for what is coming: *return-oriented programming*.

The paper, "The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)," constructs a complete shellcode -- load, store, arithmetic, logic, control flow, system calls -- entirely from these gadgets, without injecting a single byte of executable code. The CCS 2017 Test-of-Time Award acknowledges the paper's formative impact a decade later [@ccs-2017-awards]. Wikipedia summarises the canonical arc concisely: "With data execution prevention, an adversary cannot directly execute instructions written to a buffer ... To defeat this protection, a return-oriented programming attack does not inject malicious instructions, but rather uses instruction sequences already present in executable memory, called 'gadgets', by manipulating return addresses" [@wiki-rop].

<Definition term="Gadget">
A short, attacker-useful instruction sequence ending in a control-flow transfer (typically `ret` for ROP, an indirect `jmp` for JOP, or an indirect `call` through a vtable for COOP). The defining property of a gadget is that it exists *unintentionally* inside a legitimate executable image, at a byte offset the binary's author never planned for the CPU to start decoding. Variable-length x86 instructions make gadgets plentiful: any random `0xC3` byte in the middle of a function is a potential `ret`-terminated tail.
</Definition>

<Definition term="Forward edge and backward edge">
The forward edge of a control-flow graph is any indirect transfer whose target is determined at runtime: indirect `call`, indirect `jmp`, virtual dispatch through a vtable, function-pointer call. The backward edge is the `ret` instruction returning to a caller. CFI implementations frequently address only one edge: CFG and XFG check forward edges with a bitmap or a hash; the Intel CET shadow stack checks the backward edge by comparing the popped return address against a CPU-managed parallel copy [@wiki-cfi].
</Definition>

If gadgets are everywhere, how do you stop the attacker from calling them? Microsoft's answer arrived seven years later. On November 18, 2014, Windows 8.1 Update 3 (KB3000850) shipped Control Flow Guard [@ms-learn-cfg]. CFG is a *coarse-grained* forward-edge CFI scheme. The contract is a single equivalence class per process: every address-taken function in any module loaded into the process is a legal indirect-call target; every other byte is not.

The mechanism has four moving parts. First, the MSVC compiler invoked with `/guard:cf` enumerates every function whose address is taken anywhere in the module and emits a check thunk -- `__guard_check_icall_fptr` for "check then return to caller," `__guard_dispatch_icall_fptr` for "check then tail-call dispatch" -- at every indirect call site [@ms-learn-guard-flag]. Second, the linker emits a per-binary Function ID (FID) table inside `IMAGE_LOAD_CONFIG_DIRECTORY` in the PE file, listing the relative virtual addresses of every legal target.

Third, at image load time the Windows loader merges per-module FID tables into a process-wide bitmap (two bits per 16 bytes of code) backed by a kernel-managed, read-only mapping. Fourth, the check thunk indexes the bitmap by the target address; if the bit is clear, the thunk invokes `__fastfail(FAST_FAIL_GUARD_ICALL_CHECK_FAILURE)`, which raises `STATUS_STACK_BUFFER_OVERRUN` and terminates the process.

<Definition term="FID table (Function ID table)">
A PE-file structure inside `IMAGE_LOAD_CONFIG_DIRECTORY` that enumerates the relative virtual addresses of every address-taken function in the binary. The Windows loader's `LdrpProtectAndRelocateImage` routine merges every loaded module's FID table into a single process-wide bitmap. `dumpbin /loadconfig` displays the table's contents and the `CF Instrumented` and `FID table present` flags.
</Definition>

> **Note:** `/guard:cf` alone is a silent no-op. The Microsoft Learn primary states the contract bluntly: "The `/DYNAMICBASE` linker option is also required" [@ms-learn-guard-flag]. Without `/DYNAMICBASE`, the linker omits the FID table entirely; the binary loads, no error fires, and the resulting image is not CFG-protected. Every CFG-aware build of a Windows binary must pass both flags.

The bitmap layout is worth a moment. McGarr's Black Hat USA 2025 *Out of Control* deck documents the state machine in detail [@mcgarr-bhusa25-deck-primary]. Two bits per 16 bytes of address space encode four states: `(0,0)` means no valid target; `(1,0)` means a 16-byte-aligned valid target; `(1,1)` means a non-aligned valid target (function entry is not 16-byte aligned); and `(0,1)` is the suppressed-target marker the loader sets for entries the linker has decided are unsafe.

The arithmetic of the bitmap is what reshaped the Windows 8.1 address space.

<Sidenote>Alex Ionescu walked through the arithmetic in his 2014 writeup: a 128 TB user-mode virtual address space at 16-byte granularity is 8 TB of possible targets, and at 2 bits per slot that is a 2 TB bitmap. The memory manager paginates the bitmap sparsely, so resident commit is tiny, but the *reservation* could not coexist with the older 8 TB user-VA layout. CFG is the reason Windows 8.1 went from 8 TB to 128 TB user VA [@alex-ionescu-cfg].</Sidenote>

<Aside label="Why CFG required a 128 TB address space">
Before Windows 8.1 Update 3, the Windows user-mode virtual address space on x64 was 8 TB per process. CFG's bitmap reservation is sized to cover the entire user-mode address space at 16-byte granularity -- one byte of bitmap per 64 bytes of VA. On the new 128 TB layout that is a 2 TB bitmap; had Microsoft instead sized the bitmap for the legacy 8 TB layout it would have been only 128 GB, but they did not. Alex Ionescu walked through the consequence after the November 2014 ship: dropping the 2 TB bitmap sized for the new 128 TB layout into the legacy 8 TB user VA would have cut the usable address space by 25% per process (2 TB / 8 TB) and pushed per-process commit to roughly 4 GB. So the engineering decision Microsoft made was to grow the user VA to 128 TB on the way to landing CFG. The 2 TB bitmap is the largest single contiguous reservation any Windows process has ever made, and most of its bytes are never touched [@alex-ionescu-cfg].
</Aside>

The first independent reverse-engineering writeup came from Trend Micro in 2015: Jack Tang's *Exploring Control Flow Guard in Windows 10* walks the `ntdll.dll!LdrpCallInitRoutine` path, the per-module `__guard_check_icall_fptr` import resolution, the `MEMORY_BASIC_INFORMATION.Protect & PAGE_TARGETS_INVALID` flag, and the exact bitmap layout in full disassembly [@trend-micro-cfg]. The shape of the field is now a matter of public record.

<Mermaid caption="The CFG build pipeline: compiler enumerates address-taken functions, linker writes the FID table, loader merges into a process bitmap, check thunk consults the bitmap at every indirect call.">
flowchart LR
    A[MSVC compiler with /guard:cf] -->|emit check thunks| B[Object files with FID metadata]
    B --> C[Linker with /DYNAMICBASE]
    C -->|writes| D[FID table in IMAGE_LOAD_CONFIG_DIRECTORY]
    D --> E[PE binary on disk]
    E --> F[Windows loader at image load]
    F -->|LdrpProtectAndRelocateImage| G[Process-wide CFG bitmap]
    G --> H[__guard_check_icall_fptr at every indirect call]
    H -->|bit clear| I[STATUS_STACK_BUFFER_OVERRUN]
    H -->|bit set| J[Dispatch indirect call]
</Mermaid>

The two-bit-per-16-byte state machine deserves a table.

| Two-bit value | Meaning                                                                            |
| ------------- | ---------------------------------------------------------------------------------- |
| `(0, 0)`      | Address is not a legal indirect-call target. Check thunk fast-fails.               |
| `(1, 0)`      | Address is a legal target *and* is 16-byte aligned.                                |
| `(1, 1)`      | Address is a legal target but is *not* 16-byte aligned (function entry is misaligned). |
| `(0, 1)`      | Suppressed target: the linker marked the entry as deliberately invalid.            |

The Becker, Hollick and Classen *SoK* paper at USENIX WOOT 2024 measured CFG coverage on a Windows 11 Insider Preview developer build 23440 at 97.37% of x64 PE files (only 2.63% unprotected), and 99.09% on `C:\Windows\System32` (0.91% unprotected) [@woot24-becker-pdf]. The mitigation has reached near-universal coverage on the system surface; the gap is in third-party code that has not opted in.

CFG worked, and it broke, in precisely the way 2007-era CFI papers had predicted. The first major bypass came from the JIT side. On October 11, 2016, Microsoft Patch Tuesday MS16-119 shipped a cumulative update to Microsoft Edge. Theori's Frontier Squad followed with a December 13, 2016 writeup describing the bypass it closed [@theori-chakra-jit].

The Chakra JavaScript engine generated native code into a temporary writable buffer, then copied it to executable memory. While the code was in the temporary buffer, an adversary with a write primitive could rewrite the bytes the JIT was about to emit, smuggling attacker-chosen instructions into legitimately CFG-valid territory. JIT-emitted code is, by construction, registered as a valid CFG target through `SetProcessValidCallTargets` [@ms-learn-setprocessvalidcalltargets] -- there is no way to ship a working JavaScript runtime otherwise. CFG cannot tell intended JIT output from substituted JIT output.

<PullQuote>
"CFG didn't offer any granularity over the valid call targets. Any protected indirect call was allowed to call any valid call target. In large binaries, valid call targets could easily be in the thousands, giving attackers plenty of flexibility to bypass CFG by chaining valid C++ virtual functions."
-- Quarkslab, *How the MSVC compiler generates XFG function prototype hashes*
</PullQuote>

The deeper structural bypass arrived first, at IEEE Symposium on Security and Privacy 2015. Felix Schuster and his coauthors at Ruhr-University Bochum and TU Darmstadt published *Counterfeit Object-oriented Programming* -- the COOP attack [@schuster-2015-coop]. Their observation was structural. C++ virtual calls dispatch through vtables. Every vtable entry points at a function whose address has been taken. Every such function is therefore a valid CFG target by construction.

An attacker who can corrupt an object's vtable pointer can chain valid virtual calls and reach Turing-completeness without leaving the CFG-valid set. The CFG bitmap never fires. The check passes every time.

> **Key idea:** CFG asks the same question of every indirect call: is this target's address bit set in the process-wide bitmap? Every C++ virtual method is address-taken. Every vtable entry is in the bitmap. Schuster's COOP attack stays inside the legal set and reaches Turing-completeness without CFG ever firing. To close COOP, the check has to ask a harder question: does this function's signature match the call site's?

### 4.2 Generation 1.5: eXtended Flow Guard

Bletsch and colleagues at NC State and the National University of Singapore published *Jump-Oriented Programming: A New Class of Code-Reuse Attack* at ASIACCS 2011 in Hong Kong [@bletsch-2011-jop]. JOP replaces ROP's `ret`-terminated gadgets with *indirect-jump-terminated* gadgets dispatched by a separate dispatcher gadget, typically an indirect `jmp` that updates a virtual program counter held in a chosen register. The attack defeats any defense that single-targets `ret`.

Schuster's COOP, four years later, defeated CFG itself. By 2019 the natural defensive answer was overdue, and David Weston walked on stage at BlueHat Shanghai 2019 to announce Microsoft's response: *eXtended Flow Guard* [@dwizzle-presentations].

<Definition term="COOP (Counterfeit Object-Oriented Programming)">
A code-reuse attack class identified by Schuster, Tendyck, Liebchen, Davi, Sadeghi and Holz in IEEE S&P 2015 [@schuster-2015-coop]. COOP chains C++ virtual calls dispatched through legitimately-existing vtables. Because every vtable entry is the address of a virtual method whose address has been taken, every dispatch target is a valid CFG bitmap entry by construction. COOP reaches Turing-completeness without ever violating coarse-grained forward-edge CFI. The attack is the structural reason XFG was designed.
</Definition>

If COOP attacks survive CFG by staying inside the legal set, what makes a target's signature legal? XFG's answer is a 64-bit truncated SHA-1 hash of each function's prototype, computed at compile time and stored eight bytes before the function entry. The call site loads the expected hash into `r10`. Dispatch goes through `__guard_dispatch_icall_fptr_xfg`, which reads the eight bytes at `[rax - 8]` and compares them to `r10`. Mismatch raises `STATUS_STACK_BUFFER_OVERRUN`. Quarkslab's 2020 teardown documents the dispatch in detail [@quarkslab-xfg].

<Sidenote>Quarkslab's reverse-engineering shows the XFG dispatch thunk also ORs bit 0 of `r10` before the comparison, a feature that lets the loader downgrade XFG to plain CFG semantics for modules that did not opt into the hash check [@quarkslab-xfg].</Sidenote>

<Mermaid caption="XFG dispatch: the call site loads the expected prototype hash into r10, the thunk reads the actual hash stored eight bytes before the function entry, and the comparison decides whether to dispatch or fast-fail.">
sequenceDiagram
    participant CallSite as XFG-instrumented call site
    participant Thunk as __guard_dispatch_icall_fptr_xfg
    participant Target as Target function entry
    CallSite->>CallSite: load expected hash into r10
    CallSite->>Thunk: call thunk(rax = target)
    Thunk->>Target: read 8 bytes at [rax - 8]
    Thunk->>Thunk: compare with r10
    alt hashes match
        Thunk->>Target: dispatch indirect call
    else mismatch
        Thunk->>Thunk: STATUS_STACK_BUFFER_OVERRUN
    end
</Mermaid>

The toolchain is narrower than CFG's. The `/guard:xfg` flag shipped in Visual Studio 2019 Preview 16.5 [@mcgarr-examining-xfg]. Connor McGarr's 2020 *Examining XFG* writeup is the canonical practitioner-side reference, documenting the thunk, the hash placement, and the build contract. Upstream LLVM and Clang shipped no equivalent.

Critically, `/guard:xfg` is *not documented* on the Microsoft Learn `/guard` page, which lists only `/guard:cf` and `/guard:cf-` [@ms-learn-guard-flag]. That documentation absence is a leading indicator.

> **Note:** The `/guard:xfg` compiler flag is documented only on third-party reverse-engineering writeups, never on the canonical Microsoft Learn `/guard` page [@ms-learn-guard-flag]. Upstream LLVM and Clang have no equivalent XFG-instrumentation pass. The closest Linux peer is Sami Tolvanen's kCFI work, which shipped a 32-bit prototype hash in Linux 6.1 (December 2022) [@lwn-kcfi].

<Aside label="Why XFG is documented nowhere on Microsoft Learn">
The Microsoft Learn `/guard` flag page documents `/guard:cf` and the negative form `/guard:cf-` in full prose, including the `/DYNAMICBASE` requirement. It does not name `/guard:xfg`, and a search across Microsoft Learn turns up only product-blog entries and Visual Studio release notes, never canonical reference documentation for the flag's semantics. For a feature whose mechanics are public and whose tooling has shipped in MSVC since Visual Studio 2019 Preview 16.5, this is an unusual absence. The signal it sends to internal Microsoft developers and to ISV partners writing CFI-aware code is the same: XFG is not a product-graduated feature, and code should not rely on it. Quarkslab and McGarr have done the work Microsoft did not.
</Aside>

The WOOT 2024 paper is the empirical measurement. Becker, Hollick and Classen analysed the Windows 11 Insider Preview developer build 23440 and reported the numbers verbatim in Table 4: 85.73% of executables carry XFG instrumentation, 85.70% of DLLs, 97.04% of `C:\Windows\System32` DLLs, with a geometric-mean equivalence-class size of 1.37 [@woot24-becker-pdf] [@woot24-becker-abstract].

Translation: on Insider Preview builds, the OS-side coverage is high (effectively all of `System32`), but the 14% gap on executables outside the system directory is in third-party code that has not adopted `/guard:xfg`. The geometric-mean equivalence class of 1.37 means the hash narrows the legal target set dramatically -- a typical XFG-protected call site is followed by one or two prototype-matching candidates rather than the thousands an unrefined CFG bitmap would admit.

| File class                                     | CFG (unprotected) | XFG (coverage) | PA (coverage)  |
| ---------------------------------------------- | ----------------- | -------------- | -------------- |
| Executables (Windows 11 x64 Insider 23440)     | 2.68%             | 85.73%         | n/a            |
| DLLs (Windows 11 x64 Insider 23440)            | 2.62%             | 85.70%         | n/a            |
| `C:\Windows\System32` DLLs (x64)               | 0.91%             | 97.04%         | n/a            |
| Combined (Windows 11 x64 Insider 23440)        | 2.63%             | 85.70%         | n/a            |
| Windows 11 ARM64 Insider Preview build 23419   | n/a               | n/a            | 92%            |

Source: Becker et al., USENIX WOOT 2024, Table 4 and §5.3 [@woot24-becker-pdf].

<PullQuote>
"eXtended Control Flow Guard (XFG) was an attempt to address this. XFG was never fully instrumented (UM/KM) and is now deprecated."
-- Connor McGarr, *Out of Control*, Black Hat USA 2025 [@mcgarr-bhusa25-deck-primary]
</PullQuote>

The retrospective verdict came in August 2025. McGarr's Black Hat USA 2025 deck names XFG as "never fully instrumented (UM/KM) and is now deprecated" [@mcgarr-bhusa25-deck-primary].

The reason Microsoft de-prioritised XFG is not documented by Microsoft. The most defensible reading, consistent with the public timeline, is this: once Intel CET silicon arrived in September 2020, hardware CFI on the *backward* edge -- the territory software CFG and XFG never touched -- became the strategic priority. XFG was the right answer to COOP. It was also a software answer to a problem the silicon was about to absorb. By the time Tiger Lake taped out, Microsoft was already pivoting.

### 4.3 Generation 2: Intel CET, Shadow Stack and Indirect Branch Tracking

Intel published document 334525-001, *Control-Flow Enforcement Technology Specification*, Revision 1.0, in June 2016 -- four years before any silicon shipped [@wiki-shadow-stack]. The specification defines two independent components. SHSTK is the Shadow Stack, the backward-edge piece. IBT is Indirect Branch Tracking, the forward-edge piece. They are siblings, not parent and child. Tiger Lake (11th Gen Intel Core Mobile) shipped on September 2, 2020 as the first commercial silicon with both [@wiki-tiger-lake]. AMD Zen 3 (Ryzen 5000 "Vermeer" and Epyc 7003 "Milan") shipped a compatible implementation on November 5, 2020 [@wiki-zen-3]. The two-vendor consensus locked in.

<Definition term="Shadow Stack (SHSTK)">
A CPU-managed second stack of return addresses, write-protected by a CET-specific page-table bit. On `call`, the CPU pushes the return address onto both the regular stack and the shadow stack. On `ret`, it pops both, compares, and raises a `#CP` (Control Protection) exception on mismatch. Only the privileged instructions `WRSS` (CPL 0) and `WRUSS` (CPL 0 with user-class access) can legitimately mutate shadow-stack contents. Software shadow stacks predated CET (StackShield 1998, RAD 2001, SmashGuard 2006), but all of them stored the second stack at user privilege where an attacker with an arbitrary-write primitive could forge it. SHSTK is the first widely-deployed shadow stack with hardware-rooted integrity [@wiki-shadow-stack].
</Definition>

<Definition term="Indirect Branch Tracking (IBT)">
The forward-edge half of Intel CET. Every legal indirect-branch target must begin with `ENDBR64` (on x86-64) or `ENDBR32` (on x86). The CPU maintains a per-mode tracker state machine: an indirect call or indirect jump transitions the tracker out of `IDLE`, and the next instruction at the branch target must be `ENDBR64` to transition it back; any other instruction raises `#CP` [@felix-endbr64]. `ENDBR64` is a no-op for direct execution paths, so it is safe to sprinkle at the entry of every address-taken function. IBT first shipped in the Tiger Lake generation [@wiki-ibt]. As of May 2026, Windows enables only SHSTK; IBT is documented in the architecture but is not turned on by the OS [@mcgarr-bhusa25-deck-primary].
</Definition>

<Mermaid caption="The Intel CET umbrella: SHSTK protects the backward edge with a CPU-managed shadow stack; IBT protects the forward edge with ENDBR64 landing pads. Windows uses only SHSTK as of 2026.">
flowchart TD
    A[Intel CET] --> B[SHSTK<br/>Shadow Stack<br/>backward edge]
    A --> C[IBT<br/>Indirect Branch Tracking<br/>forward edge]
    B --> D[CPU pushes return address on call]
    B --> E[CPU compares on ret]
    B --> F[#CP fault on mismatch]
    C --> G[ENDBR64 required at indirect-branch target]
    C --> H[CPU tracker state machine]
    C --> I[#CP fault on non-ENDBR target]
    B -.-> J[Windows enforces this]
    C -.-> K[Windows does not enforce this]
</Mermaid>

The SHSTK mechanism is direct. On `call`, the CPU pushes the return address to both the regular stack and the shadow stack. On `ret`, it pops from both and compares. Mismatch raises `#CP` -- the Control Protection exception, vector 21.

The shadow stack lives on pages marked with a CET-specific page-table bit; an ordinary `mov` to those pages faults. Two privileged instructions are the only legitimate way to write to a shadow stack: `WRSS` requires CPL 0 (kernel mode), and `WRUSS` requires CPL 0 *with user-class access* [@felix-wrss] [@felix-wruss]. The instruction family rounds out with `INCSSP` for unwinding the shadow-stack pointer, `RDSSP` for reading it, and `SAVEPREVSSP` / `RSTORSSP` for context-switch primitives [@felix-incssp] [@felix-rdssp].

<Sidenote>The `WRUSS` privilege oddity is worth pausing on. The instruction can only execute when CPL is 0, but the processor treats its shadow-stack access as a *user-class* access for the purpose of page-permission checks: "The WRUSS instruction can be executed only if CPL = 0, however the processor treats its shadow-stack accesses as user accesses" [@felix-wruss]. That carve-out is what lets the kernel implement SEH unwinding and `longjmp` over a user shadow stack without violating the userspace memory model.</Sidenote>

Windows integration begins where the silicon ends. The Microsoft Tech Community post *Understanding Hardware-enforced Stack Protection*, published on March 24, 2020 (six months before Tiger Lake shipped), announced the plumbing [@ms-hsp-techcommunity]. The `#CP` fault is delivered to user mode as `STATUS_STACK_BUFFER_OVERRUN` -- the same status code CFG fast-fails use, with a CET-specific subcode that lets debuggers distinguish the two.

The `/CETCOMPAT` linker flag, available beginning in Visual Studio 2019 and exposed in the GUI in version 16.7, sets `IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT` in the PE header [@ms-learn-cetcompat]. The loader uses this bit to decide whether to enforce shadow-stack faults in *strict* mode (fatal on any binary) or *compatibility* mode (fatal only on `/CETCOMPAT`-marked modules).

The per-process policy lives in a ten-single-bit-field struct named `PROCESS_MITIGATION_USER_SHADOW_STACK_POLICY` [@ms-learn-shadow-stack-policy]. The fields are, in declared order: `EnableUserShadowStack`, `AuditUserShadowStack`, `SetContextIpValidation`, `AuditSetContextIpValidation`, `EnableUserShadowStackStrictMode`, `BlockNonCetBinaries`, `BlockNonCetBinariesNonEhcont`, `AuditBlockNonCetBinaries`, `CetDynamicApisOutOfProcOnly`, and `SetContextIpValidationRelaxedMode`, followed by `ReservedFlags : 22`.

The default state on Windows 11 24H2 on CET-capable hardware is `EnableUserShadowStack = TRUE` in *compatibility mode*, meaning the shadow stack is active for every process but the fault is fatal only when the unwinding instruction is in a `/CETCOMPAT`-marked module. Strict mode is opt-in.

| Policy bit                              | Role                                                                                            |
| --------------------------------------- | ----------------------------------------------------------------------------------------------- |
| `EnableUserShadowStack`                 | Master switch. TRUE enables HSP for the process in compatibility mode.                          |
| `AuditUserShadowStack`                  | Log shadow-stack violations rather than fast-failing. Used for canary builds.                   |
| `SetContextIpValidation`                | Closes the `SetThreadContext`-via-CET-bypass carve-out by validating the IP write.              |
| `AuditSetContextIpValidation`           | Audit-mode variant of the above.                                                                |
| `EnableUserShadowStackStrictMode`       | Fault is fatal in every module, not just `/CETCOMPAT`-marked ones.                              |
| `BlockNonCetBinaries`                   | Refuse to load any module without `IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT`.                     |
| `BlockNonCetBinariesNonEhcont`          | Same as above but exempts modules with EH continuation metadata.                                |
| `AuditBlockNonCetBinaries`              | Audit-mode variant of `BlockNonCetBinaries`.                                                    |
| `CetDynamicApisOutOfProcOnly`           | JIT shadow-stack APIs must be invoked from a different process.                                 |
| `SetContextIpValidationRelaxedMode`     | Loosens `SetContextIpValidation` for compatibility with older debuggers.                        |

Critically, McGarr's BHUSA 2025 deck states verbatim: "Windows only uses the Shadow Stack feature of CET" [@mcgarr-bhusa25-deck-primary]. IBT is documented in the CPU and required by GCC's `-fcf-protection=full` on Linux, but Windows turns it off. The forward edge on Windows in 2026 is still a software story.

Hardware closes the backward edge. But the forward edge is still in software, and the kernel-mode story is still off by default. Why?

## 5. Hardware-Enforced Backward-Edge Protection

The shadow stack is not a new idea. The Wikipedia *Shadow stack* article documents three software-shadow-stack ancestors before Intel CET [@wiki-shadow-stack]. StackShield shipped in 1998. Return Address Defender (RAD) followed in 2001. SmashGuard arrived in 2006. Each kept a parallel stack of return addresses and compared the popped value at `ret`. Each paid one of two costs: per-call overhead from the compare-and-branch check, or a second stack at *user* privilege where an attacker with an arbitrary-write primitive could overwrite the shadow copy along with the regular one.

<MarginNote>StackShield (1998), RAD (2001), SmashGuard (2006), LLVM `-fsanitize=shadow-call-stack`. Every software shadow stack before CET lived at user privilege; the cost of integrity was either runtime overhead or a register reservation an attacker could subvert.</MarginNote>

What does the CPU give you that the compiler cannot? Three things, in declining order of structural significance.

First, a page-table attribute the CPU itself enforces. Shadow-stack pages are marked SHSTK in the page table. A regular `mov` to those pages faults, no matter how clever the attacker's write primitive is.

The privileged-write surface is exactly two instructions, `WRSS` and `WRUSS`, and both are CPL-0-only. Compatibility for existing C and C++ unwind paths -- SEH on Windows, `setjmp`/`longjmp` in the C runtime, C++ exception unwinding -- routes through these privileged instructions, called by the kernel on behalf of user-mode code that needs to legitimately rewind the shadow stack. The shadow stack is, structurally, a piece of CPU state that user code cannot mutate at all.

<Aside label="The setjmp, longjmp, and SEH unwind problem">
A `longjmp` is a long jump: control transfers across multiple stack frames in a single instruction. The C runtime saves a `jmp_buf` containing the stack pointer, the instruction pointer, and the register file, and `longjmp` restores them. On a CET-equipped system, the regular stack pointer is restored normally but the shadow stack pointer must also rewind by the same number of frames. SEH unwinding poses the same problem: when a structured exception handler dispatches, the runtime walks the SEH chain and unwinds the stack one frame at a time. Both paths require legitimately popping multiple shadow-stack entries in a single sequence. Intel solved this with `INCSSP` for the trivial unwind case (advance the shadow-stack pointer by a count of frames) and with `WRUSS` for the harder case where the kernel needs to write specific values back onto a user shadow stack. The engineering work to make every existing unwind path CET-compatible occupied compiler teams and C-runtime maintainers for the better part of two years between 2018 and 2020 [@felix-incssp] [@felix-wruss].
</Aside>

Second, a single CPU-visible event at the moment of mismatch. The compare-and-branch sequence that software shadow stacks emit takes multiple instructions, each of which can be raced by a concurrent attacker thread that wins the window between the compare and the trap. The CET `ret` instruction performs the compare and raises `#CP` atomically; there is no user-visible instruction between the comparison and the fault. The CPU enforces the invariant; user code cannot race it.

Third, performance. Intel and Microsoft both characterise shadow-stack overhead as single-digit percent on typical workloads [@intel-cet-technical-look], with Microsoft's *Understanding Hardware-enforced Stack Protection* announcement describing the cost as negligible [@ms-hsp-techcommunity]. WOOT 2024 measures below 2% on production workloads and 3% to 8% on micro-benchmarks [@woot24-becker-pdf]. Software shadow stacks, by contrast, typically pay 5% to 10% on `call`-heavy workloads plus a memory cost the hardware version does not.

<Mermaid caption="The legitimate write paths to a shadow stack. Regular writes fault. WRSS (CPL 0) and WRUSS (CPL 0 with user-class access) are the only mutation paths. SEH unwinding and longjmp route through these instructions via kernel-mediated helpers.">
flowchart TD
    A[User-mode mov to SHSTK page] -->|page-table SHSTK bit| B[Faults]
    C[Compiler-emitted call/ret] -->|hardware push/pop| D[Shadow stack pointer updated]
    E[longjmp] --> F[INCSSP advances SSP]
    E --> G[Kernel may invoke WRUSS]
    H[SEH unwind] --> G
    G --> I[Shadow stack legitimately rewound]
    J[Kernel] -->|CPL 0 only| K[WRSS writes shadow stack]
</Mermaid>

The atomicity argument is the structural one. The performance is the marketing one. The page-table attribute is the security one. Together they explain why hardware backward-edge protection is a generational step on Windows rather than an incremental improvement on the shadow-stack lineage.

> **Key idea:** Shadow stack is the first time Windows has had a backward-edge story. Every prior Windows mitigation -- /GS, DEP, ASLR, SafeSEH, CFG, XFG -- treated `ret` either as something to guard a single frame around (the `/GS` cookie) or as something to ignore. The forward-edge story is still in software. The asymmetry matters.

So what is the state in 2026?

## 6. CFI on Windows in 2026

A snapshot of every CFI surface currently shipping. On a freshly-installed Windows 11 24H2 box, the operational picture stitches together cleanly into four layers.

### 6.1 User-mode Hardware-enforced Stack Protection

User-mode HSP is default-on for `/CETCOMPAT`-marked binaries on CET-capable hardware, announced by Microsoft in March 2020 [@ms-hsp-techcommunity]. Compatibility mode is the default; strict mode is opt-in via `EnableUserShadowStackStrictMode` [@ms-learn-shadow-stack-policy]. The minimum supported client is Windows 10 version 2004 (build 19041), which means every supported consumer Windows release of the last six years has the API surface. The `SetContextIpValidation` bit is the load-bearing addition; it closes the `SetThreadContext`-via-CET-bypass carve-out by validating that any IP write through `SetThreadContext` targets a CET-instrumented landing.

### 6.2 Kernel-mode Hardware-enforced Stack Protection

Kernel-mode HSP is **off** by default on Windows 11 24H2 and Windows Server 2025. The Microsoft Learn primary states the prerequisite list verbatim: "Windows 11 2022 update or newer; 11th Gen Intel Core Mobile processors and AMD Zen 3 Core (and newer); Virtualization-based security (VBS) and Hypervisor-enforced code integrity (HVCI) are enabled" [@ms-learn-kernel-hsp]. Activation is via Windows Security under Device Security and Core Isolation, or via Group Policy.

The [HVCI](/blog/wdac--hvci-code-integrity-at-every-layer-in-windows/) prerequisite is non-negotiable: kernel-mode HSP relies on the hypervisor to enforce the write-protected page-table bit on shadow-stack pages, because the same NT kernel an attacker would compromise is the one that would otherwise own those mappings.

> **Note:** The Microsoft Learn page for Kernel-mode Hardware-enforced Stack Protection states explicitly: "Kernel-mode Hardware-enforced Stack Protection is off by default, but customers can turn it on if the prerequisites are met" [@ms-learn-kernel-hsp]. This is a load-bearing correction to a common misconception. Even on hardware that supports CET, kernel ROP is *not* mitigated by default. The opt-in surface requires VBS plus HVCI plus an explicit user action.

<Mermaid caption="The kernel-mode HSP prerequisite chain: CPU support, Windows version, VBS, HVCI, and an explicit user opt-in. Any missing link silently disables the mitigation.">
flowchart TD
    A[11th Gen Intel Core Mobile or AMD Zen 3 or newer] --> B[Windows 11 2022 update or newer]
    B --> C[Virtualization-based Security enabled]
    C --> D[Hypervisor-enforced Code Integrity enabled]
    D --> E[User opt-in via Windows Security or Group Policy]
    E --> F[Kernel-mode HSP active]
    A -.->|missing| G[Silent no-op]
    C -.->|missing| G
    E -.->|missing| G
</Mermaid>

<Sidenote>Synacktiv's SSTIC 2025 paper, *Analyzing the Windows kernel shadow stack mitigation* by Remi Jullian and Alexandre Aulnette of Synacktiv's reverse-engineering team, is the canonical practitioner reference for the kernel-mode implementation [@synacktiv-sstic25]. The paper walks the hypervisor calls, the `KscpCfgDispatchUserCallTargetEs*` functions named in McGarr's BHUSA 2025 deck, and the bypass surfaces a researcher should look at first.</Sidenote>

### 6.3 Pointer Authentication on Windows-on-ARM

Windows on ARM ships ARMv8.3-A Pointer Authentication. The mechanism is different in detail from CET but parallel in role: a small cryptographic MAC over a 64-bit pointer, computed and stripped by dedicated instructions. McGarr's 2023 *Windows ARM64 Internals: Deconstructing Pointer Authentication* writeup is the practitioner reference [@mcgarr-windows-pac]. The exact quote from the post nails the scope: "Windows currently only uses PAC for 'instruction pointers' ... and it also it only uses 'key B' for cryptographic signatures and, therefore, loads the target pointer signing value into the `APIBKeyLo_EL1` and `APIBKeyHi_EL1` AArch64 system registers."

<Definition term="PAC (Pointer Authentication Code)">
An ARMv8.3-A feature in which 64-bit pointers carry a small cryptographic MAC in unused upper bits, generated and verified by dedicated `PACI*`, `AUTI*`, and `XPAC*` instructions. The Windows-on-ARM loader uses `PACIBSP` to sign the return address on function entry, `AUTIBSP` to verify it on exit, and `XPACLRI` to strip the MAC for debug-print paths. Windows uses key B (`APIBKeyLo_EL1`/`APIBKeyHi_EL1`) for instruction-pointer signing; the kernel-managed key is derived by `OslPrepareTarget` via `SymCryptRngAesGenerate` at boot [@mcgarr-windows-pac].
</Definition>

The `LOADER_PARAMETER_EXTENSION.PointerAuthKernelIpEnabled` bit controls activation; `PointerAuthKernelIpKey` holds the kernel-managed key. The instruction triple `PACIBSP` / `AUTIBSP` / `XPACLRI` is sprinkled at function entry, exit, and debug-print paths respectively. WOOT 2024 measured 92% PA file coverage on Windows 11 ARM64 Insider Preview developer build 23419 [@woot24-becker-pdf]. The structural answer to backward-edge integrity on ARM is therefore PAC, not a shadow stack -- and Windows-on-ARM gets that protection by default on Snapdragon X Elite and X Plus machines.

### 6.4 Coverage in production

The WOOT 2024 measurements summarise the operational picture cleanly. CFG coverage on Windows 11 Insider Preview developer build 23440 is 97.37% of x64 PE files, 99.09% on `System32`; XFG coverage is 85.7% on PE files, 97.0% on `System32`; PA coverage on the Windows 11 ARM64 Insider Preview developer build 23419 is 92% [@woot24-becker-pdf]. CET shadow-stack adoption tracks the `/CETCOMPAT` linker flag's penetration across the OS surface; on the system DLLs in 24H2 it is at or near total. Translation: on a modern Windows 11 system, control-flow protection is almost-everywhere in the OS, and opt-in on user applications.

Almost everything in Windows itself is protected. The third-party-app and JIT-runtime surfaces are not. And the question of what to do about COOP, now that XFG is deprecated, is genuinely open.

## 7. How Other Platforms Solve the Same Problem

Step outside Windows for a moment. What does Linux do? What does Apple do? What does Android do?

Linux's answer is kCFI. The `-fsanitize=cfi-icall` flag, originally an LLVM jump-table forward-edge CFI, shipped in Linux 5.13 in June 2021. The replacement design, `-fsanitize=kcfi`, shipped in Linux 6.1 in December 2022 [@lwn-kcfi]. The mechanism is a 32-bit prototype hash placed before each function entry, padded with `INT3` instructions to keep the hash bytes from becoming a useful gadget.

Jonathan Corbet's LWN writeup describes the design: "When code is compiled with -fsanitize=kcfi, the entry point to each function is preceded by a 32-bit value representing the prototype of that function. This value is (part of) a hash calculated from the C++ mangled name for the function and its arguments." kCFI is the design point XFG was peer to. It shipped, was documented, and remains supported.

<MarginNote>Sami Tolvanen of Google's Android kernel team is the patch-series author for Linux kCFI. His earlier `-fsanitize=cfi-icall` work in LLVM landed first.</MarginNote>

Apple's answer is PAC, deployed by default on every Apple Silicon Mac (since the M1 in November 2020) and on every iOS device since the A12 in 2018 [@apple-platform-security]. The hardened runtime plus the `com.apple.security.cs.allow-jit` entitlement is the declarative JIT story, because PAC interacts badly with code generation that wants to sign and verify its own pointers; Apple's solution was to require an explicit entitlement for any process that wants JIT capability and to enforce a separate W^X policy on JIT memory [@apple-dev-allow-jit].

Android's answer is ARMv8.5-A Memory Tagging Extension on Pixel 8 and later [@source-android-mte]. MTE is adjacent to CFI rather than within its design space: a tagged-allocator scheme that catches use-after-free and out-of-bounds memory accesses at hardware speed, before they corrupt a control-flow target in the first place. MTE complements PAC; it does not replace it.

| Platform                | Forward edge                                          | Backward edge                                  | Memory safety adjuncts                      |
| ----------------------- | ----------------------------------------------------- | ---------------------------------------------- | ------------------------------------------- |
| Windows 11 x86-64       | CFG (default); XFG (Insider, deprecated)              | CET Shadow Stack (default-on user mode)        | -- |
| Windows 11 ARM64        | -- (no forward-edge CFI documented; PAC is backward) | ARMv8.3 PAC, key B                             | -- |
| Linux mainline          | `-fsanitize=cfi-icall` (LTO jump tables) / kCFI hash   | LLVM software shadow-call-stack; CET on x86-64 | `-fcf-protection=full` (CET); MTE on ARM    |
| macOS / iOS             | --                                                    | ARMv8.3 PAC                                    | Hardened runtime; W^X JIT                   |
| Android (Pixel 8+)      | LLVM CFI                                              | ARMv8.3 PAC                                    | ARMv8.5 MTE (tagged allocator)              |
| CHERI / CHERIoT         | Capability-bound pointers (all edges)                 | Capability-bound return addresses              | 128-bit hardware capabilities               |

The capability-hardware future is CHERI -- Capability Hardware Enhanced RISC Instructions -- and its embedded sibling CHERIoT. The structural shift CHERI makes is to encode 128-bit hardware capabilities into the pointer itself: every pointer carries provenance, bounds, and permissions, all enforced by the CPU. A capability cannot be forged, narrowed beyond its grant, or reused after revocation. Pointer integrity is enforced at the silicon, not at the call site [@cheri-cambridge]. Microsoft Research's Project Snowflake explores the same design space [@msr-snowflake].

Three platforms, three answers. None is a complete answer. To understand why, we have to look at the bug class no CFI variant can close.

## 8. What CFI Cannot Close

Hong Hu and his coauthors at the National University of Singapore published *Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks* at IEEE Symposium on Security and Privacy in May 2016 [@hu-2016-dop]. The paper's abstract is the load-bearing observation: "In this paper we show that such attacks are Turing-complete. We present a systematic technique called data-oriented programming (DOP) to construct expressive non-control data exploits for arbitrary x86 programs. ... 8 out of 9 real-world programs have gadgets to simulate arbitrary computations and 2 of them are confirmed to be able to build Turing-complete attacks. All the attacks work in the presence of ASLR and DEP."

The structural point is what makes DOP devastating to the CFI design space. A DOP attack never violates the static control-flow graph. The attacker chains short non-control-data corruptions -- writes to variables, flags, configuration values, never to a code pointer -- and computes inside the program's legitimate control flow.

The CFI bitmap, the prototype hash, the shadow stack, the IBT tracker, the PAC MAC: none of them are designed to detect data writes. They are designed to detect control-flow transfers to illegal targets. A DOP exploit never goes to an illegal target. It stays on the legitimate path and rearranges what the program computes along the way.

<Definition term="Data-Oriented Programming (DOP)">
A code-reuse attack class identified by Hu, Shinde, Sendroiu, Chua, Saxena and Liang at IEEE S&P 2016. DOP chains short data-flow-stitching gadgets to compute arbitrary functions using only legitimate, in-CFG control flow. The exploits never violate the static control-flow graph. Every CFI variant -- CFG, XFG, IBT, SHSTK, PAC -- is structurally invisible to DOP because none of these mechanisms validate data writes; they only validate the targets of indirect transfers [@hu-2016-dop].
</Definition>

<Mermaid caption="What CFI closes and what it does not. CFI mitigations validate the targets of indirect transfers, so they close ROP, JOP and COOP. They do not validate data writes, so DOP, use-after-free, and arbitrary-write primitives survive untouched.">
flowchart TD
    A[Memory-safety bug] --> B[Control-flow hijacking]
    A --> C[Data-only attack]
    B --> D[ROP - closed by SHSTK]
    B --> E[JOP - closed by CFG/IBT/XFG]
    B --> F[COOP - closed by XFG and PAC]
    C --> G[DOP - not closed by any CFI]
    C --> H[Use-after-free - not closed by CFI]
    C --> I[Arbitrary-write primitive - not closed by CFI]
</Mermaid>

Even within the forward-edge attacks CFI does try to close, the precision is limited. The Burow, Carr, Nash, Larsen, Franz, Brunthaler and Payer survey at ACM Computing Surveys 2017 is the canonical reference on the precision dimension [@burow-2017-csur].

CFG admits the count of address-taken functions per binary -- thousands, on any non-trivial DLL. XFG narrows the equivalence class to the count of functions sharing a prototype hash. WOOT 2024 measured the geometric mean of XFG equivalence classes on Windows 11 Insider Preview at 1.37: a typical XFG-protected call site is followed by roughly one or two prototype-matching candidates [@woot24-becker-pdf].

PAC's equivalence class is the count of functions whose signed-with-key-B return addresses collide on the same MAC -- much smaller in practice, but still non-singleton. None of these mitigations achieve the single-target precision a fully type-aware fine-grained CFI would offer.

JIT and dynamic code constitute their own carve-out. Any platform with runtime code generation must mark JIT-emitted code as valid CFI territory through some API -- on Windows, `SetProcessValidCallTargets` is the surface, plus the `PAGE_TARGETS_INVALID` page-protection flag for memory that has not yet been marked. The Theori MS16-119 Chakra JIT bypass remains the canonical demonstration that JIT carve-outs are a structural CFI weakness, not an implementation bug [@theori-chakra-jit].

And then there is the structural ceiling. Matt Miller's BlueHat IL 2019 talk *Trends, challenges, and shifts in software vulnerability mitigation* contains the empirical floor: roughly 70% of CVEs Microsoft issued each year between 2006 and 2018 were memory-safety bugs, and the share has been stable across a window that includes the introduction of `/GS`, SafeSEH, DEP, ASLR, CFG, ACG, CIG, and CET [@matt-miller-bluehat-il-2019].

The Becker et al. WOOT 2024 §1 statement corroborates from the academic side: "Memory safety vulnerabilities make up two thirds of security issues in large code bases across the industry" [@woot24-becker-pdf]. Note the careful framing: this is the *bug class* statistic, not the *exploit class*. CFI closes a *subclass* of memory-corruption exploitation. The bigger box is still open.

> **Key idea:** CFI closes the control-flow-hijacking subclass of memory-corruption exploitation. The 70% memory-safety statistic is the structural ceiling. The exits from that ceiling are not within the CFI design space. They are memory-safe languages (Rust closing the bug class at compile time) and capability hardware (CHERI and CHERIoT closing pointer integrity at the silicon). CFI is one layer in a multi-layer story.

The real answers, then, are not new CFI variants. They are memory-safe languages -- Rust adoption in the Windows kernel, in the .NET runtime, in the WinRT projection -- and capability hardware. Neither is a substitute for the CFI layer that exists today, but neither is a CFI primitive either. They live at a different floor of the stack.

So where is the research moving?

## 9. Open Problems

The 2026-2030 research surface on Windows CFI has at least five named unknowns.

The first is kernel CFG and kernel CET bypasses. McGarr's Black Hat USA 2025 deck *Out of Control* names the area explicitly: kernel-mode CFG and kernel-mode CET surfaces have active bypass research, including PTE-manipulation attacks against the kCFG bitmap when HVCI is disabled, and the `nt!KscpCfgDispatchUserCallTargetEs[No]Smep` dispatch function on the kernel side [@mcgarr-bhusa25-deck-primary].

The Synacktiv SSTIC 2025 paper is the canonical reverse-engineering reference for the kernel-mode HSP implementation, and it walks the bypass surface a researcher would attack first [@synacktiv-sstic25].

The second is the XFG deprecation story. What fills the COOP-shaped forward-edge gap on shipping Windows x86-64 now that XFG is deprioritised? The candidates are IBT (free if Windows turned it on, but coarse: every `ENDBR64` is a legal target), an academic refinement like FineIBT (not deployed), or an unnamed type-aware MSVC successor that Microsoft has not publicly committed to. The honest answer is: nothing has XFG's fine-grained shape on Windows x86-64 in 2026. The COOP-shaped attack surface is open.

The third is Memory Tagging Extension on Windows-on-ARM. No Snapdragon X Elite or X Plus stepping currently sold supports ARMv8.5-A MTE in hardware, and Windows has no documented MTE-tagged allocator. The Pixel 8 line shipped MTE on Android in 2023 [@google-project-zero-mte] [@wiki-pixel-8]; Apple Silicon shipped a different MTE-adjacent tagging scheme [@apple-platform-security]; Windows is the third major platform on ARM and has the smallest MTE story. Whether Windows-on-ARM gets MTE in the next Snapdragon generation, and whether Microsoft ships a tagged Windows kernel allocator if it does, is open future work.

The fourth is CFI for managed runtimes. The .NET and WebAssembly host code-generation paths are the same carve-out Theori demonstrated in 2016 against Chakra. The .NET runtime in particular runs through `RyuJIT` to emit native code that must be marked CFG-valid through `SetProcessValidCallTargets` [@ms-learn-setprocessvalidcalltargets]. Whether Microsoft ships a finer-grained CFI for managed-runtime-emitted code -- one that bounds the equivalence class to "methods of this type" rather than "any address-taken function in the process" -- is not a public roadmap item.

The fifth is forward-edge precision after XFG. The Burow et al. CSUR 2017 survey's analytical framing is the one to keep in mind: precision is the size of the equivalence class admitted at each call site. CFG admits thousands. XFG admits roughly one to two on the WOOT 2024 measurement. The fine-grained ideal is one. Microsoft has not publicly committed to a successor type-aware forward-edge CFI for Windows x86-64.

Knowing what is open is half the practitioner's job. Knowing how to verify what is currently shipping is the other half.

## 10. Verifying CFI on Any Windows Binary

A reproducible workflow the reader can run on their own machine right now.

**Compile with CFI.** The MSVC command line for the full stack is `cl /guard:cf main.cpp /link /DYNAMICBASE /HIGHENTROPYVA /CETCOMPAT`. Order matters: switches before `/link` go to the compiler, switches after `/link` go to the linker, and `/CETCOMPAT` is a linker-only option [@ms-learn-cetcompat]. Both `/guard:cf` *and* `/DYNAMICBASE` are required for CFG; `/guard:cf` alone is a silent no-op [@ms-learn-guard-flag].

`/guard:xfg` adds XFG instrumentation on MSVC since Visual Studio 2019 Preview 16.5 [@mcgarr-examining-xfg]. `/CETCOMPAT` marks the binary as shadow-stack-compatible, which the loader uses to decide whether shadow-stack faults are fatal in strict mode. `/HIGHENTROPYVA` extends ASLR's randomisation range and is required for the 128 TB user VA that CFG's bitmap reservation depends on [@ms-learn-highentropyva].

**Inspect a binary on disk.** `dumpbin /loadconfig binary.exe` reports `CF Instrumented`, `FID table present`, `Long jump target table`, and `XFG functions present`. `dumpbin /headers binary.exe` reports `IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT` if the binary was linked with `/CETCOMPAT`. `link /DUMP /HEADERS` is the linker-side equivalent and produces the same information. Both tools ship in any Visual Studio install.

**Inspect a running process.** `Get-ProcessMitigation -Name notepad.exe` in PowerShell reports CFG, ASLR, DEP, shadow-stack, [ACG and CIG](/blog/process-mitigation-policies-cfg-acg-cig-and-the-layer-betwee/) state per process [@ms-learn-cfg]. `Set-ProcessMitigation` toggles policies at runtime for a given process name. `Get-ProcessMitigation -System` reports system-wide defaults. The cmdlet is implemented atop `GetProcessMitigationPolicy` under the hood.

<RunnableCode lang="js" title="Reproducing the verification workflow">{`
// Reproduces the logic of Get-ProcessMitigation plus dumpbin output
// for a single binary. In real PowerShell, GetProcessMitigationPolicy
// returns a struct with one field per policy class.

function inspectBinary(name, dumpbinHeaders, dumpbinLoadConfig) {
  const cetCompat = dumpbinHeaders.includes('CET Compatible');
  const cfInstrumented = dumpbinLoadConfig.includes('CF Instrumented');
  const xfgPresent = dumpbinLoadConfig.includes('XFG functions present');

  console.log('--- ' + name + ' ---');
  console.log('  CFG       :', cfInstrumented ? 'INSTRUMENTED' : 'absent');
  console.log('  XFG       :', xfgPresent ? 'INSTRUMENTED' : 'absent');
  console.log('  CETCOMPAT :', cetCompat ? 'YES' : 'NO');
}

function inspectProcess(name, mitigationPolicy) {
  console.log('Process: ' + name);
  console.log('  CFG.Enable                 :', mitigationPolicy.CFG.Enable);
  console.log('  UserShadowStack.Enable     :', mitigationPolicy.USS.Enable);
  console.log('  UserShadowStack.StrictMode :', mitigationPolicy.USS.StrictMode);
  console.log('  ASLR.BottomUp              :', mitigationPolicy.ASLR.BottomUp);
  console.log('  DEP.Enable                 :', mitigationPolicy.DEP.Enable);
}

inspectBinary('msedge.exe',
  'IMAGE_DLLCHARACTERISTICS_EX_CET_COMPATIBLE',
  'CF Instrumented, FID table present, XFG functions present');

inspectProcess('msedge.exe', {
  CFG:  { Enable: 'ON', StrictMode: 'ON' },
  USS:  { Enable: 'ON', StrictMode: 'ON' },
  ASLR: { BottomUp: 'ON' },
  DEP:  { Enable: 'ON' }
});
`}</RunnableCode>

**Programmatic policy installation.** The two API surfaces are `SetProcessMitigationPolicy`, which sets the policy of the current process at runtime, and `UpdateProcThreadAttribute(PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY)`, which sets the policy of a child process at `CreateProcess` time. The latter is the only race-free entry point for hardened child processes -- it is impossible for child code to execute before the policy is installed.

> **Note:** Use `UpdateProcThreadAttribute(PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY)` paired with `CreateProcess`, not in-process `SetProcessMitigationPolicy` after the fact. The latter has a window between process creation and policy installation in which child code can execute without the mitigation. The `UpdateProcThreadAttribute` approach installs the policy as part of the process creation `STARTUPINFOEX`, closing the race.

**Turn on kernel-mode HSP.** Windows Security -> Device Security -> Core Isolation -> "Kernel-mode Hardware-enforced Stack Protection." HVCI is the prerequisite; if it is off, the toggle is not available. Group Policy exposes the same setting at `Computer Configuration / Administrative Templates / System / Device Guard / Turn On Virtualization Based Security / Kernel-mode Hardware-enforced Stack Protection`.

<Spoiler kind="solution" label="Inspecting your own machine: a verification one-liner">

Open PowerShell as administrator and run:

```powershell
Get-ProcessMitigation -Name (Get-Process -Id $PID).Path |
  Select-Object CFG, ASLR, DEP, UserShadowStack
```

The output is the policy of the current PowerShell session. To check a binary on disk:

```powershell
dumpbin /headers C:\Windows\System32\notepad.exe | findstr /C:"CET"
dumpbin /loadconfig C:\Windows\System32\notepad.exe | findstr /C:"FID"
```

The first line returns `CET Compatible` if the binary was linked with `/CETCOMPAT`. The second returns the FID-table presence line if CFG was enabled.

</Spoiler>

Now the reader can answer the question §1 raised: why does the same OS apply different contracts to different processes? Because each process opts in, and the opt-in surface has ten bits.

## 11. Frequently Asked Questions

<FAQ title="Practitioner questions about Windows CFI">

<FAQItem question="Is CFG, XFG, or CET the answer to memory-corruption exploitation?">
No. None of them close the data-only attack class. Hu and colleagues proved at IEEE S&P 2016 that Data-Oriented Programming is Turing-complete and never violates the static control-flow graph, which means every CFI variant is structurally blind to it [@hu-2016-dop]. CFI closes the control-flow-hijacking subclass of memory-corruption exploitation. The 70% memory-safety statistic from Matt Miller's BlueHat IL 2019 talk is the structural ceiling [@matt-miller-bluehat-il-2019].
</FAQItem>

<FAQItem question="Why was XFG deprecated?">
The public-facing reason is documented in Connor McGarr's Black Hat USA 2025 retrospective: "XFG was never fully instrumented (UM/KM) and is now deprecated" [@mcgarr-bhusa25-deck-primary]. The most defensible reading of why is that hardware CET on the backward edge -- territory software CFG and XFG never touched -- became the strategic priority once Tiger Lake silicon arrived in September 2020. WOOT 2024 measured XFG at 85.7% of x64 PE files on Insider Preview, never reaching the universal coverage CFG achieves [@woot24-becker-pdf].
</FAQItem>

<FAQItem question="Is kernel-mode HSP on by default in Windows 11 24H2?">
No. The Microsoft Learn page states the default verbatim: "Kernel-mode Hardware-enforced Stack Protection is off by default, but customers can turn it on if the prerequisites are met" [@ms-learn-kernel-hsp]. The prerequisites are an 11th-gen Intel Core Mobile CPU or AMD Zen 3 or newer, Windows 11 2022 update or newer, VBS enabled, HVCI enabled, and an explicit user opt-in via Windows Security or Group Policy.
</FAQItem>

<FAQItem question="Does Windows-on-ARM have MTE?">
No, not as of May 2026. The Snapdragon X Elite and X Plus steppings shipping in 2026 Windows-on-ARM machines do not support ARMv8.5-A Memory Tagging Extension in hardware, and Windows has no documented MTE-tagged allocator. Pointer Authentication is shipped (92% PA file coverage on Insider Preview build 23419 per WOOT 2024) but MTE is not [@woot24-becker-pdf] [@mcgarr-windows-pac].
</FAQItem>

<FAQItem question="Does AMD ship CET?">
Yes. AMD Zen 3 (Ryzen 5000 "Vermeer" and Epyc 7003 "Milan") shipped on November 5, 2020 with a compatible shadow-stack implementation [@wiki-zen-3]. Microsoft's Kernel-mode HSP documentation explicitly names "AMD Zen 3 Core (and newer)" as a CET prerequisite [@ms-learn-kernel-hsp]. The instruction encodings follow the Intel CET specification, so OS code paths are shared.
</FAQItem>

<FAQItem question="What is the difference between CFG and /GS?">
Different invariants, different timing. `/GS` is a stack-cookie check on function *epilogue*: a random value is placed between local buffers and the saved return address, and the runtime check fires before `ret` if the cookie has been overwritten. CFG is an indirect-call target check on function *prologue*: every indirect call site invokes a thunk that consults a bitmap to verify the target address. `/GS` detects contiguous stack-buffer overflows; CFG constrains the target of an attacker-controlled function-pointer write. They are complementary, not substitutes.
</FAQItem>

<FAQItem question="What is the difference between HVCI and kernel-mode HSP?">
HVCI is W^X for kernel pages. The hypervisor enforces that kernel memory marked executable is not writable from any source, including the NT kernel itself, by managing the second-level address translation tables that the kernel cannot touch. Kernel-mode HSP is the CET-based ROP mitigation for ring 0: a CPU-managed shadow stack of kernel return addresses, with a `#CP` fault on mismatch. HVCI is a prerequisite for kernel-mode HSP because the shadow-stack pages need to be write-protected by the hypervisor; the NT kernel cannot guarantee its own non-mutability after a code-execution compromise [@ms-learn-kernel-hsp].
</FAQItem>

<FAQItem question="Will Rust replace CFI?">
Rust closes memory-safety bugs at compile time. CFI closes the exploitation surface at runtime against bugs that did make it past the compiler. Both layers ship in parallel. Microsoft is migrating selected Windows kernel components to Rust (the Mu UEFI firmware project [@github-microsoft-mu], segments of the GDI subsystem) but CFI remains the runtime layer for everything in the C and C++ surface. The two are complementary; one does not replace the other.
</FAQItem>

</FAQ>

The story this article tells closes around a structural admission. CFI is one layer of a defence stack. The 1996-to-2016 attack-class genealogy -- stack smash, return-into-libc, ROP, JOP, COOP, DOP -- produced a matching defense genealogy on Windows: `/GS`, DEP, ASLR, CFG, XFG, CET shadow stack. Each generation closes the gap the previous attacker class opened. Each leaves open exactly the territory the next attacker class will occupy.

DOP and the 70% memory-safety statistic are the territory no CFI generation has touched. That territory is the one Rust closes at compile time, and CHERI and CHERIoT close at the silicon. The future of memory-corruption defence on Windows is not a fourth generation of CFI. It is the combination of memory-safe languages in the kernel and capability hardware underneath the language.

CFI is necessary and not sufficient. Now you know which bit is which.

<StudyGuide slug="control-flow-integrity-on-windows-cfg-xfg-and-intel-cet-shadow-stack" keyTerms={[
  { term: "CFG", definition: "Control Flow Guard. Shipped Windows 8.1 Update 3 (November 2014). Per-process bitmap of valid indirect-call targets, indexed by target address." },
  { term: "XFG", definition: "eXtended Flow Guard. Announced BlueHat Shanghai 2019. 64-bit prototype-hash refinement of CFG; never fully instrumented in shipping Windows; deprecated per McGarr BHUSA 2025." },
  { term: "CET", definition: "Intel Control-flow Enforcement Technology. Hardware feature shipped in Tiger Lake (September 2, 2020). Two components: SHSTK (Shadow Stack) and IBT (Indirect Branch Tracking)." },
  { term: "SHSTK", definition: "Shadow Stack. CPU-managed parallel stack of return addresses, write-protected by a CET page-table bit. Mismatch on ret raises #CP." },
  { term: "IBT", definition: "Indirect Branch Tracking. Forward-edge half of CET. Indirect-branch targets must begin with ENDBR64; mismatch raises #CP. Windows does not enable IBT as of 2026." },
  { term: "FID table", definition: "Function ID table. Per-binary PE structure inside IMAGE_LOAD_CONFIG_DIRECTORY listing every address-taken function. Loader merges per-module tables into a process-wide CFG bitmap." },
  { term: "COOP", definition: "Counterfeit Object-Oriented Programming. Schuster et al. IEEE S&P 2015. Chains C++ virtual calls dispatched through legitimate vtables, every target a valid CFG bit. The attack that motivated XFG." },
  { term: "DOP", definition: "Data-Oriented Programming. Hu et al. IEEE S&P 2016. Turing-complete attack via non-control data corruption. Invisible to every CFI variant because it never violates the control-flow graph." },
  { term: "PAC", definition: "Pointer Authentication Code. ARMv8.3-A feature. Cryptographic MAC over a 64-bit pointer in unused upper bits. Windows-on-ARM uses key B for instruction-pointer signing on return addresses." },
  { term: "HVCI", definition: "Hypervisor-enforced Code Integrity. W^X for kernel pages enforced by the hypervisor via second-level address translation. Prerequisite for kernel-mode HSP." }
]} />
