Q: What is the difference between CFG and /GS?

Different invariants, different timing. /GS is a stack-cookie check on function epilogue : a random value is placed between local buffers and the saved return address, and the runtime check fires before ret if the cookie has been overwritten. CFG is an indirect-call target check on function prologue : every indirect call site invokes a thunk that consults a bitmap to verify the target address. /GS detects contiguous stack-buffer overflows; CFG constrains the target of an attacker-controlled function-pointer write. They are complementary, not substitutes.

TL;DR

Windows ships three generations of control-flow integrity in 2026. CFG (Control Flow Guard, 2014) is a per-process bitmap of valid indirect-call targets, one or two bits per 16 bytes of address space. XFG (eXtended Flow Guard, announced 2019) refines CFG with a 64-bit per-prototype hash stored eight bytes before each function entry, but was never fully instrumented in shipping Windows and is now deprecated. Intel CET (Tiger Lake silicon, September 2, 2020) adds a CPU-managed shadow stack and an ENDBR64-based indirect branch tracker; Windows uses only the shadow-stack half. User-mode shadow stack is default-on for /CETCOMPAT-marked binaries on CET-capable hardware. Kernel-mode shadow stack is off by default on Windows 11 24H2 and Windows Server 2025, requires Virtualization-Based Security plus Hypervisor-enforced Code Integrity, and must be enabled explicitly. None of these mitigations close the data-only attack class identified by Hu and colleagues in 2016, and roughly 70% of CVEs Microsoft issued between 2006 and 2018 were memory-safety bugs the entire CFI stack cannot prevent.

1. One Status Code, Two Processes

Open PowerShell on a Windows 11 24H2 machine and run Get-ProcessMitigation -Name notepad.exe. Run it again with -Name msedge.exe. Three rows will be different: CFG.Enable, UserShadowStack.Enable, and UserShadowStack.StrictMode. The same operating system, on the same hardware, is applying three different control-flow integrity contracts to two different processes. This article is the answer to why.

To make the question concrete, here is the failure mode each contract is trying to prevent. Crash a process with a deliberately corrupted indirect-call target and Windows reports STATUS_STACK_BUFFER_OVERRUN (0xC0000409). Dig into the fast-fail subcode and you find FAST_FAIL_GUARD_ICALL_CHECK_FAILURE, value 0x0A in winnt.h. That is the canonical CFG fast-fail.

Trip a corrupted return address while user-mode Hardware-enforced Stack Protection is active and the same status code fires with a CET-specific subcode, this time raised by the CPU's #CP (Control Protection) exception rather than by a compiler-inserted check thunk ^[1]. FAST_FAIL_GUARD_ICALL_CHECK_FAILURE is defined in the Windows SDK winnt.h header with value 0x0A. The Microsoft Learn CFG primary documents the runtime check-thunk path that routes the fault ^[2].

JavaScript A/B test: querying per-process CFI state

// Emulates Get-ProcessMitigation -Name notepad.exe vs msedge.exe
// In real PowerShell this calls GetProcessMitigationPolicy() under the hood.

const policies = {
'notepad.exe': {
  CFG:  { Enable: 'ON',  StrictMode: 'OFF' },
  USS:  { Enable: 'OFF', StrictMode: 'OFF', AuditEnable: 'OFF' }
},
'msedge.exe': {
  CFG:  { Enable: 'ON',  StrictMode: 'ON' },
  USS:  { Enable: 'ON',  StrictMode: 'ON',  AuditEnable: 'OFF' }
}
};

for (const [proc, p] of Object.entries(policies)) {
console.log(proc);
console.log('  CFG.Enable                  :', p.CFG.Enable);
console.log('  UserShadowStack.Enable      :', p.USS.Enable);
console.log('  UserShadowStack.StrictMode  :', p.USS.StrictMode);
}

Press Run to execute.

The three rows that differ are the three generations of Windows CFI. CFG arrived first, in November 2014 on Windows 8.1 Update 3. Five years later, in 2019, Microsoft announced its prototype-hash refinement called eXtended Flow Guard. Five years after that, in 2024, an academic measurement and a Black Hat retrospective confirmed XFG was never fully shipped. In between, in September 2020, Intel taped out the first commercial silicon with a hardware shadow stack, and Microsoft routed #CP faults from that silicon into the same STATUS_STACK_BUFFER_OVERRUN channel. Each generation closes a different attack class. The status code is the lens.

To understand why three generations exist, we have to start where the attacker did: in 1996, with a stack and a buffer.

Ctrl + scroll to zoom

Diagram source

timeline
title Three generations of Windows control-flow integrity
1996 : Aleph One Phrack 49-14 : Stack smashing tutorial
1997 : Solar Designer BugTraq : Return-into-libc
2004 : Windows XP SP2 ships DEP
2005 : Abadi et al. name CFI
2007 : Shacham CCS 2007 : ROP
2011 : Bletsch et al. ASIACCS 2011 : JOP
2014 : Windows 8.1 Update 3 ships CFG
2015 : Schuster et al. IEEE S&P 2015 : COOP
2016 : Hu et al. IEEE S&P 2016 : DOP
2016 : Intel publishes CET spec
2019 : Weston announces XFG at BlueHat Shanghai
2020 : Tiger Lake ships CET silicon : AMD Zen 3 ships compatible shadow stack
2024 : WOOT 2024 measures Windows CFI coverage
2025 : McGarr BHUSA : XFG never fully instrumented

Diagram source

timeline
title Three generations of Windows control-flow integrity
1996 : Aleph One Phrack 49-14 : Stack smashing tutorial
1997 : Solar Designer BugTraq : Return-into-libc
2004 : Windows XP SP2 ships DEP
2005 : Abadi et al. name CFI
2007 : Shacham CCS 2007 : ROP
2011 : Bletsch et al. ASIACCS 2011 : JOP
2014 : Windows 8.1 Update 3 ships CFG
2015 : Schuster et al. IEEE S&P 2015 : COOP
2016 : Hu et al. IEEE S&P 2016 : DOP
2016 : Intel publishes CET spec
2019 : Weston announces XFG at BlueHat Shanghai
2020 : Tiger Lake ships CET silicon : AMD Zen 3 ships compatible shadow stack
2024 : WOOT 2024 measures Windows CFI coverage
2025 : McGarr BHUSA : XFG never fully instrumented

Three generations of Windows CFI on a 2026 release timeline

2. The Attack That Started Everything

Aleph One, the pseudonym of Elias Levy (BugTraq moderator and later CTO of SecurityFocus), sat down in November 1996 and wrote Phrack Magazine Volume 7, Issue 49, File 14 of 16: Smashing The Stack For Fun And Profit ^[3]. The tutorial is meticulous. Levy walks the reader from C's stack-frame layout through gets() and strcpy() to a working shellcode payload that overflows a fixed-size automatic buffer, overwrites the saved return address, and redirects ret to attacker-supplied instructions in the same buffer.

It is the first widely-distributed step-by-step exposition of stack-buffer-overflow exploitation. Every Windows control-flow-integrity story has to recap it, because every later defense is a reaction to the bug class it demonstrated. "Aleph One" is the pen name of Elias Levy, BugTraq mailing-list moderator from 1996 to 2001 and later CTO of SecurityFocus. The pseudonym refers to the first transfinite cardinal in Cantor set theory ^[4].

The natural question the 1996 article raises is the one every defender after Levy had to answer. If overflowing a stack buffer can rewrite the return address, what stops the attacker from making ret point anywhere they want?

For a decade the answer was "almost nothing." Researchers prototyped stack-canary schemes (StackGuard, then /GS in Visual C++ 2002) and proposed compiler-rewriting defenses, but the fundamental shift waited for hardware ^[5].

On September 23, 2003, AMD shipped the Athlon 64 with the NX bit -- bit 63 of the AMD64 page-table entry, marketed under the label "Enhanced Virus Protection." Intel followed with the XD bit on Prescott-based Pentium 4 in 2004 ^[6]. With per-page no-execute enforcement in silicon, an operating system could finally mark data pages as non-executable and refuse to dispatch a jmp into the stack. Windows XP Service Pack 2, on August 6, 2004, was the first mainstream OS to enable hardware-enforced Data Execution Prevention by default for system binaries on NX-capable CPUs ^[7].

DEP did exactly what it advertised. It also broke the attacker's model. If data pages cannot execute, no amount of clever shellcode injection helps -- the bytes in the buffer simply will not run. The next move belongs to Solar Designer.

On August 10, 1997 -- seven years before DEP shipped -- Alexander Peslyak, posting as Solar Designer on the BugTraq mailing list, published the first public exploit demonstrating code reuse: overflow the buffer, redirect ret not to attacker-supplied shellcode but to the entry of an existing libc function such as system(), and hand-craft the stack so the function's arguments come from attacker-controlled data ^[8]. Solar Designer was prescient. In the same post he observed that this method "might sometimes be better than usual one (with shellcode) even if the stack is executable."

If the data pages cannot execute, reuse the code already there. That single sentence is the structural premise of every code-reuse attack from that day forward.

Control-Flow Integrity (CFI)

A static safety property defined by Abadi, Budiu, Erlingsson and Ligatti in 2005: every indirect control-flow transfer at runtime must follow an edge in a precomputed static control-flow graph of the program. CFI as a contract makes no claim about data integrity -- only about the targets of call, jmp through a register, and ret. Modern Windows mitigations (CFG, XFG, CET) each implement part of this contract.

By the mid-2000s the structural answer was overdue. In November 2005, at the 12th ACM Conference on Computer and Communications Security, Martin Abadi, Mihai Budiu and Ulfar Erlingsson (all at Microsoft Research Silicon Valley) together with Jay Ligatti named the contract: Control-Flow Integrity ^[9]. Their paper's abstract states the thesis plainly: "enforcement of a basic safety property, Control-Flow Integrity (CFI), can prevent such attacks from arbitrarily controlling program behavior."

Every indirect control-flow transfer at runtime must follow an edge in a precomputed static control-flow graph. The paper demonstrated a prototype binary rewriter that placed a unique ID-check label before every indirect-call target and inserted a label-comparison stub before every indirect call and return, refusing to dispatch unless the labels matched. Benchmarks reported 16% average overhead -- impractical for production, but the contract was now formal.

The contract was the easy part. Implementing it took Microsoft another nine years. In the interim, attackers built three generations of code reuse: ROP, JOP and COOP. We have to understand all three before we can read CFG's source listing.

3. The Mitigation Stack Before CFI

Between Aleph One's 1996 tutorial and the first Windows CFI shipment in 2014 lies an eighteen-year sequence of defenses that closed every direct attack path one by one. None of them protected indirect transfers. Together they make injecting attacker-controlled instructions uneconomic and force the attacker into code reuse. The attacker's choice in 2007 is no longer "inject" -- it is "reuse."

The pieces matter because each is referenced by name in the CFG documentation. Visual C++ 2002 introduced /GS, which inserts a random cookie between local buffers and the saved return address and validates it on function epilogue ^[5]. A contiguous overflow that overwrites the return address must also pass through the cookie, and the runtime check terminates the process before ret dispatches. Stack-cookie schemes do not stop the attacker who has a non-contiguous write primitive, but they raise the cost of the canonical exploit Aleph One described.

※ Vista RTM was build 6000, shipped November 8, 2006 ^[10]. ASLR was opt-in via /DYNAMICBASE and required PE images to be re-linked; pre-Vista binaries continued to load at fixed bases until the developer recompiled.

DEP, again on Windows XP SP2 in August 2004, paired the NX bit with a per-process OptIn / OptOut / AlwaysOn / AlwaysOff policy surface. ASLR followed two years later in Windows Vista RTM build 6000 on November 8, 2006: image-base, heap, stack and PEB/TEB locations randomised per boot or process, with binaries opting in via the /DYNAMICBASE linker flag ^[11]. The PaX project on Linux had pioneered the technique in July 2001, OpenBSD 3.4 shipped ASLR by default in 2003, and Linux mainline followed in 2005; Vista was the third major OS in the column ^[12].

SafeSEH (a linker-side table of legal exception handlers, validated at SEH dispatch) and its runtime sibling SEHOP closed the SEH-overwrite technique David Litchfield formalised in his September 2003 NGSSoftware paper ^[13]. That class became canonical in the 2004-2008 browser and Office client-side exploit lineage, distinct from the return-address-overwrite worms (Code Red 2001 against IIS, Slammer 2003 against SQL Server 2000, Blaster 2003 and Sasser 2004 against RPC and LSASS) that motivated DEP and /GS.

By late 2007, every direct attack was closed. Stack canaries caught contiguous overflows on epilogue. DEP refused to dispatch into data pages. ASLR forced the attacker to leak a pointer before any hardcoded address would resolve. SafeSEH constrained the SEH chain.

The structural gap each left was the same: indirect calls and indirect jumps remained unconstrained. An attacker who could write a corrupted function pointer through any means -- type-confusion, use-after-free, integer-overflow-feeding-allocator -- could still redirect an indirect call to any legitimately-executable byte in the process. Hovav Shacham turned the question inside out. Instead of inventing new instructions, he used the ones already there.

4. Three Generations of CFI on Windows

4.1 Generation 1: Control Flow Guard

Shacham steps up to a CCS 2007 podium in Alexandria, Virginia, and demonstrates a Turing-complete instruction set discovered inside an unmodified libc binary -- ret-terminated gadgets at byte offsets the binary's author never intended ^[14]. The audience now has a name for what is coming: return-oriented programming.

The paper, "The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)," constructs a complete shellcode -- load, store, arithmetic, logic, control flow, system calls -- entirely from these gadgets, without injecting a single byte of executable code. The CCS 2017 Test-of-Time Award acknowledges the paper's formative impact a decade later ^[15]. Wikipedia summarises the canonical arc concisely: "With data execution prevention, an adversary cannot directly execute instructions written to a buffer ... To defeat this protection, a return-oriented programming attack does not inject malicious instructions, but rather uses instruction sequences already present in executable memory, called 'gadgets', by manipulating return addresses" ^[16].

Gadget

A short, attacker-useful instruction sequence ending in a control-flow transfer (typically ret for ROP, an indirect jmp for JOP, or an indirect call through a vtable for COOP). The defining property of a gadget is that it exists unintentionally inside a legitimate executable image, at a byte offset the binary's author never planned for the CPU to start decoding. Variable-length x86 instructions make gadgets plentiful: any random 0xC3 byte in the middle of a function is a potential ret-terminated tail.

Forward edge and backward edge

The forward edge of a control-flow graph is any indirect transfer whose target is determined at runtime: indirect call, indirect jmp, virtual dispatch through a vtable, function-pointer call. The backward edge is the ret instruction returning to a caller. CFI implementations frequently address only one edge: CFG and XFG check forward edges with a bitmap or a hash; the Intel CET shadow stack checks the backward edge by comparing the popped return address against a CPU-managed parallel copy ^[17].

If gadgets are everywhere, how do you stop the attacker from calling them? Microsoft's answer arrived seven years later. On November 18, 2014, Windows 8.1 Update 3 (KB3000850) shipped Control Flow Guard ^[2]. CFG is a coarse-grained forward-edge CFI scheme. The contract is a single equivalence class per process: every address-taken function in any module loaded into the process is a legal indirect-call target; every other byte is not.

The mechanism has four moving parts. First, the MSVC compiler invoked with /guard:cf enumerates every function whose address is taken anywhere in the module and emits a check thunk -- __guard_check_icall_fptr for "check then return to caller," __guard_dispatch_icall_fptr for "check then tail-call dispatch" -- at every indirect call site ^[18]. Second, the linker emits a per-binary Function ID (FID) table inside IMAGE_LOAD_CONFIG_DIRECTORY in the PE file, listing the relative virtual addresses of every legal target.

Third, at image load time the Windows loader merges per-module FID tables into a process-wide bitmap (two bits per 16 bytes of code) backed by a kernel-managed, read-only mapping. Fourth, the check thunk indexes the bitmap by the target address; if the bit is clear, the thunk invokes __fastfail(FAST_FAIL_GUARD_ICALL_CHECK_FAILURE), which raises STATUS_STACK_BUFFER_OVERRUN and terminates the process.

FID table (Function ID table)

A PE-file structure inside IMAGE_LOAD_CONFIG_DIRECTORY that enumerates the relative virtual addresses of every address-taken function in the binary. The Windows loader's LdrpProtectAndRelocateImage routine merges every loaded module's FID table into a single process-wide bitmap. dumpbin /loadconfig displays the table's contents and the CF Instrumented and FID table present flags.

The bitmap layout is worth a moment. McGarr's Black Hat USA 2025 Out of Control deck documents the state machine in detail ^[19]. Two bits per 16 bytes of address space encode four states: (0,0) means no valid target; (1,0) means a 16-byte-aligned valid target; (1,1) means a non-aligned valid target (function entry is not 16-byte aligned); and (0,1) is the suppressed-target marker the loader sets for entries the linker has decided are unsafe.

The arithmetic of the bitmap is what reshaped the Windows 8.1 address space. Alex Ionescu walked through the arithmetic in his 2014 writeup: a 128 TB user-mode virtual address space at 16-byte granularity is 8 TB of possible targets, and at 2 bits per slot that is a 2 TB bitmap. The memory manager paginates the bitmap sparsely, so resident commit is tiny, but the reservation could not coexist with the older 8 TB user-VA layout. CFG is the reason Windows 8.1 went from 8 TB to 128 TB user VA ^[20].

Before Windows 8.1 Update 3, the Windows user-mode virtual address space on x64 was 8 TB per process. CFG's bitmap reservation is sized to cover the entire user-mode address space at 16-byte granularity -- one byte of bitmap per 64 bytes of VA. On the new 128 TB layout that is a 2 TB bitmap; had Microsoft instead sized the bitmap for the legacy 8 TB layout it would have been only 128 GB, but they did not. Alex Ionescu walked through the consequence after the November 2014 ship: dropping the 2 TB bitmap sized for the new 128 TB layout into the legacy 8 TB user VA would have cut the usable address space by 25% per process (2 TB / 8 TB) and pushed per-process commit to roughly 4 GB. So the engineering decision Microsoft made was to grow the user VA to 128 TB on the way to landing CFG. The 2 TB bitmap is the largest single contiguous reservation any Windows process has ever made, and most of its bytes are never touched ^[20].

The first independent reverse-engineering writeup came from Trend Micro in 2015: Jack Tang's Exploring Control Flow Guard in Windows 10 walks the ntdll.dll!LdrpCallInitRoutine path, the per-module __guard_check_icall_fptr import resolution, the MEMORY_BASIC_INFORMATION.Protect & PAGE_TARGETS_INVALID flag, and the exact bitmap layout in full disassembly ^[21]. The shape of the field is now a matter of public record.

Ctrl + scroll to zoom

Diagram source

flowchart LR
A[MSVC compiler with /guard:cf] -->|emit check thunks| B[Object files with FID metadata]
B --> C[Linker with /DYNAMICBASE]
C -->|writes| D[FID table in IMAGE_LOAD_CONFIG_DIRECTORY]
D --> E[PE binary on disk]
E --> F[Windows loader at image load]
F -->|LdrpProtectAndRelocateImage| G[Process-wide CFG bitmap]
G --> H[__guard_check_icall_fptr at every indirect call]
H -->|bit clear| I[STATUS_STACK_BUFFER_OVERRUN]
H -->|bit set| J[Dispatch indirect call]

Diagram source

flowchart LR
A[MSVC compiler with /guard:cf] -->|emit check thunks| B[Object files with FID metadata]
B --> C[Linker with /DYNAMICBASE]
C -->|writes| D[FID table in IMAGE_LOAD_CONFIG_DIRECTORY]
D --> E[PE binary on disk]
E --> F[Windows loader at image load]
F -->|LdrpProtectAndRelocateImage| G[Process-wide CFG bitmap]
G --> H[__guard_check_icall_fptr at every indirect call]
H -->|bit clear| I[STATUS_STACK_BUFFER_OVERRUN]
H -->|bit set| J[Dispatch indirect call]

The CFG build pipeline: compiler enumerates address-taken functions, linker writes the FID table, loader merges into a process bitmap, check thunk consults the bitmap at every indirect call.

The two-bit-per-16-byte state machine deserves a table.

Two-bit value	Meaning
`(0, 0)`	Address is not a legal indirect-call target. Check thunk fast-fails.
`(1, 0)`	Address is a legal target and is 16-byte aligned.
`(1, 1)`	Address is a legal target but is not 16-byte aligned (function entry is misaligned).
`(0, 1)`	Suppressed target: the linker marked the entry as deliberately invalid.

The Becker, Hollick and Classen SoK paper at USENIX WOOT 2024 measured CFG coverage on a Windows 11 Insider Preview developer build 23440 at 97.37% of x64 PE files (only 2.63% unprotected), and 99.09% on C:\Windows\System32 (0.91% unprotected) ^[22]. The mitigation has reached near-universal coverage on the system surface; the gap is in third-party code that has not opted in.

CFG worked, and it broke, in precisely the way 2007-era CFI papers had predicted. The first major bypass came from the JIT side. On October 11, 2016, Microsoft Patch Tuesday MS16-119 shipped a cumulative update to Microsoft Edge. Theori's Frontier Squad followed with a December 13, 2016 writeup describing the bypass it closed ^[23].

The Chakra JavaScript engine generated native code into a temporary writable buffer, then copied it to executable memory. While the code was in the temporary buffer, an adversary with a write primitive could rewrite the bytes the JIT was about to emit, smuggling attacker-chosen instructions into legitimately CFG-valid territory. JIT-emitted code is, by construction, registered as a valid CFG target through SetProcessValidCallTargets ^[24] -- there is no way to ship a working JavaScript runtime otherwise. CFG cannot tell intended JIT output from substituted JIT output.

CFG didn't offer any granularity over the valid call targets. Any protected indirect call was allowed to call any valid call target. In large binaries, valid call targets could easily be in the thousands, giving attackers plenty of flexibility to bypass CFG by chaining valid C++ virtual functions. -- Quarkslab, How the MSVC compiler generates XFG function prototype hashes

The deeper structural bypass arrived first, at IEEE Symposium on Security and Privacy 2015. Felix Schuster and his coauthors at Ruhr-University Bochum and TU Darmstadt published Counterfeit Object-oriented Programming -- the COOP attack ^[25]. Their observation was structural. C++ virtual calls dispatch through vtables. Every vtable entry points at a function whose address has been taken. Every such function is therefore a valid CFG target by construction.

An attacker who can corrupt an object's vtable pointer can chain valid virtual calls and reach Turing-completeness without leaving the CFG-valid set. The CFG bitmap never fires. The check passes every time.

CFG asks the same question of every indirect call: is this target's address bit set in the process-wide bitmap? Every C++ virtual method is address-taken. Every vtable entry is in the bitmap. Schuster's COOP attack stays inside the legal set and reaches Turing-completeness without CFG ever firing. To close COOP, the check has to ask a harder question: does this function's signature match the call site's?

4.2 Generation 1.5: eXtended Flow Guard

Bletsch and colleagues at NC State and the National University of Singapore published Jump-Oriented Programming: A New Class of Code-Reuse Attack at ASIACCS 2011 in Hong Kong ^[26]. JOP replaces ROP's ret-terminated gadgets with indirect-jump-terminated gadgets dispatched by a separate dispatcher gadget, typically an indirect jmp that updates a virtual program counter held in a chosen register. The attack defeats any defense that single-targets ret.

Schuster's COOP, four years later, defeated CFG itself. By 2019 the natural defensive answer was overdue, and David Weston walked on stage at BlueHat Shanghai 2019 to announce Microsoft's response: eXtended Flow Guard ^[27].

COOP (Counterfeit Object-Oriented Programming)

A code-reuse attack class identified by Schuster, Tendyck, Liebchen, Davi, Sadeghi and Holz in IEEE S&P 2015 ^[25]. COOP chains C++ virtual calls dispatched through legitimately-existing vtables. Because every vtable entry is the address of a virtual method whose address has been taken, every dispatch target is a valid CFG bitmap entry by construction. COOP reaches Turing-completeness without ever violating coarse-grained forward-edge CFI. The attack is the structural reason XFG was designed.

If COOP attacks survive CFG by staying inside the legal set, what makes a target's signature legal? XFG's answer is a 64-bit truncated SHA-1 hash of each function's prototype, computed at compile time and stored eight bytes before the function entry. The call site loads the expected hash into r10. Dispatch goes through __guard_dispatch_icall_fptr_xfg, which reads the eight bytes at [rax - 8] and compares them to r10. Mismatch raises STATUS_STACK_BUFFER_OVERRUN. Quarkslab's 2020 teardown documents the dispatch in detail ^[28]. Quarkslab's reverse-engineering shows the XFG dispatch thunk also ORs bit 0 of r10 before the comparison, a feature that lets the loader downgrade XFG to plain CFG semantics for modules that did not opt into the hash check ^[28].

Ctrl + scroll to zoom

Diagram source

sequenceDiagram
participant CallSite as XFG-instrumented call site
participant Thunk as __guard_dispatch_icall_fptr_xfg
participant Target as Target function entry
CallSite->>CallSite: load expected hash into r10
CallSite->>Thunk: call thunk(rax = target)
Thunk->>Target: read 8 bytes at [rax - 8]
Thunk->>Thunk: compare with r10
alt hashes match
Thunk->>Target: dispatch indirect call
else mismatch
Thunk->>Thunk: STATUS_STACK_BUFFER_OVERRUN
end

Diagram source

sequenceDiagram
participant CallSite as XFG-instrumented call site
participant Thunk as __guard_dispatch_icall_fptr_xfg
participant Target as Target function entry
CallSite->>CallSite: load expected hash into r10
CallSite->>Thunk: call thunk(rax = target)
Thunk->>Target: read 8 bytes at [rax - 8]
Thunk->>Thunk: compare with r10
alt hashes match
Thunk->>Target: dispatch indirect call
else mismatch
Thunk->>Thunk: STATUS_STACK_BUFFER_OVERRUN
end

XFG dispatch: the call site loads the expected prototype hash into r10, the thunk reads the actual hash stored eight bytes before the function entry, and the comparison decides whether to dispatch or fast-fail.

The toolchain is narrower than CFG's. The /guard:xfg flag shipped in Visual Studio 2019 Preview 16.5 ^[29]. Connor McGarr's 2020 Examining XFG writeup is the canonical practitioner-side reference, documenting the thunk, the hash placement, and the build contract. Upstream LLVM and Clang shipped no equivalent.

Critically, /guard:xfg is not documented on the Microsoft Learn /guard page, which lists only /guard:cf and /guard:cf- ^[18]. That documentation absence is a leading indicator.

The WOOT 2024 paper is the empirical measurement. Becker, Hollick and Classen analysed the Windows 11 Insider Preview developer build 23440 and reported the numbers verbatim in Table 4: 85.73% of executables carry XFG instrumentation, 85.70% of DLLs, 97.04% of C:\Windows\System32 DLLs, with a geometric-mean equivalence-class size of 1.37 ^[22] ^[31].

Translation: on Insider Preview builds, the OS-side coverage is high (effectively all of System32), but the 14% gap on executables outside the system directory is in third-party code that has not adopted /guard:xfg. The geometric-mean equivalence class of 1.37 means the hash narrows the legal target set dramatically -- a typical XFG-protected call site is followed by one or two prototype-matching candidates rather than the thousands an unrefined CFG bitmap would admit.

File class	CFG (unprotected)	XFG (coverage)	PA (coverage)
Executables (Windows 11 x64 Insider 23440)	2.68%	85.73%	n/a
DLLs (Windows 11 x64 Insider 23440)	2.62%	85.70%	n/a
`C:\Windows\System32` DLLs (x64)	0.91%	97.04%	n/a
Combined (Windows 11 x64 Insider 23440)	2.63%	85.70%	n/a
Windows 11 ARM64 Insider Preview build 23419	n/a	n/a	92%

Source: Becker et al., USENIX WOOT 2024, Table 4 and §5.3 ^[22].

eXtended Control Flow Guard (XFG) was an attempt to address this. XFG was never fully instrumented (UM/KM) and is now deprecated. -- Connor McGarr, Out of Control, Black Hat USA 2025 ^[19]

The retrospective verdict came in August 2025. McGarr's Black Hat USA 2025 deck names XFG as "never fully instrumented (UM/KM) and is now deprecated" ^[19].

The reason Microsoft de-prioritised XFG is not documented by Microsoft. The most defensible reading, consistent with the public timeline, is this: once Intel CET silicon arrived in September 2020, hardware CFI on the backward edge -- the territory software CFG and XFG never touched -- became the strategic priority. XFG was the right answer to COOP. It was also a software answer to a problem the silicon was about to absorb. By the time Tiger Lake taped out, Microsoft was already pivoting.

4.3 Generation 2: Intel CET, Shadow Stack and Indirect Branch Tracking

Intel published document 334525-001, Control-Flow Enforcement Technology Specification, Revision 1.0, in June 2016 -- four years before any silicon shipped ^[32]. The specification defines two independent components. SHSTK is the Shadow Stack, the backward-edge piece. IBT is Indirect Branch Tracking, the forward-edge piece. They are siblings, not parent and child. Tiger Lake (11th Gen Intel Core Mobile) shipped on September 2, 2020 as the first commercial silicon with both ^[33]. AMD Zen 3 (Ryzen 5000 "Vermeer" and Epyc 7003 "Milan") shipped a compatible implementation on November 5, 2020 ^[34]. The two-vendor consensus locked in.

Shadow Stack (SHSTK)

A CPU-managed second stack of return addresses, write-protected by a CET-specific page-table bit. On call, the CPU pushes the return address onto both the regular stack and the shadow stack. On ret, it pops both, compares, and raises a #CP (Control Protection) exception on mismatch. Only the privileged instructions WRSS (CPL 0) and WRUSS (CPL 0 with user-class access) can legitimately mutate shadow-stack contents. Software shadow stacks predated CET (StackShield 1998, RAD 2001, SmashGuard 2006), but all of them stored the second stack at user privilege where an attacker with an arbitrary-write primitive could forge it. SHSTK is the first widely-deployed shadow stack with hardware-rooted integrity ^[32].

Indirect Branch Tracking (IBT)

The forward-edge half of Intel CET. Every legal indirect-branch target must begin with ENDBR64 (on x86-64) or ENDBR32 (on x86). The CPU maintains a per-mode tracker state machine: an indirect call or indirect jump transitions the tracker out of IDLE, and the next instruction at the branch target must be ENDBR64 to transition it back; any other instruction raises #CP ^[35]. ENDBR64 is a no-op for direct execution paths, so it is safe to sprinkle at the entry of every address-taken function. IBT first shipped in the Tiger Lake generation ^[36]. As of May 2026, Windows enables only SHSTK; IBT is documented in the architecture but is not turned on by the OS ^[19].

Ctrl + scroll to zoom

Diagram source

flowchart TD
A[Intel CET] --> B[SHSTK
Shadow Stack
backward edge]
A --> C[IBT
Indirect Branch Tracking
forward edge]
B --> D[CPU pushes return address on call]
B --> E[CPU compares on ret]
B --> F[#CP fault on mismatch]
C --> G[ENDBR64 required at indirect-branch target]
C --> H[CPU tracker state machine]
C --> I[#CP fault on non-ENDBR target]
B -.-> J[Windows enforces this]
C -.-> K[Windows does not enforce this]

Diagram source

flowchart TD
A[Intel CET] --> B[SHSTK
Shadow Stack
backward edge]
A --> C[IBT
Indirect Branch Tracking
forward edge]
B --> D[CPU pushes return address on call]
B --> E[CPU compares on ret]
B --> F[#CP fault on mismatch]
C --> G[ENDBR64 required at indirect-branch target]
C --> H[CPU tracker state machine]
C --> I[#CP fault on non-ENDBR target]
B -.-> J[Windows enforces this]
C -.-> K[Windows does not enforce this]

The Intel CET umbrella: SHSTK protects the backward edge with a CPU-managed shadow stack; IBT protects the forward edge with ENDBR64 landing pads. Windows uses only SHSTK as of 2026.

The SHSTK mechanism is direct. On call, the CPU pushes the return address to both the regular stack and the shadow stack. On ret, it pops from both and compares. Mismatch raises #CP -- the Control Protection exception, vector 21.

The shadow stack lives on pages marked with a CET-specific page-table bit; an ordinary mov to those pages faults. Two privileged instructions are the only legitimate way to write to a shadow stack: WRSS requires CPL 0 (kernel mode), and WRUSS requires CPL 0 with user-class access ^[37] ^[38]. The instruction family rounds out with INCSSP for unwinding the shadow-stack pointer, RDSSP for reading it, and SAVEPREVSSP / RSTORSSP for context-switch primitives ^[39] ^[40]. The WRUSS privilege oddity is worth pausing on. The instruction can only execute when CPL is 0, but the processor treats its shadow-stack access as a user-class access for the purpose of page-permission checks: "The WRUSS instruction can be executed only if CPL = 0, however the processor treats its shadow-stack accesses as user accesses" ^[38]. That carve-out is what lets the kernel implement SEH unwinding and longjmp over a user shadow stack without violating the userspace memory model.

Windows integration begins where the silicon ends. The Microsoft Tech Community post Understanding Hardware-enforced Stack Protection, published on March 24, 2020 (six months before Tiger Lake shipped), announced the plumbing ^[1]. The #CP fault is delivered to user mode as STATUS_STACK_BUFFER_OVERRUN -- the same status code CFG fast-fails use, with a CET-specific subcode that lets debuggers distinguish the two.

The /CETCOMPAT linker flag, available beginning in Visual Studio 2019 and exposed in the GUI in version 16.7, sets IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT in the PE header ^[41]. The loader uses this bit to decide whether to enforce shadow-stack faults in strict mode (fatal on any binary) or compatibility mode (fatal only on /CETCOMPAT-marked modules).

The per-process policy lives in a ten-single-bit-field struct named PROCESS_MITIGATION_USER_SHADOW_STACK_POLICY ^[42]. The fields are, in declared order: EnableUserShadowStack, AuditUserShadowStack, SetContextIpValidation, AuditSetContextIpValidation, EnableUserShadowStackStrictMode, BlockNonCetBinaries, BlockNonCetBinariesNonEhcont, AuditBlockNonCetBinaries, CetDynamicApisOutOfProcOnly, and SetContextIpValidationRelaxedMode, followed by ReservedFlags : 22.

The default state on Windows 11 24H2 on CET-capable hardware is EnableUserShadowStack = TRUE in compatibility mode, meaning the shadow stack is active for every process but the fault is fatal only when the unwinding instruction is in a /CETCOMPAT-marked module. Strict mode is opt-in.

Policy bit	Role
`EnableUserShadowStack`	Master switch. TRUE enables HSP for the process in compatibility mode.
`AuditUserShadowStack`	Log shadow-stack violations rather than fast-failing. Used for canary builds.
`SetContextIpValidation`	Closes the `SetThreadContext`-via-CET-bypass carve-out by validating the IP write.
`AuditSetContextIpValidation`	Audit-mode variant of the above.
`EnableUserShadowStackStrictMode`	Fault is fatal in every module, not just `/CETCOMPAT`-marked ones.
`BlockNonCetBinaries`	Refuse to load any module without `IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT`.
`BlockNonCetBinariesNonEhcont`	Same as above but exempts modules with EH continuation metadata.
`AuditBlockNonCetBinaries`	Audit-mode variant of `BlockNonCetBinaries`.
`CetDynamicApisOutOfProcOnly`	JIT shadow-stack APIs must be invoked from a different process.
`SetContextIpValidationRelaxedMode`	Loosens `SetContextIpValidation` for compatibility with older debuggers.

Critically, McGarr's BHUSA 2025 deck states verbatim: "Windows only uses the Shadow Stack feature of CET" ^[19]. IBT is documented in the CPU and required by GCC's -fcf-protection=full on Linux, but Windows turns it off. The forward edge on Windows in 2026 is still a software story.

Hardware closes the backward edge. But the forward edge is still in software, and the kernel-mode story is still off by default. Why?

5. Hardware-Enforced Backward-Edge Protection

The shadow stack is not a new idea. The Wikipedia Shadow stack article documents three software-shadow-stack ancestors before Intel CET ^[32]. StackShield shipped in 1998. Return Address Defender (RAD) followed in 2001. SmashGuard arrived in 2006. Each kept a parallel stack of return addresses and compared the popped value at ret. Each paid one of two costs: per-call overhead from the compare-and-branch check, or a second stack at user privilege where an attacker with an arbitrary-write primitive could overwrite the shadow copy along with the regular one.

※ StackShield (1998), RAD (2001), SmashGuard (2006), LLVM -fsanitize=shadow-call-stack. Every software shadow stack before CET lived at user privilege; the cost of integrity was either runtime overhead or a register reservation an attacker could subvert.

What does the CPU give you that the compiler cannot? Three things, in declining order of structural significance.

First, a page-table attribute the CPU itself enforces. Shadow-stack pages are marked SHSTK in the page table. A regular mov to those pages faults, no matter how clever the attacker's write primitive is.

The privileged-write surface is exactly two instructions, WRSS and WRUSS, and both are CPL-0-only. Compatibility for existing C and C++ unwind paths -- SEH on Windows, setjmp/longjmp in the C runtime, C++ exception unwinding -- routes through these privileged instructions, called by the kernel on behalf of user-mode code that needs to legitimately rewind the shadow stack. The shadow stack is, structurally, a piece of CPU state that user code cannot mutate at all.

A longjmp is a long jump: control transfers across multiple stack frames in a single instruction. The C runtime saves a jmp_buf containing the stack pointer, the instruction pointer, and the register file, and longjmp restores them. On a CET-equipped system, the regular stack pointer is restored normally but the shadow stack pointer must also rewind by the same number of frames. SEH unwinding poses the same problem: when a structured exception handler dispatches, the runtime walks the SEH chain and unwinds the stack one frame at a time. Both paths require legitimately popping multiple shadow-stack entries in a single sequence. Intel solved this with INCSSP for the trivial unwind case (advance the shadow-stack pointer by a count of frames) and with WRUSS for the harder case where the kernel needs to write specific values back onto a user shadow stack. The engineering work to make every existing unwind path CET-compatible occupied compiler teams and C-runtime maintainers for the better part of two years between 2018 and 2020 ^[39] ^[38].

Second, a single CPU-visible event at the moment of mismatch. The compare-and-branch sequence that software shadow stacks emit takes multiple instructions, each of which can be raced by a concurrent attacker thread that wins the window between the compare and the trap. The CET ret instruction performs the compare and raises #CP atomically; there is no user-visible instruction between the comparison and the fault. The CPU enforces the invariant; user code cannot race it.

Third, performance. Intel and Microsoft both characterise shadow-stack overhead as single-digit percent on typical workloads ^[43], with Microsoft's Understanding Hardware-enforced Stack Protection announcement describing the cost as negligible ^[1]. WOOT 2024 measures below 2% on production workloads and 3% to 8% on micro-benchmarks ^[22]. Software shadow stacks, by contrast, typically pay 5% to 10% on call-heavy workloads plus a memory cost the hardware version does not.

Ctrl + scroll to zoom

Diagram source

flowchart TD
A[User-mode mov to SHSTK page] -->|page-table SHSTK bit| B[Faults]
C[Compiler-emitted call/ret] -->|hardware push/pop| D[Shadow stack pointer updated]
E[longjmp] --> F[INCSSP advances SSP]
E --> G[Kernel may invoke WRUSS]
H[SEH unwind] --> G
G --> I[Shadow stack legitimately rewound]
J[Kernel] -->|CPL 0 only| K[WRSS writes shadow stack]

Diagram source

flowchart TD
A[User-mode mov to SHSTK page] -->|page-table SHSTK bit| B[Faults]
C[Compiler-emitted call/ret] -->|hardware push/pop| D[Shadow stack pointer updated]
E[longjmp] --> F[INCSSP advances SSP]
E --> G[Kernel may invoke WRUSS]
H[SEH unwind] --> G
G --> I[Shadow stack legitimately rewound]
J[Kernel] -->|CPL 0 only| K[WRSS writes shadow stack]

The legitimate write paths to a shadow stack. Regular writes fault. WRSS (CPL 0) and WRUSS (CPL 0 with user-class access) are the only mutation paths. SEH unwinding and longjmp route through these instructions via kernel-mediated helpers.

The atomicity argument is the structural one. The performance is the marketing one. The page-table attribute is the security one. Together they explain why hardware backward-edge protection is a generational step on Windows rather than an incremental improvement on the shadow-stack lineage.

Shadow stack is the first time Windows has had a backward-edge story. Every prior Windows mitigation -- /GS, DEP, ASLR, SafeSEH, CFG, XFG -- treated ret either as something to guard a single frame around (the /GS cookie) or as something to ignore. The forward-edge story is still in software. The asymmetry matters.

So what is the state in 2026?

6. CFI on Windows in 2026

A snapshot of every CFI surface currently shipping. On a freshly-installed Windows 11 24H2 box, the operational picture stitches together cleanly into four layers.

6.1 User-mode Hardware-enforced Stack Protection

User-mode HSP is default-on for /CETCOMPAT-marked binaries on CET-capable hardware, announced by Microsoft in March 2020 ^[1]. Compatibility mode is the default; strict mode is opt-in via EnableUserShadowStackStrictMode ^[42]. The minimum supported client is Windows 10 version 2004 (build 19041), which means every supported consumer Windows release of the last six years has the API surface. The SetContextIpValidation bit is the load-bearing addition; it closes the SetThreadContext-via-CET-bypass carve-out by validating that any IP write through SetThreadContext targets a CET-instrumented landing.

6.2 Kernel-mode Hardware-enforced Stack Protection

Kernel-mode HSP is off by default on Windows 11 24H2 and Windows Server 2025. The Microsoft Learn primary states the prerequisite list verbatim: "Windows 11 2022 update or newer; 11th Gen Intel Core Mobile processors and AMD Zen 3 Core (and newer); Virtualization-based security (VBS) and Hypervisor-enforced code integrity (HVCI) are enabled" ^[44]. Activation is via Windows Security under Device Security and Core Isolation, or via Group Policy.

The HVCI prerequisite is non-negotiable: kernel-mode HSP relies on the hypervisor to enforce the write-protected page-table bit on shadow-stack pages, because the same NT kernel an attacker would compromise is the one that would otherwise own those mappings.

Ctrl + scroll to zoom

Diagram source

flowchart TD
A[11th Gen Intel Core Mobile or AMD Zen 3 or newer] --> B[Windows 11 2022 update or newer]
B --> C[Virtualization-based Security enabled]
C --> D[Hypervisor-enforced Code Integrity enabled]
D --> E[User opt-in via Windows Security or Group Policy]
E --> F[Kernel-mode HSP active]
A -.->|missing| G[Silent no-op]
C -.->|missing| G
E -.->|missing| G

Diagram source

flowchart TD
A[11th Gen Intel Core Mobile or AMD Zen 3 or newer] --> B[Windows 11 2022 update or newer]
B --> C[Virtualization-based Security enabled]
C --> D[Hypervisor-enforced Code Integrity enabled]
D --> E[User opt-in via Windows Security or Group Policy]
E --> F[Kernel-mode HSP active]
A -.->|missing| G[Silent no-op]
C -.->|missing| G
E -.->|missing| G

The kernel-mode HSP prerequisite chain: CPU support, Windows version, VBS, HVCI, and an explicit user opt-in. Any missing link silently disables the mitigation.

Synacktiv's SSTIC 2025 paper, Analyzing the Windows kernel shadow stack mitigation by Remi Jullian and Alexandre Aulnette of Synacktiv's reverse-engineering team, is the canonical practitioner reference for the kernel-mode implementation ^[45]. The paper walks the hypervisor calls, the KscpCfgDispatchUserCallTargetEs* functions named in McGarr's BHUSA 2025 deck, and the bypass surfaces a researcher should look at first.

6.3 Pointer Authentication on Windows-on-ARM

Windows on ARM ships ARMv8.3-A Pointer Authentication. The mechanism is different in detail from CET but parallel in role: a small cryptographic MAC over a 64-bit pointer, computed and stripped by dedicated instructions. McGarr's 2023 Windows ARM64 Internals: Deconstructing Pointer Authentication writeup is the practitioner reference ^[46]. The exact quote from the post nails the scope: "Windows currently only uses PAC for 'instruction pointers' ... and it also it only uses 'key B' for cryptographic signatures and, therefore, loads the target pointer signing value into the APIBKeyLo_EL1 and APIBKeyHi_EL1 AArch64 system registers."

PAC (Pointer Authentication Code)

An ARMv8.3-A feature in which 64-bit pointers carry a small cryptographic MAC in unused upper bits, generated and verified by dedicated PACI*, AUTI*, and XPAC* instructions. The Windows-on-ARM loader uses PACIBSP to sign the return address on function entry, AUTIBSP to verify it on exit, and XPACLRI to strip the MAC for debug-print paths. Windows uses key B (APIBKeyLo_EL1/APIBKeyHi_EL1) for instruction-pointer signing; the kernel-managed key is derived by OslPrepareTarget via SymCryptRngAesGenerate at boot ^[46].

The LOADER_PARAMETER_EXTENSION.PointerAuthKernelIpEnabled bit controls activation; PointerAuthKernelIpKey holds the kernel-managed key. The instruction triple PACIBSP / AUTIBSP / XPACLRI is sprinkled at function entry, exit, and debug-print paths respectively. WOOT 2024 measured 92% PA file coverage on Windows 11 ARM64 Insider Preview developer build 23419 ^[22]. The structural answer to backward-edge integrity on ARM is therefore PAC, not a shadow stack -- and Windows-on-ARM gets that protection by default on Snapdragon X Elite and X Plus machines.

6.4 Coverage in production

The WOOT 2024 measurements summarise the operational picture cleanly. CFG coverage on Windows 11 Insider Preview developer build 23440 is 97.37% of x64 PE files, 99.09% on System32; XFG coverage is 85.7% on PE files, 97.0% on System32; PA coverage on the Windows 11 ARM64 Insider Preview developer build 23419 is 92% ^[22]. CET shadow-stack adoption tracks the /CETCOMPAT linker flag's penetration across the OS surface; on the system DLLs in 24H2 it is at or near total. Translation: on a modern Windows 11 system, control-flow protection is almost-everywhere in the OS, and opt-in on user applications.

Almost everything in Windows itself is protected. The third-party-app and JIT-runtime surfaces are not. And the question of what to do about COOP, now that XFG is deprecated, is genuinely open.

7. How Other Platforms Solve the Same Problem

Step outside Windows for a moment. What does Linux do? What does Apple do? What does Android do?

Linux's answer is kCFI. The -fsanitize=cfi-icall flag, originally an LLVM jump-table forward-edge CFI, shipped in Linux 5.13 in June 2021. The replacement design, -fsanitize=kcfi, shipped in Linux 6.1 in December 2022 ^[30]. The mechanism is a 32-bit prototype hash placed before each function entry, padded with INT3 instructions to keep the hash bytes from becoming a useful gadget.

Jonathan Corbet's LWN writeup describes the design: "When code is compiled with -fsanitize=kcfi, the entry point to each function is preceded by a 32-bit value representing the prototype of that function. This value is (part of) a hash calculated from the C++ mangled name for the function and its arguments." kCFI is the design point XFG was peer to. It shipped, was documented, and remains supported.

※ Sami Tolvanen of Google's Android kernel team is the patch-series author for Linux kCFI. His earlier -fsanitize=cfi-icall work in LLVM landed first.

Apple's answer is PAC, deployed by default on every Apple Silicon Mac (since the M1 in November 2020) and on every iOS device since the A12 in 2018 ^[47]. The hardened runtime plus the com.apple.security.cs.allow-jit entitlement is the declarative JIT story, because PAC interacts badly with code generation that wants to sign and verify its own pointers; Apple's solution was to require an explicit entitlement for any process that wants JIT capability and to enforce a separate W^X policy on JIT memory ^[48].

Android's answer is ARMv8.5-A Memory Tagging Extension on Pixel 8 and later ^[49]. MTE is adjacent to CFI rather than within its design space: a tagged-allocator scheme that catches use-after-free and out-of-bounds memory accesses at hardware speed, before they corrupt a control-flow target in the first place. MTE complements PAC; it does not replace it.

Platform	Forward edge	Backward edge	Memory safety adjuncts
Windows 11 x86-64	CFG (default); XFG (Insider, deprecated)	CET Shadow Stack (default-on user mode)	--
Windows 11 ARM64	-- (no forward-edge CFI documented; PAC is backward)	ARMv8.3 PAC, key B	--
Linux mainline	`-fsanitize=cfi-icall` (LTO jump tables) / kCFI hash	LLVM software shadow-call-stack; CET on x86-64	`-fcf-protection=full` (CET); MTE on ARM
macOS / iOS	--	ARMv8.3 PAC	Hardened runtime; W^X JIT
Android (Pixel 8+)	LLVM CFI	ARMv8.3 PAC	ARMv8.5 MTE (tagged allocator)
CHERI / CHERIoT	Capability-bound pointers (all edges)	Capability-bound return addresses	128-bit hardware capabilities

The capability-hardware future is CHERI -- Capability Hardware Enhanced RISC Instructions -- and its embedded sibling CHERIoT. The structural shift CHERI makes is to encode 128-bit hardware capabilities into the pointer itself: every pointer carries provenance, bounds, and permissions, all enforced by the CPU. A capability cannot be forged, narrowed beyond its grant, or reused after revocation. Pointer integrity is enforced at the silicon, not at the call site ^[50]. Microsoft Research's Project Snowflake explores the same design space ^[51].

Three platforms, three answers. None is a complete answer. To understand why, we have to look at the bug class no CFI variant can close.

8. What CFI Cannot Close

Hong Hu and his coauthors at the National University of Singapore published Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks at IEEE Symposium on Security and Privacy in May 2016 ^[52]. The paper's abstract is the load-bearing observation: "In this paper we show that such attacks are Turing-complete. We present a systematic technique called data-oriented programming (DOP) to construct expressive non-control data exploits for arbitrary x86 programs. ... 8 out of 9 real-world programs have gadgets to simulate arbitrary computations and 2 of them are confirmed to be able to build Turing-complete attacks. All the attacks work in the presence of ASLR and DEP."

The structural point is what makes DOP devastating to the CFI design space. A DOP attack never violates the static control-flow graph. The attacker chains short non-control-data corruptions -- writes to variables, flags, configuration values, never to a code pointer -- and computes inside the program's legitimate control flow.

The CFI bitmap, the prototype hash, the shadow stack, the IBT tracker, the PAC MAC: none of them are designed to detect data writes. They are designed to detect control-flow transfers to illegal targets. A DOP exploit never goes to an illegal target. It stays on the legitimate path and rearranges what the program computes along the way.

Data-Oriented Programming (DOP)

A code-reuse attack class identified by Hu, Shinde, Sendroiu, Chua, Saxena and Liang at IEEE S&P 2016. DOP chains short data-flow-stitching gadgets to compute arbitrary functions using only legitimate, in-CFG control flow. The exploits never violate the static control-flow graph. Every CFI variant -- CFG, XFG, IBT, SHSTK, PAC -- is structurally invisible to DOP because none of these mechanisms validate data writes; they only validate the targets of indirect transfers ^[52].

Ctrl + scroll to zoom

Diagram source

flowchart TD
A[Memory-safety bug] --> B[Control-flow hijacking]
A --> C[Data-only attack]
B --> D[ROP - closed by SHSTK]
B --> E[JOP - closed by CFG/IBT/XFG]
B --> F[COOP - closed by XFG and PAC]
C --> G[DOP - not closed by any CFI]
C --> H[Use-after-free - not closed by CFI]
C --> I[Arbitrary-write primitive - not closed by CFI]

Diagram source

flowchart TD
A[Memory-safety bug] --> B[Control-flow hijacking]
A --> C[Data-only attack]
B --> D[ROP - closed by SHSTK]
B --> E[JOP - closed by CFG/IBT/XFG]
B --> F[COOP - closed by XFG and PAC]
C --> G[DOP - not closed by any CFI]
C --> H[Use-after-free - not closed by CFI]
C --> I[Arbitrary-write primitive - not closed by CFI]

What CFI closes and what it does not. CFI mitigations validate the targets of indirect transfers, so they close ROP, JOP and COOP. They do not validate data writes, so DOP, use-after-free, and arbitrary-write primitives survive untouched.

Even within the forward-edge attacks CFI does try to close, the precision is limited. The Burow, Carr, Nash, Larsen, Franz, Brunthaler and Payer survey at ACM Computing Surveys 2017 is the canonical reference on the precision dimension ^[53].

CFG admits the count of address-taken functions per binary -- thousands, on any non-trivial DLL. XFG narrows the equivalence class to the count of functions sharing a prototype hash. WOOT 2024 measured the geometric mean of XFG equivalence classes on Windows 11 Insider Preview at 1.37: a typical XFG-protected call site is followed by roughly one or two prototype-matching candidates ^[22].

PAC's equivalence class is the count of functions whose signed-with-key-B return addresses collide on the same MAC -- much smaller in practice, but still non-singleton. None of these mitigations achieve the single-target precision a fully type-aware fine-grained CFI would offer.

JIT and dynamic code constitute their own carve-out. Any platform with runtime code generation must mark JIT-emitted code as valid CFI territory through some API -- on Windows, SetProcessValidCallTargets is the surface, plus the PAGE_TARGETS_INVALID page-protection flag for memory that has not yet been marked. The Theori MS16-119 Chakra JIT bypass remains the canonical demonstration that JIT carve-outs are a structural CFI weakness, not an implementation bug ^[23].

And then there is the structural ceiling. Matt Miller's BlueHat IL 2019 talk Trends, challenges, and shifts in software vulnerability mitigation contains the empirical floor: roughly 70% of CVEs Microsoft issued each year between 2006 and 2018 were memory-safety bugs, and the share has been stable across a window that includes the introduction of /GS, SafeSEH, DEP, ASLR, CFG, ACG, CIG, and CET ^[54].

The Becker et al. WOOT 2024 §1 statement corroborates from the academic side: "Memory safety vulnerabilities make up two thirds of security issues in large code bases across the industry" ^[22]. Note the careful framing: this is the bug class statistic, not the exploit class. CFI closes a subclass of memory-corruption exploitation. The bigger box is still open.

CFI closes the control-flow-hijacking subclass of memory-corruption exploitation. The 70% memory-safety statistic is the structural ceiling. The exits from that ceiling are not within the CFI design space. They are memory-safe languages (Rust closing the bug class at compile time) and capability hardware (CHERI and CHERIoT closing pointer integrity at the silicon). CFI is one layer in a multi-layer story.

The real answers, then, are not new CFI variants. They are memory-safe languages -- Rust adoption in the Windows kernel, in the .NET runtime, in the WinRT projection -- and capability hardware. Neither is a substitute for the CFI layer that exists today, but neither is a CFI primitive either. They live at a different floor of the stack.

So where is the research moving?

9. Open Problems

The 2026-2030 research surface on Windows CFI has at least five named unknowns.

The first is kernel CFG and kernel CET bypasses. McGarr's Black Hat USA 2025 deck Out of Control names the area explicitly: kernel-mode CFG and kernel-mode CET surfaces have active bypass research, including PTE-manipulation attacks against the kCFG bitmap when HVCI is disabled, and the nt!KscpCfgDispatchUserCallTargetEs[No]Smep dispatch function on the kernel side ^[19].

The Synacktiv SSTIC 2025 paper is the canonical reverse-engineering reference for the kernel-mode HSP implementation, and it walks the bypass surface a researcher would attack first ^[45].

The second is the XFG deprecation story. What fills the COOP-shaped forward-edge gap on shipping Windows x86-64 now that XFG is deprioritised? The candidates are IBT (free if Windows turned it on, but coarse: every ENDBR64 is a legal target), an academic refinement like FineIBT (not deployed), or an unnamed type-aware MSVC successor that Microsoft has not publicly committed to. The honest answer is: nothing has XFG's fine-grained shape on Windows x86-64 in 2026. The COOP-shaped attack surface is open.

The third is Memory Tagging Extension on Windows-on-ARM. No Snapdragon X Elite or X Plus stepping currently sold supports ARMv8.5-A MTE in hardware, and Windows has no documented MTE-tagged allocator. The Pixel 8 line shipped MTE on Android in 2023 ^[55] ^[56]; Apple Silicon shipped a different MTE-adjacent tagging scheme ^[47]; Windows is the third major platform on ARM and has the smallest MTE story. Whether Windows-on-ARM gets MTE in the next Snapdragon generation, and whether Microsoft ships a tagged Windows kernel allocator if it does, is open future work.

The fourth is CFI for managed runtimes. The .NET and WebAssembly host code-generation paths are the same carve-out Theori demonstrated in 2016 against Chakra. The .NET runtime in particular runs through RyuJIT to emit native code that must be marked CFG-valid through SetProcessValidCallTargets ^[24]. Whether Microsoft ships a finer-grained CFI for managed-runtime-emitted code -- one that bounds the equivalence class to "methods of this type" rather than "any address-taken function in the process" -- is not a public roadmap item.

The fifth is forward-edge precision after XFG. The Burow et al. CSUR 2017 survey's analytical framing is the one to keep in mind: precision is the size of the equivalence class admitted at each call site. CFG admits thousands. XFG admits roughly one to two on the WOOT 2024 measurement. The fine-grained ideal is one. Microsoft has not publicly committed to a successor type-aware forward-edge CFI for Windows x86-64.

Knowing what is open is half the practitioner's job. Knowing how to verify what is currently shipping is the other half.

10. Verifying CFI on Any Windows Binary

A reproducible workflow the reader can run on their own machine right now.

Compile with CFI. The MSVC command line for the full stack is cl /guard:cf main.cpp /link /DYNAMICBASE /HIGHENTROPYVA /CETCOMPAT. Order matters: switches before /link go to the compiler, switches after /link go to the linker, and /CETCOMPAT is a linker-only option ^[41]. Both /guard:cf and /DYNAMICBASE are required for CFG; /guard:cf alone is a silent no-op ^[18].

/guard:xfg adds XFG instrumentation on MSVC since Visual Studio 2019 Preview 16.5 ^[29]. /CETCOMPAT marks the binary as shadow-stack-compatible, which the loader uses to decide whether shadow-stack faults are fatal in strict mode. /HIGHENTROPYVA extends ASLR's randomisation range and is required for the 128 TB user VA that CFG's bitmap reservation depends on ^[57].

Inspect a binary on disk. dumpbin /loadconfig binary.exe reports CF Instrumented, FID table present, Long jump target table, and XFG functions present. dumpbin /headers binary.exe reports IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT if the binary was linked with /CETCOMPAT. link /DUMP /HEADERS is the linker-side equivalent and produces the same information. Both tools ship in any Visual Studio install.

Inspect a running process. Get-ProcessMitigation -Name notepad.exe in PowerShell reports CFG, ASLR, DEP, shadow-stack, ACG and CIG state per process ^[2]. Set-ProcessMitigation toggles policies at runtime for a given process name. Get-ProcessMitigation -System reports system-wide defaults. The cmdlet is implemented atop GetProcessMitigationPolicy under the hood.

JavaScript Reproducing the verification workflow

// Reproduces the logic of Get-ProcessMitigation plus dumpbin output
// for a single binary. In real PowerShell, GetProcessMitigationPolicy
// returns a struct with one field per policy class.

function inspectBinary(name, dumpbinHeaders, dumpbinLoadConfig) {
const cetCompat = dumpbinHeaders.includes('CET Compatible');
const cfInstrumented = dumpbinLoadConfig.includes('CF Instrumented');
const xfgPresent = dumpbinLoadConfig.includes('XFG functions present');

console.log('--- ' + name + ' ---');
console.log('  CFG       :', cfInstrumented ? 'INSTRUMENTED' : 'absent');
console.log('  XFG       :', xfgPresent ? 'INSTRUMENTED' : 'absent');
console.log('  CETCOMPAT :', cetCompat ? 'YES' : 'NO');
}

function inspectProcess(name, mitigationPolicy) {
console.log('Process: ' + name);
console.log('  CFG.Enable                 :', mitigationPolicy.CFG.Enable);
console.log('  UserShadowStack.Enable     :', mitigationPolicy.USS.Enable);
console.log('  UserShadowStack.StrictMode :', mitigationPolicy.USS.StrictMode);
console.log('  ASLR.BottomUp              :', mitigationPolicy.ASLR.BottomUp);
console.log('  DEP.Enable                 :', mitigationPolicy.DEP.Enable);
}

inspectBinary('msedge.exe',
'IMAGE_DLLCHARACTERISTICS_EX_CET_COMPATIBLE',
'CF Instrumented, FID table present, XFG functions present');

inspectProcess('msedge.exe', {
CFG:  { Enable: 'ON', StrictMode: 'ON' },
USS:  { Enable: 'ON', StrictMode: 'ON' },
ASLR: { BottomUp: 'ON' },
DEP:  { Enable: 'ON' }
});

Press Run to execute.

Programmatic policy installation. The two API surfaces are SetProcessMitigationPolicy, which sets the policy of the current process at runtime, and UpdateProcThreadAttribute(PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY), which sets the policy of a child process at CreateProcess time. The latter is the only race-free entry point for hardened child processes -- it is impossible for child code to execute before the policy is installed.

Turn on kernel-mode HSP. Windows Security -> Device Security -> Core Isolation -> "Kernel-mode Hardware-enforced Stack Protection." HVCI is the prerequisite; if it is off, the toggle is not available. Group Policy exposes the same setting at Computer Configuration / Administrative Templates / System / Device Guard / Turn On Virtualization Based Security / Kernel-mode Hardware-enforced Stack Protection.

Inspecting your own machine: a verification one-liner

Open PowerShell as administrator and run:

Get-ProcessMitigation -Name (Get-Process -Id $PID).Path |
  Select-Object CFG, ASLR, DEP, UserShadowStack

The output is the policy of the current PowerShell session. To check a binary on disk:

dumpbin /headers C:\Windows\System32\notepad.exe | findstr /C:"CET"
dumpbin /loadconfig C:\Windows\System32\notepad.exe | findstr /C:"FID"

The first line returns CET Compatible if the binary was linked with /CETCOMPAT. The second returns the FID-table presence line if CFG was enabled.

Now the reader can answer the question §1 raised: why does the same OS apply different contracts to different processes? Because each process opts in, and the opt-in surface has ten bits.

11. Frequently Asked Questions

Practitioner questions about Windows CFI

Is CFG, XFG, or CET the answer to memory-corruption exploitation?

No. None of them close the data-only attack class. Hu and colleagues proved at IEEE S&P 2016 that Data-Oriented Programming is Turing-complete and never violates the static control-flow graph, which means every CFI variant is structurally blind to it ^[52]. CFI closes the control-flow-hijacking subclass of memory-corruption exploitation. The 70% memory-safety statistic from Matt Miller's BlueHat IL 2019 talk is the structural ceiling ^[54].

Why was XFG deprecated?

The public-facing reason is documented in Connor McGarr's Black Hat USA 2025 retrospective: "XFG was never fully instrumented (UM/KM) and is now deprecated" ^[19]. The most defensible reading of why is that hardware CET on the backward edge -- territory software CFG and XFG never touched -- became the strategic priority once Tiger Lake silicon arrived in September 2020. WOOT 2024 measured XFG at 85.7% of x64 PE files on Insider Preview, never reaching the universal coverage CFG achieves ^[22].

Is kernel-mode HSP on by default in Windows 11 24H2?

No. The Microsoft Learn page states the default verbatim: "Kernel-mode Hardware-enforced Stack Protection is off by default, but customers can turn it on if the prerequisites are met" ^[44]. The prerequisites are an 11th-gen Intel Core Mobile CPU or AMD Zen 3 or newer, Windows 11 2022 update or newer, VBS enabled, HVCI enabled, and an explicit user opt-in via Windows Security or Group Policy.

Does Windows-on-ARM have MTE?

No, not as of May 2026. The Snapdragon X Elite and X Plus steppings shipping in 2026 Windows-on-ARM machines do not support ARMv8.5-A Memory Tagging Extension in hardware, and Windows has no documented MTE-tagged allocator. Pointer Authentication is shipped (92% PA file coverage on Insider Preview build 23419 per WOOT 2024) but MTE is not ^[22] ^[46].

Does AMD ship CET?

Yes. AMD Zen 3 (Ryzen 5000 "Vermeer" and Epyc 7003 "Milan") shipped on November 5, 2020 with a compatible shadow-stack implementation ^[34]. Microsoft's Kernel-mode HSP documentation explicitly names "AMD Zen 3 Core (and newer)" as a CET prerequisite ^[44]. The instruction encodings follow the Intel CET specification, so OS code paths are shared.

What is the difference between CFG and /GS?

Different invariants, different timing. /GS is a stack-cookie check on function epilogue: a random value is placed between local buffers and the saved return address, and the runtime check fires before ret if the cookie has been overwritten. CFG is an indirect-call target check on function prologue: every indirect call site invokes a thunk that consults a bitmap to verify the target address. /GS detects contiguous stack-buffer overflows; CFG constrains the target of an attacker-controlled function-pointer write. They are complementary, not substitutes.

What is the difference between HVCI and kernel-mode HSP?

HVCI is W^X for kernel pages. The hypervisor enforces that kernel memory marked executable is not writable from any source, including the NT kernel itself, by managing the second-level address translation tables that the kernel cannot touch. Kernel-mode HSP is the CET-based ROP mitigation for ring 0: a CPU-managed shadow stack of kernel return addresses, with a #CP fault on mismatch. HVCI is a prerequisite for kernel-mode HSP because the shadow-stack pages need to be write-protected by the hypervisor; the NT kernel cannot guarantee its own non-mutability after a code-execution compromise ^[44].

Will Rust replace CFI?

Rust closes memory-safety bugs at compile time. CFI closes the exploitation surface at runtime against bugs that did make it past the compiler. Both layers ship in parallel. Microsoft is migrating selected Windows kernel components to Rust (the Mu UEFI firmware project ^[58], segments of the GDI subsystem) but CFI remains the runtime layer for everything in the C and C++ surface. The two are complementary; one does not replace the other.

The story this article tells closes around a structural admission. CFI is one layer of a defence stack. The 1996-to-2016 attack-class genealogy -- stack smash, return-into-libc, ROP, JOP, COOP, DOP -- produced a matching defense genealogy on Windows: /GS, DEP, ASLR, CFG, XFG, CET shadow stack. Each generation closes the gap the previous attacker class opened. Each leaves open exactly the territory the next attacker class will occupy.

DOP and the 70% memory-safety statistic are the territory no CFI generation has touched. That territory is the one Rust closes at compile time, and CHERI and CHERIoT close at the silicon. The future of memory-corruption defence on Windows is not a fourth generation of CFI. It is the combination of memory-safe languages in the kernel and capability hardware underneath the language.

CFI is necessary and not sufficient. Now you know which bit is which.

Study guide

Key terms

CFG: Control Flow Guard. Shipped Windows 8.1 Update 3 (November 2014). Per-process bitmap of valid indirect-call targets, indexed by target address.
XFG: eXtended Flow Guard. Announced BlueHat Shanghai 2019. 64-bit prototype-hash refinement of CFG; never fully instrumented in shipping Windows; deprecated per McGarr BHUSA 2025.
CET: Intel Control-flow Enforcement Technology. Hardware feature shipped in Tiger Lake (September 2, 2020). Two components: SHSTK (Shadow Stack) and IBT (Indirect Branch Tracking).
SHSTK: Shadow Stack. CPU-managed parallel stack of return addresses, write-protected by a CET page-table bit. Mismatch on ret raises #CP.
IBT: Indirect Branch Tracking. Forward-edge half of CET. Indirect-branch targets must begin with ENDBR64; mismatch raises #CP. Windows does not enable IBT as of 2026.
FID table: Function ID table. Per-binary PE structure inside IMAGE_LOAD_CONFIG_DIRECTORY listing every address-taken function. Loader merges per-module tables into a process-wide CFG bitmap.
COOP: Counterfeit Object-Oriented Programming. Schuster et al. IEEE S&P 2015. Chains C++ virtual calls dispatched through legitimate vtables, every target a valid CFG bit. The attack that motivated XFG.
DOP: Data-Oriented Programming. Hu et al. IEEE S&P 2016. Turing-complete attack via non-control data corruption. Invisible to every CFI variant because it never violates the control-flow graph.
PAC: Pointer Authentication Code. ARMv8.3-A feature. Cryptographic MAC over a 64-bit pointer in unused upper bits. Windows-on-ARM uses key B for instruction-pointer signing on return addresses.
HVCI: Hypervisor-enforced Code Integrity. W^X for kernel pages enforced by the hypervisor via second-level address translation. Prerequisite for kernel-mode HSP.

References

Baiju V Patel (2020). Understanding Hardware-enforced Stack Protection. https://techcommunity.microsoft.com/blog/windowsosplatform/understanding-hardware-enforced-stack-protection/1247815 ↩
Control Flow Guard for platform security. https://learn.microsoft.com/en-us/windows/win32/secbp/control-flow-guard ↩
Aleph One (1996). Smashing The Stack For Fun And Profit. http://phrack.org/issues/49/14.html - Phrack Magazine Volume 7, Issue 49, File 14 of 16, November 1996. ↩
Elias Levy (Wikipedia). https://en.wikipedia.org/wiki/Elias_Levy ↩
/GS (Buffer Security Check) -- MSVC build reference. https://learn.microsoft.com/en-us/cpp/build/reference/gs-buffer-security-check ↩
NX bit (Wikipedia). https://en.wikipedia.org/wiki/NX_bit ↩
Windows XP Service Pack 2 (Wikipedia). https://en.wikipedia.org/wiki/Windows_XP_SP2 ↩
Solar Designer (1997). Getting around non-executable stack (and fix). https://marc.info/?l=bugtraq&m=87602746719512 - BugTraq mailing list, 1997-08-10. ↩
Martin Abadi, Mihai Budiu, Ulfar Erlingsson, & Jay Ligatti (2005). Control-Flow Integrity: Principles, Implementations, and Applications. https://www.microsoft.com/en-us/research/publication/control-flow-integrity-principles-implementations-and-applications/ - The foundational CFI paper, ACM CCS 2005. DOI 10.1145/1102120.1102165. ↩
Windows Vista (Wikipedia). https://en.wikipedia.org/wiki/Windows_Vista ↩
Michael Howard (2006). Address Space Layout Randomization in Windows Vista. https://learn.microsoft.com/en-us/archive/blogs/michael_howard/address-space-layout-randomization-in-windows-vista ↩
Address space layout randomization (Wikipedia). https://en.wikipedia.org/wiki/Address_space_layout_randomization ↩
/SAFESEH (Image has Safe Exception Handlers) -- MSVC linker reference. https://learn.microsoft.com/en-us/cpp/build/reference/safeseh-image-has-safe-exception-handlers ↩
Hovav Shacham (2007). The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86). https://hovav.net/ucsd/dist/geometry.pdf - ACM CCS 2007. DOI 10.1145/1315245.1315313. ↩
CCS 2017 Test-of-Time Award. https://www.sigsac.org/ccs/CCS2017/awards.html ↩
Return-oriented programming (Wikipedia). https://en.wikipedia.org/wiki/Return-oriented_programming ↩
Control-flow integrity (Wikipedia). https://en.wikipedia.org/wiki/Control-flow_integrity ↩
/guard (Enable Control Flow Guard) -- MSVC build reference. https://learn.microsoft.com/en-us/cpp/build/reference/guard-enable-control-flow-guard ↩
Connor McGarr (2025). Out Of Control: How kCFG and kCET Redefine Control Flow Integrity in the Windows Kernel. https://github.com/connormcgarr/Presentations/blob/master/BlackHat-US-25-McGarr-Out-Of-Control-KCFG-And-KCET.pdf - Black Hat USA 2025. ↩
Alex Ionescu Windows 8.1 Kernel Patch Protection, Address Space and Behavior Changes. https://www.alex-ionescu.com/windows-8-1-address-space-and-behavior-changes/ ↩
Jack Tang (2015). Exploring Control Flow Guard in Windows 10. https://documents.trendmicro.com/assets/wp/exploring-control-flow-guard-in-windows10.pdf ↩
Lucas Becker, Matthias Hollick, & Jiska Classen (2024). SoK: On the Effectiveness of Control-Flow Integrity in Practice. https://www.usenix.org/system/files/woot24-becker.pdf - USENIX WOOT 2024. ↩
Theori Frontier Squad (2016). Chakra JIT CFG Bypass. https://theori.io/blog/chakra-jit-cfg-bypass ↩
SetProcessValidCallTargets function (memoryapi.h). https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-setprocessvalidcalltargets ↩
Felix Schuster, Thomas Tendyck, Christopher Liebchen, Lucas Davi, Ahmad-Reza Sadeghi, & Thorsten Holz (2015). Counterfeit Object-Oriented Programming: On the Difficulty of Preventing Code Reuse Attacks in C++ Applications. https://www.ieee-security.org/TC/SP2015/papers-archived/6949a745.pdf - IEEE Symposium on Security and Privacy 2015. DOI 10.1109/SP.2015.51. ↩
Tyler K. Bletsch, Xuxian Jiang, Vincent W. Freeh, & Zhenkai Liang (2011). Jump-Oriented Programming: A New Class of Code-Reuse Attack. https://www.comp.nus.edu.sg/~liangzk/papers/asiaccs11.pdf - ASIACCS 2011. DOI 10.1145/1966913.1966919. ↩
David Weston (2019). Advancing Windows Security (BlueHat Shanghai 2019). https://github.com/dwizzzle/Presentations ↩
How the MSVC compiler generates XFG function prototype hashes. https://blog.quarkslab.com/how-the-msvc-compiler-generates-xfg-function-prototype-hashes.html ↩
Connor McGarr (2020). Exploit Development: Between a Rock and a (Xtended Flow) Guard Place: Examining XFG. https://connormcgarr.github.io/examining-xfg/ ↩
Jonathan Corbet (2022). kCFI in the Linux kernel. https://lwn.net/Articles/898040/ ↩
Lucas Becker, Matthias Hollick, & Jiska Classen SoK: On the Effectiveness of Control-Flow Integrity in Practice (abstract). https://www.usenix.org/conference/woot24/presentation/becker ↩
Shadow stack (Wikipedia). https://en.wikipedia.org/wiki/Shadow_stack ↩
Tiger Lake (Wikipedia). https://en.wikipedia.org/wiki/Tiger_Lake ↩
Zen 3 (Wikipedia). https://en.wikipedia.org/wiki/Zen_3 ↩
ENDBR64 instruction reference. https://www.felixcloutier.com/x86/endbr64 ↩
Indirect branch tracking (Wikipedia). https://en.wikipedia.org/wiki/Indirect_branch_tracking ↩
WRSSD / WRSSQ instruction reference. https://www.felixcloutier.com/x86/wrssd:wrssq ↩
WRUSSD / WRUSSQ instruction reference. https://www.felixcloutier.com/x86/wrussd:wrussq ↩
INCSSPD / INCSSPQ instruction reference. https://www.felixcloutier.com/x86/incsspd:incsspq ↩
RDSSPD / RDSSPQ instruction reference. https://www.felixcloutier.com/x86/rdsspd:rdsspq ↩
/CETCOMPAT -- MSVC linker reference. https://learn.microsoft.com/en-us/cpp/build/reference/cetcompat ↩
PROCESS_MITIGATION_USER_SHADOW_STACK_POLICY structure. https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-process_mitigation_user_shadow_stack_policy ↩
Baiju V Patel (2020). A Technical Look at Intel's Control-flow Enforcement Technology. https://software.intel.com/content/www/us/en/develop/articles/technical-look-control-flow-enforcement-technology.html - Intel primary on CET design and characterised overhead. Cross-referenced from Microsoft Learn /CETCOMPAT. ↩
Kernel-mode Hardware-enforced Stack Protection. https://learn.microsoft.com/en-us/windows-server/security/kernel-mode-hardware-stack-protection ↩
Remi Jullian & Alexandre Aulnette (2025). Analyzing the Windows kernel shadow stack mitigation. https://www.synacktiv.com/sites/default/files/2025-06/sstic_windows_kernel_shadow_stack_mitigation.pdf ↩
Connor McGarr (2023). Exploit Development: Unveiling Windows ARM64 Pointer Authentication (PAC). https://connormcgarr.github.io/windows-pac-arm64/ ↩
Operating system integrity -- Apple Platform Security. https://support.apple.com/guide/security/operating-system-integrity-sec8b776536b/web ↩
Allow execution of JIT-compiled code entitlement -- Apple Developer Documentation. https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-jit ↩
Arm Memory Tagging Extension (MTE) -- Android Source. https://source.android.com/docs/security/test/memory-safety/arm-mte ↩
Robert N. M. Watson, Simon W. Moore, Peter Sewell, Brooks Davis, & Peter Neumann Capability Hardware Enhanced RISC Instructions (CHERI). https://www.cl.cam.ac.uk/research/security/ctsrd/cheri - Cambridge / SRI International CHERI project page; canonical CHERI / CHERIoT entry point. ↩
Project Snowflake: Non-blocking Safe Manual Memory Management in .NET. https://www.microsoft.com/en-us/research/publication/project-snowflake-non-blocking-safe-manual-memory-management-net/ - Microsoft Research publication page for the Snowflake non-blocking safe manual memory management project. ↩
Hong Hu, Shweta Shinde, Sendroiu Adrian, Zheng Leong Chua, Prateek Saxena, & Zhenkai Liang (2016). Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks. https://huhong789.github.io/advanced-DOP/ - IEEE S&P 2016. DOI 10.1109/SP.2016.62. ↩
Nathan Burow, Scott A. Carr, Joseph Nash, Per Larsen, Michael Franz, Stefan Brunthaler, & Mathias Payer (2017). Control-Flow Integrity: Precision, Security, and Performance. https://nebelwelt.net/publications/files/17CSUR.pdf - ACM Computing Surveys 50(1) Art. 16. DOI 10.1145/3054924. ↩
Matt Miller (2019). Trends, challenges, and shifts in software vulnerability mitigation. https://github.com/microsoft/MSRC-Security-Research/blob/master/presentations/2019_02_BlueHatIL/2019_01%20-%20BlueHatIL%20-%20Trends%2C%20challenge%2C%20and%20shifts%20in%20software%20vulnerability%20mitigation.pdf - BlueHat IL 2019. The ~70% memory-safety statistic primary. ↩
Mark Brand (2023). First handset with MTE on the market. https://googleprojectzero.blogspot.com/2023/11/first-handset-with-mte-on-market.html - Google Project Zero, November 2023. Pixel 8 first-handset MTE primary. ↩
Pixel 8 (Wikipedia). https://en.wikipedia.org/wiki/Pixel_8 ↩
/HIGHENTROPYVA (Support 64-bit ASLR) -- MSVC linker reference. https://learn.microsoft.com/en-us/cpp/build/reference/highentropyva-support-64-bit-aslr ↩
Project Mu -- microsoft/mu. https://github.com/microsoft/mu - Microsoft's Project Mu modular UEFI firmware repository; includes Rust-based components. ↩

1. One Status Code, Two Processes#

2. The Attack That Started Everything#

3. The Mitigation Stack Before CFI#

4. Three Generations of CFI on Windows#

4.1 Generation 1: Control Flow Guard#

4.2 Generation 1.5: eXtended Flow Guard#

4.3 Generation 2: Intel CET, Shadow Stack and Indirect Branch Tracking#

5. Hardware-Enforced Backward-Edge Protection#

6. CFI on Windows in 2026#

6.1 User-mode Hardware-enforced Stack Protection#

6.2 Kernel-mode Hardware-enforced Stack Protection#

6.3 Pointer Authentication on Windows-on-ARM#

6.4 Coverage in production#

7. How Other Platforms Solve the Same Problem#

8. What CFI Cannot Close#

9. Open Problems#

10. Verifying CFI on Any Windows Binary#

11. Frequently Asked Questions#

Practitioner questions about Windows CFI

Key terms

References

Share

1. One Status Code, Two Processes

2. The Attack That Started Everything

3. The Mitigation Stack Before CFI

4. Three Generations of CFI on Windows

4.1 Generation 1: Control Flow Guard

4.2 Generation 1.5: eXtended Flow Guard

4.3 Generation 2: Intel CET, Shadow Stack and Indirect Branch Tracking

5. Hardware-Enforced Backward-Edge Protection

6. CFI on Windows in 2026

6.1 User-mode Hardware-enforced Stack Protection

6.2 Kernel-mode Hardware-enforced Stack Protection

6.3 Pointer Authentication on Windows-on-ARM

6.4 Coverage in production

7. How Other Platforms Solve the Same Problem

8. What CFI Cannot Close

9. Open Problems

10. Verifying CFI on Any Windows Binary

11. Frequently Asked Questions